Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
7dc2968
Add dataset utilities and tests
EtonT471 Mar 2, 2026
47fc954
Adding dataset_tests.jl
EtonT471 Mar 3, 2026
787a88d
Small changes to design.md
EtonT471 Mar 3, 2026
2dd2295
March 16 Updates
EtonT471 Mar 17, 2026
173cdd1
Ridge Regression file
EtonT471 Mar 17, 2026
042d5d6
dataset.jl small update
EtonT471 Mar 17, 2026
70cfa67
Updated Experimental Units and Treatments Sections
EtonT471 Mar 17, 2026
9b8c48c
Small changes
EtonT471 Mar 17, 2026
6c923ab
Bidiagonalization Stuff
EtonT471 Mar 17, 2026
6727a90
Adding Linear Algebra to Project.toml
EtonT471 Mar 17, 2026
358e881
Design Branch
EtonT471 Mar 17, 2026
2ad65a9
Updated design 3/19
EtonT471 Mar 19, 2026
fc2595f
3/19 edits
EtonT471 Mar 19, 2026
788665c
Update experimental design document
EtonT471 Mar 24, 2026
698aff7
Remove code files from design branch
EtonT471 Mar 24, 2026
f904351
Compiling issues
EtonT471 Mar 24, 2026
49c8a81
Attempt 1
EtonT471 Mar 24, 2026
8529e34
Minor adjustment
EtonT471 Mar 24, 2026
3734a46
restoring .jl files
EtonT471 Mar 24, 2026
48edb29
moving to src
EtonT471 Mar 24, 2026
8c2e5bc
Put source file back into the folder.
vp314 Mar 24, 2026
8192f16
Delete docs/src/RidgeRegression.jl
vp314 Mar 24, 2026
8a6525e
Source folder re-added
EtonT471 Mar 24, 2026
dc10590
Fixed math notation
vp314 Mar 24, 2026
6b893ef
recent changes
EtonT471 Apr 7, 2026
a605e40
Updated design.md
EtonT471 Apr 7, 2026
d9a63e7
Remove RidgeRegression.jl changes from Design branch
EtonT471 Apr 7, 2026
c29b983
fixing compiling
EtonT471 Apr 7, 2026
0ba580c
Fixing dollar signs
EtonT471 Apr 7, 2026
e080193
XtopX change
EtonT471 Apr 7, 2026
26a6198
Update 4/13
EtonT471 Apr 13, 2026
cf1e17e
4/14 Update
EtonT471 Apr 15, 2026
9ca4815
4/28 updates
EtonT471 Apr 28, 2026
2c641dd
Fixed compiling issue
EtonT471 Apr 28, 2026
06f1c06
More compiling fixed
EtonT471 Apr 28, 2026
c932604
Updates to experimental design
vp314 May 21, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@
/docs/build/
Manifest.toml
build/
.DS_Store
10 changes: 9 additions & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,15 @@
name = "RidgeRegression"
uuid = "739161c8-60e1-4c49-8f89-ff30998444b1"
authors = ["Vivak Patel <vp314@users.noreply.github.com>"]
version = "0.1.0"
authors = ["Eton Tackett <etont@icloud.com>", "Vivak Patel <vp314@users.noreply.github.com>"]

[deps]
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Downloads = "f43a241f-c20a-4ad4-852c-f6b1247861c6"

[compat]
CSV = "0.10.15"
DataFrames = "1.8.1"
Downloads = "1.7.0"
julia = "1.12.4"
1 change: 1 addition & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ makedocs(;
),
pages=[
"Home" => "index.md",
"Design" => "design.md",
],
)

Expand Down
162 changes: 162 additions & 0 deletions docs/src/design.md
Comment thread
vp314 marked this conversation as resolved.
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
# Motivation and Background
Many modern science problems involve regression problems with extremely large numbers of
predictors.
[Genome-wide association studies (GWAS)](https://doi.org/10.1371/journal.pone.0245376),
for example, try to identify genetic
variants associated with a disease phenotype using hundreds of thousands or millions of
genomic features. In such settings, traditional least squares methods fail.
Penalized Least Squares (PLS) extends ordinary least squares (OLS)
regression by adding a penalty term to shrink parameter estimates. Ridge regression, an
approach within PLS, adds a regularization term, producing a regularized estimator.

Mathematically, ridge regression estimates the regression coefficients by solving the
penalized least squares problem
```math
\hat{\boldsymbol{\beta}} =
\arg\min_{\boldsymbol{\beta}}
\left(
\| \mathbf{y} - X\boldsymbol{\beta} \|_2^2
+
\lambda \| \boldsymbol{\beta} \|_2^2
\right)
Comment on lines +15 to +21
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should indicate which norms you are using by using a subscript _2 for instance.

```
where $\lambda > 0$ is a regularization parameter that controls the strength of the penalty.

The purpose of ridge regression is to stabilize regression estimates when the predictors are
highly correlated or the design matrix $X$ is nearly singular. Ridge regression modifies the
least squares objective by adding a penalty on the squared $\ell_2$-norm of the coefficient
vector. The estimator is obtained by minimizing a penalized least squares objective in which
large coefficient values are discouraged through the penalty term $\lambda
\|\boldsymbol{\beta}\|_2^2$. This penalty shrinks the estimated coefficients toward the
origin, which reduces the variance of the estimator and mitigates the effects of
multicollinearity.

There are many numerical algorithms available to compute ridge regression estimates
including direct methods, Krylov subspace methods, gradient-based optimization, coordinate
descent, and stochastic gradient descent. These algorithms differ in their computational
costs and numerical stability.

The goal of this experiment is to investigate the performance of these algorithms when we
vary the structure and scale of the regression problem. To do this, we consider the linear
model $\mathbf{y} = X\boldsymbol{\beta} + \boldsymbol{\varepsilon}$ where the matrix ${X}$
may be constructed with varying dimensions, sparsity patterns, and conditioning properties.
# Questions
The primary goal of this experiment is to compare numerical algorithms for computing ridge
regression estimates under various conditions. In particular, we aim to address the
following questions:

1. How does the performance of ridge regression algorithms change as the structural
and numerical properties of the regression problem vary?

2. Which ridge regression algorithm provides the best balance between numerical
stability and computational cost across these problem regimes?

# Experimental Units
An experimental unit is defined by a triplet: a design matrix, $X$; a response vector, $y$;
and a non-negative regularization parameter $\lambda$.
Experimental units are selected based on factors we believe will potentially impact
the performance of a specific algorithm.
These factors, or blocks, correspond to:
- The dimensional regime of $X \in \mathbb{R}^{n \times p}$:
$p \ll n$, $p ≈ n$, and $p \gg n$.
- The sparsity of the design matrix: measured as a fraction in $[0,1]$.
- The magnitude of the regularization parameter, $\lambda$, as described below.

The regularization parameter is often selected using one of the following strategies,
depending on application domain:

- [Generalized Cross Validation](https://doi.org/10.1080/00401706.1979.10489751)
- [The L-Curve Method](https://doi.org/10.1137/1034115)
- [Information Criteria](https://doi.org/10.1002/wics.1607)
- [Morozov's Discrepancy Principle](https://doi.org/10.1088/0266-5611/26/2/025001)

Excluding Morozov's discrepancy principle, the remaining methods require
computing $\hat{\beta}_\lambda = (X^\top X + \lambda I)^{-1}X^\top y$ or
$\Vert X\hat{\beta}_\lambda - y \Vert_2$ over a grid of values for $\lambda$.
Given our assumption that problem conditioning will impact algorithm behavior,
we will choose this grid to modify the problem's condition number systematically.

Recall, the ridge estimator's closed form is given by solving
```math
\min_{\beta} \left\Vert \begin{bmatrix} X \\ \lambda I \end{bmatrix}\beta -
y \right\Vert_2^2,
```
or (equivalently, from a mathematical perspective) solving
$(X^\top X + \lambda I)\hat{\beta}_\lambda = X^\top y$.
Let $\sigma_{\max}$ denote the largest singular value
of $X$, and let $\sigma_{\min}$ denote the smallest singular value of
$X$ (note, $0$ is allowed).
Then the condition number for the normal systems approach is
```math
\kappa(X^\top X+\lambda I)
=
\frac{\sigma_{\max}^2+\lambda}{\sigma_{\min}^2+\lambda}.
```

There are two cases we consider.
- Case 1: $\sigma_{\min}=0$. In this case, the least squares problem is ill-posed. We choose
the smallest value of $\lambda$ in the grid to be
$\sigma_{\max}^2 / (10^{12} - 1)$, which ensures
$\kappa(X^\top X + \lambda I) = 10^{12}$.
- Case 2: $\sigma_{\min}>0$. In this case, we choose the smallest value of $\lambda$ in the
grid to be $\sigma_{\max}\sigma_{\min}$, which ensures
$\kappa(X^\top X + \lambda I) = \sigma_{\max}/\sigma_{\min}$.
That is, it sets the regularized normal system's condition number to that of the original
least squares problem.

In both cases, assuming $\sigma_{\max}^2 > 2 \sigma_{\min}$, we choose the larges value of
$\lambda$ to be $\sigma_{\max}^2 - 2 \sigma_{\min}$. The condition number for the
regularized normal equations is $\kappa(X^\top X + \lambda I)=2$.


The total number of block combinations is determined by the product of the number of levels
in each blocking factor, denoted b. For example, if the experiment includes three
dimensional regimes, two sparsity levels, and two regularization strengths, then there are
$3 * 2 * 2 = 12$ block combinations. We will also denote r to be the number of replicated
datasets in each block. Here, we mean the number datasets within a block. The total number
of experimental units is then ${b * r}$.


| Blocking System | Factor | Blocks |
|:----------------|:-------|:-------|
| Dataset | Dimensional regime | $(p \ll n)$, $(p \approx n)$, $(p \gg n)$|
| Ridge Penalty | Magnitude of ${\lambda}$ relative to the spectral scale of $X^\top X$ | Strong: $\kappa \approx 10$, Moderate: $\kappa \approx 10^3$, Weak: $\kappa \approx 10^6$, with $\lambda$ computed from the corresponding $\epsilon$. |
| Matrix Sparsity| Density of non-zero values in $X$ | Sparse (< 10% non-zero), Moderate (10%-50% non-zero), Dense (> 50% non-zero)|

# Treatments

The treatments are the ridge regression solution methods:

- Gradient-based optimization
- Stochastic gradient descent
- Direct Methods
- Golub Kahan Bidiagonalization

Since each experimental unit will recieves all t treatments, the total number of algorithm
runs in the experiment is ${t * b * r}$. For this experiment, ${t=3}$. To ensure fair
comparison between algorithms, each treatment will be applied under a fixed wall time
constraint. Each algorithm will be run for a maximum of two hours per experimental unit.
# Observational Units and Measurements

The observational units are each algorithm-dataset pair. For each combination we will observe the following

| Column Name | Data Type | Description |
|:---|:---|:---|
| `dataset_id` | Positive Integer | Identifier for the generated dataset (experimental unit). |
| `dimensional_regime` | String | Relationship between predictors and observations: `p << n`, `p ≈ n`, or `p >> n`. |
| `sparsity_level` | String | Density of the matrix `X`: `Sparse`, `Moderate`, or `Dense`. |
| `lambda_level` | String | Relative magnitude of the ridge penalty parameter `λ`: `Weak`, `Moderate`, or `Strong`. |
| `algorithm` | String | Ridge regression solution method used: `GradientDescent`, `SGD`, or `DirectMethod`. |
| `runtime_seconds` | Positive Floating-point | Time required for the algorithm to compute a solution. |
| `iterations` | Positive Integer | Number of iterations performed by the algorithm (`NA` for direct methods). |
| `step_size` | Positive Floating-point | Step size used in gradient descent or SGD (`NA` for direct methods). |
| `batch_size` | Positive Integer | Number of samples used per SGD update (`NA` for direct methods and gradient descent). |
| `number_of_epochs` | Positive Integer | Number epochs per observation (`NA` for direct methods). |



The collected measurements will be written to a CSV file. Each row in the file corresponds
to a single algorithm–dataset pair, which forms the observational unit of the experiment.
The columns represent the recorded measurements. After the experiment, the resulting CSV
file should contain ${Algorithms∗Datasets}$ number of rows and each row will contain exactly
10 columns.
3 changes: 3 additions & 0 deletions test/Project.toml
Original file line number Diff line number Diff line change
@@ -1,2 +1,5 @@
[deps]
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"