Gradient Descent
@gradient_descentPseudo-code
function update_weights(x, y, m, c, alpha):
# Step 1: Calculate total error (cost)
J = 0
for i from 0 to len(x)-1:
prediction = m * x[i] + c
error = y[i] - prediction
J += error^2
# Step 2: Calculate gradients (direction to move)
dJ_dm = 0
dJ_dc = 0
for i from 0 to len(x)-1:
prediction = m * x[i] + c
error = y[i] - prediction
dJ_dm += -2 * x[i] * error
dJ_dc += -2 * error
# Step 3: Update parameters (take a step downhill)
m = m - alpha * (dJ_dm / len(x))
c = c - alpha * (dJ_dc / len(x))
return m, c
function gradient_descent(x, y, m, c, alpha, iterations):
for step from 1 to iterations:
m, c = update_weights(x, y, m, c, alpha)
# Think: each loop is one small step downhill
return m, c
- x Represents the input data (like hours studied, size of a house, etc.). Think of it as the 'given information'.
- y Actual correct output. This is what we want our model to match (like real exam scores or house prices).
- m Slope of the line. It controls how steep the line is. Imagine tilting a stick up or down.
- c The y-intercept. It shifts the line up or down. Think of sliding the line vertically.
- alpha Learning rate. It controls how big each step is when adjusting the model. Small steps = slow but safe, big steps = fast but risky.
- J Cost (error). It measures how wrong the model is. Think of it like a 'score' where lower is better.
- 0 Often used to start values (like m = 0, c = 0). This means we start with a flat line.
- 0.01 A typical small learning rate. It ensures we take careful steps instead of jumping wildly.
- n Number of data points. We divide by n to get the average error.
Recipe Steps
Start with data points. Imagine dots on a graph (like study hours vs exam scores).
Guess a line. Start with a random line (your first guess). It will probably be wrong.
Measure how wrong you are. Calculate how far your line is from the real points (this is the cost J).
Adjust the line slightly. Move it in the direction that reduces the error. Like adjusting a tilted board to better match points.
Repeat many times. Each step improves the line little by little, like walking downhill toward the lowest point.
Stop when changes are tiny. You have reached the best-fit line.
Questions & Answers
- Like walking down a foggy mountain. You cannot see the bottom, but you feel the slope and step downward.
- Too small = very slow, too big = unstable.
- Not normalizing data can make learning unstable.
- Gradient descent may find a local minimum, not always the global one.
- Plotting the cost over time helps check if learning is working.
Related Algorithms
Decision Tree (ID3 / CART)
A Decision Tree builds a flowchart-like structure that splits data step by step using features, until it can make a clear prediction.
ViewK-Means Clustering
K-Means Clustering groups data points into k clusters by repeatedly assigning points to the nearest center and updating those centers.
ViewLongest Common Subsequence (LCS)
The Longest Common Subsequence (LCS) finds the longest sequence of characters that appear in the same order in both sequences (not necessarily next to each other).
View