Functions
AI Assistant

Gradient Descent

Roll downhill to find the minimum — the engine of machine learning

Gradient descent finds the minimum of a function by repeatedly stepping in the direction of steepest descent. The gradient (derivative) tells you which way is "downhill," and the learning rate controls how big each step is.

The graph shows f(x) = x⁴ − 3x² + 2, which has two valleys (local minima). Starting from a point, the algorithm follows the slope downhill, step by step, until it reaches a valley. This is exactly how neural networks learn — they "roll downhill" on a loss landscape.

Ask the AI "Start at x = 2 and descend" or "What happens with a large learning rate?"

Graph

FAQ

What is gradient descent?
An optimization algorithm that finds function minima by iterating: x_{n+1} = x_n − α · f'(x_n), where α is the learning rate. At each step, you move in the direction opposite to the gradient (downhill).
What is the learning rate?
The learning rate α controls step size. Too small: slow convergence, many steps needed. Too large: you overshoot the minimum and may diverge. Finding the right learning rate is a key challenge in machine learning.
What is the difference between local and global minimum?
A local minimum is a valley that's lower than its immediate surroundings but may not be the absolute lowest point. A global minimum is the lowest point overall. Gradient descent can get stuck in local minima — it only sees the local slope.
How is this used in machine learning?
Neural networks have a "loss function" that measures prediction error. Gradient descent adjusts the model's parameters to minimize this loss — literally rolling downhill on a high-dimensional landscape. Every AI model you use was trained this way.