Why Momentum Works - https://news.ycombinator.com/item?id=14034426 - April 2017 (95 comments)
var s = 3
var x = xy[0]; var y = xy[1]*s
var fx = (1-x)*(1-x) + 20*(y - x*x )*(y - x*x )
var dfx = [-2*(1-x) - 80*x*(-x*x + y), s*40*(-x*x + y)]
The interactive example uses an initial guess of [-1.21, 0.853] and a fixed 150 iterations, with no convergence test.From manually fiddling with (step-size) alpha & (momentum) beta parameters, and editing the code to specify a smaller number of iterations, it seems quite difficult to tune this momentum-based approach to get near the minima and stay there without bouncing away in 50 iterations or fewer.
Out of curiosity, I compared minimising this bananaf function with scipy.optimize.minimize, using the same initial guess.
If we force scipy.optimize.minimize to use method='cg', leaving all other parameters as defaults, it converges to the optimal solution of [1.0, 1./3.] requiring 43 evaluations of fx and dfx,
If we allow scipy.optimize.minimize to use all defaults -- including the default method='bfgs', it converges to the optimal solution after only 34 evaluations of fx and dfx.
Under the hood, scipy's method='cg' and method='bfgs' solvers do not use a fixed step size or momentum to determine the step size, but instead solve a line search problem. The line search problem is to identify a step size that satisfies a sufficient decrease condition and a curvature condition - see Wolfe conditions [1]. Scipy's default line search method -- used for cg and bfgs -- is a python port [2] of the dcsrch routine from MINPACK2. A good reference covering line search methods & BFGS is Nocedal & Wright's 2006 book Numerical Optimization.
[1] https://en.wikipedia.org/wiki/Wolfe_conditions [2] https://github.com/scipy/scipy/blob/main/scipy/optimize/_dcs...
Momentum, or specifically inertia, in physics, what the hell is it? There's a Feynman tale where he asked his father why the ball rolled to the back of a trolly when he pulled the trolley. The answer he received was the usual description of inertia, but also the rarely given insight that describing something and giving it a name is completely different from knowing why it happens.
It's one of those things that I lie in bed thinking about. The other one is position, I can grasp the notion of spacetime and the idea of movement and speed as changes in position in space relative to position in time. I really don't have a grasp of what position is though. I know the name, I can attach the numbers to it, but that doesn't really cover what the numbers are of though.