Name |
Description |
Adagrad |
Automatically tunes the step size of for each feature. Features with steep
slopes will get rapidly get smaller steps while features with shallow slopes
will get larger steps. Either way, the step size for each feature is strictly
decreasing over time. |
Adam |
Adam, or "Adaptive Moment Estimation", is another schedule that automatically tunes the step sizes for each
coefficient. It builds on the theoretical foundation of RmsProp and addresses some issues that RMSProp and
Adagrad have. This is the currently recommended implementation for adaptive gradients and should be safe to
use without manual tuning of the constructor parameters. |
Fixed |
Simple step rule that always provides a fixed step size to the descent. |
RmsProp |
Essentially takes a moving average of the squares of the gradient and uses
that to calculate step sizes. As with Adagrad, steeper slopes lead to smaller
step sizes while shallower slopes lead to larger step sizes. Unlike Adagrad,
step sizes are not necessarily strictly decreasing. |