Adam¶
- class viabel.Adam(learning_rate, *, beta1=0.9, beta2=0.999, jitter=1e-08, iterate_avg_prop=0.2, diagnostics=False)[source]¶
Adam optimization method (Kingma and Ba, 2015)
Tracks exponential moving average of the gradient as well as the squared gradient:
\[\begin{split}m^{(k+1)} &= \beta_1 m^{(k)} + (1-\beta_1) \hat{g}^{(k)}\\ \nu^{(k+1)} &= \beta_2 \nu^{(k)} + (1-\beta_2) \hat{g}^{(k)} \cdot \hat{g}^{(k)}\end{split}\]and uses \(m^{(k)}\) and \(\nu^{(k)}\) to rescale the current stochastic gradient:
\[m^{(k)}/\sqrt{\nu^{(k)}}.\]- Parameters:
- beta1float optional
Gradient moving average hyper parameter. The default is 0.9
- beta2float optional
Squared gradient moving average hyper parameter. The default is 0.999
- jitter: `float` optional
Small value used for numerical stability. The default is 1e-8
- component_wise: `boolean` optional
Indicator for component-wise normalization of discent direction
- Returns:
- descent_dirnumpy.ndarray, shape(var_param_dim,)
Descent direction of the optimization algorithm
Methods
descent_direction
(grad)Compute descent direction for optimization.
optimize
(n_iters, objective, init_param[, ...])- Parameters:
resetting m and \(\nu\), the exponential moving average of gradient and squared gradient respectively
- __init__(learning_rate, *, beta1=0.9, beta2=0.999, jitter=1e-08, iterate_avg_prop=0.2, diagnostics=False)[source]¶
- Parameters:
- learning_ratefloat
Tuning parameter that determines the step size
- weight_decay: `float`
L2 regularization weight
- iterate_avg_propfloat
Proportion of iterates to use for computing iterate average. None means no iterate averaging. The default is 0.2.
- diagnosticsbool, optional
Record diagnostic information if True. The default is False.
- descent_direction(grad)[source]¶
Compute descent direction for optimization.
Default implementation returns
grad
.- Parameters:
- gradnumpy.ndarray, shape(var_param_dim,)
(stochastic) gradient of the objective function
- Returns:
- descent_dirnumpy.ndarray, shape(var_param_dim,)
Descent direction of the optimization algorithm