Adam

class viabel.Adam(learning_rate, *, beta1=0.9, beta2=0.999, jitter=1e-08, iterate_avg_prop=0.2, diagnostics=False)[source]

Adam optimization method (Kingma and Ba, 2015)

Tracks exponential moving average of the gradient as well as the squared gradient:

\[\begin{split}m^{(k+1)} &= \beta_1 m^{(k)} + (1-\beta_1) \hat{g}^{(k)}\\ \nu^{(k+1)} &= \beta_2 \nu^{(k)} + (1-\beta_2) \hat{g}^{(k)} \cdot \hat{g}^{(k)}\end{split}\]

and uses \(m^{(k)}\) and \(\nu^{(k)}\) to rescale the current stochastic gradient:

\[m^{(k)}/\sqrt{\nu^{(k)}}.\]
Parameters:
beta1float optional

Gradient moving average hyper parameter. The default is 0.9

beta2float optional

Squared gradient moving average hyper parameter. The default is 0.999

jitter: `float` optional

Small value used for numerical stability. The default is 1e-8

component_wise: `boolean` optional

Indicator for component-wise normalization of discent direction

Returns:
descent_dirnumpy.ndarray, shape(var_param_dim,)

Descent direction of the optimization algorithm

Methods

descent_direction(grad)

Compute descent direction for optimization.

optimize(n_iters, objective, init_param[, ...])

Parameters:

reset_state()

resetting m and \(\nu\), the exponential moving average of gradient and squared gradient respectively

__init__(learning_rate, *, beta1=0.9, beta2=0.999, jitter=1e-08, iterate_avg_prop=0.2, diagnostics=False)[source]
Parameters:
learning_ratefloat

Tuning parameter that determines the step size

weight_decay: `float`

L2 regularization weight

iterate_avg_propfloat

Proportion of iterates to use for computing iterate average. None means no iterate averaging. The default is 0.2.

diagnosticsbool, optional

Record diagnostic information if True. The default is False.

descent_direction(grad)[source]

Compute descent direction for optimization.

Default implementation returns grad.

Parameters:
gradnumpy.ndarray, shape(var_param_dim,)

(stochastic) gradient of the objective function

Returns:
descent_dirnumpy.ndarray, shape(var_param_dim,)

Descent direction of the optimization algorithm

reset_state()[source]

resetting m and \(\nu\), the exponential moving average of gradient and squared gradient respectively