AveragedRMSProp¶

class viabel.AveragedRMSProp(learning_rate, *, jitter=1e-08, diagnostics=False, component_wise=True)[source]¶

Averaged RMSProp optimization method (Mukkamala and Hein, 2017, §4)

Uses averaged squared gradient by setting \(\beta_k = 1-1/k\) such that

\[\nu^{(k+1)} = \beta_k \nu^{(k)} + (1-\beta_k) \hat{g}^{(k)} \cdot \hat{g}^{(k)}.\]

Then,

\[\nu^{(k+1)} = (k+1)^{-1} \sum^k_{k^\prime =0}\hat{g}^{(k)} \cdot \hat{g}^{(k)},\]

where \(\nu^{(k)}\) converges to a constant almost surely under certain conditions.

Parameters:

jitter: `float` optional: Small value used for numerical stability. The default is 1e-8
component_wise: `boolean` optional: Indication of component wise discent direction computation

Returns:

descent_dirnumpy.ndarray, shape(var_param_dim,): Descent direction of the optimization algorithm

Methods

Compute descent direction for optimization.

optimize(n_iters, objective, init_param[, ...])

resetting \(\nu\) and k, the exponential moving average of squared gradient and iteration respectively

__init__(learning_rate, *, jitter=1e-08, diagnostics=False, component_wise=True)[source]¶

Parameters:

learning_ratefloat: Tuning parameter that determines the step size
weight_decay: `float`: L2 regularization weight
iterate_avg_propfloat: Proportion of iterates to use for computing iterate average. None means no iterate averaging. The default is 0.2.
diagnosticsbool, optional: Record diagnostic information if True. The default is False.

descent_direction(grad)[source]¶

Compute descent direction for optimization.

Default implementation returns grad.

Parameters:

gradnumpy.ndarray, shape(var_param_dim,): (stochastic) gradient of the objective function

Returns:

descent_dirnumpy.ndarray, shape(var_param_dim,): Descent direction of the optimization algorithm

reset_state()[source]¶: resetting \(\nu\) and k, the exponential moving average of squared gradient and iteration respectively