AveragedRMSProp¶
- class viabel.AveragedRMSProp(learning_rate, *, jitter=1e-08, diagnostics=False, component_wise=True)[source]¶
Averaged RMSProp optimization method (Mukkamala and Hein, 2017, §4)
Uses averaged squared gradient by setting \(\beta_k = 1-1/k\) such that
\[\nu^{(k+1)} = \beta_k \nu^{(k)} + (1-\beta_k) \hat{g}^{(k)} \cdot \hat{g}^{(k)}.\]Then,
\[\nu^{(k+1)} = (k+1)^{-1} \sum^k_{k^\prime =0}\hat{g}^{(k)} \cdot \hat{g}^{(k)},\]where \(\nu^{(k)}\) converges to a constant almost surely under certain conditions.
- Parameters:
- jitter: `float` optional
Small value used for numerical stability. The default is 1e-8
- component_wise: `boolean` optional
Indication of component wise discent direction computation
- Returns:
- descent_dirnumpy.ndarray, shape(var_param_dim,)
Descent direction of the optimization algorithm
Methods
descent_direction
(grad)Compute descent direction for optimization.
optimize
(n_iters, objective, init_param[, ...])- Parameters:
resetting \(\nu\) and k, the exponential moving average of squared gradient and iteration respectively
- __init__(learning_rate, *, jitter=1e-08, diagnostics=False, component_wise=True)[source]¶
- Parameters:
- learning_ratefloat
Tuning parameter that determines the step size
- weight_decay: `float`
L2 regularization weight
- iterate_avg_propfloat
Proportion of iterates to use for computing iterate average. None means no iterate averaging. The default is 0.2.
- diagnosticsbool, optional
Record diagnostic information if True. The default is False.
- descent_direction(grad)[source]¶
Compute descent direction for optimization.
Default implementation returns
grad
.- Parameters:
- gradnumpy.ndarray, shape(var_param_dim,)
(stochastic) gradient of the objective function
- Returns:
- descent_dirnumpy.ndarray, shape(var_param_dim,)
Descent direction of the optimization algorithm