WindowedAdagrad¶
- class viabel.WindowedAdagrad(learning_rate, *, weight_decay=0, window_size=10, jitter=1e-08, diagnostics=False)[source]¶
Windowed Adagrad optimization method (Default optimizer in Pymc3)
Uses a running window (w) to get the mean squared gradient to rescale the current stochastic gradient:
\[\frac{\hat{g}^{(k+1)}}{\sqrt{\sum^k_{k^\prime = k-w} \hat{g}^{(k^\prime)} \cdot \hat{g}^{(k^\prime)}}}\]- Parameters:
- window sizeint optional
Window size used to store the square of the gradients. The default is 10
- jitter: `float` optional
Small value used for numerical stability. The default is 1e-8
- Returns:
- descent_dirnumpy.ndarray, shape(var_param_dim,)
Descent direction of the optimization algorithm
Methods
descent_direction
(grad)Compute descent direction for optimization.
optimize
(n_iters, objective, init_param[, ...])- Parameters:
resetting the running squared gradients
- __init__(learning_rate, *, weight_decay=0, window_size=10, jitter=1e-08, diagnostics=False)[source]¶
- Parameters:
- learning_ratefloat
Tuning parameter that determines the step size
- weight_decay: `float`
L2 regularization weight
- iterate_avg_propfloat
Proportion of iterates to use for computing iterate average. None means no iterate averaging. The default is 0.2.
- diagnosticsbool, optional
Record diagnostic information if True. The default is False.
- descent_direction(grad)[source]¶
Compute descent direction for optimization.
Default implementation returns
grad
.- Parameters:
- gradnumpy.ndarray, shape(var_param_dim,)
(stochastic) gradient of the objective function
- Returns:
- descent_dirnumpy.ndarray, shape(var_param_dim,)
Descent direction of the optimization algorithm