优化器(optimizers)类的基类。这个类定义了在训练模型的时候添加一个操作的API。你基本上不会直接使用这个类,但是你会用到他的子类比如, , .等等这些。
后面讲的时候会详细讲一下 这个类的一些函数,然后其他的类只会讲构造函数,因为类中剩下的函数都是大同小异的
这个类是实现梯度下降算法的优化器。(结合理论可以看到,这个构造函数需要的一个学习率就行了)
(learning_rate, use_locking=False,name=’GradientDescent’)
compute_gradients(loss,var_list=None,gate_gradients=GATE_OP,aggregation_method=None,colocate_gradients_with_ops=False,grad_loss=None)
apply_gradients(grads_and_vars,global_step=None,name=None)
get_name()
minimize(loss,global_step=None,var_list=None,gate_gradients=GATE_OP,aggregation_method=None,colocate_gradients_with_ops=False,name=None,grad_loss=None)
实现了 Adadelta算法的优化器,可以算是下面的Adagrad算法改进版本
构造函数:
tf.train.AdadeltaOptimizer.init(learning_rate=0.001, rho=0.95, epsilon=1e-08, use_locking=False, name=’Adadelta’)
作用:构造一个使用Adadelta算法的优化器
参数:
learning_rate: tensor或者浮点数,学习率
rho: tensor或者浮点数. The decay rate.
epsilon: A Tensor or a floating point value. A constant epsilon used to better conditioning the grad update.
use_locking: If True use locks for update operations.
name: 【可选】这个操作的名字,默认是”Adadelta”
Optimizer that implements the Adagrad algorithm.
See this paper.
tf.train.AdagradOptimizer.(learning_rate, initial_accumulator_value=0.1, use_locking=False, name=’Adagrad’)
Construct a new Adagrad optimizer.
Args:
Raises:
The Optimizer base class provides methods to compute gradients for a loss and apply gradients to variables. A collection of subclasses implement classic optimization algorithms such as GradientDescent and Adagrad.
You never instantiate the Optimizer class itself, but instead instantiate one of the subclasses.
Optimizer that implements the Momentum algorithm.
tf.train.MomentumOptimizer.(learning_rate, momentum, use_locking=False, name=’Momentum’, use_nesterov=False)
Construct a new Momentum optimizer.
Args:
learning_rate: A Tensor or a floating point value. The learning rate.
momentum: A Tensor or a floating point value. The momentum.
use_locking: If True use locks for update operations.
name: Optional name prefix for the operations created when applying gradients. Defaults to “Momentum”.
实现了Adam算法的优化器
构造函数:
tf.train.AdamOptimizer.(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, use_locking=False, name=’Adam’)
Construct a new Adam optimizer.
Initialization:
m_0 <- 0 (Initialize initial 1st moment vector)
v_0 <- 0 (Initialize initial 2nd moment vector)
t <- 0 (Initialize timestep)
The update rule for variable with gradient g uses an optimization described at the end of section2 of the paper:
t <- t + 1
lr_t <- learning_rate * sqrt(1 - beta2^t) / (1 - beta1^t)
m_t <- beta1 * m_{t-1} + (1 - beta1) * g
v_t <- beta2 * v_{t-1} + (1 - beta2) * g * g
variable <- variable - lr_t * m_t / (sqrt(v_t) + epsilon)
The default value of 1e-8 for epsilon might not be a good default in general. For example, when training an Inception network on ImageNet a current good choice is 1.0 or 0.1.
Note that in dense implement of this algorithm, m_t, v_t and variable will update even if g is zero, but in sparse implement, m_t, v_t and variable will not update in iterations g is zero.
Args:
learning_rate: A Tensor or a floating point value. The learning rate.
beta1: A float value or a constant float tensor. The exponential decay rate for the 1st moment estimates.
beta2: A float value or a constant float tensor. The exponential decay rate for the 2nd moment estimates.
epsilon: A small constant for numerical stability.
use_locking: If True use locks for update operations.
name: Optional name for the operations created when applying gradients. Defaults to “Adam”.