论文An overview of gradient descent optimization algorithms介绍了算法原理,请参考这个博客相关原理。
包里相关的函数:
tf.train.GradientDescentOptimizer(learning_rate, use_locking=False,name=’GradientDescent’)
learning_rate:学习速率,控制参数的更新速度。过大过小都会影响算法的运算时间和结果,过大容易发散,过小运算时间太长。
其他参数可以忽略
tf.train.AdadeltaOptimizer(learning_rate=0.001, rho=0.95, epsilon=1e-08, use_locking=False, name=’Adadelta’)
learning_rate: tensor或者浮点数,学习率
rho: tensor或者浮点数. The decay rate.
epsilon: A Tensor or a floating point value. A constant epsilon used to better conditioning the grad update.
tf.train.AdagradOptimizer(learning_rate, initial_accumulator_value=0.1, use_locking=False, name=’Adagrad’)
learning_rate: 学习速率
initial_accumulator_value: A floating point value. Starting value for the accumulators, must be positive.
use_locking: 默认False。变量允许并发读写操作,若为true则防止对变量的并发更新。
tf.train.MomentumOptimizer(learning_rate, momentum, use_locking=False, name=’Momentum’, use_nesterov=False)
learning_rate: A Tensor or a floating point value
momentum: A Tensor or a floating point value.
use_locking: If True use locks for update operations.
tf.train.AdamOptimizer(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, use_locking=False, name=’Adam’)
learning_rate: 学习速率
beta1: A float value or a constant float tensor. The exponential decay rate for the 1st moment estimates.
beta2: A float value or a constant float tensor. The exponential decay rate for the 2nd moment estimates.
epsilon: A small constant for numerical stability.
tf.train.exponential_decay(
learning_rate,初始学习率
global_step,当前迭代次数
decay_steps,衰减速度(在迭代到该次数时学习率衰减为earning_rate * decay_rate)
decay_rate,学习率衰减系数,通常介于0-1之间。
staircase=False,(默认值为False,当为True时,(global_step/decay_steps)则被转化为整数) ,选择不同的衰减方式。
name=None
)
学习率会按照以下公式变化:
d e c a y e d _ l e a r n i n g r a t e=l e a r n i n g _ r a t e ? d e c a y _ r a t e g l o b a l _ s t e p / d e c a y _ s t e p s {decayed\_learning}_rate=learning\_rate * decay\_rate ^ {global\_step / decay\_steps} decayed_learningr?ate=learning_rate?decay_rateglobal_step/deca