一种多参数学习的门控激活函数
A multi parameterized gated activation function
  
DOI:
中文关键词:  激活函数;门控机制;深度神经网络
英文关键词:activation function; gating mechanism;deep neural network
基金项目:
作者单位
夏正新 南京邮电大学 继续教育学院,江苏 南京 210042;南京邮电大学 医疗信息工程研究中心,江苏 南京 210042 
苏 翀 南京邮电大学 管理学院,江苏 南京 210003;南京邮电大学 医疗信息工程研究中心,江苏 南京 210042 
摘要点击次数: 209
全文下载次数: 65
中文摘要:
      激活函数通过自身的非线性机制去激活神经网络,使其能够有效地运行,并保持较高的性能。选择好的激活函数可以对提高网络性能产生较大的影响。ReLU激活函数因其简单有效而成为深度神经网络中最常见的选择,然而,当输入为负时,在反向传播过程中,ReLU梯度将为零,进而导致发生神经元坏死的问题。为此,一种基于软性的门控机制的激活函数Swish和Mish相继提出,该类激活函数主要利用激活函数(如:Sigmoid或Tanh函数)来控制门的开或关,进而实现了神经网络的非线性化需求,并在许多具有挑战性的网络模型和数据集上取得了更好的效果。鉴于上述门控机制运行时,激活函数饱和区的范围相对固定,不能更好地拟合各种网络模型和数据分布。文中提出了一种多参数学习的门控激活函数(A Multi parameterized Gated Action Function, Mpish),该函数使用多个参数动态地调整激活函数的饱和区范围,从而适应不同的网络模型和数据分布。实验结果表明:该函数能有效提高神经网络训练结果的准确性和稳定性,且可以较好地工作在更深层次的网络模型中。
英文摘要:
      The activation function activates neural networks through its nonlinear mechanism. It can help neural networks work effectively and maintain high performance. A good activation function is critical to the performance of neural networks. The ReLU activation function has been commonly used in deep neural networks because of its simplicity and effectiveness. However, when the input is negative, the gradient of the ReLU function will be zero through the back propagation, leading to the “dying ReLU” problem. Some researchers proposed activation functions based on soft gating mechanisms such as Swish and Mish. They use the activation function, like Sigmoid or Tanh function, to realize the gating mechanism and to meet the non linear requirement of neural networks. The effectiveness of these models has been verified on on many challenging networks models and data sets. However, since the range of the saturation region of those activation functions is relatively fixed, they cannot better fit various network models and data distributions. In this paper, we propose a multi parameterized gated activation function (Mpish). This function uses multiple parameters to dynamically adjust its saturation range and to adapt to different network models and data distributions. The experimental results show that this function can effectively improve the accuracy and stability of training results of neural networks, and work well in deeper neural networks.
查看全文  查看/发表评论  下载PDF阅读器

你是第3088250访问者
版权所有《南京邮电大学学报(自然科学版)》编辑部
Tel:86-25-85866913 E-mail:xb@njupt.edu.cn
技术支持:本系统由北京勤云科技发展有限公司设计