欢迎访问《南京邮电大学学报(自然科学版)》编辑部！

面向内存受限设备的新型卷积计算方法

A novel convolution calculation algorithm on memory limited devices

DOI：

中文关键词: 深度学习；卷积计算；内存优化；数据复用；边缘设备

英文关键词:deep learning; convolution calculation; memory optimization; data reuse; edge device

基金项目:国家自然科学基金（62172235）、中国博士后基金（2019M651923）和江苏省自然科学基金（BK20191381）资助项目

作者	单位
孙雁飞	南京邮电大学物联网学院，江苏南京　210003；南京邮电大学江苏省高性能计算与智能处理工程研究中心，江苏南京　210023
王子牛	南京邮电大学江苏省高性能计算与智能处理工程研究中心，江苏南京　210023；南京邮电大学计算机学院，江苏南京　210023
孙　莹	南京邮电大学自动化学院、人工智能学院，江苏南京　210023
亓　晋	南京邮电大学物联网学院，江苏南京　210003；南京邮电大学江苏省高性能计算与智能处理工程研究中心，江苏南京　210023
董振江	南京邮电大学江苏省高性能计算与智能处理工程研究中心，江苏南京　210023；南京邮电大学计算机学院，江苏南京　210023

摘要点击次数: 1890

全文下载次数: 719

中文摘要:

针对卷积神经网络预测过程中内存使用量大，难以部署在内存受限设备上的问题，提出一种面向内存受限设备的新型卷积计算方法。该方法对输入矩阵中部分数据进行卷积计算，并将计算结果存储在临时内存；然后，将临时内存中的计算结果复制到输入矩阵不再使用的内存并重复上述步骤，从而实现对输入矩阵的卷积计算；最后，对单个卷积计算和LeNet进行验证。实验结果表明，该方法计算速度较直接卷积方法更快，且相比im2col、MEC和直接卷积方法，单个卷积计算内存平均使用量分别下降89.29%、82.60%和57.15%，LeNet内存使用量分别下降89.90%、82.21%和28.07%，有效降低了卷积神经网络的内存使用量，有助于在内存受限设备上部署使用。

英文摘要:

In the prediction process of convolutional neural network, the memory consumption is large and it is difficult to deploy on memory limited devices. This paper presents a novel convolution calculation algorithm for memory limited devices. In this method, part of data in the input matrix is convolved and the result is stored in the temporary memory. Then, the calculation result in the temporary memory is copied to the memory no longer used by the input matrix and the above steps are repeated, so as to realize the convolution calculation of the input matrix. Finally, the single convolution calculation and LeNet are verified. The experimental results show that the average memory usage of single convolution calculation is reduced by 89.29%, 82.60% and 57.15%, and the memory usage of LeNet is reduced by 89.90%, 82.21% and 28.07% compared with im2col, MEC and direct convolution methods, respectively, when the calculation speed is faster than that of direct convolution method. It effectively reduces the memory usage of convolutional neural networks, which is helpful for the deployment on memory limited devices.

查看全文查看/发表评论下载PDF阅读器