A Partial Method for Calculating CNN Networks Based On Loop Tiling

Ali  A.D. Farahani, Ali; Beitollahi, Hakem; Fathy, Mahmood; Barangi, Reza

doi:10.61186/itrc.15.2.12

Volume 15, Issue 2 (3-2023) itrc 2023, 15(2): 12-18 | Back to browse issues page

‎ 10.61186/itrc.15.2.12

Mendeley

Zotero

RefWorks

Ali A.D. Farahani A, Beitollahi H, Fathy M, Barangi R. A Partial Method for Calculating CNN Networks Based On Loop Tiling. itrc 2023; 15 (2) : 2
URL: http://journal.itrc.ac.ir/article-1-514-en.html

A Partial Method for Calculating CNN Networks Based On Loop Tiling

Ali Ali A.D. Farahani¹

, Hakem Beitollahi

², Mahmood Fathy¹

, Reza Barangi¹

1- School of Computer Engineering Iran University of Science and Technology Tehran, Iran
2- School of Computer Engineering Iran University of Science and Technology Tehran, Iran , Beitollahi@iust.ac.ir

Abstract: (1531 Views)

Convolutional Neural Networks (CNNs) have been widely deployed in the ﬁelds of artiﬁcial intelligence and computer vision. In these applications, the CNN part is the most computationally intensive. When these applications are run in an embedded device, the embedded processor can hardly handle the processing. This paper implements loop tiling to explain how one can construct a lightweight, low-power, and efﬁcient CNN hardware accelerator for embedded computing devices. This method breaks a large CNN engine into small CNN engines and calculates them by low hardware resources. Finally, the results of small CNN engines are added and concatenated to construct the large CNN output. Using this method, a small accelerator can be conﬁgured to run a wide range of large CNNs. A small accelerator with one layer is designed to evaluate our methodology. Our initial investigations show that based on our methodology, the constructed accelerator can run a modiﬁed version of MobileNetV1, 70 times per second.

Article number: 2

Keywords: Convolutional neural networks (CNNs), Hardware Accelerator, Embedded system, Low Power.

Full-Text [PDF 823 kb] (706 Downloads)

Type of Study: Research | Subject: Information Technology

References

1. [1] Tianyi, Liu, et al.: 'Implementation of Training Convolutional Neural Networks', arXiv:1506.01195, 2015.

2. [2] Andrii O. Tarasenko, et al.: 'Convolutional Neural Networks as a Model of the Visual System: Past, Present, and Future', J. Cognitive neuroscience, 2020.

3. [3] Asifullah Khan1, et al.: 'A Survey of the Recent Architectures of Deep Convolutional Neural Networks', Artiﬁcial Intelligence Review, DOI: [DOI:10.1007/s10462- 020-09825-6.]

4. [4] Min Wang, et al.: 'Factorized Convolutional Neural Networks', P. IEEE International conference on Computer Vision Workshops, P.545-553, 2017. [DOI:10.1109/ICCVW.2017.71]

5. [5] Yufei Ma, et al. 'ALAMO: FPGA acceleration of deep learning algorithms with a modularized RTL compiler', Integration, the VLSI Journal, 2018, ELSEVIER, pp14-23. [DOI:10.1016/j.vlsi.2017.12.009]

6. [6] Andrew G. Howard, et al. 'MobileNet: Efﬁcient Convolutional Neural Networks for Mobile Vision Applications', arXiv:1704.0486,2017.

7. [7] Xiaocong Lian, et al. 'High-Performance FPGA-Based CNN Accelerator With Block-Floating-Point Arithmetic', IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 27, NO. 8,AUGUST 2019, pp.1874-1885. [DOI:10.1109/TVLSI.2019.2913958]

8. [8] Yongming Shen, et al. 'Escher: A CNN Accelerator with Flexible Buffering to Minimize Off-Chip Transfer', Annual17 Volume 15- Number 2 - 2023 (12 -18) 18 IEEE Symposium on Filed-Programmable Custom Computing Machine FCCM, 2017, pp.93-100.

9. [9] Wei Dinga, et al. 'Designing Efﬁcient Accelerator of Depthwise Separable Convolutional Neural Network on FPGA', Journal of Systems Architecture, ELSEVIER 2019, DOI: https://doi.org/10.1016/j.sysarc.2018.12.008 [DOI:10.1016/j.sysarc.2018.12.008.]

10. [10] Jiang Su, et al. 'Redundancy-reduced MobileNet Acceleration on Reconﬁgurable Logic For ImageNet Classiﬁcation', International Symposium on Applied Reconﬁgurable Computing, 16-28, 2018. [DOI:10.1007/978-3-319-78890-6_2]

11. [11] Yu-Hsin chen, et al. 'Efﬁcient Processing of Deep Neural Networks: A Tutorial and Survey', pp. 2295-2329, Proceedings of the IEEE - Vol. 105, No. 12, December 2017. [DOI:10.1109/JPROC.2017.2761740]

12. [12] H. Kopka and P. W. Daly, A Guide to L AT EX, 3rd ed. Harlow, England: Addison-Wesley, 1999.

13. [13] A. Stoutchinin, et al. 'Optimally Scheduling CNN Convolutions for Efficient Memory Access', IEEE Transaction on computer-aided design of integrated circuits and systems, Feb 2019.

14. [14] H. Sharma, et al. 'Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks', ISCA 2018. [15] X. Zhang, et al. "SuffleNet:An Extremly Efficient Convolutional Neural Network for Moblile devices", 2017, [DOI:10.1109/ISCA.2018.00069]

15. arxiv:1707.01083v2.

16. [16] Y. Huang, et al. "An efficient loop tiling framework for convolutional neural network inference accelerators june", vol.16, pp.116-123, the Institue of Engineering and Thechnology, IET Circuits Devices Syst, 2022. [DOI:10.1049/cds2.12091]

17. [17] M. Merouani, et al. "Progress Report: A Deep Learning Guided Exploration of Affine Unimodular Loop Transformations" IMPACT 2022.

18. [18] R. Li, et al. "Analytical Characterization and Design Space Exploration for Optimization of CNNs" ASPLOS '21, April 19-23, 2021, Virtual, USA. [DOI:10.1145/3445814.3446759]

19. [19] P. Darbani, N. Rohbani, H. Beitollahi, P. Lotfi-Kamran " RASHT: A Partially Reconfigurable Architecture for Efficient Implementation of CNNs" IEEE Transactions on very large scale integration (VLSI), Vol. 30, Nr. 7, pp. 860-868, 2022. [DOI:10.1109/TVLSI.2022.3167449]

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Principal Contact