Volume 16, Issue 4 (12-2024)                   itrc 2024, 16(4): 9-19 | Back to browse issues page

XML Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

NarimanJahan B, Khademzadeh A, Reza A. FUCA: a Frame to Prevent the Generation of Useless results in the Dataflows Based on Cartesian Product for Convolutional Neural Network Accelerators. itrc 2024; 16 (4) :9-19
URL: http://journal.itrc.ac.ir/article-1-673-en.html
1- DDepartment of Computer Engineering Bonab Branch, Islamic Azad University Bonab, Iran
2- Iran Telecommunication Research Center (ITRC), Tehran, Iran , itrc.ahmadkhademzadeh@gmail.com
3- Department of Computer Engineering, Shahr-e-Qods Branch, Islamic Azad University, Tehran, Iran
Abstract:   (578 Views)

One of the most important issues in the design of CNN accelerators pertains to the accelerator's ability to effectively leverage the available opportunities in the type and processing of input data, and the task of achieving this objective mostly lies with the dataflow. Equal channel size in the input feature map and filter of CNNs is one of these opportunities, which makes it desirable to design dataflow as Channel Dimension Stationary (CDS). On the other hand, the complexity of designing computations based on the Cartesian product (due to its all-to-all nature) is lower, especially in CDS dataflows. But, since the Cartesian product method causes the generation of useless products and, as a result, reduces performance and energy efficiency, there is less desire for this type of design. This paper presents a frame called FUCA for Cartesian product-based dataflows, which avoids operations leading to useless products. The analysis revealed that FUCA reduces runtime and energy consumption in the Cartesian product-based dataflow by 1.5x, potentially surpassing the sliding window-based dataflow
Full-Text [PDF 1807 kb]   (122 Downloads)    
Type of Study: Research | Subject: Information Technology

References
1. Lee, K., et al. SecureLoop: Design Space Exploration of Secure DNN Accelerators. in Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture. 2023.
2. Moghaddasi, I., S. Gorgin, and J.-A. Lee, Dependable DNN Accelerator for Safety-critical Systems: A Review on the Aging Perspective. IEEE Access, 2023.
3. Yang, J., H. Zheng, and A. Louri. Venus: A versatile deep neural network accel-erator architecture design for multiple applications. in Proceedings of Design Automation Conference (DAC). 2023
4. Parchamdar, B. and M. Reshadi. Data Flow Mapping onto DNN Accelerator Considering Hardware Cost. in 2020 IEEE 14th Dallas Circuits and Systems Conference (DCAS). 2020. IEEE.
5. Rasch, M.J., et al., Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators. Nature Communications, 2023. 14(1): p. 5282
6. Moon, G.E., et al., Evaluating spatial accelerator architectures with tiled matrix-matrix multiplication. IEEE Transactions on Parallel and Distributed Systems, 2021. 33(4): p. 1002-1014.
7. Chen, Y.-H., et al., Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE journal of solid-state circuits, 2016. 52(1): p. 127-138
8. Aimar, A., et al., NullHop: A flexible convolutional neural network accelerator based on sparse representations of feature maps. IEEE transactions on neural networks and learning systems, 2018. 30(3): p. 644-656.
9. Parashar, A., et al., SCNN: An accelerator for compressed-sparse convolutional neural networks. ACM SIGARCH computer architecture news, 2017. 45(2): p. 27-40.
10. Narimanjahan, B., et al., MCPS: a mapping method for MAERI accelerator base on Cartesian Product based Convolution for DNN layers with sparse input feature map. Cluster Computing, 2022. 25(5): p. 3213-3230.
11. Kwon, H., A. Samajdar, and T. Krishna, Maeri: Enabling flexible dataflow mapping over dnn accelerators via reconfigurable interconnects. ACM SIGPLAN Notices, 2018. 53(2): p. 461-475.
12. Armeniakos, G., et al., Hardware approximate techniques for deep neural network accelerators: A survey. ACM Computing Surveys, 2022. 55(4): p. 1-36.
13. Taheri, M., et al., Exploration of Activation Fault Reliability in Quantized Systolic Array-Based DNN Accelerators. arXiv preprint arXiv:2401.09509, 2024.
14. Zhang, S., et al. Cambricon-X: An accelerator for sparse neural networks. in 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 2016. IEEE.
15. Szegedy, C., et al. Going deeper with convolutions. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
16. He, K., et al. Deep residual learning for image recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
17. Huang, G., et al. Densely connected convolutional networks. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
18. Huang, Z., et al., DC-SPP-YOLO: Dense connection and spatial pyramid pooling based YOLO for object detection. Information Sciences, 2020. 522: p. 241-258.
19. Redmon, J., et al. You only look once: Unified, real-time object detection. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
20. Redmon, J. and A. Farhadi, Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.
21. Wang, G., et al., UAV-YOLOv8: a small-object-detection model based on improved YOLOv8 for UAV aerial photography scenarios. Sensors, 2023. 23(16): p. 7190.
22. Hanson, E., et al. Cascading structured pruning: enabling high data reuse for sparse DNN accelerators. in Proceedings of the 49th Annual International Symposium on Computer Architecture. 2022.
23. Jouppi, N.P., et al. In-datacenter performance analysis of a tensor processing unit. in Proceedings of the 44th annual international symposium on computer architecture. 2017.
24. Chatarasi, P., et al., Marvel: a data-centric approach for mapping deep learning operators on spatial accelerators. ACM Transactions on Architecture and Code Optimization (TACO), 2021. 19(1): p. 1-26.
25. Zhao, Z., et al. mrna: Enabling efficient mapping space exploration for a reconfiguration neural accelerator. in 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 2019. IEEE
26. Redmon, J. and A. Farhadi. YOLO9000: better, faster, stronger. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
27. Krizhevsky, A., I. Sutskever, and G.E. Hinton, Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 2012. 25.

Add your comments about this article : Your username or Email:
CAPTCHA

Send email to the article author


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.