Single node deep learning frameworks: Comparative study and CPU/GPU performance analysis

artificial intelligence; CPU; deep learning; distributed computing; frameworks; GPU; parallel computing; Comparatives studies; Deep learning; Design and implementations; Efficient sets; Framework; High-level programming; Learning frameworks; Parallel com- puting; Performances analysis; Software; Theoretical Computer Science; Computer Science Applications; Computer Networks and Communications; Computational Theory and Mathematics

Abstract :

[en] Deep learning presents an efficient set of methods that allow learning from massive volumes of data using complex deep neural networks. To facilitate the design and implementation of algorithms, deep learning frameworks provide a high-level programming interface. Based on these frameworks, new models, and applications are able to make better and better predictions. One type of deep learning application is the Internet of Things that can gather a continuous flow of data, which causes an explosion of the amount of data. Therefore, to handle this data management issue, computation technologies can offer new perspectives to analyze more data with more complex models. In this context, a cluster of computers can operate to quickly deliver a model or to enable the design of a complex neural network spread among computers. An alternative is to distribute a deep learning task with HPC cloud computing resources and to scale cluster in order to quickly and efficiently train a neural network. As a first step to design an infrastructure aware framework which is able to scale the computing nodes, this work aims to review and analyze the state-of-the-art frameworks by collecting device utilization data during the training task. We gather information about the CPU, RAM and the GPU utilization on deep learning algorithms with and without multi-threading. The behavior of each framework is discussed and analyzed in order to shed light on the strengths and weaknesses of the different deep learning frameworks.

Disciplines :

Computer science

Author, co-author :

Lerat, Jean-Sébastien ; Faculty of Engineering, University of Mons, Mons, Belgium ; Department of Sciences and Technologies, Haute École en Hainaut, Mons, Belgium

Mahmoudi, Sidi ; Université de Mons - UMONS > Faculté Polytechniqu > Service Informatique, Logiciel et Intelligence artificielle

Mahmoudi, Saïd ; Faculty of Engineering, University of Mons, Mons, Belgium

Language :

English

Title :

Single node deep learning frameworks: Comparative study and CPU/GPU performance analysis

Publication date :

25 June 2023

Event name :

Concurrency and Computation Practice and Expertise

Event date :

Juin 2023

Audience :

International

Journal title :

Concurrency and Computation: Practice and Experience

ISSN :

1532-0626

eISSN :

1532-0634

Publisher :

John Wiley and Sons Ltd

Volume :

Issue :

Peer reviewed :

Peer Reviewed verified by ORBi

Additional URL :

https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpe.6730

Research unit :

F114 - Informatique, Logiciel et Intelligence artificielle

Research institute :

Infortech

Funding text :

This work was partially funded by the Wallonia-Brussels Federation (JCM/TP/BS/mo/c999)This work was partially funded by the Wallonia‐Brussels Federation (JCM/TP/BS/mo/c999)

Available on ORBi UMONS :

since 12 January 2024

Statistics

Number of views

46 (6 by UMONS)

Number of downloads

70 (2 by UMONS)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

Bibliography

Wen W, Xu C, Yan F, et al. Terngrad: ternary gradients to reduce communication in distributed deep learning. Proceedings of the 2017 International Conference on Information and Communication Technology Convergence; 2017:1509-1519.
Sattler F, Wiedemann S, Müller KR, Samek W. Sparse binary compression: towards distributed deep learning with minimal communication. Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN); 2019:1-8; IEEE.
Kuang D, Chen M, Xiao D, Wu W. Entropy-based gradient compression for distributed deep learning. Proceedings of the 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems; 2019:231-238; IEEE.
Lim EJ, Ahn SY, Choi W. Accelerating training of DNN in distributed machine learning system with shared memory. Proceedings of the 2017 International Conference on Information and Communication Technology Convergence (ICTC); 2017:1209-1212; IEEE.
Li D, Lai Z, Ge K, Zhang Y, Zhang Z, Wang Q, Wang H. Hpdl: towards a general framework for high-performance distributed deep learning. Proceedings of the 2019 IEEE 39th International Conference on Distributed Computing Systems; 2019:1742-1753; IEEE.
Ahn S, Kim J, Lim E, Choi W, Mohaisen A, Kang S. Shmcaffe: a distributed deep learning platform with shared memory buffer for HPC architecture. Proceedings of the 2018 IEEE 38th International Conference on Distributed Computing Systems; 2018:1118-1128; IEEE.
Moritz P, Nishihara R, Stoica I, Jordan MI. Sparknet: training deep networks in spark; 2015. arXiv preprint arXiv:1511.06051.
Shi S, Wang Q, Chu X. Performance modeling and evaluation of distributed deep learning frameworks on gpus. Proceedings of the 2018 IEEE 16th International Conference on Dependable, Autonomic and Secure Computing, 16th International Conference on Pervasive Intelligence and Computing, 4th International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech); 2018:949-957; IEEE.
Mayer R, Jacobsen H-A. Scalable deep learning on distributed infrastructures: challenges, techniques, and tools. ACM Comput Surv (CSUR). 2020;53(1):1-37.
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst. 2012;25:1097-1105.
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Bengio Y, LeCun Y, eds. Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, Conference Track Proceedings, San Diego, CA; arXiv; 2015.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016:770–778.
Howard A, Zhmoginov A, Chen LC, Sandler M, Zhu M. Mobilenetv2: Inverted residuals and linear bottlenecks. Proceedings of the IEEE conference on computer vision and pattern recognition. 2018:4510–4520.