Equalizer: Energy-efficient machine learning-based heterogeneous cluster load balancer

energy; execution time prediction; heterogeneous cluster; imbalance; load balancing; scheduling; Computing load; Energy; Execution time predictions; Heterogeneous clusters; Heterogeneous systems; Imbalance; Load balancer; Load-Balancing; Machine-learning; Real- time; Software; Theoretical Computer Science; Computer Science Applications; Computer Networks and Communications; Computational Theory and Mathematics

Abstract :

[en] Heterogeneous systems deliver high computing performance when effectively utilized. It is crucial to execute each application on the most suitable device while maintaining system balance. However, achieving equal distribution of the computing load is challenging due to variations in computing power and device architectures within the system. Moreover, scheduling applications at real-time further complicates this task, as prior information about the submitted applications is absent. In this context, we introduce “Equalizer,” a real-time load balancer for heterogeneous systems. “Equalizer” leverages machine learning to continuously monitor the system's state, predicting optimal devices for application execution at runtime. It assigns applications to devices that minimize system imbalance. To quantify system imbalance, we propose a novel metric that reflects the disparity in computing loads across the system's devices. This metric is calculated using predicted execution times of applications. To validate the performance of “Equalizer,” we conducted a comparative study against widely adopted approaches, namely Round Robin and Device Suitability. The experiments were performed on a heterogeneous cluster comprising a master host and three slave servers, equipped with a total of 4 central processing units (CPUs) and 4 graphics processing units (GPUs). All approaches were deployed on the cluster and evaluated using three distinct workloads categorized by their computing intensity: medium intensity, heavy intensity, and a combination of heavy and medium intensity, simulating real-world scenarios. Each workload consisted of a set of 80 OpenCL applications with varying input data sizes. The experimental results demonstrate that “Equalizer” effectively minimized the system's imbalance, reduced the idle time of devices, and eliminated overloads. Moreover, “Equalizer” exhibited significant improvements in workload execution time, resource utilization, throughput, and energy consumption. Across all tested scenarios, “Equalizer” consistently outperformed alternative approaches, showcasing its robustness, adaptability to dynamic environments, and applicability in real-world practice.

Disciplines :

Computer science

Author, co-author :

Rahmani, Taha Abdelazziz; LIO Laboratory, Department of Computer Science, University of Oran1, Oran, Algeria ; FPMs-ILIA Laboratory, Department of Computer Science, University of Umons, Mons, Belgium

Belalem, Ghalem; LIO Laboratory, Department of Computer Science, University of Oran1, Oran, Algeria

Mahmoudi, Sidi ; Université de Mons - UMONS > Faculté Polytechnique > Service Informatique, Logiciel et Intelligence artificielle

Merad-Boudia, Omar Rafik; LIO Laboratory, Department of Computer Science, University of Oran1, Oran, Algeria

Language :

English

Title :

Equalizer: Energy-efficient machine learning-based heterogeneous cluster load balancer

Publication date :

25 October 2024

Journal title :

Concurrency and Computation: Practice and Experience

ISSN :

1532-0626

eISSN :

1532-0634

Publisher :

John Wiley and Sons Ltd

Volume :

Issue :

Peer reviewed :

Peer Reviewed verified by ORBi

Additional URL :

https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpe.8230

Research unit :

F114 - Informatique, Logiciel et Intelligence artificielle

Research institute :

R450 - Institut NUMEDIART pour les Technologies des Arts Numériques
R300 - Institut de Recherche en Technologies de l'Information et Sciences de l'Informatique

Available on ORBi UMONS :

since 14 January 2025

Statistics

Number of views

28 (0 by UMONS)

Number of downloads

0 (0 by UMONS)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

Bibliography

Bestavros A. WWW traffic reduction and load balancing through server-based caching. IEEE Concurr. 1997;5(1):56-67. doi:10.1109/ACCESS.2021.3065170
Lung-Hsuan H, Chih-Hung W, Chiung-Hui T, Hsiang-Cheh H. Migration-based load balance of virtual machine servers in cloud computing by load prediction using genetic-based methods. IEEE Access. 2021;9:49760-49773. doi:10.1007/978-3-642-19861-8_16
Rahmani TA, Belalem G, Mahmoudi SA. Machine learning-driven energy-efficient load balancing for real-time heterogeneous systems. Clust Comput. 2024. doi:10.1007/s10586-023-04215-3
Tarek H, Sadam A, Omar B. A machine learning-based approach to estimate the CPU-Burst time for processes in the computational grids. Paper presented at: International Conference on Artificial Intelligence, Modelling and Simulation (AIMS), Kota Kinabalu, Malaysia, 3-8. 2-4 December 2015. doi:10.1109/AIMS.2015.11
Khronos Group. OpenCL specification, version 2.2. 2019 https://registry.khronos.org/OpenCL/
Olivier V, Pangfeng L, Jan-Jan W. A collaborative CPU–GPU approach for principal component analysis on mobile heterogeneous platforms. J Parallel Distrib Comput. 2018;120:44-61. doi:10.1016/j.jpdc.2018.05.006
Wang YC, Cheng K-T. Energy and performance characterization of mobile heterogeneous computing. Paper presented at: IEEE Workshop on Signal Processing Systems. 2012. doi:10.1109/sips.2012.61
Judit P, Rosa MB, Eduard A, Jesús L. SSMART: smart scheduling of multi-architecture tasks on heterogeneous systems. Paper presented at: WACCPD: Proceedings of the Second Workshop on Accelerator Programming Using Directives, Austin, Texas, USA, 1-11. 15 November 2015. doi:10.1145/2832105.2832109
Jinbin H, Chaoliang Z, Zilong W, et al. Enabling load balancing for lossless datacenters. Paper presented at: IEEE 31st International Conference on Network Protocols (ICNP), Reykjavik, Iceland, 1-11. 10-13 October 2023. doi:10.1109/ICNP59255.2023.10355615
Jinbin H, Yi H, Jin W, Wangqing L, Jiawei H. RLB: reordering-robust load balancing in lossless datacenter networks. Paper presented at: ICPP'23: Proceedings of the 52nd International Conference on Parallel Processing, Salt Lake City, Utah, USA, 576-584. 7 - 10 August 2023. doi:10.1145/3605573.3605617
Jinbin H, Jiawei H, Wenjun L, et al. Adjusting switching granularity of load balancing for heterogeneous datacenter traffic. IEEE/ACM Trans Netw. 2021;29(5):2367-2384. doi:10.1109/TNET.2021.3088276
Borja P, Esteban S, José LB, Ramón B. Energy efficiency of load balancing for data-parallel applications in heterogeneous systems. J Supercomput. 2017;73:330-342. doi:10.1007/s11227-016-1864-y
Borja P, Stafford E, Bosque JL, Beivide R. Sigmoid: an auto-tuned load balancing algorithm for heterogeneous systems. J Parallel Distrib Comput. 2021;157:30-42. doi:10.1016/j.jpdc.2021.06.003
Mahmoudi S, Manneback P, Augonnet C, Thibault S. Traitements d'images sur architectures parallèles et hétérogènes. Comput Sci Technol. 2012;31:1183-1203. doi:10.3166/tsi.31.1183-1203
Harichane I, Makhlouf SA, Belalem G. KubeSC-RTP: smart scheduler for Kubernetes platform on CPU-GPU heterogeneous systems. Concurr Comput Pract Exp. 2022;34:e7108. doi:10.1002/cpe.7108
Grewe D, O'Boyle MFP. A static task partitioning approach for heterogeneous systems using OpenCL. Paper presented at: International Conference on Compiler Construction, Saarbrücken, Germany, 286-305. March 26-April 3 2011. doi:10.1007/978-3-642-19861-8_16
Wen Y, Wang Z, O'Boyle MFP. Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms. Paper presented at: International Conference on High Performance Computing (HiPC). 2014. doi:10.1109/hipc.2014.7116910
Hong JC, Dong OS, Seung GK, Jong MK, Hsien-Hsin L, Cheol HK. An efficient scheduling scheme using estimated execution time for heterogeneous computing systems. J Supercomput. 2013;65:886-902. doi:10.1007/s11227-013-0870-6
Lee J, Samadi M, Park Y, Mahlke SA. Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems. Paper presented at: 22nd International Conference on Parallel Architectures and Compilation Techniques, Edinburgh, UK, 245-255. 7-11 September 2013. doi:10.1109/PACT.2013.6618821
Khalid YN, Aleem M, Prodan R, Iqbal MA, Islam MA. E-OSched: a load balancing scheduler for heterogeneous multicores. J Supercomput. 2018;74:5399-5431. doi:10.1007/s11227-018-2435-1
Khalid YN, Aleem M, Usman A, Muhammad AI, Islam MA, Iqbal MA. Troodon: a machine-learning based load-balancing application scheduler for CPU–GPU system. J Parallel Distrib Comput. 2019;132:79-94. doi:10.1016/j.jpdc.2019.05.015
Usman A, Jerry CWL, Gautam S, Aleem M. A load balance multi-scheduling model for OpenCL kernel tasks in an integrated cluster. Soft Comput. 2021;25:407-420. doi:10.1007/s00500-020-05152-8
Rahmani TA, Belalem G, Mahmoudi SA. RTLB_Sched: real time load balancing scheduler for CPU-GPU heterogeneous systems. 2023 International Conference on Smart Computing and Application (ICSCA), Hail, Saudi Arabia, 1-6. 5-6 February 2023. doi:10.1109/ICSCA57840.2023.10087604
Rahmani TA, Daham F, Belalem G, Mahmoudi SA. HBalancer: a machine learning based load balancer in real time CPU-GPU heterogeneous systems. Paper presented at: International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT), Bahrain, 674-679. 20-21 November 2022. doi:10.1109/3ICT56508.2022.9990623
Cybenko G. Dynamic load balancing for distributed memory multiprocessors. J Parallel Distrib Comput. 1989;7(2):279-301. doi:10.1016/0743-7315(89)90021-x
Grauer-Gray S, Xu L, Searles R, Ayalasomayajula S, Cavazos J. Auto-tuning a high-level language targeted to GPU codes. Paper presented at: Innovative Parallel Computing (InPar 2012). San Jose, California, USA, 30-39. 13-14 May 2012 Innov Parallel Comput 2012. doi:10.1109/InPar.2012.6339595
Adrian S. LLVM for grad students. 2015 https://www.cs.cornell.edu/∼asampson/blog/llvm.html
Pycaret. Pycaret documentation. https://pycaret.gitbook.io/docs/get-started/functions
Chen RC, Dewi C, Huang S, Caraka R. Selecting critical features for data classification based on machine learning methods. J Big Data. 2020;7:52. doi:10.1186/s40537-020-00327-4
Chen X, Jeong JC. Enhanced recursive feature elimination. Sixth Int Conf Mach Learn Appl ICMLA. 2007;2007:429-435. doi:10.1109/ICMLA.2007.35
Chicco D, Warrens M, Jurman G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput Sci. 2021;7:e623.
Ubuntu Manuals. perf(1)—Linux manual page. https://www.man7.org/linux/man-pages/man1/perf.1.html
Nvidia Corporation. nvidia-smi.txt. https://developer.download.nvidia.com/compute/DCGM/docs/nvidia-smi-367.38.pdf
Intel Corporation. Processeur Intel® Xeon® E3-1225. https://ark.intel.com/content/www/fr/fr/ark/products/52270/intel-xeon-processor-e31225-6m-cache-3-10-ghz.html
Nvidia Corporation. Unmatched power. Unmatched creative freedom. NVIDIA® QUADRO® P400. https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/productspage/quadro/quadro-desktop/quadro-pascal-p400-data-sheet-us-nv-704503-r1.pdf
Intel Corporation. Processeur Intel® Core™ i7-6700. https://ark.intel.com/content/www/fr/fr/ark/products/88196/intel-core-i76700-processor-8m-cache-up-to-4-00-ghz.html
Nvidia Corporation. Accelerate your creativity NVIDIA® QUADRO® K620. https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/documents/75509_DS_NV_Quadro_K620_US_NV_HR.pdf