Journal
PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPP 2019)
Volume -, Issue -, Pages -Publisher
ASSOC COMPUTING MACHINERY
DOI: 10.1145/3339186.3339215
Keywords
HPC; Energy Efficiency; Quantum Espresso; Big Data; Anomaly Detection; Artificial Intelligence; Datacentre automation
Categories
Funding
- EU FETHPC project ANTAREX [g.a. 671623]
- EU ERC Project MULTITHERMAN [g.a. 291125]
- CINECA research grant on Energy-Efficient HPC systems
Ask authors/readers for more resources
Energy efficiency and datacentre automation are critical targets of the research and deployment agenda of CINECA and its research partners in the Energy Efficient System Laboratory of the University of Bologna and the Integrated System Laboratory in ETH Zurich. In this manuscript, we present the primary outcomes of the research conducted in this domain and under the umbrella of several European, National and Private funding schemes. These outcomes consist of: (i) the ExaMon scalable, flexible, holistic monitoring framework, which is capable of ingesting 70GB/day of telemetry data of the entire CINECA datacentre and link this data with machine learning and artificial intelligence techniques and tools. (ii) The exploitation of ExaMon to evaluates the viability of machine-learning based job scheduling, power prediction and deep-learning based anomaly detection of compute nodes. (iii) The viability of scalable, out-of-band and high-frequency power monitoring in compute nodes, by leveraging low cost and open source embedded hardware and edge-computing, namely DiG. (iv) Finally, the viability of run time library to exploit communication regions in large-scale application to reduce the energy consumption without impairing the execution time, namely COUNTDOWN.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available