相关参考文献
注意:仅列出部分参考文献,下载原文获取全部文献信息。Deep Learning for Anomaly Detection: A Review
Guansong Pang et al.
ACM COMPUTING SURVEYS (2021)
A machine learning approach to online fault classification in HPC systems
Alessio Netti et al.
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE (2020)
The Landscape of Exascale Research: A Data-Driven Literature Analysis
Stijn Heldens et al.
ACM COMPUTING SURVEYS (2020)
Online Diagnosis of Performance Variation in HPC Systems Using Machine Learning
Ozan Tuncer et al.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2019)
A semisupervised autoencoder-based approach for anomaly detection in high performance computing systems
Andrea Borghesi et al.
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE (2019)
Online Anomaly Detection in HPC Systems
Andrea Borghesi et al.
2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2019) (2019)
Paving theWay Toward Energy-Aware and Automated Datacentre
Andrea Bartolini et al.
PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPP 2019) (2019)
FINJ: A Fault Injection Tool for HPC Systems
Alessio Netti et al.
EURO-PAR 2018: PARALLEL PROCESSING WORKSHOPS (2019)
Anomaly Detection in High Performance Computers: A Vicinity Perspective
Siavash Ghiasvand et al.
2019 18TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC 2019) (2019)
Unraveling Network-Induced Memory Contention: Deeper Insights with Machine Learning
Taylor Liles Groves et al.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2018)
Modeling and Simulating Multiple Failure Masking Enabled by Local Recovery for Stencil-Based Applications at Extreme Scales
Marc Gamell et al.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2017)
A Convolutional Neural Network for Fault Classification and Diagnosis in Semiconductor Manufacturing Processes
Ki Bum Lee et al.
IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING (2017)
Adaptive Impact-Driven Detection of Silent Data Corruption for HPC Applications
Sheng Di et al.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2016)
Using Migratable Objects to Enhance Fault Tolerance Schemes in Supercomputers
Esteban Meneses et al.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2015)
Improving the computing efficiency of HPC systems using a combination of proactive and preventive checkpointing
Mohamed Slim Bouguerra et al.
IEEE 27TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2013) (2013)
The Reliability Wall for Exascale Supercomputing
Xuejun Yang et al.
IEEE TRANSACTIONS ON COMPUTERS (2012)