相关参考文献
注意:仅列出部分参考文献,下载原文获取全部文献信息。Aarohi: Making Real-Time Node Failure Prediction Feasible
Anwesha Das et al.
2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM IPDPS 2020 (2020)
A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures
Yong Yu et al.
NEURAL COMPUTATION (2019)
Operational Data Analytics: Optimizing the National Energy Research Scientific Computing Center Cooling Systems
Norman Bourassa et al.
PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPP 2019) (2019)
Predicting Faults in High Performance Computing Systems: An In-Depth Survey of the State-of-the-Practice
David Jauk et al.
PROCEEDINGS OF SC19: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (2019)
MELA: A Visual Analytics Tool for Studying Multifidelity HPC System Logs
Shilpika et al.
PROCEEDINGS OF DAAC 2019: THE 3RD IEEE/ACM INDUSTRY/UNIVERSITY JOINT INTERNATIONAL WORKSHOP ON DATA-CENTER AUTOMATION, ANALYTICS, AND CONTROL (DAAC) (2019)
Recurrent Neural Network Attention Mechanisms for Interpretable System Log Anomaly Detection
Andy Brown et al.
PROCEEDINGS OF THE 1ST WORKSHOP ON MACHINE LEARNING FOR COMPUTING SYSTEMS (MLCS 2018) (2018)
DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning
Min Du et al.
CCS'17: PROCEEDINGS OF THE 2017 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY (2017)
LOGAIDER: A tool for mining potential correlations of HPC log events
Sheng Di et al.
2017 17TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID) (2017)
Data Mining-based Analysis of HPC Center Operations
Jannis Klinkenberg et al.
2017 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER) (2017)
A Practical Approach to Hard Disk Failure Prediction in Cloud Platforms Big Data Model for Failure Management in Datacenters
Sandipan Ganguly et al.
PROCEEDINGS 2016 IEEE SECOND INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2016) (2016)
LogMine: Fast Pattern Recognition for Log Analytics
Hossein Hamooni et al.
CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT (2016)
Predicting Scheduling Failures in the Cloud: A Case Study with Google Clusters and Hadoop on Amazon EMR
Mbarka Soualhia et al.
2015 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2015 IEEE 7TH INTERNATIONAL SYMPOSIUM ON CYBERSPACE SAFETY AND SECURITY, AND 2015 IEEE 12TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS) (2015)
Failure Prediction of Data Centers Using Time Series and Fault Tree Analysis
Thanyalak Chalermarrewong et al.
PROCEEDINGS OF THE 2012 IEEE 18TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2012) (2012)
LogMaster: Mining Event Correlations in Logs of Large-scale Cluster Systems
Xiaoyu Fu et al.
2012 31ST INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS (SRDS 2012) (2012)
D3: Data-Driven Documents
Michael Bostock et al.
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS (2011)
A Survey of Online Failure Prediction Methods
Felix Salfner et al.
ACM COMPUTING SURVEYS (2010)