期刊
出版社
ASSOC COMPUTING MACHINERY
DOI: 10.1145/3176653.3176710
关键词
Power supply system; Root-cause alarm; Topological constraints; Association analysis; FP-growth
Power supply system is extremely important in data center since it is the basis for the operation of infrastructure such as servers, switches, etc.. Power supply alarm management is essential because alarm flooding always occurs when a power supply device is cut off or other equipment is damaged, causing the high pressure of dealing with a large quantity of burst data and the risk of alarm omission. In this paper, we propose a novel alarm root-cause association analysis method based on actual topological constraints, for alarm diagnosis in power supply system. First, a new alarm clustering algorithm, namely DTIBFS, is proposed to cluster the alarm nodes and figure out the root-cause node. In this way, the system can handle over 90% of alarm records by dealing with only 4 alarm clusters in the alarm flooding period, or handle over 60% alarm records by dealing with only 20 alarm clusters on the average in a long time period, both of which contribute to a remarkable improvement of efficiency and reduce of operation workload compared with the one-by-one alarm record addressing. Furthermore, an improved FP-growth association analysis based on DTIBFS above is introduced. Experiments on the actual alarm records of the data center indicate that a multitude of meaningful rules can be obtained. Due to the consideration of supply system topology, our method can mine the associations from both statistical point and topology point, which is helpful to detect whether there are missing alarms during the alarm floods or provide tips for alarm diagnosis in data center operation.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据