As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
With the widely adoption of Hadoop-based big data platfom in industry, monitoring and maintaining Hadoop clusters has become increasingly critical, which also brings high demands on automatic or semi-automatic health checking and diagonostics. Anomaly detection plays an essential role for monitoring Hadoop clusters and provides basis for automatic alerts generation. Most existing methods for anomaly detection for Hadoop clusters are rule-based and relies on characterizing of anomalies with domain-knowledge and understanding of the specific target system, which bring barriers for practical application in complex business platform. In this paper, we leverage statistical analysis to discover critical components in execution data collected through Hadoop JobTracker. A two-stage anomaly detection method is proposed by integrating PCA and DBSCAN. This new method has no requirements on prior knowledge of target nodes and jobs. Experiments are conducted to detect buggy jobs based on real execution data collected from three Hadoop clusters provided by one of the largest E-Commerce company in the world. The results show our method is effective to detect anomalies without specifying the features of anomalies in advance.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.