ABSTRACT
Air quality has always been a hot issue of concern to the people, the environmental protection department and the government. Among the massive air quality data, abnormal data can interfere with subsequent experiments and analysis. Therefore, it is necessary to detect abnormal data to improve the accuracy of the data. However, traditional air outlier detection methods require at least one year's data to make inferences about air quality. This paper firstly analyzes the characteristics of air quality big data, and then proposes a framework based on Bayesian non-parametric clustering, namely Dirichlet Process (DP) clustering framework, to realize the outlier detection of air quality. The framework optimizes Gaussian mixture model into infinite Gaussian mixture model according to the results of data analysis, and uses neural network to cluster the data processed by infinite Gaussian mixture model, which effectively improves the clustering accuracy and avoids the need of collecting a large number of training data.
- Meng, K. 2017. Research on Recognition Technology of Hollow CAPTCHAs Based on SVM. Chongqing University of Posts and Telecommunications.Google Scholar
- Zhang, Z. Y. 2017. The Design and Implementation of Verification Code Recognition Module in "Tianyancha" Distributed Crawl System. Beijing Jiaotong University.Google Scholar
- Chen, R., Huang, S. G., Ye, C. M. and Zhang, L. 2014. CAPTCHA Recognition Based on Two Dimensional RNN. Journal of Chinese Computer Systems. 3503:504--508.Google Scholar
- Fan, W., Han, J. G., Gou, F. and Li, S. 2018. Chinese character CAPTCHA recognition based on convolution neural network. Computer Engineering and Applications. 54(3):160--165.Google Scholar
- Jian, X. Z., Cao, S. J. and Guo, X. 2015. Segmentation of CAPTCHA characters based on self-organizing maps and Voronoi. Application Research of Computers.Google Scholar
- Ying, L. 2014. Recognition of Distorted and Merged Text-based CAPTCHA. University of Science and Technology of China.Google Scholar
- Goto, M., Shirato, T., Uda, R. 2014. Text-Based CAPTCHA Using Phonemic Restoration Effect and Similar Sounds. Google ScholarDigital Library
- Stark, F., Hazirbas, C., Triebel, R. and Cremers. 2015. Captcha recognition with active deep learning. In GCPR Workshop on New Challenges in Neural Computation (Vol. 10).Google Scholar
- Arain, R. H., Shaikh, R. A., Maitlo, A., Kumar, K. and Shah, S. S. A. 2018. A deep learning model for recognition of complex Text-based CAPTCHAs. IJCSNS 18.2 (2018): 103.Google Scholar
- Li, K. S. 2014. The Research on Recognition Technology of Chinese Character CAPTCHA. Xidian University.Google Scholar
- C.E. Rasmussen. The infinite Gaussian mixture model. Advances in neural information processing systems, pages 554--560, 2000. Google ScholarDigital Library
- R.M. Neal, Markov chain sampling methods for Dirichlet process mixture models, J. Comput. Graphical Stat. 9 (2) (2000) 249--265.Google Scholar
Index Terms
- Air Big Data Outlier Detection Based on Infinite Gauss Bayesian and CNN
Recommendations
Big data outlier detection model based on improved density peak algorithm
Ethical Computational Intelligence for Cyber MarketOutlier detection is an important branch of data mining. This paper proposes an advanced fast density peak outlier detection algorithm based on the characteristics of big data. The algorithm is an outlier detection method based on the improved density ...
Outlier detection based on cluster outlier factor and mutual density
Outlier detection is an important task in data mining with numerous applications. Recent years, the study on outlier detection is very active, many algorithms were proposed including based on clustering. However, most outlier detection algorithms based on ...
Enhancing Outlier Detection by an Outlier Indicator
Machine Learning and Data Mining in Pattern RecognitionAbstractOutlier detection is an important task in data mining and has high practical value in numerous applications such as astronomical observation, text detection, fraud detection and so on. At present, a large number of popular outlier detection ...
Comments