research-article

Air Big Data Outlier Detection Based on Infinite Gauss Bayesian and CNN

Authors:
LiangQi Zhou

Engineering Laboratory on Radioactive Geoscience and Big Data Technology, East China University of Technology, Nanchang, China and School of Information Engineering, East China University of Technology, Nanchang, China

Engineering Laboratory on Radioactive Geoscience and Big Data Technology, East China University of Technology, Nanchang, China and School of Information Engineering, East China University of Technology, Nanchang, China
View Profile

,
HongZhen Xu

Engineering Laboratory on Radioactive Geoscience and Big Data Technology, East China University of Technology, Nanchang, China and School of Information Engineering, East China University of Technology, Nanchang, China

Engineering Laboratory on Radioactive Geoscience and Big Data Technology, East China University of Technology, Nanchang, China and School of Information Engineering, East China University of Technology, Nanchang, China
View Profile

,
Li Wei

School of Information Engineering, East China University of Technology, Nanchang, China

School of Information Engineering, East China University of Technology, Nanchang, China
View Profile

,
Quan Zhang

School of Information Engineering, East China University of Technology, Nanchang, China

School of Information Engineering, East China University of Technology, Nanchang, China
View Profile

,
Fei Zhou

School of Information Engineering, East China University of Technology, Nanchang, China

School of Information Engineering, East China University of Technology, Nanchang, China
View Profile

,
ZhuoPei Li

School of Information Engineering, East China University of Technology, Nanchang, China

School of Information Engineering, East China University of Technology, Nanchang, China
View Profile

ICMLC '19: Proceedings of the 2019 11th International Conference on Machine Learning and ComputingFebruary 2019Pages 317–321https://doi.org/10.1145/3318299.3318384

Published:22 February 2019Publication History

ICMLC '19: Proceedings of the 2019 11th International Conference on Machine Learning and Computing

Pages 317–321

ABSTRACT

Air quality has always been a hot issue of concern to the people, the environmental protection department and the government. Among the massive air quality data, abnormal data can interfere with subsequent experiments and analysis. Therefore, it is necessary to detect abnormal data to improve the accuracy of the data. However, traditional air outlier detection methods require at least one year's data to make inferences about air quality. This paper firstly analyzes the characteristics of air quality big data, and then proposes a framework based on Bayesian non-parametric clustering, namely Dirichlet Process (DP) clustering framework, to realize the outlier detection of air quality. The framework optimizes Gaussian mixture model into infinite Gaussian mixture model according to the results of data analysis, and uses neural network to cluster the data processed by infinite Gaussian mixture model, which effectively improves the clustering accuracy and avoids the need of collecting a large number of training data.

References

Meng, K. 2017. Research on Recognition Technology of Hollow CAPTCHAs Based on SVM. Chongqing University of Posts and Telecommunications.Google Scholar
Zhang, Z. Y. 2017. The Design and Implementation of Verification Code Recognition Module in "Tianyancha" Distributed Crawl System. Beijing Jiaotong University.Google Scholar
Chen, R., Huang, S. G., Ye, C. M. and Zhang, L. 2014. CAPTCHA Recognition Based on Two Dimensional RNN. Journal of Chinese Computer Systems. 3503:504--508.Google Scholar
Fan, W., Han, J. G., Gou, F. and Li, S. 2018. Chinese character CAPTCHA recognition based on convolution neural network. Computer Engineering and Applications. 54(3):160--165.Google Scholar
Jian, X. Z., Cao, S. J. and Guo, X. 2015. Segmentation of CAPTCHA characters based on self-organizing maps and Voronoi. Application Research of Computers.Google Scholar
Ying, L. 2014. Recognition of Distorted and Merged Text-based CAPTCHA. University of Science and Technology of China.Google Scholar
Goto, M., Shirato, T., Uda, R. 2014. Text-Based CAPTCHA Using Phonemic Restoration Effect and Similar Sounds. Google ScholarDigital Library
Stark, F., Hazirbas, C., Triebel, R. and Cremers. 2015. Captcha recognition with active deep learning. In GCPR Workshop on New Challenges in Neural Computation (Vol. 10).Google Scholar
Arain, R. H., Shaikh, R. A., Maitlo, A., Kumar, K. and Shah, S. S. A. 2018. A deep learning model for recognition of complex Text-based CAPTCHAs. IJCSNS 18.2 (2018): 103.Google Scholar
Li, K. S. 2014. The Research on Recognition Technology of Chinese Character CAPTCHA. Xidian University.Google Scholar
C.E. Rasmussen. The infinite Gaussian mixture model. Advances in neural information processing systems, pages 554--560, 2000. Google ScholarDigital Library
R.M. Neal, Markov chain sampling methods for Dirichlet process mixture models, J. Comput. Graphical Stat. 9 (2) (2000) 249--265.Google Scholar

Index Terms

Air Big Data Outlier Detection Based on Infinite Gauss Bayesian and CNN
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Big data outlier detection model based on improved density peak algorithm
Ethical Computational Intelligence for Cyber Market

Outlier detection is an important branch of data mining. This paper proposes an advanced fast density peak outlier detection algorithm based on the characteristics of big data. The algorithm is an outlier detection method based on the improved density ...
Read More
Outlier detection based on cluster outlier factor and mutual density

Outlier detection is an important task in data mining with numerous applications. Recent years, the study on outlier detection is very active, many algorithms were proposed including based on clustering. However, most outlier detection algorithms based on ...
Read More
Enhancing Outlier Detection by an Outlier Indicator
Machine Learning and Data Mining in Pattern Recognition
Abstract
Outlier detection is an important task in data mining and has high practical value in numerous applications such as astronomical observation, text detection, fraud detection and so on. At present, a large number of popular outlier detection ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICMLC '19: Proceedings of the 2019 11th International Conference on Machine Learning and Computing
February 2019
563 pages
ISBN:9781450366007
DOI:10.1145/3318299

Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 February 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Air quality
Bayesian clustering
Dirichlet process
neural Network
outlier detection
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 89
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Air Big Data Outlier Detection Based on Infinite Gauss Bayesian and CNN

ICMLC '19: Proceedings of the 2019 11th International Conference on Machine Learning and Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Big data outlier detection model based on improved density peak algorithm

Outlier detection based on cluster outlier factor and mutual density

Enhancing Outlier Detection by an Outlier Indicator

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Air Big Data Outlier Detection Based on Infinite Gauss Bayesian and CNN

ICMLC '19: Proceedings of the 2019 11th International Conference on Machine Learning and Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Big data outlier detection model based on improved density peak algorithm

Outlier detection based on cluster outlier factor and mutual density

Enhancing Outlier Detection by an Outlier Indicator

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media