Abstract
With the continuous development of information technology, various multi-media data are constantly emerging and presents the characteristics of autonomous and heterogeneous, how to integrate and analysis data more correctly and efficiently has become a challenging problem. Firstly, in order to improve the quality of the integrated data, two real-time threads combined with data adapter are used to monitor and refresh necessary updates from heterogeneous data efficiently. Once the original data has been updated, the real-time data will be loaded into the data center soon. Secondly, a data reverse cleaning method is proposed to improve the data quality. It uses the data source tree that built in the data integration process to find the location of the original data quickly after reverse cleaning. finally, a data accuracy assessment algorithm is designed for data quality assessment, which is based on Bayesian network and the path condition algorithm. Experimental results show that the quality of the integrated data significantly higher than the quality of the original data.












Similar content being viewed by others
References
Aikebaier A, Enokido T et al (2011) Trustworthy group making algorithm in distributed systems. Hum-Centric Comput Inf Sci 1(6):1–15
Chen H, Jiang W et al (2013) A heuristic feature selection approach for text categorization by using chaos optimization and genetic algorithm. Math Probl Eng 11(4):1–6
Chen H, Zou B (2009) Optimal feature selection algorithm based on quantum-inspired clone genetic strategy in text categorization. In Proc. of the 1st ACM/SIGEVO summit on genetic and evolutionary computation. June 12–14. 799–802
Clinchant S, Ah-Pine J, Csurka G (2011) Semantic combination of textual and visual information in multimedia retrieval[C]//Proceedings of the 1st acm international conference on multimedia retrieval. ACM 44
Dong XL, Srivastava D (2015) Big data integration[J]. Synth Lect Data Manag 7(1):1–198
Gemmell J, Christodoulakis S (1992) Principles of delay-sensitive multimedia data storage retrieval[J]. ACM Trans Inf Syst (TOIS) 10(1):51–90
Hao Chen, Beiji Zou (2008) A genetic search strategy based on simulated annealing for web mining. J Comput Inf Syst. Binary Information Press, 4(4): 2641–2650
Hong S, Chang J (2013) A new k-NN query processing algorithm based on multicasting-based cell expansion in location-based services. J Convergence 4(4):1–6
Huang TS, Dagli CK, Rajaram S et al (2008) Active learning for interactive multimedia retrieval[J]. Proc IEEE 96(4):648–667
Jiang L, Chen H, Ouyang Y, Li C (2014) A multisource retrospective audit method for data quality optimization and evaluation. Int J Distrib Sens Netw. http://www.hindawi.com/journals/ijdsn/aa/195015/
Katsumata M (2014) Task context-aware e-mail platform for collaborative tasks. Hum-Centric Comput Inf Sci 4(17):1–10
Laursen A, Olkin J C, Porter M A, et al (1998) Method and apparatus for scalable, high bandwidth storage retrieval and transportation of multimedia data on a network: U.S. Patent 5,805,804[P]
Lew MS, Sebe N, Djeraba C et al (2006) Content-based multimedia information retrieval: state of the art and challenges[J]. ACM Trans Multimed Comput Commun Appl (TOMM) 2(1):1–19
Liu S, Cheng X, Fu W et al (2014) Numeric characteristics of generalized M-set with its asymptote [J]. Appl Math Comput 243:767–774
Liu S, Cheng X, LAN C et al (2013) Fractal property of generalized M-set with rational number exponent. Appl Math Comput 220(4):668–675
Naphade MR, Huang TS (2002) Extracting semantics from audio-visual content: the final frontier in multimedia retrieval[J]. IEEE Trans Neural Netw 13(4):793–810
Ohm J (2015) Transmission and storage of multimedia data[M]//multimedia signal coding and transmission. Springer Berlin, Heidelberg, pp 491–520
Poisel R, Tjoa S (2011) Forensics investigations of multimedia data: a review of the state-of-the-art[C]//IT Security Incident Management and IT Forensics (IMF), 2011 Sixth international conference on. IEEE. 48–61
Song J, Yang Y, Yang Y, et al. (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources[C]//Proceedings of the 2013 ACM SIGMOD international conference on management of data. ACM 785–796
Uddin J, Islam R, Kim J-M (2014) Texture feature extraction techniques for fault diagnosis of induction motors. J Convergence 5(2):15–20
Yadav P K, Rizvi S (2014) An exhaustive study on data mining techniques in mining of Multimedia database[C]//Issues and Challenges in Intelligent Computing Techniques (ICICT), 2014 international conference on. IEEE 541–545
Yang G, Liu S (2014) Distributed cooperative algorithm for k-M set with negative integer k by fractal symmetrical property. Int J Distrib Sens Netw. doi:10.1155/ 2014/ 398583
Acknowledgments
This paper is partly supported by the National Science Foundation of China (Grant No. 61472132, 61472131, and 61300218).
Conflict of interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, H., Ouyang, Y. & Jiang, W. An optimized data integration model based on reverse cleaning for heterogeneous multi-media data. Multimed Tools Appl 75, 15571–15586 (2016). https://doi.org/10.1007/s11042-015-2683-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-015-2683-5