Skip to main content
Log in

An optimized data integration model based on reverse cleaning for heterogeneous multi-media data

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

With the continuous development of information technology, various multi-media data are constantly emerging and presents the characteristics of autonomous and heterogeneous, how to integrate and analysis data more correctly and efficiently has become a challenging problem. Firstly, in order to improve the quality of the integrated data, two real-time threads combined with data adapter are used to monitor and refresh necessary updates from heterogeneous data efficiently. Once the original data has been updated, the real-time data will be loaded into the data center soon. Secondly, a data reverse cleaning method is proposed to improve the data quality. It uses the data source tree that built in the data integration process to find the location of the original data quickly after reverse cleaning. finally, a data accuracy assessment algorithm is designed for data quality assessment, which is based on Bayesian network and the path condition algorithm. Experimental results show that the quality of the integrated data significantly higher than the quality of the original data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Aikebaier A, Enokido T et al (2011) Trustworthy group making algorithm in distributed systems. Hum-Centric Comput Inf Sci 1(6):1–15

    Google Scholar 

  2. Chen H, Jiang W et al (2013) A heuristic feature selection approach for text categorization by using chaos optimization and genetic algorithm. Math Probl Eng 11(4):1–6

    Google Scholar 

  3. Chen H, Zou B (2009) Optimal feature selection algorithm based on quantum-inspired clone genetic strategy in text categorization. In Proc. of the 1st ACM/SIGEVO summit on genetic and evolutionary computation. June 12–14. 799–802

  4. Clinchant S, Ah-Pine J, Csurka G (2011) Semantic combination of textual and visual information in multimedia retrieval[C]//Proceedings of the 1st acm international conference on multimedia retrieval. ACM 44

  5. Dong XL, Srivastava D (2015) Big data integration[J]. Synth Lect Data Manag 7(1):1–198

    Article  Google Scholar 

  6. Gemmell J, Christodoulakis S (1992) Principles of delay-sensitive multimedia data storage retrieval[J]. ACM Trans Inf Syst (TOIS) 10(1):51–90

    Article  Google Scholar 

  7. Hao Chen, Beiji Zou (2008) A genetic search strategy based on simulated annealing for web mining. J Comput Inf Syst. Binary Information Press, 4(4): 2641–2650

  8. Hong S, Chang J (2013) A new k-NN query processing algorithm based on multicasting-based cell expansion in location-based services. J Convergence 4(4):1–6

    Article  MathSciNet  Google Scholar 

  9. Huang TS, Dagli CK, Rajaram S et al (2008) Active learning for interactive multimedia retrieval[J]. Proc IEEE 96(4):648–667

    Article  Google Scholar 

  10. Jiang L, Chen H, Ouyang Y, Li C (2014) A multisource retrospective audit method for data quality optimization and evaluation. Int J Distrib Sens Netw. http://www.hindawi.com/journals/ijdsn/aa/195015/

  11. Katsumata M (2014) Task context-aware e-mail platform for collaborative tasks. Hum-Centric Comput Inf Sci 4(17):1–10

    Google Scholar 

  12. Laursen A, Olkin J C, Porter M A, et al (1998) Method and apparatus for scalable, high bandwidth storage retrieval and transportation of multimedia data on a network: U.S. Patent 5,805,804[P]

  13. Lew MS, Sebe N, Djeraba C et al (2006) Content-based multimedia information retrieval: state of the art and challenges[J]. ACM Trans Multimed Comput Commun Appl (TOMM) 2(1):1–19

    Article  Google Scholar 

  14. Liu S, Cheng X, Fu W et al (2014) Numeric characteristics of generalized M-set with its asymptote [J]. Appl Math Comput 243:767–774

    MathSciNet  MATH  Google Scholar 

  15. Liu S, Cheng X, LAN C et al (2013) Fractal property of generalized M-set with rational number exponent. Appl Math Comput 220(4):668–675

    MathSciNet  MATH  Google Scholar 

  16. Naphade MR, Huang TS (2002) Extracting semantics from audio-visual content: the final frontier in multimedia retrieval[J]. IEEE Trans Neural Netw 13(4):793–810

    Article  Google Scholar 

  17. Ohm J (2015) Transmission and storage of multimedia data[M]//multimedia signal coding and transmission. Springer Berlin, Heidelberg, pp 491–520

    Google Scholar 

  18. Poisel R, Tjoa S (2011) Forensics investigations of multimedia data: a review of the state-of-the-art[C]//IT Security Incident Management and IT Forensics (IMF), 2011 Sixth international conference on. IEEE. 48–61

  19. Song J, Yang Y, Yang Y, et al. (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources[C]//Proceedings of the 2013 ACM SIGMOD international conference on management of data. ACM 785–796

  20. Uddin J, Islam R, Kim J-M (2014) Texture feature extraction techniques for fault diagnosis of induction motors. J Convergence 5(2):15–20

    Google Scholar 

  21. Yadav P K, Rizvi S (2014) An exhaustive study on data mining techniques in mining of Multimedia database[C]//Issues and Challenges in Intelligent Computing Techniques (ICICT), 2014 international conference on. IEEE 541–545

  22. Yang G, Liu S (2014) Distributed cooperative algorithm for k-M set with negative integer k by fractal symmetrical property. Int J Distrib Sens Netw. doi:10.1155/ 2014/ 398583

    Google Scholar 

Download references

Acknowledgments

This paper is partly supported by the National Science Foundation of China (Grant No. 61472132, 61472131, and 61300218).

Conflict of interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hao Chen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, H., Ouyang, Y. & Jiang, W. An optimized data integration model based on reverse cleaning for heterogeneous multi-media data. Multimed Tools Appl 75, 15571–15586 (2016). https://doi.org/10.1007/s11042-015-2683-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-015-2683-5

Keywords