Skip to main content

Bagging and Boosting Ensembles for Conflict Resolution on Heterogeneous Data

  • Conference paper
  • First Online:
Intelligent Computing and Optimization (ICO 2019)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1072))

Included in the following conference series:

  • 1134 Accesses

Abstract

In the era of big data and with the advent of the Internet of things (IoT) more and more of devices are being connected to internet and are sending voluminous amounts of data. The potential of this huge volume of unconnected data remains untapped. It poses a greater challenge to generate insights from this dark data at all levels in the data mining process i.e. from pre-processing of data to reports generation. As such the quality and reliability of data is of utmost importance. One of the challenges addressed in this paper is challenge of truth discovery or veracity of the data. Data veracity estimation is a challenging concept be it in Internet of things (IoT) or wireless sensor networks (WSN). In IoT it is achieved at computational level whereas in WSN it is achieved at network level. When there are multiple conflicting information sources generating data, we have to find out a way to ascertain the correct value and provide a source reliability index for each and every source. Though there are a number of truth discovery algorithms in literature a major challenge lies in determining which method to select and the performance evaluation of the method given the limited availability of ground truth values. In this paper we propose two algorithms using bootstrapped aggregation (Bagging) technique and Boosting technique to arrive at the results on a weather data set. The weather data set chosen here consists of continuous as well as categorical values (Heterogeneous data) and both have been handled as part of this algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. [George_J._Klir]_Uncertainty_and_information_foun(b-ok.xyz).pdf

    Google Scholar 

  2. Berti-Equille, L., Ba, M.: Veracity of big data: challenges of cross-modal truth discovery. J. Data Inf. Qual 7(3), 1–3 (2016)

    Article  Google Scholar 

  3. Bagheri, M.A., Gao, Q., Escalera, S.: A framework towards the unification of ensemble classification methods (2013)

    Google Scholar 

  4. Fang, X.S., Sheng, Q.Z., Wang, X.: An ensemble approach for better truth discovery. In: Li, J., Li, X., Wang, S., Li, J., Sheng, Quan Z. (eds.) ADMA 2016. LNCS (LNAI), vol. 10086, pp. 298–311. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49586-6_20

    Chapter  Google Scholar 

  5. Galland, A., Abiteboul, S., Marian, A., Senellart, P.: Corroborating information from disagreeing views, pp. 131–140 (2010)

    Google Scholar 

  6. Gao, J., Li, Q., Zhao, B., Fan, W., Han, J.: Truth discovery and crowdsourcing aggregation: a unified perspective. Proc. VLDB Endowment 8(12), 2048–2049 (2015)

    Article  Google Scholar 

  7. Gürbüz, B., Weber, G.-W., Mawengkang, H.: Numerical approach for rumor propagation model (2019)

    Google Scholar 

  8. Zhao, B., Han, J.: A probabilistic model for estimating real-valued truth from conflicting sources (2012)

    Google Scholar 

  9. Kanellopoulos, S.K.D.: Combining bagging, boosting and random subspace ensembles for regression problems. Int. J. Innov. Comput. Inf. Control 8(6), 3953–3961 (2012)

    Google Scholar 

  10. Li, X., Dong, X.L., Lyons, K., Meng, W., Srivastava, D.: Truth finding on the deep web: is the problem solved? Proc. VLDB Endowmwnt 6, 97–108 (2012)

    Article  Google Scholar 

  11. Li, Y.: Conflicts to harmony: a framework for resolving conflicts in heterogeneous data by truth discovery. IEEE Trans. Knowl. Data Eng. 28(8), 1986–1999 (2016)

    Article  Google Scholar 

  12. Li, Y.: A survey on truth discovery. Acm Sigkdd Explor. Newsl 17(2), 1–16 (2016)

    Article  Google Scholar 

  13. Mohan, P., Padmanabhan, V.N., Ramjee, R.: Nericell: rich monitoring of road and traffic conditions using mobile smartphones, pp. 323–336 (2008)

    Google Scholar 

  14. Ouyang, R.W., Kaplan, L.M., Toniolo, A., Srivastava, M., Norman, T.J.: Aggregating crowd- sourced quantitative claims: additive and multiplicative models. IEEE Trans. Knowl. Data Eng. 28(7), 1621–1634 (2016)

    Article  Google Scholar 

  15. Ouyang, R.W., Srivastava, M., Toniolo, A., Norman, T.J.: Truth discovery in crowdsourced detection of spatial events. IEEE Trans. Knowl. Data Eng. 28(4), 1047–1060 (2016)

    Article  Google Scholar 

  16. Özmen, A., Weber, G.W., Batmaz, I.: The new robust CMARS (RCMARS) method (2010)

    Google Scholar 

  17. Pendyala, V.S., Fang, Y., Holliday, J., Zalzala, A.: A text mining approach to automated healthcare for the masses (2014)

    Google Scholar 

  18. Rubin, V., Lukoianova, T.: Veracity roadmap: Is big data objective, truthful and credible? (2014)

    Google Scholar 

  19. Srivastava, D., Dong, X.L.: Big data integration. In: Data Engineering (2013)

    Google Scholar 

  20. Wang, D., et al.: Using humans as sensors: an estimation-theoretic perspective, pp. 35–46 (2014). http://ieeexplore.ieee.org/abstract/document/6846739/

  21. Xiao, H.: Believe it today or tomorrow? detecting untrustworthy information from dynamic multi-source data, pp. 397–405 (2015)

    Google Scholar 

  22. Xiao, H., Gao, J., Wang, Z., Wang, S., Su, L., Liu, H.: A truth discovery approach with theoretical guarantee. ACM Press (2016). https://doi.org/10.1145/2939672.2939816

  23. Xiao, X., Attanasio, A., Chiusano, S., Cerquitelli, T.: Twitter data laid almost bare: an insightful exploratory analyser. Expert Syst. Appl. 90, 501–517 (2017)

    Article  Google Scholar 

  24. Xie, S., Gao, J., Fan, W., Turaga, D., Yu, P.S.: Class-distribution regularized consensus maximization for alleviating overfitting in model combination (2014)

    Google Scholar 

  25. Xu, G., Li, H., Tan, C., Liu, D., Dai, Y., Yang, K.: Achieving efficient and privacy-preserving truth discovery in crowd sensing systems. Comput. Secur. 69, 114–126 (2016)

    Article  Google Scholar 

  26. Yang, S., Wu, F., Tang, S., Gao, X., Yang, B., Chen, G.: On designing data quality-aware truth estimation and surplus sharing method for mobile crowdsensing. IEEE J. Sel. Areas Commun. 35(4), 832–847 (2017)

    Article  Google Scholar 

  27. Yu, D.: The wisdom of minority: unsupervised slot filling validation based on multi-dimensional truth-finding (2014)

    Google Scholar 

  28. Zhang, F., Yu, L., Cai, X., Zhang, Y., Zhang, H.: Truth finding from multiple data sources by source confidence estimation (2015)

    Google Scholar 

  29. Zhang, Y., Ruan, X., Wang, H., Wang, H., He, S.: Twitter trends manipulation: a first look inside the security of twitter trending. IEEE Trans. Inf. Forensics Secur. 12, 144–156 (2016)

    Google Scholar 

  30. Zhao, Z., Cheng, J., Ng, W.: Truth discovery in data streams: a single-pass probabilistic approach, pp. 1589–1598 (2014)

    Google Scholar 

Download references

Acknowledgment

We wish to acknowledge the Department of Science and Technology, India and School of Computing, Sathyabama Institute of science and Technology, Chennai for providing the facilities to do the research under the DST-FIST Grant Project No. SR/FST/ETI- 364/2014.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Subhashini Radhakrishnan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vadavalli, A., Radhakrishnan, S. (2020). Bagging and Boosting Ensembles for Conflict Resolution on Heterogeneous Data. In: Vasant, P., Zelinka, I., Weber, GW. (eds) Intelligent Computing and Optimization. ICO 2019. Advances in Intelligent Systems and Computing, vol 1072. Springer, Cham. https://doi.org/10.1007/978-3-030-33585-4_43

Download citation

Publish with us

Policies and ethics