Abstract
In the era of big data and with the advent of the Internet of things (IoT) more and more of devices are being connected to internet and are sending voluminous amounts of data. The potential of this huge volume of unconnected data remains untapped. It poses a greater challenge to generate insights from this dark data at all levels in the data mining process i.e. from pre-processing of data to reports generation. As such the quality and reliability of data is of utmost importance. One of the challenges addressed in this paper is challenge of truth discovery or veracity of the data. Data veracity estimation is a challenging concept be it in Internet of things (IoT) or wireless sensor networks (WSN). In IoT it is achieved at computational level whereas in WSN it is achieved at network level. When there are multiple conflicting information sources generating data, we have to find out a way to ascertain the correct value and provide a source reliability index for each and every source. Though there are a number of truth discovery algorithms in literature a major challenge lies in determining which method to select and the performance evaluation of the method given the limited availability of ground truth values. In this paper we propose two algorithms using bootstrapped aggregation (Bagging) technique and Boosting technique to arrive at the results on a weather data set. The weather data set chosen here consists of continuous as well as categorical values (Heterogeneous data) and both have been handled as part of this algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
[George_J._Klir]_Uncertainty_and_information_foun(b-ok.xyz).pdf
Berti-Equille, L., Ba, M.: Veracity of big data: challenges of cross-modal truth discovery. J. Data Inf. Qual 7(3), 1–3 (2016)
Bagheri, M.A., Gao, Q., Escalera, S.: A framework towards the unification of ensemble classification methods (2013)
Fang, X.S., Sheng, Q.Z., Wang, X.: An ensemble approach for better truth discovery. In: Li, J., Li, X., Wang, S., Li, J., Sheng, Quan Z. (eds.) ADMA 2016. LNCS (LNAI), vol. 10086, pp. 298–311. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49586-6_20
Galland, A., Abiteboul, S., Marian, A., Senellart, P.: Corroborating information from disagreeing views, pp. 131–140 (2010)
Gao, J., Li, Q., Zhao, B., Fan, W., Han, J.: Truth discovery and crowdsourcing aggregation: a unified perspective. Proc. VLDB Endowment 8(12), 2048–2049 (2015)
Gürbüz, B., Weber, G.-W., Mawengkang, H.: Numerical approach for rumor propagation model (2019)
Zhao, B., Han, J.: A probabilistic model for estimating real-valued truth from conflicting sources (2012)
Kanellopoulos, S.K.D.: Combining bagging, boosting and random subspace ensembles for regression problems. Int. J. Innov. Comput. Inf. Control 8(6), 3953–3961 (2012)
Li, X., Dong, X.L., Lyons, K., Meng, W., Srivastava, D.: Truth finding on the deep web: is the problem solved? Proc. VLDB Endowmwnt 6, 97–108 (2012)
Li, Y.: Conflicts to harmony: a framework for resolving conflicts in heterogeneous data by truth discovery. IEEE Trans. Knowl. Data Eng. 28(8), 1986–1999 (2016)
Li, Y.: A survey on truth discovery. Acm Sigkdd Explor. Newsl 17(2), 1–16 (2016)
Mohan, P., Padmanabhan, V.N., Ramjee, R.: Nericell: rich monitoring of road and traffic conditions using mobile smartphones, pp. 323–336 (2008)
Ouyang, R.W., Kaplan, L.M., Toniolo, A., Srivastava, M., Norman, T.J.: Aggregating crowd- sourced quantitative claims: additive and multiplicative models. IEEE Trans. Knowl. Data Eng. 28(7), 1621–1634 (2016)
Ouyang, R.W., Srivastava, M., Toniolo, A., Norman, T.J.: Truth discovery in crowdsourced detection of spatial events. IEEE Trans. Knowl. Data Eng. 28(4), 1047–1060 (2016)
Özmen, A., Weber, G.W., Batmaz, I.: The new robust CMARS (RCMARS) method (2010)
Pendyala, V.S., Fang, Y., Holliday, J., Zalzala, A.: A text mining approach to automated healthcare for the masses (2014)
Rubin, V., Lukoianova, T.: Veracity roadmap: Is big data objective, truthful and credible? (2014)
Srivastava, D., Dong, X.L.: Big data integration. In: Data Engineering (2013)
Wang, D., et al.: Using humans as sensors: an estimation-theoretic perspective, pp. 35–46 (2014). http://ieeexplore.ieee.org/abstract/document/6846739/
Xiao, H.: Believe it today or tomorrow? detecting untrustworthy information from dynamic multi-source data, pp. 397–405 (2015)
Xiao, H., Gao, J., Wang, Z., Wang, S., Su, L., Liu, H.: A truth discovery approach with theoretical guarantee. ACM Press (2016). https://doi.org/10.1145/2939672.2939816
Xiao, X., Attanasio, A., Chiusano, S., Cerquitelli, T.: Twitter data laid almost bare: an insightful exploratory analyser. Expert Syst. Appl. 90, 501–517 (2017)
Xie, S., Gao, J., Fan, W., Turaga, D., Yu, P.S.: Class-distribution regularized consensus maximization for alleviating overfitting in model combination (2014)
Xu, G., Li, H., Tan, C., Liu, D., Dai, Y., Yang, K.: Achieving efficient and privacy-preserving truth discovery in crowd sensing systems. Comput. Secur. 69, 114–126 (2016)
Yang, S., Wu, F., Tang, S., Gao, X., Yang, B., Chen, G.: On designing data quality-aware truth estimation and surplus sharing method for mobile crowdsensing. IEEE J. Sel. Areas Commun. 35(4), 832–847 (2017)
Yu, D.: The wisdom of minority: unsupervised slot filling validation based on multi-dimensional truth-finding (2014)
Zhang, F., Yu, L., Cai, X., Zhang, Y., Zhang, H.: Truth finding from multiple data sources by source confidence estimation (2015)
Zhang, Y., Ruan, X., Wang, H., Wang, H., He, S.: Twitter trends manipulation: a first look inside the security of twitter trending. IEEE Trans. Inf. Forensics Secur. 12, 144–156 (2016)
Zhao, Z., Cheng, J., Ng, W.: Truth discovery in data streams: a single-pass probabilistic approach, pp. 1589–1598 (2014)
Acknowledgment
We wish to acknowledge the Department of Science and Technology, India and School of Computing, Sathyabama Institute of science and Technology, Chennai for providing the facilities to do the research under the DST-FIST Grant Project No. SR/FST/ETI- 364/2014.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Vadavalli, A., Radhakrishnan, S. (2020). Bagging and Boosting Ensembles for Conflict Resolution on Heterogeneous Data. In: Vasant, P., Zelinka, I., Weber, GW. (eds) Intelligent Computing and Optimization. ICO 2019. Advances in Intelligent Systems and Computing, vol 1072. Springer, Cham. https://doi.org/10.1007/978-3-030-33585-4_43
Download citation
DOI: https://doi.org/10.1007/978-3-030-33585-4_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33584-7
Online ISBN: 978-3-030-33585-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)