Abstract
With the increasing popularity of tablets, smartphones and other mobile electronic devices, it is not uncommon for users to complete online tasks through different electronic devices. Identifying individual users across different digital devices is now becoming a hot research topic. Methods based on name, email and other demographic information have received much attention. However, it is often difficult to obtain a complete set of information. In this paper, we use a probabilistic approach for cross-device identity issue and focus on comparing different algorithms. We conduct an in-depth study and expand the attribute of data through the study of the relationship between attributes. Dummy variables are introduced to improve the efficiency of the models. Experimental results on four datasets (released by ICDM Challenge) show that the eXtreme Gradient Boosting can consistently and significantly outperform other algorithms on both accuracy and F1-score. It also consistently provides a better performance compared to the methods we used in ICDM Challenge (We took part in the ICDM 2015 Challenge, and achieved a moderate score ranking use C4.5 and BP model), and achieves a better comprehensive evaluation ranking.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wells, J.D., Fuerst, W.L., Palmer, J.W.: Designing consumer interfaces for experiential tasks: an empirical investigation. Eur. J. Inf. Syst. 14(3), 273–287 (2005)
Setoguchi, S., Zhu, Y., Jalbert, J.J., et al.: Validity of deterministic record linkage using multiple indirect personal identifiers linking a large registry to claims data. Circ. Cardiovas. Qual. Outcomes 7(3), 475–480 (2014)
Ojanen, T.T., Boonmongkon, P., Samakkeekarom, R., et al.: Investigating online harassment and offline violence among young people in Thailand: methodological approaches, lessons learned. Cult. Health sex. 16(9), 1097–1112 (2014)
Gueth, P., Dauvergne, D., Freud, N., et al.: Machine learning-based patient specific prompt-gamma dose monitoring in proton therapy. Phys. Med. Biol. 58(13), 4563 (2013)
Carmagnola, F., Cena, F.: User identification for cross-system personalisation. Inf. Sci. 179(1), 16–32 (2009)
Guna, J., Stojmenova, E., Lugmayr, A., et al.: User identification approach based on simple gestures. Multimedia Tools Appl. 71(1), 179–194 (2014)
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. stat. 1189–1232 (2001)
Friedman, J.H.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38(4), 367–378 (2002)
Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann. Stat. 28(2), 337–407 (2000)
Johnson, R., Zhang, T.: Learning nonlinear functions using regularized greedy forest. arXiv preprint arXiv:1109.0887 (2011)
Rokach, L., Maimon, O.: Data mining with decision trees: theory and applications. World Scientific Pub Co Inc. (2008). ISBN 978-9812771711
Quinlan, J.R.: C4. 5: programs for machine learning. Elsevier (2014)
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Breiman, L.: Arcing the edge. Technical Report 486, Statistics Department. University of California at Berkeley (1997)
Breiman, L.: Arcing classifier (with discussion and a rejoinder by the author). Ann. Stat. 26(3), 801–849 (1998)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Cogn. Model. 5, 3 (1988)
Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992)
Acknowledgments
This work is supported by Natural Science Foundations of China (No.61170192), National High-tech R&D Program of China (No. 2013AA013801).L Li is the corresponding author for the paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Song, R., Chen, S., Deng, B., Li, L. (2016). eXtreme Gradient Boosting for Identifying Individual Users Across Different Digital Devices. In: Cui, B., Zhang, N., Xu, J., Lian, X., Liu, D. (eds) Web-Age Information Management. WAIM 2016. Lecture Notes in Computer Science(), vol 9658. Springer, Cham. https://doi.org/10.1007/978-3-319-39937-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-39937-9_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39936-2
Online ISBN: 978-3-319-39937-9
eBook Packages: Computer ScienceComputer Science (R0)