Abstract
Identification of the same commodity entities is a major challenge in the heterogeneous multi-source e-commerce of big data. This paper introduces a framework based on Map-Reduce, called IIRS, which is made up of data index, data integration, entity recognition and data sorting. IIRS aims to form the unified model and high efficient commodity information with building an index model based on commodity’s attribute/value and constructing a global model map to record commodity’s attribute and value, identify the commodity entities in different e-commerce with measuring the similarity of the commodity’s identity, and then output the same identity commodity sets and their associated properties organized in the inverted index list. Through an extensive experimental study on real e-commerce dataset on Hadoop, IIRS significantly demonstrates its feasibility, accuracy, and high efficiency.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Herndndez, M.A., Stolfo, S.J.: The merge/purge problem for large databases. SIGMOD 24(2), 127–138 (1995)
Arasu, A., Kaushik, R.: A grammar-based entity representation framework for data cleaning. In: SIGMOD, pp. 233–244 (2009)
Wenfei, F., Xibei, J., Jianzhong, L., et al.: Reasoning about record matching rules. VLDB 2(1), 407–418 (2009)
Chaudhuri, S., Ganti, V., Motwani, R.: Robust identification of fuzzy duplicates. In: ICDE 2005, pp. 865–876 (2005)
Chen, Z., Kalashnikov, D.V., Mehrotra, S.: Adaptive graphical approach to entity resolution. In: Proc. of the 7th ACM IEEE-CS Joint Conf. on Digital Libraries, New York, pp. 204–213 (2007)
Singla, P., Domingos, P.: Entity resolution with markovlogic. In: ICDM, pp. 572–582 (2006)
Augsten, N., Bohlen, M., Dyreson, C., et aI.: Approximate joins for data—centric XML. In: ICDE 2008, pp. 814–823 (2008)
Li, W., Rong, Z., Chaofeng, S., et al.: A Product Normalization Method for E-Commerce. Chinese Journal of Compters 37(2), 312–325 (2014)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Fang, Q., Hu, Y., Lv, S., Guo, L., Xiao, L., Hu, Y. (2015). IIRS: A Novel Framework of Identifying Commodity Entities on E-commerce Big Data. In: Dong, X., Yu, X., Li, J., Sun, Y. (eds) Web-Age Information Management. WAIM 2015. Lecture Notes in Computer Science(), vol 9098. Springer, Cham. https://doi.org/10.1007/978-3-319-21042-1_44
Download citation
DOI: https://doi.org/10.1007/978-3-319-21042-1_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21041-4
Online ISBN: 978-3-319-21042-1
eBook Packages: Computer ScienceComputer Science (R0)