Abstract
In recent years Machine Learning and Artificial Intelligence are reshaping the landscape of e-commerce and retail. Using advanced analytics, behavioral modeling, and inference, representatives of these industries can leverage collected data and increase their market performance. To perform assortment optimization – one of the most fundamentals problems in retail – one has to identify products that are present in the competitors’ portfolios. It is not possible without effective product matching. The paper deals with finding identical products in the offer of different retailers. The task is performed using a text-mining approach, assuming that the data may contain incomplete information. Besides the description of the algorithm, the results for real-world data fetched from the offers of two consumer electronics retailers are being demonstrated.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Amshakala, K., Nedunchezhian, R.: Using fuzzy logic for product matching. In: Krishnan, G.S.S., Anitha, R., Lekshmi, R.S., Kumar, M.S., Bonato, A., Graña, M. (eds.) Computational Intelligence, Cyber Security and Computational Models. AISC, vol. 246, pp. 171–179. Springer, New Delhi (2014). https://doi.org/10.1007/978-81-322-1680-3_20
Rusdah, D.A., Murfi, H.: XGBoost in handling missing values for life insurance risk prediction. SN Appl. Sci. 2(8), 1–10 (2020). https://doi.org/10.1007/s42452-020-3128-y
Bernstein, F., Kök, A.G., Xie, L.: Dynamic assortment customization with limited inventories. Manuf. Serv. Oper. Manag. 17, 538–553 (2015)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. CoRR abs/1607.04606 (2016). http://arxiv.org/abs/1607.04606
Charytanowicz, M., Niewczas, J., Kulczycki, P., Kowalski, P.A., Łukasik, S.: Discrimination of wheat grain varieties using x-ray images. In: Piętka, E., Badura, P., Kawa, J., Wieclawek, W. (eds.) Information Technologies in Medicine. AISC, vol. 471, pp. 39–50. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-39796-2_4
Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. KDD 2016, Association for Computing Machinery, New York, NY, USA (2016). https://doi.org/10.1145/2939672.2939785, https://doi.org/10.1145/2939672.2939785
Damerau, F.J.: A technique for computer detection and correction of spellingerrors. Commun. ACM 7(3), 171–176 (1964). https://doi.org/10.1145/363958.363994
Edelman: 2019 Edelman AI Survey. Whitepaper, Edelman (2019)
Faris, H., Aljarah, I., Mirjalili, S.: Training feedforward neural networks using multi-verse optimizer for binary classification problems. Appl. Intell. 45(2), 322–332 (2016). https://doi.org/10.1007/s10489-016-0767-1
Gaspar, P., Carbonell, J., Oliveira, J.: On the parameter optimization of support vector machines for binary classification. J. Integr. Bioinform. 9(3), 201 (2012). https://doi.org/10.2390/biecoll-jib-2012-201
Gomaa, W., Fahmy, A.: A survey of text similarity approaches. Int. J. Comput. Appl. 68(13), 13–18 (2013). https://doi.org/10.5120/11638-7118
Ismail, M., Ibrahim, M., Sanusi, Z., Cemal Nat, M.: Data mining in electronic commerce: benefits and challenges. Int. J. Commun. Netw. Syst. Sci. 8, 501–509 (2015). https://doi.org/10.4236/ijcns.2015.812045
Ito, S., Fujimaki, R.: Large-scale price optimization via network flow. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 3855–3863. Curran Associates, Inc. (2016). http://papers.nips.cc/paper/6301-large-scale-price-optimization-via-network-flow.pdf
Ivchenko, G., Honov, S.: On the jaccard similarity test. J. Math. Sci. 88(6), 789–794 (1998)
Jolliffe, I.: Principal Component Analysis. Springer Verlag, New York (2002)
Köpcke, H., Thor, A., Thomas, S., Rahm, E.: Tailoring entity resolution for matching product offers. In: Proceedings of the 15th International Conference on Extending Database Technology, pp. 545–550. EDBT 2012, Association for Computing Machinery, New York, NY, USA (2012). https://doi.org/10.1145/2247596.2247662, https://doi.org/10.1145/2247596.2247662
Köpcke, H., Rahm, E.: Frameworks for entity matching: a comparison. Data Knowl. Eng. 69(2), 197 – 210 (2010).https://doi.org/10.1016/j.datak.2009.10.003,http://www.sciencedirect.com/science/article/pii/S0169023X09001451
Liu, L., Anlong Ming, Ma, H., Zhang, X.: A binary-classification-tree based framework for distributed target classification in multimedia sensor networks. In: 2012 Proceedings IEEE INFOCOM, pp. 594–602 (March 2012). https://doi.org/10.1109/INFCOM.2012.6195802
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality (2013)
Ristoski, P., Petrovski, P., Mika, P., Paulheim, H.: A machine learning approach for product matching and categorization: use case: enriching product ads with semantic structured data. Semant. Web 9, 1–22 (2018). https://doi.org/10.3233/SW-180300
Schnabel, T., Labutov, I., Mimno, D., Joachims, T.: Evaluation methods for unsupervised word embeddings. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp. 298–307 (2015)
Shah, K., Kopru, S., Ruvini, J.D.: Neural network based extreme classification and similarity models for product matching. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 3 (Industry Papers), pp. 8–15. Association for Computational Linguistics, New Orleans - Louisiana (Jun 2018). https://doi.org/10.18653/v1/N18-3002, https://www.aclweb.org/anthology/N18-3002
Srinivasa Raghavan, N.R.: Data mining in e-commerce: a survey. Sadhana 30(2), 275–289 (2005). https://doi.org/10.1007/BF02706248
US Census Bureau: quarterly retail e-commerce sales. News report CB19-170, US Census Bureau,19 November 2019
Vieira, A., Ribeiro, B.: Introduction to deep learning business applications for developers: from Conversational Bots in Customer Service to Medical Image Processing. Apress (2018). https://books.google.pl/books?id=K3ZZDwAAQBAJ
Yu, G., Xia, C., Guo, X.: Research on web data mining and its application in electronic commerce. In: 2009 International Conference on Computational Intelligence and Software Engineering, pp. 1–3 (December 2009). https://doi.org/10.1109/CISE.2009.5363366
Yujian, L., Bo, L.: A normalized levenshtein distance metric. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1091–1095 (2007). https://doi.org/10.1109/TPAMI.2007.1078
Acknowledgment
The work was supported by the Faculty of Physics and Applied Computer Science AGH UST statutory tasks within the subsidy of MEiN.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Łukasik, S., Michałowski, A., Kowalski, P.A., Gandomi, A.H. (2021). Text-Based Product Matching with Incomplete and Inconsistent Items Descriptions. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12743. Springer, Cham. https://doi.org/10.1007/978-3-030-77964-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-77964-1_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77963-4
Online ISBN: 978-3-030-77964-1
eBook Packages: Computer ScienceComputer Science (R0)