Abstract
Most businesses need to manually match their codes of products to the codes that suppliers use for the same products. Our industry-based project investigated two techniques for learning such matches automatically. The first approach uses synonyms when preprocessing data before applying approximate string matching. We found that trigram cosine distance matching outperforms the other six popular matching methods we evaluated. The second approach couples approximate string matching with deep learning. Here, the Siamese Manhattan biLSTM method has higher accuracy and lower run time compared to multiple LSTM 1D CNN. Suggesting the top three candidates to a domain expert leads to a near perfect accuracy with good turnaround time in our real-world business context.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Yoo, S., Kim, Y.: Web-based knowledge management for sharing product data in virtual enterprises. Int. J. Prod. Econ. 75, 173–183 (2002)
Fensel, D., et al.: Product data integration in B2B e-commerce. IEEE Intell. Syst. 16, 54–59 (2001)
Hansen, J., Hill, N.: Control and audit of electronic data interchange. MIS Q. 13, 403 (1989)
Vesta Central. https://vesta-central.com/. Accessed 01 Apr 2021
Vladimir, L.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys. Doklady 10(8), 707–710 (1966)
Jaro, M.: Probabilistic linkage of large public health data files. Stat. Med. 14, 491–498 (1995)
Dice, L.: Measures of the amount of ecologic association between species. Ecology 26, 297–302 (1945)
Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI, vol. 6, pp. 775–780 (2006)
Boytsov, L.: Indexing methods for approximate dictionary searching. J. Exp. Algorithmics 16, 11 (2011)
Wagner, R., Lowrance, R.: An extension of the string-to-string correction problem. J. ACM 22, 177–183 (1975)
Saul, B.N., Christian, D.W.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)
Ukkonen, E.: Approximate string-matching with q-grams and maximal matches. Theoret. Comput. Sci. 92, 191–211 (1992)
Winkler, W.: String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage. In: Proceedings of the Section on Survey Research Methods, pp: 354–359. American Statistical Association (1990)
Loo, M.: The stringdist package for approximate string matching. R J. 6, 111 (2014)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2–4, 2013, Workshop Track Proceedings (2013)
Jonas, M., Aditya, T.: Siamese recurrent architectures for learning sentence similarity. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)
Schuster, M., Paliwal, K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45, 2673–2681 (1997)
Pontes, l.L., Huet, S., Linhares, A.C., Torres-Moreno, J.-M.: Predicting the semantic textual similarity with siamese CNN and LSTM. In: CORIA-TALN-RJC 2018, Rennes, France, May 14–18, pp. 311–320 (2018)
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25–29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 1746–1751 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Excell, Y., Link, S. (2022). Learning to Match Product Codes. In: Fujita, H., Fournier-Viger, P., Ali, M., Wang, Y. (eds) Advances and Trends in Artificial Intelligence. Theory and Practices in Artificial Intelligence. IEA/AIE 2022. Lecture Notes in Computer Science(), vol 13343. Springer, Cham. https://doi.org/10.1007/978-3-031-08530-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-08530-7_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08529-1
Online ISBN: 978-3-031-08530-7
eBook Packages: Computer ScienceComputer Science (R0)