Learning to Match Product Codes

Excell, Ying; Link, Sebastian

doi:10.1007/978-3-031-08530-7_3

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13343))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

1783 Accesses

Abstract

Most businesses need to manually match their codes of products to the codes that suppliers use for the same products. Our industry-based project investigated two techniques for learning such matches automatically. The first approach uses synonyms when preprocessing data before applying approximate string matching. We found that trigram cosine distance matching outperforms the other six popular matching methods we evaluated. The second approach couples approximate string matching with deep learning. Here, the Siamese Manhattan biLSTM method has higher accuracy and lower run time compared to multiple LSTM 1D CNN. Suggesting the top three candidates to a domain expert leads to a near perfect accuracy with good turnaround time in our real-world business context.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Linking IT Product Records

CASCADENET: An LSTM Based Deep Learning Model for Automated ICD-10 Coding

Transformers are Short-Text Classifiers

References

Yoo, S., Kim, Y.: Web-based knowledge management for sharing product data in virtual enterprises. Int. J. Prod. Econ. 75, 173–183 (2002)
Article Google Scholar
Fensel, D., et al.: Product data integration in B2B e-commerce. IEEE Intell. Syst. 16, 54–59 (2001)
Article Google Scholar
Hansen, J., Hill, N.: Control and audit of electronic data interchange. MIS Q. 13, 403 (1989)
Article Google Scholar
Vesta Central. https://vesta-central.com/. Accessed 01 Apr 2021
Vladimir, L.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys. Doklady 10(8), 707–710 (1966)
MathSciNet Google Scholar
Jaro, M.: Probabilistic linkage of large public health data files. Stat. Med. 14, 491–498 (1995)
Article Google Scholar
Dice, L.: Measures of the amount of ecologic association between species. Ecology 26, 297–302 (1945)
Article Google Scholar
Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI, vol. 6, pp. 775–780 (2006)
Google Scholar
Boytsov, L.: Indexing methods for approximate dictionary searching. J. Exp. Algorithmics 16, 11 (2011)
Article MathSciNet Google Scholar
Wagner, R., Lowrance, R.: An extension of the string-to-string correction problem. J. ACM 22, 177–183 (1975)
Article MathSciNet Google Scholar
Saul, B.N., Christian, D.W.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)
Article Google Scholar
Ukkonen, E.: Approximate string-matching with q-grams and maximal matches. Theoret. Comput. Sci. 92, 191–211 (1992)
Article MathSciNet Google Scholar
Winkler, W.: String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage. In: Proceedings of the Section on Survey Research Methods, pp: 354–359. American Statistical Association (1990)
Google Scholar
Loo, M.: The stringdist package for approximate string matching. R J. 6, 111 (2014)
Article Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2–4, 2013, Workshop Track Proceedings (2013)
Google Scholar
Jonas, M., Aditya, T.: Siamese recurrent architectures for learning sentence similarity. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)
Google Scholar
Schuster, M., Paliwal, K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45, 2673–2681 (1997)
Article Google Scholar
Pontes, l.L., Huet, S., Linhares, A.C., Torres-Moreno, J.-M.: Predicting the semantic textual similarity with siamese CNN and LSTM. In: CORIA-TALN-RJC 2018, Rennes, France, May 14–18, pp. 311–320 (2018)
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25–29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 1746–1751 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, University of Auckland, Auckland, New Zealand
Ying Excell & Sebastian Link

Authors

Ying Excell
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Link
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sebastian Link .

Editor information

Editors and Affiliations

i-SOMET, Inc., Morioka-shi, Iwate, Japan
Hamido Fujita
College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong, China
Philippe Fournier-Viger
Texas State University, San Marcos, TX, USA
Moonis Ali
Shanghai University of Finance and Economics, Shanghai, China
Yinglin Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Excell, Y., Link, S. (2022). Learning to Match Product Codes. In: Fujita, H., Fournier-Viger, P., Ali, M., Wang, Y. (eds) Advances and Trends in Artificial Intelligence. Theory and Practices in Artificial Intelligence. IEA/AIE 2022. Lecture Notes in Computer Science(), vol 13343. Springer, Cham. https://doi.org/10.1007/978-3-031-08530-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-08530-7_3
Published: 30 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08529-1
Online ISBN: 978-3-031-08530-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning to Match Product Codes

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Linking IT Product Records

CASCADENET: An LSTM Based Deep Learning Model for Automated ICD-10 Coding

Transformers are Short-Text Classifiers

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Learning to Match Product Codes

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Linking IT Product Records

CASCADENET: An LSTM Based Deep Learning Model for Automated ICD-10 Coding

Transformers are Short-Text Classifiers

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation