Skip to main content

Abstract

Most businesses need to manually match their codes of products to the codes that suppliers use for the same products. Our industry-based project investigated two techniques for learning such matches automatically. The first approach uses synonyms when preprocessing data before applying approximate string matching. We found that trigram cosine distance matching outperforms the other six popular matching methods we evaluated. The second approach couples approximate string matching with deep learning. Here, the Siamese Manhattan biLSTM method has higher accuracy and lower run time compared to multiple LSTM 1D CNN. Suggesting the top three candidates to a domain expert leads to a near perfect accuracy with good turnaround time in our real-world business context.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Yoo, S., Kim, Y.: Web-based knowledge management for sharing product data in virtual enterprises. Int. J. Prod. Econ. 75, 173–183 (2002)

    Article  Google Scholar 

  2. Fensel, D., et al.: Product data integration in B2B e-commerce. IEEE Intell. Syst. 16, 54–59 (2001)

    Article  Google Scholar 

  3. Hansen, J., Hill, N.: Control and audit of electronic data interchange. MIS Q. 13, 403 (1989)

    Article  Google Scholar 

  4. Vesta Central. https://vesta-central.com/. Accessed 01 Apr 2021

  5. Vladimir, L.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys. Doklady 10(8), 707–710 (1966)

    MathSciNet  Google Scholar 

  6. Jaro, M.: Probabilistic linkage of large public health data files. Stat. Med. 14, 491–498 (1995)

    Article  Google Scholar 

  7. Dice, L.: Measures of the amount of ecologic association between species. Ecology 26, 297–302 (1945)

    Article  Google Scholar 

  8. Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI, vol. 6, pp. 775–780 (2006)

    Google Scholar 

  9. Boytsov, L.: Indexing methods for approximate dictionary searching. J. Exp. Algorithmics 16, 11 (2011)

    Article  MathSciNet  Google Scholar 

  10. Wagner, R., Lowrance, R.: An extension of the string-to-string correction problem. J. ACM 22, 177–183 (1975)

    Article  MathSciNet  Google Scholar 

  11. Saul, B.N., Christian, D.W.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)

    Article  Google Scholar 

  12. Ukkonen, E.: Approximate string-matching with q-grams and maximal matches. Theoret. Comput. Sci. 92, 191–211 (1992)

    Article  MathSciNet  Google Scholar 

  13. Winkler, W.: String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage. In: Proceedings of the Section on Survey Research Methods, pp: 354–359. American Statistical Association (1990)

    Google Scholar 

  14. Loo, M.: The stringdist package for approximate string matching. R J. 6, 111 (2014)

    Article  Google Scholar 

  15. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2–4, 2013, Workshop Track Proceedings (2013)

    Google Scholar 

  16. Jonas, M., Aditya, T.: Siamese recurrent architectures for learning sentence similarity. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)

    Google Scholar 

  17. Schuster, M., Paliwal, K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45, 2673–2681 (1997)

    Article  Google Scholar 

  18. Pontes, l.L., Huet, S., Linhares, A.C., Torres-Moreno, J.-M.: Predicting the semantic textual similarity with siamese CNN and LSTM. In: CORIA-TALN-RJC 2018, Rennes, France, May 14–18, pp. 311–320 (2018)

    Google Scholar 

  19. Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25–29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 1746–1751 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sebastian Link .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Excell, Y., Link, S. (2022). Learning to Match Product Codes. In: Fujita, H., Fournier-Viger, P., Ali, M., Wang, Y. (eds) Advances and Trends in Artificial Intelligence. Theory and Practices in Artificial Intelligence. IEA/AIE 2022. Lecture Notes in Computer Science(), vol 13343. Springer, Cham. https://doi.org/10.1007/978-3-031-08530-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-08530-7_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-08529-1

  • Online ISBN: 978-3-031-08530-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics