Skip to main content

Automatic Semantic Typing of Pet E-commerce Products Using Crowdsourced Reviews: An Experimental Study

  • Conference paper
  • First Online:
Knowledge Graphs and Semantic Web (KGSWC 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14382))

Included in the following conference series:

  • 179 Accesses

Abstract

This paper considers the problem of semantically typing pet products using only independent and crowdsourced reviews provided for them on e-commerce websites by customers purchasing the product, rather than detailed product descriptions. Instead of proposing new methods, we consider the feasibility of established text classification algorithms in support of this goal. We conduct a detailed series of experiments, using three different methodologies and a two-level pet product taxonomy. Our results show that classic methods can serve as robust solutions to this problem, and that, while promising when more data is available, language models and word embeddings tend both to be more computationally intensive, as well as being susceptible to degraded performance in the long tail.

These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.google.com/basepages/producttype/taxonomy-with-ids.en-US.txt.

  2. 2.

    https://huggingface.co/datasets/amazon_us_reviews/viewer/Pet_Products_v1_00/train; Accessed: Jan 25, 2022.

  3. 3.

    We could have also done a ‘global’ 80-20 split of the entire manually labeled sample of 1000 products, but this would not have guaranteed that all Level 1 categories were, in fact, represented. This is especially the case due to the ‘long-tail’ nature of the distribution: Level 1 categories like Reptile are significantly less prevalent in the sample and in the overall dataset than Dog or Cat.

  4. 4.

    https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html; Accessed: March 1, 2023.

  5. 5.

    https://docs.python.org/3/library/string.html; Accessed: March 1, 2023.

  6. 6.

    https://scikit-learn.org/stable/modules/feature_extraction.html#stop-words; Accessed: March 1, 2023.

  7. 7.

    https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.normalize.html; Accessed: March 1, 2023.

  8. 8.

    https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegressionCV.html; Accessed: March 1, 2023.

  9. 9.

    https://www.nltk.org/api/nltk.tokenize.html; Accessed: March 1, 2023.

  10. 10.

    Specifically, “fasttext-wiki-news-subwords-300” accessed at https://fasttext.cc/docs/en/english-vectors.html.

  11. 11.

    https://nlp.stanford.edu/projects/glove/; Accessed: March 1, 2023.

  12. 12.

    Specifically, the distilbert-base-uncased model accessed at https://huggingface.co/distilbert-base-uncased.

  13. 13.

    Note that, because we have partitioned products into training and test sets, reviews would not ‘straddle’ the two sets: either all n reviews for a product would be allocated to the training set partition, or to the test set partition.

References

  1. Kejriwal, M.: Domain-Specific Knowledge Graph Construction. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-12375-8

    Book  Google Scholar 

  2. Ehrig, M.: Ontology Alignment: Bridging the Semantic Gap. Springer, New York (2006). https://doi.org/10.1007/978-0-387-36501-5

    Book  Google Scholar 

  3. Kejriwal, M., Shen, K., Ni, C.-C., Torzec, N.: An evaluation and annotation methodology for product category matching in e-commerce. Comput. Ind. 131, 103497 (2021)

    Article  Google Scholar 

  4. Cho, Y.H., Kim, J.K.: Application of web usage mining and product taxonomy to collaborative recommendations in e-commerce. Expert Syst. Appl. 26(2), 233–246 (2004)

    Article  Google Scholar 

  5. Kejriwal, M., Shen, K., Ni, C.-C., Torzec, N.: Transfer-based taxonomy induction over concept labels. Eng. Appl. Artif. Intell. 108, 104548 (2022)

    Article  Google Scholar 

  6. Zhang, W., Cao, H., Lin, L.: Analysis of the future development trend of the pet industry. In: 2022 7th International Conference on Financial Innovation and Economic Development (ICFIED 2022), pp. 1682–1689. Atlantis Press (2022)

    Google Scholar 

  7. Bakos, Y.: The emerging landscape for retail e-commerce. J. Econ. Perspect. 15(1), 69–80 (2001)

    Article  Google Scholar 

  8. Nakashole, N., Tylenda, T., Weikum, G.: Fine-grained semantic typing of emerging entities. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1488–1497 (2013)

    Google Scholar 

  9. Kejriwal, M., Szekely, P.: Supervised typing of big graphs using semantic embeddings. In: Proceedings of the International Workshop on Semantic Big Data, pp. 1–6 (2017)

    Google Scholar 

  10. Kapoor, R., Kejriwal, M., Szekely, P.: Using contexts and constraints for improved geotagging of human trafficking webpages. In: Proceedings of the Fourth International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data, pp. 1–6 (2017)

    Google Scholar 

  11. Kejriwal, M., Szekely, P.: Scalable generation of type embeddings using the ABox. Open J. Semant. Web (OJSW) 4(1), 20–34 (2017)

    Google Scholar 

  12. Gene Ontology Consortium: Creating the gene ontology resource: design and implementation. Genome Res. 11(8), 1425–1433 (2001)

    Google Scholar 

  13. Kejriwal, M., Knoblock, C.A., Szekely, P.: Knowledge Graphs: Fundamentals, Techniques, and Applications. MIT Press, Cambridge (2021)

    MATH  Google Scholar 

  14. Dong, X.L.: Challenges and innovations in building a product knowledge graph. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2869–2869 (2018)

    Google Scholar 

  15. Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., Brown, D.: Text classification algorithms: a survey. Information 10(4), 150 (2019)

    Article  Google Scholar 

  16. Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., Gao, J.: Deep learning-based text classification: a comprehensive review. ACM Comput. Surv. (CSUR) 54(3), 1–40 (2021)

    Article  Google Scholar 

  17. Grandini, M., Bagli, E., Visani, G.: Metrics for multi-class classification: an overview. arXiv preprint arXiv:2008.05756 (2020)

  18. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)

  19. Gheini, M., Kejriwal, M.: Unsupervised product entity resolution using graph representation learning. In: eCOM@ SIGIR (2019)

    Google Scholar 

  20. Sarawagi, S., et al.: Information extraction. Found. Trends® Databases 1(3), 261–377 (2008)

    Google Scholar 

  21. Kejriwal, M.: A meta-engine for building domain-specific search engines. Softw. Impacts 7, 100052 (2021)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mayank Kejriwal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, X. et al. (2023). Automatic Semantic Typing of Pet E-commerce Products Using Crowdsourced Reviews: An Experimental Study. In: Ortiz-Rodriguez, F., Villazón-Terrazas, B., Tiwari, S., Bobed, C. (eds) Knowledge Graphs and Semantic Web. KGSWC 2023. Lecture Notes in Computer Science, vol 14382. Springer, Cham. https://doi.org/10.1007/978-3-031-47745-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-47745-4_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-47744-7

  • Online ISBN: 978-3-031-47745-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics