Engineering doc2vec for automatic classification of product descriptions on O2O applications

Lee, Hana; Yoon, Young

doi:10.1007/s10660-017-9268-5

Engineering doc2vec for automatic classification of product descriptions on O2O applications

Published: 28 August 2017

Volume 18, pages 433–456, (2018)
Cite this article

Electronic Commerce Research Aims and scope Submit manuscript

1579 Accesses
15 Citations
Explore all metrics

Abstract

In this paper, we develop an automatic product classifier that can become a vital part of a natural user interface for an integrated online-to-offline (O2O) service platform. We devise a novel feature extraction technique to represent product descriptions that are expressed in full natural language sentences. We specifically adapt doc2vec algorithm that implements the document embedding technique. Doc2vec is a way to predict a vector of salient contexts that are specific to a document. Our classifier is trained to classify a product description based on the doc2vec-based feature that is augmented in various ways. We trained and tested our classifier with up to 53,000 real product descriptions from Groupon, a popular social commerce site that also offers O2O commerce features such as online ordering for in-store pick-up. Compared to the baseline approaches of using bag-of-words modeling and word-level embedding, our classifier showed significant improvement in terms of classification accuracy when our adapted doc2vec-based feature was used.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-class Categorization of User-Generated Content in a Domain Specific Medium: Inferring Product Specifications from E-Commerce Marketplaces

Classifying multi-level product categories using dynamic masking and transformer models

Article 01 March 2022

Ozan Ozyegen, Hadi Jahanshahi, … Ayşe Başar

Translate2Classify: Machine Translation for E-Commerce Product Categorization in Comparison with Machine Learning & Deep Learning Classification

Notes

https://radimrehurek.com/gensim/.
A comprehensive explanation is available in [32]. A visual explanation is available at https://ronxin.github.io/wevi/.

References

Abrahams, S. L. (2008). Handmade online: The crafting of commerce, aesthetics and community on Etsy com. Chapel Hill: The University of North Carolina.
Google Scholar
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
Google Scholar
Dai, A. M., Olah, C., & Le, Q. V. (2015). Document embedding with paragraph vectors. arXiv preprint arXiv:1507.07998.
Das, P., Xia, Y., Levine, A., Di Fabbrizio, G., & Datta, A. (2016). Large-scale taxonomy categorization for noisy product listings. In 2016 IEEE international conference on big data (big data) (pp. 3885–3894). IEEE.
Ding, Y., Korotkiy, M., Omelayenko, B., Kartseva, V., Zykov, V., Klein, M., et al. (2002). Goldenbullet: Automated classification of product data in e-commerce. In Proceedings of the 5th international conference on business information systems.
Dos Santos, C. N., & Gatti, M. (2014). Deep convolutional neural networks for sentiment analysis of short texts. In COLING (pp. 69–78).
Goldberg, Y., & Levy, O. (2014). word2vec explained: Deriving mikolov et al.’s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722.
Gottipati, S. (2012). E-commerce product categorization srinivasu gottipati and mumtaz vauhkonen. Stanford C229 Final Projects.
Hashimoto, K., Stenetorp, P., Miwa, M., & Tsuruoka, Y. (2015). Task-oriented learning of word embeddings for semantic relation classification. arXiv preprint arXiv:1503.00095.
Hull, D. A., et al. (1996). Stemming algorithms: A case study for detailed evaluation. JASIS, 47(1), 70–84.
Article Google Scholar
Ju, R., Zhou, P., Li, C.H., & Liu, L. (2015). An efficient method for document categorization based on word2vec and latent semantic analysis. In 2015 IEEE international conference on computer and information technology; ubiquitous computing and communications; dependable, autonomic and secure computing; pervasive intelligence and computing (CIT/IUCC/DASC/PICOM) (pp. 2276–2283). IEEE.
Kim, Y. (2014). Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882.
Kim, Y. G., Lee, T., Chun, J., & Lee, S. G. (2006). Modified naïve bayes classifier for e-catalog classification. In Data engineering issues in e-commerce and services (pp. 246–257). Springer.
Kohavi, R. (1996). Scaling up the accuracy of naive-bayes classifiers: A decision-tree hybrid. In KDD (vol. 96, pp. 202–207). Citeseer.
Kononenko, I. (1993). Inductive and bayesian learning in medical diagnosis. Applied Artificial Intelligence an International Journal, 7(4), 317–337.
Article Google Scholar
Kozareva, Z. (2015). Everyone likes shopping! multi-class product categorization for e-commerce. In HLT-NAACL (pp. 1329–1333).
Le, Q. V., & Mikolov, T. (2014). Distributed representations of sentences and documents. In ICML (vol. 14, pp. 1188–1196).
Lee, H., Lim, E., Cho, Y., & Yoon, Y. (2016). Automatic classification of product data for natural general-purpose o2o application user interface. In The 2016 fall conference of the KIPS (pp. 382–385).
Lee, J. H., Ha, J., Jung, J. Y., & Lee, S. (2013). Semantic contextual advertising based on the open directory project. ACM Transactions on the Web (TWEB), 7(4), 24.
Google Scholar
Lee, Y. E., & Benbasat, I. (2003). Interface design for mobile commerce. Communications of the ACM, 46(12), 48–52.
Article Google Scholar
Liu, Y., Liu, Z., Chua, T. S., & Sun, M. (2015). Topical word embeddings. In AAAI (pp. 2418–2424).
Lu, S. H., Chiang, D. A., Keh, H. C., & Huang, H. H. (2010). Chinese text classification by the naïve bayes classifier and the associative classifier with multiple confidence threshold values. Knowledge-Based Systems, 23(6), 598–604.
Article Google Scholar
Ma, C., Xu, W., Li, P., & Yan, Y. (2015). Distributional representations of words for short text classification. In VS@ HLT-NAACL (pp. 33–38).
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Palangi, H., Deng, L., Shen, Y., Gao, J., He, X., Chen, J., et al. (2016). Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 24(4), 694–707.
Article Google Scholar
Panetto, H., Dassisti, M., & Tursi, A. (2012). Onto-pdm: Product-driven ontology for product data management interoperability within manufacturing process environment. Advanced Engineering Informatics, 26(2), 334–348.
Article Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12, 2825–2830.
Google Scholar
Perez, S. (2014). Etsy moves further into the offline world with launch of card reader for in-person payments. https://techcrunch.com/2014/10/23/etsy-moves-further-into-the-offline-world-with-launch-of-card-reader-for-in-person-payments/.
Ren, Y., Wang, R., & Ji, D. (2016). A topic-enhanced word embedding for twitter sentiment classification. Information Sciences, 369, 188–198.
Article Google Scholar
Ren, Y., Zhang, Y., Zhang, M., & Ji, D. (2016). Context-sensitive twitter sentiment classification using neural network. In AAAI (pp. 215–221).
Robertson, S. (2004). Understanding inverse document frequency: On theoretical arguments for idf. Journal of Documentation, 60(5), 503–520.
Article Google Scholar
Rong, X. (2014). word2vec parameter learning explained. arXiv preprint arXiv:1411.2738.
Sahami, M., Dumais, S., Heckerman, D., & Horvitz, E. (1998). A bayesian approach to filtering junk e-mail. In Learning for text categorization: Papers from the 1998 workshop (vol. 62, pp. 98–105).
Scholl, N. B., Crawford, J., & Puckett, J. (2013). Online ordering for in-shop service (2013). US Patent App. 13/839,414.
Staykova, K. S., & Damsgaard, J. (2016). Platform expansion design as strategic choice: The case of wechat and kakaotalk. http://aisel.aisnet.org/ecis2016_rp/78.
Tang, D. (2015). Sentiment-specific representation learning for document-level sentiment analysis. In Proceedings of the eighth ACM international conference on web search and data mining (pp. 447–452). ACM.
Wang, Q., Garrity, G. M., Tiedje, J. M., & Cole, J. R. (2007). Naive bayesian classifier for rapid assignment of rrna sequences into the new bacterial taxonomy. Applied and Environmental Microbiology, 73(16), 5261–5267.
Article Google Scholar
Yang, X., Macdonald, C., & Ounis, I. (2016). Using word embeddings in twitter election classification. arXiv preprint arXiv:1606.07006.

Download references

Acknowledgements

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2016R1D1A1B03931324) and 2017 Hongik University Research Fund.

Author information

Authors and Affiliations

Department of Computer Engineering, Hongik University, 94 Seocho-gu Wowsan-ro, Seoul, South Korea
Hana Lee & Young Yoon

Authors

Hana Lee
View author publications
You can also search for this author in PubMed Google Scholar
Young Yoon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Young Yoon.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, H., Yoon, Y. Engineering doc2vec for automatic classification of product descriptions on O2O applications. Electron Commer Res 18, 433–456 (2018). https://doi.org/10.1007/s10660-017-9268-5

Download citation

Published: 28 August 2017
Issue Date: September 2018
DOI: https://doi.org/10.1007/s10660-017-9268-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Engineering doc2vec for automatic classification of product descriptions on O2O applications

Abstract

Access this article

Similar content being viewed by others

Multi-class Categorization of User-Generated Content in a Domain Specific Medium: Inferring Product Specifications from E-Commerce Marketplaces

Classifying multi-level product categories using dynamic masking and transformer models

Translate2Classify: Machine Translation for E-Commerce Product Categorization in Comparison with Machine Learning & Deep Learning Classification

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Engineering doc2vec for automatic classification of product descriptions on O2O applications

Abstract

Access this article

Similar content being viewed by others

Multi-class Categorization of User-Generated Content in a Domain Specific Medium: Inferring Product Specifications from E-Commerce Marketplaces

Classifying multi-level product categories using dynamic masking and transformer models

Translate2Classify: Machine Translation for E-Commerce Product Categorization in Comparison with Machine Learning & Deep Learning Classification

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation