Entity name recognition of cross-border e-commerce commodity titles based on TWs-LSTM

Luo, Yongcong; Ma, Jing; Li, Chi

doi:10.1007/s10660-019-09371-6

Entity name recognition of cross-border e-commerce commodity titles based on TWs-LSTM

Published: 03 September 2019

Volume 20, pages 405–426, (2020)
Cite this article

Electronic Commerce Research Aims and scope Submit manuscript

Yongcong Luo¹,
Jing Ma¹ &
Chi Li²

765 Accesses
16 Citations
Explore all metrics

Abstract

Commodity information must be matched to HSCode so as to be quickly through customs for export. So it is particularly important to identify entity name in the commodity title of e-commerce platform quickly and accurately. Aim at the problem, an approach based on TWs-LSTM is proposed to identify the entity name of commodity. In this paper, we apply TFIDF algorithm to manipulate text corpus of the commodity for getting the weight matrix of the commodity words. Meanwhile, we use the Word2Vec model to represent the semantic meanings of the words extracted from the bag of words. Then, the weight vector of commodity titles and every word vector of the title are combined into a new one-dimensional vector. We use these one-dimensional vectors to represent the commodity titles, named TWs model. Finally, we put the TWs vector into the LSTM for commodity entity name recognition. In the experimental stage, we compare the TWs-LSTM model with other text processing models for experimental calculation by dividing the commodity entity name data into a training set and a testing set. After applying the TWs-LSTM model, the F1-Score reached 64.58% with the commodity title corpus of the Tmall platform, where the TWs-LSTM achieves a state-of-the-art in comparison with the baseline models and previous studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine Learning Based Cross-border E-Commerce Commodity Customs Product Name Recognition Algorithm

Application of BiLSTM-CRF model with different embeddings for product name extraction in unstructured Turkish text

Article Open access 21 February 2024

Serdar Arslan

An effective deep learning method with multi-feature and attention mechanism for recognition of Chinese rice variety information

Article 01 March 2022

Helong Yu, Ziqing Li, … Huiling Chen

References

Amancio, D. R., Oliveira, O. N., Jr., & da Fontoura Costa, L. (2012). Identification of literary movements using complex networks to represent texts. New Journal of Physics,14(4), 043029. https://doi.org/10.1088/1367-2630/14/4/043029.
Article Google Scholar
Chen, F., & Jahanshahi, M. R. (2018). NB-CNN: Deep learning-based crack detection using convolutional neural network and Naive Bayes data fusion. IEEE Transactions on Industrial Electronics,65(5), 4392–4400. https://doi.org/10.1109/TIE.2017.2764844.
Article Google Scholar
De Deyne, S., Navarro, D. J., & Storms, G. (2013). Better explanations of lexical and semantic cognition using networks derived from continued rather than single-word associations. Behavior Research Methods,45(2), 480–498. https://doi.org/10.3758/s13428-012-0260-7.
Article Google Scholar
Ercan, G., & Cicekli, Y. (2007). Using lexical chains for keyword extraction. Information Processing and Management,43, 1705–1714. https://doi.org/10.1016/j.ipm.2007.01.015.
Article Google Scholar
Frinken, V., Fischer, A., Baumgartner, M., & Bunke, H. (2014). Keyword spotting for self-training of BLSTM NN based handwriting recognition systems. Pattern Recognition,47(3), 1073–1082. https://doi.org/10.1016/j.patcog.2013.06.030.
Article Google Scholar
Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to forget: Continual prediction with LSTM. Neural Computation,12(10), 2451–2471. https://doi.org/10.1162/089976600300015015.
Article Google Scholar
Grabska-Gradzinska, I., Kulig, A., Kwapien, J., & Drozdz, S. (2012). Complex network analysis of literary and scientific texts. International Journal of Modern Physics C,23(7), 1250051. https://doi.org/10.1142/S0129183112500519.
Article Google Scholar
Habibi, M., Weber, L., Neves, M., Wiegandt, D. L., & Leser, U. (2017). Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics,33(14), I37–I48. https://doi.org/10.1093/bioinformatics/btx228.
Article Google Scholar
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation,9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735.
Article Google Scholar
Hu, K., Wu, H., & Qi, K. (2018). A domain keyword analysis approach extending Term frequency-keyword active index with google word2vec model. Scientometrics,114(3), 1031–1068. https://doi.org/10.1007/s11192-017-2574-9.
Article Google Scholar
Jung, J. J. (2012). Online named entity recognition method for microtexts in social networking services: A case study of twitter. Expert Systems with Applications,39(9), 8066–8070. https://doi.org/10.1016/j.eswa.2012.01.136.
Article Google Scholar
Konkol, M., Brychc´ın, T., & Konop´ık, M. (2015). Latent semantics in named entity recognition. Expert Systems with Applications,42, 3470–3479.
Article Google Scholar
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM,60(6), 84–90. https://doi.org/10.1145/3065386.
Article Google Scholar
Lerner, A. J., Ogrocki, P. K., & Thomas, P. J. (2009). Thomas network graph analysis of category fluency testing. Cognitive and Behavioral Neurology,22(1), 45–52. https://doi.org/10.1097/wnn.0b013e318192ccaf.
Article Google Scholar
Li, J., Rao, Y., Jin, F., Chen, H., & Xiang, X. (2016). Multi-label maximum entropy model for social emotion classification over short text. Neurocomputing,210, 247–256. https://doi.org/10.1016/j.neucom.2016.03.088.
Article Google Scholar
Li, S., Sun, Y., & Soergel, D. (2016). A new method for automatically constructing domain-oriented term taxonomy based on weighted word co-occurrence analysis. Scientometrics,108(2), 1005. https://doi.org/10.1007/s11192-016-1832-6.
Article Google Scholar
Lu, Y., & Qin, X. (2014). A coupled K-nearest neighbour and Bayesian neural network model for daily rainfall downscaling. International Journal of Climatology,34(11), 3221–3236. https://doi.org/10.1002/joc.3906.
Article Google Scholar
Luhn, H. (1958). The automatic creation of literature abstracts. IBM Journal of Research and Development,2(2), 159–165. https://doi.org/10.1147/rd.22.0159.
Article Google Scholar
Magerman, T., Bart, L. V., & Song, X. (2010). Exploring the feasibility and accuracy of latent semantic analysis based text mining techniques to detect similarity between patent documents and scientific publications. Scientometrics,82(2), 289–306. https://doi.org/10.1007/s11192-009-0046-6.
Article Google Scholar
Onan, A., Korukoğlu, S., & Bulut, H. (2016). Ensemble of keyword extraction methods and classifiers in text classification. Expert Systems with Applications,57, 232–247. https://doi.org/10.1016/j.eswa.2016.03.045.
Article Google Scholar
Robertson, S. (2004). Understanding inverse document frequency: On theoretical arguments for IDF. Journal of Documentation,60(5), 503–520. https://doi.org/10.1108/00220410410560582.
Article Google Scholar
Round, G., & Roper, S. (2015). Untangling the brand name from the branded entity: The conceptualisation and value of the established brand name. European Journal of Marketing,49(11/12), 1941–1960. https://doi.org/10.1108/EJM-09-2014-0541.
Article Google Scholar
Sahin, G. (2017). Turkish document classification based on word2vec and SVM classifier. In 2017 25th signal processing and communications applications conference (SIU), pp.1–4.
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks,61, 85–117. https://doi.org/10.1016/j.neunet.2014.09.003.
Article Google Scholar
Seker, G. A., & Eryiğit, G. (2017). Extending a CRF-based named entity recognition model for Turkish well formed text and user generated content. Semantic Web,8(5), 625–642. https://doi.org/10.3233/sw-170253.
Article Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp. 1–9.
Van Tran, C., Nguyen, T. T., Hoang, D. T., Hwang, D., & Nguyen, N. T. (2017). Active learning-based approach for named entity recognition on short text streams. In A. Zgrzywa, K. Choroś, & A. Siemiński (Eds.), Multimedia and Network Information Systems (pp. 321–330). Cham: Springer.
Chapter Google Scholar
Wang, Y., Nelissen, N., Adamczuk, K., De Weer, A. S., Vandenbulcke, M., Sunaert, S., et al. (2014). Reproducibility and robustness of graph measures of the associative-semantic network. PLoS ONE,9(12), 1–28. https://doi.org/10.1371/journal.pone.0115215.
Article Google Scholar
Wei, D., Wang, B., Lin, G., Liu, D., Dong, Z., Liu, H., et al. (2017). Research on unstructured text data mining and fault classification based on RNN-LSTM with malfunction inspection report. Energies,10(3), 406. https://doi.org/10.3390/en10030406.
Article Google Scholar
Wu, H., & Salton, G. (1981). A comparison of search term weighting: Term relevance vs. inverse document frequency. In Proceedings of the 4th annual international ACM SIGIR conference on information storage and retrieval: Theoretical issues in information retrieval, SIGIR’81, pp. 30–39. ACM, New York, NY, USA. https://doi.org/10.1145/511754.511759.
Zhu, Q., Li, X., Conesa, A., & Pereira, C. (2018). GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text. Bioinformatics,34(9), 1547–1554. https://doi.org/10.1093/bioinformatics/btx815.
Article Google Scholar
Zhuang, Y., Chang, X., Qian, Y., & Yu, K. (2016) Unrestricted vocabulary keyword spotting using LSTM-CTC. In 17th annual conference of the international-speech-communication-association (INTERSPEECH 2016), pp. 938–942. https://doi.org/10.21437/interspeech.2016-753.

Download references

Acknowledgements

The authors are grateful for the helpful suggestions from the anonymous reviewers. This research was supported by the National Natural Science Foundation of China (No. 71373123) and the Fundamental Research Funds for the Central Universities (No. NW2018004).

Author information

Authors and Affiliations

College of Economics and Management, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
Yongcong Luo & Jing Ma
Cainiao Logistics Co., Ltd., Hangzhou, 311100, China
Chi Li

Authors

Yongcong Luo
View author publications
You can also search for this author in PubMed Google Scholar
Jing Ma
View author publications
You can also search for this author in PubMed Google Scholar
Chi Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jing Ma.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Among the ten digits in the HSCode, the first six digits are the codes of international common standards for dividing all international trade commodities into 22 sections and 98 chapters, and then last four digits are individually coded by countries according to their own conditions. Each chapter includes the commodity heading and subheading.

The Fig. 11 shows the HSCode and the commodity information, and can be described as follows:

The ten numbers 9603210000 in the red box: 96 represents the chapter, 03 represents the heading, 21 represents the subheading and the 0000 coded by the China Customs.
The information in the blue box is commodity name ‘toothbrush’.
The items in the black box: ‘cleaning’ (function of commodity), ‘polypropylene/nylon’ (constituents of commodity) and the ‘LION brand’ (commodity brand).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Luo, Y., Ma, J. & Li, C. Entity name recognition of cross-border e-commerce commodity titles based on TWs-LSTM. Electron Commer Res 20, 405–426 (2020). https://doi.org/10.1007/s10660-019-09371-6

Download citation

Published: 03 September 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s10660-019-09371-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Entity name recognition of cross-border e-commerce commodity titles based on TWs-LSTM

Abstract

Access this article

Similar content being viewed by others

Machine Learning Based Cross-border E-Commerce Commodity Customs Product Name Recognition Algorithm

Application of BiLSTM-CRF model with different embeddings for product name extraction in unstructured Turkish text

An effective deep learning method with multi-feature and attention mechanism for recognition of Chinese rice variety information

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Entity name recognition of cross-border e-commerce commodity titles based on TWs-LSTM

Abstract

Access this article

Similar content being viewed by others

Machine Learning Based Cross-border E-Commerce Commodity Customs Product Name Recognition Algorithm

Application of BiLSTM-CRF model with different embeddings for product name extraction in unstructured Turkish text

An effective deep learning method with multi-feature and attention mechanism for recognition of Chinese rice variety information

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation