Skip to main content
Log in

Entity name recognition of cross-border e-commerce commodity titles based on TWs-LSTM

  • Published:
Electronic Commerce Research Aims and scope Submit manuscript

Abstract

Commodity information must be matched to HSCode so as to be quickly through customs for export. So it is particularly important to identify entity name in the commodity title of e-commerce platform quickly and accurately. Aim at the problem, an approach based on TWs-LSTM is proposed to identify the entity name of commodity. In this paper, we apply TFIDF algorithm to manipulate text corpus of the commodity for getting the weight matrix of the commodity words. Meanwhile, we use the Word2Vec model to represent the semantic meanings of the words extracted from the bag of words. Then, the weight vector of commodity titles and every word vector of the title are combined into a new one-dimensional vector. We use these one-dimensional vectors to represent the commodity titles, named TWs model. Finally, we put the TWs vector into the LSTM for commodity entity name recognition. In the experimental stage, we compare the TWs-LSTM model with other text processing models for experimental calculation by dividing the commodity entity name data into a training set and a testing set. After applying the TWs-LSTM model, the F1-Score reached 64.58% with the commodity title corpus of the Tmall platform, where the TWs-LSTM achieves a state-of-the-art in comparison with the baseline models and previous studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Amancio, D. R., Oliveira, O. N., Jr., & da Fontoura Costa, L. (2012). Identification of literary movements using complex networks to represent texts. New Journal of Physics,14(4), 043029. https://doi.org/10.1088/1367-2630/14/4/043029.

    Article  Google Scholar 

  2. Chen, F., & Jahanshahi, M. R. (2018). NB-CNN: Deep learning-based crack detection using convolutional neural network and Naive Bayes data fusion. IEEE Transactions on Industrial Electronics,65(5), 4392–4400. https://doi.org/10.1109/TIE.2017.2764844.

    Article  Google Scholar 

  3. De Deyne, S., Navarro, D. J., & Storms, G. (2013). Better explanations of lexical and semantic cognition using networks derived from continued rather than single-word associations. Behavior Research Methods,45(2), 480–498. https://doi.org/10.3758/s13428-012-0260-7.

    Article  Google Scholar 

  4. Ercan, G., & Cicekli, Y. (2007). Using lexical chains for keyword extraction. Information Processing and Management,43, 1705–1714. https://doi.org/10.1016/j.ipm.2007.01.015.

    Article  Google Scholar 

  5. Frinken, V., Fischer, A., Baumgartner, M., & Bunke, H. (2014). Keyword spotting for self-training of BLSTM NN based handwriting recognition systems. Pattern Recognition,47(3), 1073–1082. https://doi.org/10.1016/j.patcog.2013.06.030.

    Article  Google Scholar 

  6. Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to forget: Continual prediction with LSTM. Neural Computation,12(10), 2451–2471. https://doi.org/10.1162/089976600300015015.

    Article  Google Scholar 

  7. Grabska-Gradzinska, I., Kulig, A., Kwapien, J., & Drozdz, S. (2012). Complex network analysis of literary and scientific texts. International Journal of Modern Physics C,23(7), 1250051. https://doi.org/10.1142/S0129183112500519.

    Article  Google Scholar 

  8. Habibi, M., Weber, L., Neves, M., Wiegandt, D. L., & Leser, U. (2017). Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics,33(14), I37–I48. https://doi.org/10.1093/bioinformatics/btx228.

    Article  Google Scholar 

  9. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation,9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735.

    Article  Google Scholar 

  10. Hu, K., Wu, H., & Qi, K. (2018). A domain keyword analysis approach extending Term frequency-keyword active index with google word2vec model. Scientometrics,114(3), 1031–1068. https://doi.org/10.1007/s11192-017-2574-9.

    Article  Google Scholar 

  11. Jung, J. J. (2012). Online named entity recognition method for microtexts in social networking services: A case study of twitter. Expert Systems with Applications,39(9), 8066–8070. https://doi.org/10.1016/j.eswa.2012.01.136.

    Article  Google Scholar 

  12. Konkol, M., Brychc´ın, T., & Konop´ık, M. (2015). Latent semantics in named entity recognition. Expert Systems with Applications,42, 3470–3479.

    Article  Google Scholar 

  13. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM,60(6), 84–90. https://doi.org/10.1145/3065386.

    Article  Google Scholar 

  14. Lerner, A. J., Ogrocki, P. K., & Thomas, P. J. (2009). Thomas network graph analysis of category fluency testing. Cognitive and Behavioral Neurology,22(1), 45–52. https://doi.org/10.1097/wnn.0b013e318192ccaf.

    Article  Google Scholar 

  15. Li, J., Rao, Y., Jin, F., Chen, H., & Xiang, X. (2016). Multi-label maximum entropy model for social emotion classification over short text. Neurocomputing,210, 247–256. https://doi.org/10.1016/j.neucom.2016.03.088.

    Article  Google Scholar 

  16. Li, S., Sun, Y., & Soergel, D. (2016). A new method for automatically constructing domain-oriented term taxonomy based on weighted word co-occurrence analysis. Scientometrics,108(2), 1005. https://doi.org/10.1007/s11192-016-1832-6.

    Article  Google Scholar 

  17. Lu, Y., & Qin, X. (2014). A coupled K-nearest neighbour and Bayesian neural network model for daily rainfall downscaling. International Journal of Climatology,34(11), 3221–3236. https://doi.org/10.1002/joc.3906.

    Article  Google Scholar 

  18. Luhn, H. (1958). The automatic creation of literature abstracts. IBM Journal of Research and Development,2(2), 159–165. https://doi.org/10.1147/rd.22.0159.

    Article  Google Scholar 

  19. Magerman, T., Bart, L. V., & Song, X. (2010). Exploring the feasibility and accuracy of latent semantic analysis based text mining techniques to detect similarity between patent documents and scientific publications. Scientometrics,82(2), 289–306. https://doi.org/10.1007/s11192-009-0046-6.

    Article  Google Scholar 

  20. Onan, A., Korukoğlu, S., & Bulut, H. (2016). Ensemble of keyword extraction methods and classifiers in text classification. Expert Systems with Applications,57, 232–247. https://doi.org/10.1016/j.eswa.2016.03.045.

    Article  Google Scholar 

  21. Robertson, S. (2004). Understanding inverse document frequency: On theoretical arguments for IDF. Journal of Documentation,60(5), 503–520. https://doi.org/10.1108/00220410410560582.

    Article  Google Scholar 

  22. Round, G., & Roper, S. (2015). Untangling the brand name from the branded entity: The conceptualisation and value of the established brand name. European Journal of Marketing,49(11/12), 1941–1960. https://doi.org/10.1108/EJM-09-2014-0541.

    Article  Google Scholar 

  23. Sahin, G. (2017). Turkish document classification based on word2vec and SVM classifier. In 2017 25th signal processing and communications applications conference (SIU), pp.1–4.

  24. Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks,61, 85–117. https://doi.org/10.1016/j.neunet.2014.09.003.

    Article  Google Scholar 

  25. Seker, G. A., & Eryiğit, G. (2017). Extending a CRF-based named entity recognition model for Turkish well formed text and user generated content. Semantic Web,8(5), 625–642. https://doi.org/10.3233/sw-170253.

    Article  Google Scholar 

  26. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp. 1–9.

  27. Van Tran, C., Nguyen, T. T., Hoang, D. T., Hwang, D., & Nguyen, N. T. (2017). Active learning-based approach for named entity recognition on short text streams. In A. Zgrzywa, K. Choroś, & A. Siemiński (Eds.), Multimedia and Network Information Systems (pp. 321–330). Cham: Springer.

    Chapter  Google Scholar 

  28. Wang, Y., Nelissen, N., Adamczuk, K., De Weer, A. S., Vandenbulcke, M., Sunaert, S., et al. (2014). Reproducibility and robustness of graph measures of the associative-semantic network. PLoS ONE,9(12), 1–28. https://doi.org/10.1371/journal.pone.0115215.

    Article  Google Scholar 

  29. Wei, D., Wang, B., Lin, G., Liu, D., Dong, Z., Liu, H., et al. (2017). Research on unstructured text data mining and fault classification based on RNN-LSTM with malfunction inspection report. Energies,10(3), 406. https://doi.org/10.3390/en10030406.

    Article  Google Scholar 

  30. Wu, H., & Salton, G. (1981). A comparison of search term weighting: Term relevance vs. inverse document frequency. In Proceedings of the 4th annual international ACM SIGIR conference on information storage and retrieval: Theoretical issues in information retrieval, SIGIR’81, pp. 30–39. ACM, New York, NY, USA. https://doi.org/10.1145/511754.511759.

  31. Zhu, Q., Li, X., Conesa, A., & Pereira, C. (2018). GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text. Bioinformatics,34(9), 1547–1554. https://doi.org/10.1093/bioinformatics/btx815.

    Article  Google Scholar 

  32. Zhuang, Y., Chang, X., Qian, Y., & Yu, K. (2016) Unrestricted vocabulary keyword spotting using LSTM-CTC. In 17th annual conference of the international-speech-communication-association (INTERSPEECH 2016), pp. 938–942. https://doi.org/10.21437/interspeech.2016-753.

Download references

Acknowledgements

The authors are grateful for the helpful suggestions from the anonymous reviewers. This research was supported by the National Natural Science Foundation of China (No. 71373123) and the Fundamental Research Funds for the Central Universities (No. NW2018004).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Ma.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Among the ten digits in the HSCode, the first six digits are the codes of international common standards for dividing all international trade commodities into 22 sections and 98 chapters, and then last four digits are individually coded by countries according to their own conditions. Each chapter includes the commodity heading and subheading.

The Fig. 11 shows the HSCode and the commodity information, and can be described as follows:

Fig. 11
figure 11

Customs Export Declaration Form of the People’s Republic of China

  • The ten numbers 9603210000 in the red box: 96 represents the chapter, 03 represents the heading, 21 represents the subheading and the 0000 coded by the China Customs.

  • The information in the blue box is commodity name ‘toothbrush’.

  • The items in the black box: ‘cleaning’ (function of commodity), ‘polypropylene/nylon’ (constituents of commodity) and the ‘LION brand’ (commodity brand).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Luo, Y., Ma, J. & Li, C. Entity name recognition of cross-border e-commerce commodity titles based on TWs-LSTM. Electron Commer Res 20, 405–426 (2020). https://doi.org/10.1007/s10660-019-09371-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10660-019-09371-6

Keywords

Navigation