Skip to main content
Log in

Product named entity recognition in Chinese text

  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

There are many expressive and structural differences between product names and general named entities such as person names, location names and organization names. To date, there has been little research on product named entity recognition (NER), which is crucial and valuable for information extraction in the field of market intelligence. This paper focuses on product NER (PRO NER) in Chinese text. First, we describe our efforts on data annotation, including well-defined specifications, data analysis and development of a corpus with annotated product named entities. Second, a hierarchical hidden Markov model-based approach to PRO NER is proposed and evaluated. Extensive experiments show that the proposed method outperforms the cascaded maximum entropy model and obtains promising results on the data sets of two different electronic product domains (digital and cell phone).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4-6

Similar content being viewed by others

Notes

  1. For purposes of clarity and precision, the singular forms “product named entity” and “named entity” are abbreviated “PRO NE” and “NE”, respectively, while the plural forms “product named entities” and “named entities” are abbreviated “PRO NEs” and “NEs,” respectively.

References

  • Aberdeen, J., et al. (1995). MITRE: Description of the ALEMBIC system used for MUC-6. In Proceedings of the Sixth Message Understanding Conference (MUC-6) (pp. 141–155).

  • Bick, E. (2004). A named entity recognizer for Danish. In Lino et al. (Eds.), Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC2004), Lisbon (pp. 305–308).

  • Bikel, D. M., Miller, S., Schwartz, R., & Weischedel, R. (1997). Nymble: A high-performance learning name-finder. In Proceedings of the Fifth Conference on Applied Natural Language Processing (pp. 194–201), ACL.

  • Borthwick, A. (1999). A maximum entropy approach to named entity recognition. PhD Dissertation. Computer Science Department, New York University.

  • Carletta, J. (1996). Assessing agreement on classification tasks: The Kappa statistic. Computational Linguistics, 22(2), 249–254.

    Google Scholar 

  • Collier, N., Nobata, C., & Tsujii, J. (2000). Extracting the names of genes and gene products with a hidden Markov model. In Proceedings of the 18th International Conference on Computational Linguistics (COLING’2000), Saarbrucken, Germany (pp. 201–207).

  • Fine, S., Singer, Y., & Tishby, N. (1998). The hierarchical hidden Markov model: Analysis and applications. Machine Learning, 32(1), 41–62.

    Article  Google Scholar 

  • Jelinek, F., & Mercer, E. L. (1980). Interpolated estimation of Markov source parameters from sparse data. In D. Gelsema & L. Kanal (Eds.), Pattern recognition in practice. North-Holland.

  • McCallum, A., Freitag, D., & Pereira, F. (2000). Maximum entropy Markov models for information extraction and segmentation. In Proceedings of the Seventeenth International Conference on Machine Learning (ICML-2000), Stanford, CA (pp. 591–598).

  • Niu, C., Li, W., Ding, J., & Srihari, R. K. (2003). A bootstrapping approach to named entity classification using successive learners. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL 2003) (Sapporo, pp. 335–342).

  • Pierre, J. M. (2002). Mining knowledge from text collections using automatically generated metadata. In Proceedings of Fourth International Conference on Practical Aspects of Knowledge Management (PAKM2002), Vienna (pp. 537–548).

  • Sekine, S., Grishman, R., & Shinou, H. (1998). A decision tree method for finding and classifying names in Japanese texts. In Proceedings of the Sixth Workshop on Very Large Corpora, Canada, http://www.cs.nyu.edu/~sekine/papers/wvlc98.pdf.

  • Sigel, S., & Castellan, N. J. (1988). Non-parametric statistics for the behavioral sciences (2nd ed.). McGraw-Hill.

  • Wu, Y., Zhao, J., & Xu, B. (2003). Chinese named entity recognition combining statistical model with human knowledge. In The Workshop attached with 41st ACL for Multilingual and Mix-language Named Entity Recognition: Combining Statistical and Symbolic Models, Sappora (pp. 65–72).

  • Xiong, D., Yu, H., & Liu, Q. (2004). Tagging complex NEs with Maxent models: Layered structures versus extended Tagset. In Proceedings of the First International Joint Conference on Natural Language Processing (IJCNLP-04), Sanya (pp. 638–643).

  • Yi, E., Lee, G. G., & Park, S.-J. (2004). SVM-based biological named entity recognition using minimum edit-distance feature boosted by virtual examples. In Proceedings of the First International Joint Conference on Natural Language Processing (IJCNLP-04), Sanya (pp. 22–24).

  • Yu, S., Duan, H., Zhu, X., Swen, B., & Chang, B. (2003). Word segmentation, POS tagging and phonetic notation. International Journal of The Chinese and Oriental Languages Information Processing Society, 13(2), 121–159.

    Google Scholar 

Download references

Acknowledgments

This work is supported by the National High Technology Development 863 Program of China under Grant No. 2006AA01Z144, the National Natural Science Foundation of China under Grant No. 60673042, and the Natural Science Foundation of Beijing under Grants No. 4052027 and 4073043. This research is also carried out as part of a cooperative project with Fujitsu R&D Center Co., Ltd. We would like to thank Dr. Hao YU, Dr. Yingju XIA, and Dr. Fumihito Nishino for helpful conversations and feedback on the corpus. We would like to thank Dr. Yang LIU of the University of Texas at Dallas, Dr. Ying ZHAO of Tsinghua University, and Mr. Matthew Trueman for their useful suggestions for modifying earlier drafts of the paper. We are grateful to the anonymous reviewers for very helpful comments on an earlier draft. Their insights and suggestions have led to many improvements in the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Zhao.

Additional information

This research was conducted under the framework of the Chinese Linguistic Data Consortium (ChineseLDC). In the first phase, ChineseLDC created a series of fundamental Chinese language resources, including Comprehensive Chinese Lexicon, Chinese Grammatical Knowledge Base (frequent words), Word-segmented and POS-tagged Chinese Corpus, Syntactic Treebank, Chinese–English Parallel Corpus, Chinese Semantic Lexicon, etc. Construction of the Product Named Entity Tagged Corpus and development of the Automatic Product Named Entity Recognition Tool are among the tasks of the second phase of ChineseLDC.

Appendix: Peking University’s TagSet for POS Tagging Chinese Texts (Yu et al. 2003)

Appendix: Peking University’s TagSet for POS Tagging Chinese Texts (Yu et al. 2003)

   

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, J., Liu, F. Product named entity recognition in Chinese text. Lang Resources & Evaluation 42, 197–217 (2008). https://doi.org/10.1007/s10579-008-9066-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-008-9066-8

Keywords

Navigation