Recent advances on NLP research in Harbin Institute of Technology

Zhao, Tiejun; Guan, Yi; Liu, Ting; Wang, Qiang

doi:10.1007/s11704-007-0040-1

Recent advances on NLP research in Harbin Institute of Technology

Review Article
Published: October 2007

Volume 1, pages 413–428, (2007)
Cite this article

Frontiers of Computer Science in China Aims and scope Submit manuscript

Zhao Tiejun¹,
Guan Yi¹,
Liu Ting¹ &
…
Wang Qiang¹

133 Accesses
4 Citations
Explore all metrics

Abstract

In the 1960s, the researchers of Harbin Institute of Technology (HIT) attempted to do relevant research on natural language processing. With more than 40-year’s effort, HIT has already established three research laboratories for Chinese information processing, i.e. the Machine Intelligence and Translation Laboratory (MI&T Lab), the Intelligent Technology and Natural Language Processing Laboratory (ITNLP) and the Information Retrieval Laboratory (IR-Lab). At present, it has a well-balanced research team of over 200 persons, and the research interests have extended to language processing, machine translation, text retrieval and other fields. Harbin Institute of Technology has accumulated a batch of key techniques and data resources, won many prizes in the technical evaluations at home and abroad. Harbin Institute of Technology has become one of the most important natural language processing bases for teaching and scientific research in China now. This paper gives an introduction to the achievements on NLP in HIT.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Zhang M, Li S, Zhao T J, et al. Research on CEMT-III machine translation system from Chinese to English. Journal of the China Society for Scientific and Technical Information, 1994, 13(1): 50–63
Google Scholar
Xun E D, Zhao T J. The selection method of translated text in BT863-II English-Chinese machine translation system. Journal of the China Society for Scientific and Technical Information, 1999, 18(2): 99–104
Google Scholar
Jiang W, Wang X L, Guan Y, et al. Research on Chinese lexical analysis by fusing multiple knowledge sources. Chinese Journal of Computer, 2007, 30(1): 137–145
Google Scholar
Jiang W, Wang X L, Guan Y, et al. Applying rough sets in word segmentation disambiguation based on maximum entropy model. Journal of Harbin Institute of Technology (New Series), 2006, 13(1): 94–98
Google Scholar
Jiang W, Guan Y, Wang X L. A pragmatic Chinese word segmentation approach based on mixing models. International Journal of Computational Linguistics and Chinese Language Processing, 2006, 11(4): 393–416
Google Scholar
Jiang W, Guan Y, Wang X L. Improving feature extraction in named entity recognition based on maximum entropy model. In: The 2006 International Conference on Machine Learning and Cybernetics. Dalian, 2006, 4: 2630–2635
Article Google Scholar
Jiang W, Guan Y, Wang X L. Conditional random fields based label sequence and information feedback. Lecture Notes in Artificial Intelligence, 2006, 4114: 667–689
Google Scholar
Sun H L, Yu S W. The survey of shallow parsing. The present linguistics, 2000, 63–73
Sang T K. Introduction to the CoNLL-2000 shared task: chunking. In: Proceedings of CoNLL-2000 and LLL-2000 conference. Lisbon, 2000, 127–132
Liang Y H. Research on the technology of English and Chinese text chunking based on multi-agent strategy. Dissertation for the Doctoral Degree. Harbin: HIT, 2006
Google Scholar
Liang Y H, Zhao T J. Distributed English text chunking using multi-agent based architecture. In: Proceedings of Mexican International Conference on Artificial Intelligence. Mexican, 2005, 752–760
Cao H L. Research on Chinese syntactic parsing based on lexicalized statistical model. Dissertation for the Doctoral Degree. Harbin: HIT, 2006
Google Scholar
Han X W. Research on automatic acquisition of Chinese verb subcategorization. Dissertation for the Doctoral Degree. Harbin: HIT, 2006
Google Scholar
Cunningham H, Maynard D, Bontcheva K, et al. Gate: a frame-work and graphical development environment for robust NLP tools and applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics. Philadelphia, 2002, 168–175
Steven B, Edward L. NLTK: the natural language toolkit. In: Proceedings of the ACL demonstration session. Barcelona: Association for Computational Linguistics, 2004, 214–217
Google Scholar
Kevin K, Hatzivassiloglou V. Two-level, many-paths generation. In: Proceedings of the 33rd annual meeting on Association for Computational Linguistics. Cambridge: Association for Computational Linguistics, 1995, 252–260
Google Scholar
Zhang H P, Liu T, Ma J S, et al. Chinese word segmentation with multiple postprocessors in HIT-IRLab. SIGHAN, 2005, 172–175
Liao X T, Yu H B, Qin B, et al. Named entity recognition with the method of combining HMM with rules. In: Proceedings of Student Workshop of Computational Linguistics. Shen Yang, 2006 (in Chinese)
Lu Z M, Liu T, Zhang G, et al. Word sense disambiguation based on dependency relation ship analysis and Bayes model. Chinese High Technology Letters, 2003, (5): 1–7 (in Chinese)
Ma J S, Zhang Y, Liu T, et al. A statistical dependency parser of Chinese under small training data. In: The 1st International Joint Conference of Natural Language Processing, 2004
Liu H J, Che W X, Liu T. Feature engineering for Chinese semantic role labeling. In: Proceedings of Student Workshop of Computational Linguistics. Shen Yang, 2006, 75–80 (in Chinese)
Lewis D D, Li F, Rose T, et al. RCV1: a new benchmark collection for text categorization research. Journal of Machine Learning Research, 2004, 5(3): 361–397
Google Scholar
Ana C C, Arlindo L O. An empirical comparison of text categorization methods. In: Proceedings of the 10th International Symposium on String Processing and Information Retrieval. Manaus, 2003, 183–196
Fabrizio Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, 2002, 34(1): 1–47
Article Google Scholar
Su J S, Zhang B F, Xu X. Advances in machine learning based text categorization. Journal of Software, 2006, 17(9): 1848–1859
Article MATH Google Scholar
Wang Q, Guan Y, Wang X L, et al. A novel feature selection method based on category information analysis for class prejudging in text classification. International Journal of Computer Science and Network Security, 2006, 6(1): 113–119
Google Scholar
Wang Q, Wang X L, Guan Y. A study of semi-discrete matrix decomposition for LSI in automated text categorization. In: Lecture Notes in Artificial Intelligence (IJCNLP04). Berlin: Springer Verlag, 2005, 3248: 606–615
Google Scholar
Wang Q, Wang X L Guan Y, et al. A research on text categorization based on the fusion of K-NN and SVM. Chinese High Technology Letters, 2005, (5): 19–24
Wang Q, Guan Y, Wang X L. A method for eliminating class noise in text classification based on feature class attribute. ACTA Automatica Sinica (in press)
Wang Q, Guan Y, Wang X L. Explore efficient feature inference and compensation in text classification. Journal of Chinese Language and Computing, 2006, 16(3): 145–156
MathSciNet Google Scholar
Wang Q, Guan Y, Wang X. SVM-based spam filter with active and online learning. In: Proceedings of the Fifteenth Text Retrieval Conference (TREC 2006). Gaithersburg, 2006
Guan Y, Wang X L, Wang Q. Measurement of system similarity. In: Proceeding of JSCL-2005. Nanjing, 2005
Sun C J, Guan Y, Wang X L, et al. Insuntourqa: a restricted-domain question answering system. Journal of Computational Information Systems, 2007, 3(4): 1581–1590
Google Scholar
Zhao Y M, Xu Z M, Guan Y, et al. An open domain question answering system based on improved system similarity model. In: Proceedings of ICMLC 2006. Dalian, 2006, 4521–4526
Zhao Y M, Xu Z M, Guan Y, et al. Insun05QA on QA track of TREC2005. In: Proceeding of the Text REtrieval Conference 2005. Gettysburg, 2005
Zhao Y M, Xu Z M, Li P, et al. InsunQA06 on QA track of TREC2006. In: Proceeding of the Text REtrieval Conference 2006. Gettysburg, 2006
Li P, Wang X L, Guan Y, et al. Extracting answers to natural language questions from large-scale corpus. In: Proceedings of 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering. Wuhan, 2005, 690–694
Li P, Wang X L, Guan Y, et al. Answer extraction based on system similarity model and stratified sampling logistic regression in rare date. International Journal of Computer Science and Network Security, 2006, 6(3): 189–196
Google Scholar
Mei J J. TongYiCi CiLin. Shanghai: Shanghai Cishu Publication Company, 1996
Google Scholar
Dong Z D, Dong Q, Hownet. http://www.keenage.com
Sun J D, Zhao T J, Liang H S. Meta-structure transformation model for statistical machine translation. In: Proceedings of the Second Workshop on Statistical Machine Translation. Prague: Association for Computational Linguistics, 2007, 64–71
Google Scholar
Xue Y Z, Li S, Zhao T J, et al. Syntax-based reordering model for phrasal statistical machine translation. Journal on Communications (to appear)
Coyne B, Sproat R. WordsEye: An automatic text-to-scene conversion system. In: Proceedings of the Annual Conference on Computer Graphics. Los Angeles: ACM Press, 2001, 487–496
Google Scholar
Carsim. http://nlp.cs.lth.se/carsim/run/webstart/complete 2006
Anim N L. http://www.cis.upenn.edu/:_cliff-group/94/animnl.html, 2007
Lu R Q, Zhang S M. From story to animation: full life cycle computer aided animation generation. ACTA Automatica Sinica, 2002, 28(3): 321–348
Google Scholar
Li H J, Zhao T J, Li S, et al. The extraction of spatial relationships from texts based on hybrid method. In: Proceedings of 2006 IEEE International Conference on Information Acquisition. Weihai, 2006
Li H J, Zhao T J, Li S. Graphic retrieval based on limited semantics. In: The IEEE International Conference on Natural Language Processing and Knowledge Engineering. Wuhan, 2005, 535–539
Liu N S. Expression of the spatial relationship of objects in Chinese. Chinese Language, 1994, (3): 169–179
Li H J, Zhao T J, Zhao J Y. The extraction of the trajectory from text based on linear classfication. In: The 16th Nordic Conference of Computational Linguistics. Tartu, Estonia, 2007

Download references

Author information

Authors and Affiliations

MOE-MS Key Laboratory of Natural Language Processing and Speech, Harbin Institute of Technology, Harbin, 150001, China
Zhao Tiejun, Guan Yi, Liu Ting & Wang Qiang

Authors

Zhao Tiejun
View author publications
Search author on:PubMed Google Scholar
Guan Yi
View author publications
Search author on:PubMed Google Scholar
Liu Ting
View author publications
Search author on:PubMed Google Scholar
Wang Qiang
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Zhao Tiejun.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, T., Guan, Y., Liu, T. et al. Recent advances on NLP research in Harbin Institute of Technology. Front. Comput. Sc. China 1, 413–428 (2007). https://doi.org/10.1007/s11704-007-0040-1

Download citation

Received: 12 July 2007
Accepted: 23 September 2007
Issue Date: October 2007
DOI: https://doi.org/10.1007/s11704-007-0040-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recent advances on NLP research in Harbin Institute of Technology

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Better Qualitative Searching for Effecting the Performance of Machine Translation

Role of Intelligent Techniques in Natural Language Processing: An Empirical Study

Chinese Comprehensive Language Knowledge Base

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Recent advances on NLP research in Harbin Institute of Technology

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Better Qualitative Searching for Effecting the Performance of Machine Translation

Role of Intelligent Techniques in Natural Language Processing: An Empirical Study

Chinese Comprehensive Language Knowledge Base

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now