Abstract
In the 1960s, the researchers of Harbin Institute of Technology (HIT) attempted to do relevant research on natural language processing. With more than 40-year’s effort, HIT has already established three research laboratories for Chinese information processing, i.e. the Machine Intelligence and Translation Laboratory (MI&T Lab), the Intelligent Technology and Natural Language Processing Laboratory (ITNLP) and the Information Retrieval Laboratory (IR-Lab). At present, it has a well-balanced research team of over 200 persons, and the research interests have extended to language processing, machine translation, text retrieval and other fields. Harbin Institute of Technology has accumulated a batch of key techniques and data resources, won many prizes in the technical evaluations at home and abroad. Harbin Institute of Technology has become one of the most important natural language processing bases for teaching and scientific research in China now. This paper gives an introduction to the achievements on NLP in HIT.
Similar content being viewed by others
References
Zhang M, Li S, Zhao T J, et al. Research on CEMT-III machine translation system from Chinese to English. Journal of the China Society for Scientific and Technical Information, 1994, 13(1): 50–63
Xun E D, Zhao T J. The selection method of translated text in BT863-II English-Chinese machine translation system. Journal of the China Society for Scientific and Technical Information, 1999, 18(2): 99–104
Jiang W, Wang X L, Guan Y, et al. Research on Chinese lexical analysis by fusing multiple knowledge sources. Chinese Journal of Computer, 2007, 30(1): 137–145
Jiang W, Wang X L, Guan Y, et al. Applying rough sets in word segmentation disambiguation based on maximum entropy model. Journal of Harbin Institute of Technology (New Series), 2006, 13(1): 94–98
Jiang W, Guan Y, Wang X L. A pragmatic Chinese word segmentation approach based on mixing models. International Journal of Computational Linguistics and Chinese Language Processing, 2006, 11(4): 393–416
Jiang W, Guan Y, Wang X L. Improving feature extraction in named entity recognition based on maximum entropy model. In: The 2006 International Conference on Machine Learning and Cybernetics. Dalian, 2006, 4: 2630–2635
Jiang W, Guan Y, Wang X L. Conditional random fields based label sequence and information feedback. Lecture Notes in Artificial Intelligence, 2006, 4114: 667–689
Sun H L, Yu S W. The survey of shallow parsing. The present linguistics, 2000, 63–73
Sang T K. Introduction to the CoNLL-2000 shared task: chunking. In: Proceedings of CoNLL-2000 and LLL-2000 conference. Lisbon, 2000, 127–132
Liang Y H. Research on the technology of English and Chinese text chunking based on multi-agent strategy. Dissertation for the Doctoral Degree. Harbin: HIT, 2006
Liang Y H, Zhao T J. Distributed English text chunking using multi-agent based architecture. In: Proceedings of Mexican International Conference on Artificial Intelligence. Mexican, 2005, 752–760
Cao H L. Research on Chinese syntactic parsing based on lexicalized statistical model. Dissertation for the Doctoral Degree. Harbin: HIT, 2006
Han X W. Research on automatic acquisition of Chinese verb subcategorization. Dissertation for the Doctoral Degree. Harbin: HIT, 2006
Cunningham H, Maynard D, Bontcheva K, et al. Gate: a frame-work and graphical development environment for robust NLP tools and applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics. Philadelphia, 2002, 168–175
Steven B, Edward L. NLTK: the natural language toolkit. In: Proceedings of the ACL demonstration session. Barcelona: Association for Computational Linguistics, 2004, 214–217
Kevin K, Hatzivassiloglou V. Two-level, many-paths generation. In: Proceedings of the 33rd annual meeting on Association for Computational Linguistics. Cambridge: Association for Computational Linguistics, 1995, 252–260
Zhang H P, Liu T, Ma J S, et al. Chinese word segmentation with multiple postprocessors in HIT-IRLab. SIGHAN, 2005, 172–175
Liao X T, Yu H B, Qin B, et al. Named entity recognition with the method of combining HMM with rules. In: Proceedings of Student Workshop of Computational Linguistics. Shen Yang, 2006 (in Chinese)
Lu Z M, Liu T, Zhang G, et al. Word sense disambiguation based on dependency relation ship analysis and Bayes model. Chinese High Technology Letters, 2003, (5): 1–7 (in Chinese)
Ma J S, Zhang Y, Liu T, et al. A statistical dependency parser of Chinese under small training data. In: The 1st International Joint Conference of Natural Language Processing, 2004
Liu H J, Che W X, Liu T. Feature engineering for Chinese semantic role labeling. In: Proceedings of Student Workshop of Computational Linguistics. Shen Yang, 2006, 75–80 (in Chinese)
Lewis D D, Li F, Rose T, et al. RCV1: a new benchmark collection for text categorization research. Journal of Machine Learning Research, 2004, 5(3): 361–397
Ana C C, Arlindo L O. An empirical comparison of text categorization methods. In: Proceedings of the 10th International Symposium on String Processing and Information Retrieval. Manaus, 2003, 183–196
Fabrizio Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, 2002, 34(1): 1–47
Su J S, Zhang B F, Xu X. Advances in machine learning based text categorization. Journal of Software, 2006, 17(9): 1848–1859
Wang Q, Guan Y, Wang X L, et al. A novel feature selection method based on category information analysis for class prejudging in text classification. International Journal of Computer Science and Network Security, 2006, 6(1): 113–119
Wang Q, Wang X L, Guan Y. A study of semi-discrete matrix decomposition for LSI in automated text categorization. In: Lecture Notes in Artificial Intelligence (IJCNLP04). Berlin: Springer Verlag, 2005, 3248: 606–615
Wang Q, Wang X L Guan Y, et al. A research on text categorization based on the fusion of K-NN and SVM. Chinese High Technology Letters, 2005, (5): 19–24
Wang Q, Guan Y, Wang X L. A method for eliminating class noise in text classification based on feature class attribute. ACTA Automatica Sinica (in press)
Wang Q, Guan Y, Wang X L. Explore efficient feature inference and compensation in text classification. Journal of Chinese Language and Computing, 2006, 16(3): 145–156
Wang Q, Guan Y, Wang X. SVM-based spam filter with active and online learning. In: Proceedings of the Fifteenth Text Retrieval Conference (TREC 2006). Gaithersburg, 2006
Guan Y, Wang X L, Wang Q. Measurement of system similarity. In: Proceeding of JSCL-2005. Nanjing, 2005
Sun C J, Guan Y, Wang X L, et al. Insuntourqa: a restricted-domain question answering system. Journal of Computational Information Systems, 2007, 3(4): 1581–1590
Zhao Y M, Xu Z M, Guan Y, et al. An open domain question answering system based on improved system similarity model. In: Proceedings of ICMLC 2006. Dalian, 2006, 4521–4526
Zhao Y M, Xu Z M, Guan Y, et al. Insun05QA on QA track of TREC2005. In: Proceeding of the Text REtrieval Conference 2005. Gettysburg, 2005
Zhao Y M, Xu Z M, Li P, et al. InsunQA06 on QA track of TREC2006. In: Proceeding of the Text REtrieval Conference 2006. Gettysburg, 2006
Li P, Wang X L, Guan Y, et al. Extracting answers to natural language questions from large-scale corpus. In: Proceedings of 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering. Wuhan, 2005, 690–694
Li P, Wang X L, Guan Y, et al. Answer extraction based on system similarity model and stratified sampling logistic regression in rare date. International Journal of Computer Science and Network Security, 2006, 6(3): 189–196
Mei J J. TongYiCi CiLin. Shanghai: Shanghai Cishu Publication Company, 1996
Dong Z D, Dong Q, Hownet. http://www.keenage.com
Sun J D, Zhao T J, Liang H S. Meta-structure transformation model for statistical machine translation. In: Proceedings of the Second Workshop on Statistical Machine Translation. Prague: Association for Computational Linguistics, 2007, 64–71
Xue Y Z, Li S, Zhao T J, et al. Syntax-based reordering model for phrasal statistical machine translation. Journal on Communications (to appear)
Coyne B, Sproat R. WordsEye: An automatic text-to-scene conversion system. In: Proceedings of the Annual Conference on Computer Graphics. Los Angeles: ACM Press, 2001, 487–496
Carsim. http://nlp.cs.lth.se/carsim/run/webstart/complete 2006
Anim N L. http://www.cis.upenn.edu/:_cliff-group/94/animnl.html, 2007
Lu R Q, Zhang S M. From story to animation: full life cycle computer aided animation generation. ACTA Automatica Sinica, 2002, 28(3): 321–348
Li H J, Zhao T J, Li S, et al. The extraction of spatial relationships from texts based on hybrid method. In: Proceedings of 2006 IEEE International Conference on Information Acquisition. Weihai, 2006
Li H J, Zhao T J, Li S. Graphic retrieval based on limited semantics. In: The IEEE International Conference on Natural Language Processing and Knowledge Engineering. Wuhan, 2005, 535–539
Liu N S. Expression of the spatial relationship of objects in Chinese. Chinese Language, 1994, (3): 169–179
Li H J, Zhao T J, Zhao J Y. The extraction of the trajectory from text based on linear classfication. In: The 16th Nordic Conference of Computational Linguistics. Tartu, Estonia, 2007
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhao, T., Guan, Y., Liu, T. et al. Recent advances on NLP research in Harbin Institute of Technology. Front. Comput. Sc. China 1, 413–428 (2007). https://doi.org/10.1007/s11704-007-0040-1
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s11704-007-0040-1