Skip to main content

Advertisement

Log in

Recent advances on NLP research in Harbin Institute of Technology

  • Review Article
  • Published:
Frontiers of Computer Science in China Aims and scope Submit manuscript

Abstract

In the 1960s, the researchers of Harbin Institute of Technology (HIT) attempted to do relevant research on natural language processing. With more than 40-year’s effort, HIT has already established three research laboratories for Chinese information processing, i.e. the Machine Intelligence and Translation Laboratory (MI&T Lab), the Intelligent Technology and Natural Language Processing Laboratory (ITNLP) and the Information Retrieval Laboratory (IR-Lab). At present, it has a well-balanced research team of over 200 persons, and the research interests have extended to language processing, machine translation, text retrieval and other fields. Harbin Institute of Technology has accumulated a batch of key techniques and data resources, won many prizes in the technical evaluations at home and abroad. Harbin Institute of Technology has become one of the most important natural language processing bases for teaching and scientific research in China now. This paper gives an introduction to the achievements on NLP in HIT.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Zhang M, Li S, Zhao T J, et al. Research on CEMT-III machine translation system from Chinese to English. Journal of the China Society for Scientific and Technical Information, 1994, 13(1): 50–63

    Google Scholar 

  2. Xun E D, Zhao T J. The selection method of translated text in BT863-II English-Chinese machine translation system. Journal of the China Society for Scientific and Technical Information, 1999, 18(2): 99–104

    Google Scholar 

  3. Jiang W, Wang X L, Guan Y, et al. Research on Chinese lexical analysis by fusing multiple knowledge sources. Chinese Journal of Computer, 2007, 30(1): 137–145

    Google Scholar 

  4. Jiang W, Wang X L, Guan Y, et al. Applying rough sets in word segmentation disambiguation based on maximum entropy model. Journal of Harbin Institute of Technology (New Series), 2006, 13(1): 94–98

    Google Scholar 

  5. Jiang W, Guan Y, Wang X L. A pragmatic Chinese word segmentation approach based on mixing models. International Journal of Computational Linguistics and Chinese Language Processing, 2006, 11(4): 393–416

    Google Scholar 

  6. Jiang W, Guan Y, Wang X L. Improving feature extraction in named entity recognition based on maximum entropy model. In: The 2006 International Conference on Machine Learning and Cybernetics. Dalian, 2006, 4: 2630–2635

    Article  Google Scholar 

  7. Jiang W, Guan Y, Wang X L. Conditional random fields based label sequence and information feedback. Lecture Notes in Artificial Intelligence, 2006, 4114: 667–689

    Google Scholar 

  8. Sun H L, Yu S W. The survey of shallow parsing. The present linguistics, 2000, 63–73

  9. Sang T K. Introduction to the CoNLL-2000 shared task: chunking. In: Proceedings of CoNLL-2000 and LLL-2000 conference. Lisbon, 2000, 127–132

  10. Liang Y H. Research on the technology of English and Chinese text chunking based on multi-agent strategy. Dissertation for the Doctoral Degree. Harbin: HIT, 2006

    Google Scholar 

  11. Liang Y H, Zhao T J. Distributed English text chunking using multi-agent based architecture. In: Proceedings of Mexican International Conference on Artificial Intelligence. Mexican, 2005, 752–760

  12. Cao H L. Research on Chinese syntactic parsing based on lexicalized statistical model. Dissertation for the Doctoral Degree. Harbin: HIT, 2006

    Google Scholar 

  13. Han X W. Research on automatic acquisition of Chinese verb subcategorization. Dissertation for the Doctoral Degree. Harbin: HIT, 2006

    Google Scholar 

  14. Cunningham H, Maynard D, Bontcheva K, et al. Gate: a frame-work and graphical development environment for robust NLP tools and applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics. Philadelphia, 2002, 168–175

  15. Steven B, Edward L. NLTK: the natural language toolkit. In: Proceedings of the ACL demonstration session. Barcelona: Association for Computational Linguistics, 2004, 214–217

    Google Scholar 

  16. Kevin K, Hatzivassiloglou V. Two-level, many-paths generation. In: Proceedings of the 33rd annual meeting on Association for Computational Linguistics. Cambridge: Association for Computational Linguistics, 1995, 252–260

    Google Scholar 

  17. Zhang H P, Liu T, Ma J S, et al. Chinese word segmentation with multiple postprocessors in HIT-IRLab. SIGHAN, 2005, 172–175

  18. Liao X T, Yu H B, Qin B, et al. Named entity recognition with the method of combining HMM with rules. In: Proceedings of Student Workshop of Computational Linguistics. Shen Yang, 2006 (in Chinese)

  19. Lu Z M, Liu T, Zhang G, et al. Word sense disambiguation based on dependency relation ship analysis and Bayes model. Chinese High Technology Letters, 2003, (5): 1–7 (in Chinese)

  20. Ma J S, Zhang Y, Liu T, et al. A statistical dependency parser of Chinese under small training data. In: The 1st International Joint Conference of Natural Language Processing, 2004

  21. Liu H J, Che W X, Liu T. Feature engineering for Chinese semantic role labeling. In: Proceedings of Student Workshop of Computational Linguistics. Shen Yang, 2006, 75–80 (in Chinese)

  22. Lewis D D, Li F, Rose T, et al. RCV1: a new benchmark collection for text categorization research. Journal of Machine Learning Research, 2004, 5(3): 361–397

    Google Scholar 

  23. Ana C C, Arlindo L O. An empirical comparison of text categorization methods. In: Proceedings of the 10th International Symposium on String Processing and Information Retrieval. Manaus, 2003, 183–196

  24. Fabrizio Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, 2002, 34(1): 1–47

    Article  Google Scholar 

  25. Su J S, Zhang B F, Xu X. Advances in machine learning based text categorization. Journal of Software, 2006, 17(9): 1848–1859

    Article  MATH  Google Scholar 

  26. Wang Q, Guan Y, Wang X L, et al. A novel feature selection method based on category information analysis for class prejudging in text classification. International Journal of Computer Science and Network Security, 2006, 6(1): 113–119

    Google Scholar 

  27. Wang Q, Wang X L, Guan Y. A study of semi-discrete matrix decomposition for LSI in automated text categorization. In: Lecture Notes in Artificial Intelligence (IJCNLP04). Berlin: Springer Verlag, 2005, 3248: 606–615

    Google Scholar 

  28. Wang Q, Wang X L Guan Y, et al. A research on text categorization based on the fusion of K-NN and SVM. Chinese High Technology Letters, 2005, (5): 19–24

  29. Wang Q, Guan Y, Wang X L. A method for eliminating class noise in text classification based on feature class attribute. ACTA Automatica Sinica (in press)

  30. Wang Q, Guan Y, Wang X L. Explore efficient feature inference and compensation in text classification. Journal of Chinese Language and Computing, 2006, 16(3): 145–156

    MathSciNet  Google Scholar 

  31. Wang Q, Guan Y, Wang X. SVM-based spam filter with active and online learning. In: Proceedings of the Fifteenth Text Retrieval Conference (TREC 2006). Gaithersburg, 2006

  32. Guan Y, Wang X L, Wang Q. Measurement of system similarity. In: Proceeding of JSCL-2005. Nanjing, 2005

  33. Sun C J, Guan Y, Wang X L, et al. Insuntourqa: a restricted-domain question answering system. Journal of Computational Information Systems, 2007, 3(4): 1581–1590

    Google Scholar 

  34. Zhao Y M, Xu Z M, Guan Y, et al. An open domain question answering system based on improved system similarity model. In: Proceedings of ICMLC 2006. Dalian, 2006, 4521–4526

  35. Zhao Y M, Xu Z M, Guan Y, et al. Insun05QA on QA track of TREC2005. In: Proceeding of the Text REtrieval Conference 2005. Gettysburg, 2005

  36. Zhao Y M, Xu Z M, Li P, et al. InsunQA06 on QA track of TREC2006. In: Proceeding of the Text REtrieval Conference 2006. Gettysburg, 2006

  37. Li P, Wang X L, Guan Y, et al. Extracting answers to natural language questions from large-scale corpus. In: Proceedings of 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering. Wuhan, 2005, 690–694

  38. Li P, Wang X L, Guan Y, et al. Answer extraction based on system similarity model and stratified sampling logistic regression in rare date. International Journal of Computer Science and Network Security, 2006, 6(3): 189–196

    Google Scholar 

  39. Mei J J. TongYiCi CiLin. Shanghai: Shanghai Cishu Publication Company, 1996

    Google Scholar 

  40. Dong Z D, Dong Q, Hownet. http://www.keenage.com

  41. Sun J D, Zhao T J, Liang H S. Meta-structure transformation model for statistical machine translation. In: Proceedings of the Second Workshop on Statistical Machine Translation. Prague: Association for Computational Linguistics, 2007, 64–71

    Google Scholar 

  42. Xue Y Z, Li S, Zhao T J, et al. Syntax-based reordering model for phrasal statistical machine translation. Journal on Communications (to appear)

  43. Coyne B, Sproat R. WordsEye: An automatic text-to-scene conversion system. In: Proceedings of the Annual Conference on Computer Graphics. Los Angeles: ACM Press, 2001, 487–496

    Google Scholar 

  44. Carsim. http://nlp.cs.lth.se/carsim/run/webstart/complete 2006

  45. Anim N L. http://www.cis.upenn.edu/:_cliff-group/94/animnl.html, 2007

  46. Lu R Q, Zhang S M. From story to animation: full life cycle computer aided animation generation. ACTA Automatica Sinica, 2002, 28(3): 321–348

    Google Scholar 

  47. Li H J, Zhao T J, Li S, et al. The extraction of spatial relationships from texts based on hybrid method. In: Proceedings of 2006 IEEE International Conference on Information Acquisition. Weihai, 2006

  48. Li H J, Zhao T J, Li S. Graphic retrieval based on limited semantics. In: The IEEE International Conference on Natural Language Processing and Knowledge Engineering. Wuhan, 2005, 535–539

  49. Liu N S. Expression of the spatial relationship of objects in Chinese. Chinese Language, 1994, (3): 169–179

  50. Li H J, Zhao T J, Zhao J Y. The extraction of the trajectory from text based on linear classfication. In: The 16th Nordic Conference of Computational Linguistics. Tartu, Estonia, 2007

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhao Tiejun.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, T., Guan, Y., Liu, T. et al. Recent advances on NLP research in Harbin Institute of Technology. Front. Comput. Sc. China 1, 413–428 (2007). https://doi.org/10.1007/s11704-007-0040-1

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-007-0040-1

Keywords