skip to main content
10.1145/2043674.2043685acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicimcsConference Proceedingsconference-collections
research-article

Keyword extraction based on sequential pattern mining

Published: 05 August 2011 Publication History

Abstract

Keyword extraction is to automatically extract keywords that capture the main topic discussed in a given document. In this paper, a new keyword extraction algorithm based on sequential patterns is proposed. By preprocessing, a document is represented as sequences of words where a sequential pattern mining algorithm is applied on, and important sequential patterns are mined that reflect the semantic relatedness between words. Both statistical features and pattern features within words are used to build the keyword extraction model. The algorithm is independent of languages and does not need the help of a semantic dictionary to get the semantic features. Experimental results on Chinese journal articles show that the proposed algorithm always outperforms the baseline method KEA.

References

[1]
Luhn, H. P. 1957. A statistical approach to the mechanized encoding and searching of literary information. IBM Journal of Research and Development.1 (4): 309--317.
[2]
Li, J., Fan, Q. and Zhang, K. 2007. Keyword extraction based on TF/IDF for Chinese news document. Wuhan University Journal of Natural Sciences.12 (5): 917--921.
[3]
Ma, Y., Wang, Y., Su, G. and Zhang, Y. 2003. A novel Chinese text subject extraction method based on character co-occurrence. Journal of Computer Research and Development. 40(6): 874--878.
[4]
Zhao, P., Cai Q., Wang, Q. and Geng, H. 2007. An automatic keyword extraction of Chinese document algorithm based on complex network features. Pattern Recognition and Artificial Intelligence. 20(6): 827--831.
[5]
Li, X., Wu, X., Hu, X., Xie, F. and Jiang, Z. 2008. Keyword extraction based on lexical chains and word co-occurrence for Chinese news web pages. In Proceedings of ICDM Workshops 2008 (December 15--19, 2008). Pisa, Italy, 744--751.
[6]
Ercan, G. and Cicekli, I. 2007. Using lexical chains for keyword extraction. Information Processing and Management: An International Journal. 43(6): 1705--1714.
[7]
Medelyan, O. and Witten, I. H. 2006. Thesaurus based automatic keyphrase indexing. In Proceedings of the Joint Conference on Digital libraries. 296--297.
[8]
Turny, P. D. 2003. Coherent keyphrase extraction via web mining. In Proceedings of the 8th International Joint Conference on Artificial Intelligence. Acapulco, Mexico. 434--439.
[9]
Barker, K. N. and Cornacchia, N. 2000. Using noun phrase heads to extract document keyphrases. In Canadian Conference on Artificial Intelligence. 40--52.
[10]
Steier, A. M., Belew, R. K. 1993. Exporting phrases: a statistical analysis of topical language, In Proceedings of Second Symposium on Document Analysis and Information Retrieval. 179--190.
[11]
Mihalcea, R. and Tarau, P. 2004. TextRank: Bringing order into texts, In Proceedings of EMNLP. Barcelona, Spain. 404--411.
[12]
Wang, J., Liu, J. and Wang, C. 2007. Keyword extraction based on PageRank. In Proceedings of the 11th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Nanjing, China. 857--864.
[13]
Matsuo, Y. and Ishizuka, M. 2004. Keyword extraction from a single document using word co-occurrence statistical information. International Journal on Artificial Intelligence Tools. 13(1): 157--169.
[14]
Turney, P. D. 1999. Learning to extract keyphrases from text. NRC Technical Report ERB-1057. National Research Council, Institute for Information Technology, 1--43, Canada.
[15]
Frank, E., Paynter, G. W. and Witten, I. H. 1999. Domain-Specific keyphrase extraction. In Proceedings of the 16th International Joint Conference on Artificial Intelligence. Stockholm, Sweden, Morgan Kaufmann. 668--673.
[16]
Agrawal, R. and Srikant, R. 1995. Mining sequential patterns. In Proc. IEEE ICDE'95(Mar. 1995). Taipei, Taiwan. 3--14.
[17]
Srikant, R. and Agrawal, R. 1996. Mining sequential pat terns: Generalizations and performance improvements. In Proc. of the 5th International Conference on Extending Database Technology. Avignon.
[18]
Zaki, M. J. 2001. SPADE: An efficient algorithm for mining frequent sequences. In International Conference on Machine Learning. vol. 42, 31--60.
[19]
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U. and Hsu, M. 2001. PrefixSpan mining sequential patterns efficiently by prefix projected pattern growth. In ICDE 2001. 215--226, Heidelberg, Germany.
[20]
Ayres, J., Flannick, J., Gehrke, J. and Yiu, T. 2002. Sequential PAttern mining using a bitmap representation. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (July 23--26, 2002). Edmonton, Alberta, Canada.
[21]
Ji X., Bailey, J. and Dong, G. 2005. Mining minimal distinguishing subsequence patterns with gap constraints. In Proc. IEEE ICDM (Nov, 2005). Houston, Texas, USA. 194--201.
[22]
Yan, X., Han, J., and Afshar, R. 2003. CloSpan: mining closed sequential patterns in large datasets. In Proceedings of SIAM International Conference on Data Mining, San Francisco, CA, USA.166--177.
[23]
Zhang, M., Kao, B., Cheung, D. and Yip, K. 2005. Mining periodic patterns with gap requirement from sequences. In Proc. ACM SIGMOD'05. Baltimore Maryland.
[24]
Denicia-Carral, C., Montes-y-Gómez, M., Villaseñor-Pineda, L. and García-Hernández, R. 2006. A text mining approach for definition question answering. Natural Language Processing Lecture Notes in Computer Science. Volume 4139/2006, 76--86.
[25]
Coyotl-Morales, R., Villaseñor-Pineda L., Montes-y-Gómez, M. and Rosso, P. 2006. Authorship attribution using word sequences. In Progress in Pattern Recognition, Image Analysis and Applications Lecture Notes in Computer Science. Volume 4225/2006, 844--853.
[26]
Zhang, H., Yu, H., Xiong, D. and Liu, Q. 2003. HHMM-based Chinese lexical analyzer ICTCLAS. In proceedings of 2nd SigHan Worksho p. 184--187.
[27]
Lovins, J. B. 1968. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics 11. 22--31.
[28]
Haykin, S. 1999. Neural networks: a comprehensive foundation {J}. Computaciony Sistemas. 4(2): 188--190.
[29]
http://www.nlp.org.cn/categories/default.php?cat_id=16
[30]
Wang, M., Hua, X., Tang, J., and Hong, R. 2009. Beyond distance measurement: constructing neighborhood similarity for video annotation. IEEE Transactions on Multimedia, vol. 11, no. 3, 2009.

Cited By

View all
  • (2024)A Case Study for Language-Free Keyword Extraction with Statistical and Graph-Based Features2024 6th International Conference on Computing and Informatics (ICCI)10.1109/ICCI61671.2024.10485164(518-521)Online publication date: 6-Mar-2024
  • (2021)Supervised sequential pattern mining of event sequences in sport to identify important patterns of play: An application to rugby unionPLOS ONE10.1371/journal.pone.025632916:9(e0256329)Online publication date: 23-Sep-2021
  • (2020)FLAKE: Fuzzy Graph Centrality-based Automatic Keyword ExtractionThe Computer Journal10.1093/comjnl/bxaa13365:4(926-939)Online publication date: 5-Dec-2020
  • Show More Cited By

Index Terms

  1. Keyword extraction based on sequential pattern mining

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      ICIMCS '11: Proceedings of the Third International Conference on Internet Multimedia Computing and Service
      August 2011
      208 pages
      ISBN:9781450309189
      DOI:10.1145/2043674
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      • Sichuan University
      • Chinese Academy of Sciences
      • SCF: Sichuan Computer Federation
      • Southwest Jiaotong University
      • Beijing ACM SIGMM Chapter

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 05 August 2011

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. keyword extraction
      2. pattern features
      3. sequential pattern mining

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      ICIMCS '11
      Sponsor:
      • SCF

      Acceptance Rates

      Overall Acceptance Rate 163 of 456 submissions, 36%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)7
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 28 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)A Case Study for Language-Free Keyword Extraction with Statistical and Graph-Based Features2024 6th International Conference on Computing and Informatics (ICCI)10.1109/ICCI61671.2024.10485164(518-521)Online publication date: 6-Mar-2024
      • (2021)Supervised sequential pattern mining of event sequences in sport to identify important patterns of play: An application to rugby unionPLOS ONE10.1371/journal.pone.025632916:9(e0256329)Online publication date: 23-Sep-2021
      • (2020)FLAKE: Fuzzy Graph Centrality-based Automatic Keyword ExtractionThe Computer Journal10.1093/comjnl/bxaa13365:4(926-939)Online publication date: 5-Dec-2020
      • (2019)A knowledge construction methodology to automate case‐based learning using clinical documentsExpert Systems10.1111/exsy.1240137:1Online publication date: 10-Apr-2019
      • (2018)A Flexible Keyphrase Extraction Technique for Academic LiteratureProcedia Computer Science10.1016/j.procs.2018.08.208135(553-563)Online publication date: 2018
      • (2017)Data-Driven Job Search Engine Using Skills and Company Attribute Filters2017 IEEE International Conference on Data Mining Workshops (ICDMW)10.1109/ICDMW.2017.33(199-206)Online publication date: Nov-2017
      • (2017)A heuristics-based keyword and phrase ranking in the text corpus for question answering systems2017 20th International Conference of Computer and Information Technology (ICCIT)10.1109/ICCITECHN.2017.8281773(1-6)Online publication date: Dec-2017
      • (2017)Keyphrase Extraction Using Sequential Pattern Mining and Entropy2017 IEEE International Conference on Big Knowledge (ICBK)10.1109/ICBK.2017.20(88-95)Online publication date: Aug-2017
      • (2015)Mining Itemset-based Distinguishing Sequential Patterns with Gap ConstraintDatabase Systems for Advanced Applications10.1007/978-3-319-18120-2_3(39-54)Online publication date: 9-Apr-2015
      • (2014)Detecting of PIU Behaviors Based on Discovered Generators and Emerging Patterns from Computer-Mediated Interaction EventsWeb-Age Information Management10.1007/978-3-319-08010-9_31(277-293)Online publication date: 2014
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media