skip to main content
10.1145/2987386.2987399acmconferencesArticle/Chapter ViewAbstractPublication PagesracsConference Proceedingsconference-collections
research-article

The Method of Semi-supervised Automatic Keyword Extraction for Web Documents using Transition Probability Distribution Generator

Authors Info & Claims
Published:11 October 2016Publication History

ABSTRACT

In this paper, a semi-supervised method for automatic keyword extraction of web documents using unconventional Markov Chain with conditional transition matrices for each distinct feature distributed by Transition Probability Distribution Generator (TPDG) is introduced. Since keywords are the set of the most appropriate and relevant words which define the context of the document precisely and concisely, many applications such as text data mining, text analytics and other natural language processes of deriving high-quality information from text can take advantage of it. The conditional transition matrices for each distinct feature of the model is the state-of-the-art which mostly rely on the characteristics of the keywords and distribution probabilities of each feature on the state space in order to learn the sequence of behaviors of the keywords in various web documents. According to the experimental results, the proposed method outperforms the baseline methods for keyword extraction in terms of performance and semantically.

References

  1. Cohen, J.D. 1995. Highlights: Language and Domain-independent Automatic Indexing Terms for Abstracting. Journal of the American Society for Information Science, 46, 3, 162--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Luhn, H. P. 1957. A Statistical Approach to Mechanized Encoding and Searching of Literary Information.IBM Journal of Research and Development, 1, 4, 309--317. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Salton, G., Yang, C. S. and Yu, C. T. 1975. A Theory of Importance in Automatic Text Analysis. Journal of the American society for Information Science, 16, 1, 33--44.Google ScholarGoogle ScholarCross RefCross Ref
  4. Matsuo, Y. and Ishizuka, M. 2004. Keyword Extraction from a Single Document Using Word Co-occurrence Statistical Information.International Journal on Artificial Intelligence Tools, 13, 1, 157--169. https://www.aaai.org/Papers/FLAIRS/2003/Flairs03-076.pdf.Google ScholarGoogle Scholar
  5. Chien, L. F. 1997. PAT-tree-based Keyword Extraction for Chinese Information Retrieval. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Philadelphia, PA, USA, July 27-31, 1997), SIGIR '97. ACM, New York, NY, 50--58. DOI=http://doi.acm.org/10.1145/258525.258534. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ercan, G. and Cicekli, I. 2007. Using Lexical Chains for Keyword Extraction. Information Processing and Management, 43, 6 (November 2007), 1705--1714. DOI=http://dx.doi.org/10.1016/j.ipm.2007.01.015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Hulth, A. 2003. Improved Automatic Keyword Extraction Given More Linguistic Knowledge. In Proceedings of the 2003 Conference on Emprical Methods in Natural Language Processing(Sapporo, Japan, July 11-12, 2003), EMNLP '03, ACL, Stroudsburg, PA, USA, 216--223, DOI=http://dx.doi.org/10.3115/1119355.1119383. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Dennis, S. F. 1967. The Design and Testing of a Fully Automatic Indexing-searching System for Documents Consisting of Expository Text. In Information Retrieval: a Critical Review, G. Schecter, Ed. Thompson Book Company, Washington D. C., 67--94.Google ScholarGoogle Scholar
  9. Salton, G. and Buckley, C. 1991. Achieving application requirements. Automatic Text Structuring and Retrieval -- Experiments in Automatic Encyclopedia Searching. In Proceedings of the Fourteenth SIGIR Conference on Research and development in information retrieval(Chicago, IL, USA, October 13-16, 1991), SIGIR '91. ACM, New York, NY, USA, 21--30. DOI=http://dx.doi.org/10.1145/122860.122863. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Frank, E., Paynter, G. W., Witten, I. H., Gutwin, C. and Nevill-Manning, C. G. 1999. Domain-Specific Keyphrase Extraction. In Proceedings of the 16th International Joint Conference on Artificial Intelligence(Stockholm, Sweden, July 31-August 6, 1999), IJCAI '99, Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, 668--673. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Zhang, K., Xu, H., Tang, J. and Li, Z. 2006. Keyword Extraction Using Support Vector Machine. In Proceedings of the 7th International Conference on Web-Age Information Management(Hong Kong, China, June 17-19, 2006), WAIM '06, Springer-Verlag Berlin, Heidelberg, 85--96. DOI=http://dx.doi.org/10.1007/11775300_8.Google ScholarGoogle Scholar
  12. Keith Humphreys, J. B. 2002. PhraseRate: An HTML Keyphrase Extractor. Technical Report. University of California, Riverside, 1--16.Google ScholarGoogle Scholar
  13. Wartena, C., Brussee, R. and Slakhorst, W. 2010. Keyword extraction using word co-occurrence. In Proceedings of 21st International Conference on Database and Expert Systems Applications(Bilbao, Spain, August 30-September 3, 2010), DEXA '10, IEEE, 54--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Rose, S., Engel, D., Cramer, N., and Cowley, W. 2010. Automatic Keyword Extraction from Individual Documents. In M. W. Berry & J. Kogan (Eds.), Text Mining: Theory and Applications: John Wiley & Sons..Google ScholarGoogle Scholar
  15. Chengzhi Z., Huilin W., Yao L., Dan W., Yi L. and Bo W. 2004. Automatic Keyword Extraction from Documents Using Conditional Random Fields. Journal of Computational Information Systems, 4, 3, 1169--1180.Google ScholarGoogle Scholar
  16. Zhang K., Xu H., Tang J., Li J. Z. 2006. Keyword Extraction Using Support Vector Machine. In Proceedings of the Seventh International Conference on Web-Age Information Management, WAIM2006, Hong Kong, China, 85--96.Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. The Method of Semi-supervised Automatic Keyword Extraction for Web Documents using Transition Probability Distribution Generator

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      RACS '16: Proceedings of the International Conference on Research in Adaptive and Convergent Systems
      October 2016
      266 pages
      ISBN:9781450344555
      DOI:10.1145/2987386

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 October 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      RACS '16 Paper Acceptance Rate40of161submissions,25%Overall Acceptance Rate393of1,581submissions,25%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader