Skip to main content

Improving Supervised Keyphrase Indexer Classification of Keyphrases with Text Denoising

  • Conference paper
Book cover The Outreach of Digital Libraries: A Globalized Resource Network (ICADL 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7634))

Included in the following conference series:

Abstract

Text denoising is a text reduction method that extracts the content-rich parts from full-text research articles. These content-rich parts, known as the denoised texts, suffice information extraction tasks, such as automatic relation mining and keyphrase extraction. In this paper, we concentrate on the latter and show that two state-of-the-art supervised keyphrase indexers named KEA and KEA++, when paired with text denoising, induce improved keyphrase classifiers. The classifiers’ performances are demonstrated on three standard full-text corpora collected from the food and agriculture, nuclear physics and biomedical domains. Using the denoised parts of the texts, the indexers induce keyphrase classifiers that are later used for full-text keyphrase extraction. Experimental results show that against a gold standard these classifiers perform better than those induced from full texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alpaydin, E.: Assessing and comparing classification algorithms. In: Introduction to Machine Learning, pp. 342–343. The MIT Press, Cambridge (2004)

    Google Scholar 

  2. Aronson, A.R., Mork, J.G., Gay, C.W., Humphrey, S.M., Rogers, W.J.: The NLM Indexing Initiative’s Medical Text Indexer. In: Proceedings of the 11th World Congress on Medical Informatics (MEDINFO 2004), San Francisco, USA, pp. 268–272 (2004)

    Google Scholar 

  3. Frank, E., Paynter, G., Witten, I., Gutwin, C.: Domain-specific keyphrase extraction. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI 1999), Stockholm, Sweden, pp. 668–673 (1999)

    Google Scholar 

  4. Gunning, R.: Fog index after twenty years. Journal of Business Communication 6(3), 3–13 (1969)

    Article  Google Scholar 

  5. Hooper, R.S.: Indexer consistency tests: Origin, measurements, results and utilization. Report, IBM Corporation, Bethesda, MD (1965)

    Google Scholar 

  6. Kim, S.N., Medelyan, O., Kan, M.-Y., Baldwin, T.: Semeval-2010 Task 5: Automatic keyphrase extraction from scientific articles. In: Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval 2010), Uppsala, Sweden (2010)

    Google Scholar 

  7. Medelyan, O.: Human-competitive automatic topic indexing. PhD thesis, University of Waikato, New Zealand (2009)

    Google Scholar 

  8. Medelyan, O., Witten, I.: Measuring inter-indexer consistency using a thesaurus. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2006), Chapel Hill, NC, USA, pp. 274–275 (2006)

    Google Scholar 

  9. Medelyan, O., Witten, I.: Domain-independent automatic keyphrase indexing with small training sets. Journal of the American Society for Information Science and Technology (JASIST) 59(7), 1026–1040 (2008)

    Article  Google Scholar 

  10. Rolling, L.: Indexing consistency, quality and efficiency. Information Processing and Management 17, 69–76 (1981)

    Article  Google Scholar 

  11. Shams, R., Mercer, R.E.: Extracting connected concepts from biomedical texts using fog index. Procedia - Social and Behavioral Sciences 27, 70–76 (2011)

    Article  Google Scholar 

  12. Shams, R., Mercer, R.E.: Investigating keyphrase indexing with text denoising. In: Proceedings of the 11th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2012), Washington DC, USA (2012)

    Google Scholar 

  13. Turney, P.: Learning algorithms for keyphrase extraction. Information Retreival 2, 303–336 (2000)

    Article  Google Scholar 

  14. Witten, I., Paynter, G., Frank, E., Gutwin, C., Nevill-Manning, C.: KEA: Practical automatic keyphrase extraction. In: Proceedings of the 4th ACM Conference on Digital Libraries, Berkeley, CA, USA, pp. 254–255 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shams, R., Mercer, R.E. (2012). Improving Supervised Keyphrase Indexer Classification of Keyphrases with Text Denoising. In: Chen, HH., Chowdhury, G. (eds) The Outreach of Digital Libraries: A Globalized Resource Network. ICADL 2012. Lecture Notes in Computer Science, vol 7634. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34752-8_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34752-8_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34751-1

  • Online ISBN: 978-3-642-34752-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics