Skip to main content

MEDLINE Abstracts Classification Based on Noun Phrases Extraction

  • Conference paper

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 25))

Abstract

Many algorithms have come up in the last years to tackle automated text categorization. They have been exhaustively studied, leading to several variants and combinations not only in the particular procedures but also in the treatment of the input data. A widely used approach is representing documents as Bag-Of-Words (BOW) and weighting tokens with the TFIDF schema. Many researchers have thrown into precision and recall improvements and classification time reduction enriching BOW with stemming, n-grams, feature selection, noun phrases, metadata, weight normalization, etc. We contribute to this field with a novel combination of these techniques. For evaluation purposes, we provide comparisons to previous works with SVM against the simple BOW. The well known OHSUMED corpus is exploited and different sets of categories are selected, as previously done in the literature. The conclusion is that the proposed method can be successfully applied to existing binary classifiers such as SVM outperforming the mixture of BOW and TFIDF approaches.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sebastiani, F.: A tutorial on automated text categorisation. In: Amandi, A., Zunino, R. (eds.) Proceedings of ASAI 1999, 1st Argentinian Symposium on Artificial Intelligence, Buenos Aires, AR, pp. 7–35 (1999)

    Google Scholar 

  2. Aas, K., Eikvil, L.: Text categorisation: A survey. Technical report, Norwegian Computer Center (June 1999)

    Google Scholar 

  3. Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Hearst, M.A., Gey, F., Tong, R. (eds.) Proceedings of SIGIR 1999, 22nd ACM International Conference on Research and Development in Information Retrieval, Berkeley, US, pp. 42–49. ACM Press, New York (1999)

    Google Scholar 

  4. Scott, S., Matwin, S.: Feature engineering for text classification. In: Bratko, I., Dzeroski, S. (eds.) Proceedings of ICML 1999, 16th International Conference on Machine Learning, Bled, SL, pp. 379–388. Morgan Kaufmann Publishers, San Francisco (1999)

    Google Scholar 

  5. Tan, C.M., Wang, Y.F., Lee, C.D.: The use of bigrams to enhance text categorization. Information Processing and Management 38(4), 529–546 (2002)

    Article  Google Scholar 

  6. Tesar, R., Strnad, V., Jezek, K., Poesio, M.: Extending the single words-based document model: a comparison of bigrams and 2-itemsets. In: DocEng 2006: Proceedings of the 2006 ACM symposium on Document engineering, pp. 138–146. ACM Press, New York (2006)

    Google Scholar 

  7. Antonie, M., Zaane, O.: Text document categorization by term association. In: IEEE International Conference on Data Mining (ICDM), pp. 19–26 (2002)

    Google Scholar 

  8. Zhang, Y., Zhang, L., Yan, J., Li, Z.: Using association features to enhance the performance of naive bayes text classifier. In: Fifth International Conference on Computational Intelligence and Multimedia Applications, ICCIMA 2003, pp. 336–341 (2003)

    Google Scholar 

  9. Basili, R., Moschitti, A., Pazienza, M.T.: Language-sensitive text classification. In: Proceeding of RIAO 2000, 6th International Conference Recherche d’Information Assistee par Ordinateur, Paris, FR, pp. 331–343 (2000)

    Google Scholar 

  10. Granitzer, M.: Hierarchical text classification using methods from machine learning. Master’s thesis, Graz University of Technology (2003)

    Google Scholar 

  11. Moschitti, A., Basili, R.: Complex linguistic features for text classification: A comprehensive study. In: McDonald, S., Tait, J.I. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 181–196. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  12. Buckley, C.: The importance of proper weighting methods. In: Bates, M. (ed.) Human Language Technology. Morgan Kaufman, San Francisco (1993)

    Google Scholar 

  13. Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. Department of Computer Science, Cornell University, Ithaca, NY 14853 (1996)

    Google Scholar 

  14. Ruiz-Rico, F., Vicedo, J.L., Rubio-Sánchez, M.C.: Newpar: an automatic feature selection and weighting schema for category ranking. In: Proceedings of DocEng 2006, 6th ACM symposium on Document engineering, pp. 128–137 (2006)

    Google Scholar 

  15. Màrquez, L., Giménez, J.: A general pos tagger generator based on support vector machines. Journal of Machine Learning Research (2004), www.lsi.upc.edu/~nlp/SVMTool

  16. Kongovi, M., Guzman, J.C., Dasigi, V.: Text categorization: An experiment using phrases. In: Crestani, F., Girolami, M., van Rijsbergen, C.J.K. (eds.) ECIR 2002. LNCS, vol. 2291, pp. 213–228. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  17. Joachims, T.: Making large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning (1999), http://svmlight.joachims.org/

  18. Joachims, T.: Support Vector and Kernel Methods. In: SIGIR 2003 Tutorial (2003)

    Google Scholar 

  19. Zu, G., Ohyama, W., Wakabayashi, T., Kimura, F.: Accuracy improvement of automatic text classification based on feature transformation. In: Proceedings of DOCENG 2003, ACM Symposium on Document engineering, Grenoble, FR, pp. 118–120. ACM Press, New York (2003)

    Chapter  Google Scholar 

  20. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  21. Joachims, T.: Estimating the generalization performance of a svm efficiently. In: Langley, P. (ed.) Proceedings of ICML 2000, 17th International Conference on Machine Learning, Stanford, US, pp. 431–438. Morgan Kaufmann Publishers, San Francisco (2000)

    Google Scholar 

  22. Dumais, S.T., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: Gardarin, G., French, J.C., Pissinou, N., Makki, K., Bouganim, L. (eds.) Proceedings of CIKM 1998, 7th ACM International Conference on Information and Knowledge Management, Bethesda, US, pp. 148–155. ACM Press, New York (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ruiz-Rico, F., Vicedo, JL., Rubio-Sánchez, MC. (2008). MEDLINE Abstracts Classification Based on Noun Phrases Extraction. In: Fred, A., Filipe, J., Gamboa, H. (eds) Biomedical Engineering Systems and Technologies. BIOSTEC 2008. Communications in Computer and Information Science, vol 25. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-92219-3_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-92219-3_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-92218-6

  • Online ISBN: 978-3-540-92219-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics