Skip to main content

Temporal Feature Space for Text Classification

  • Conference paper
  • First Online:
  • 1529 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10415))

Abstract

In supervised learning algorithms for text classification the text content is usually represented using the frequencies of the words it contains, ignoring their semantic and their relationships. Words within temporal expressions such as “ today ” or “ last February ” are particularly affected by this simplification: the same expression can have a different semantic in documents with different timestamps, while different expressions could refer to the same time. After extracting temporal expressions in documents, we model a set of temporal features derived from the time mentioned in the document, showing the relation between these features and the belonging category. We test our temporal approach on a subset of the New York Times corpus showing a significant improvement over the text-only baseline.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Dataset with precomputed features available at https://smartdata.cs.unibo.it/data/TFTC/.

  2. 2.

    NYT Corpus available at https://catalog.ldc.upenn.edu/LDC2008T19.

References

  1. Berberich, K., Bedathur, S., Alonso, O., Weikum, G.: A language modeling approach for temporal information needs. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., Rijsbergen, K. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 13–25. Springer, Heidelberg (2010). doi:10.1007/978-3-642-12275-0_5

    Chapter  Google Scholar 

  2. Bloehdorn, S., Hotho, A.: Boosting for text classification with semantic features. In: Mobasher, B., Nasraoui, O., Liu, B., Masand, B. (eds.) WebKDD 2004. LNCS, vol. 3932, pp. 149–166. Springer, Heidelberg (2006). doi:10.1007/11899402_10

    Chapter  Google Scholar 

  3. Brucato, M., Montesi, D.: Metric spaces for temporal information retrieval. In: Rijke, M., Kenter, T., Vries, A.P., Zhai, C.X., Jong, F., Radinsky, K., Hofmann, K. (eds.) ECIR 2014. LNCS, vol. 8416, pp. 385–397. Springer, Cham (2014). doi:10.1007/978-3-319-06028-6_32

    Chapter  Google Scholar 

  4. Campos, R., Dias, G., Jorge, A.M., Jatowt, A.: Survey of temporal information retrieval and related applications. ACM Comput. Surv. (CSUR) 47(2), 15 (2015)

    Google Scholar 

  5. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)

    Google Scholar 

  6. Fernández, M., Cantador, I., López, V., Vallet, D., Castells, P., Motta, E.: Semantically enhanced information retrieval: an ontology-based approach. Web Semant. Sci. Serv. Agents World Wide Web 9(4), 434–452 (2011). JWS special issue on Semantic Search

    Google Scholar 

  7. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  8. Jatowt, A., Au Yeung, C.M., Tanaka, K.: Estimating document focus time. In: Proceedings of the 22nd ACM International Conference on Conference on Information and Knowledge Management, pp. 2273–2278. ACM (2013)

    Google Scholar 

  9. Kara, S., Alan, Ö., Sabuncu, O., Akpinar, S., Cicekli, N.K., Alpaslan, F.N.: An ontology-based retrieval system using semantic indexing. Inf. Syst. 37(4), 294–305 (2012). Semantic Web Data Management

    Google Scholar 

  10. Moschitti, A., Basili, R.: Complex linguistic features for text classification: a comprehensive study. In: McDonald, S., Tait, J. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 181–196. Springer, Heidelberg (2004). doi:10.1007/978-3-540-24752-4_14

    Chapter  Google Scholar 

  11. Radinsky, K., Agichtein, E., Gabrilovich, E., Markovitch, S.: A word at a time: computing word relatedness using temporal semantic analysis. In: Proceedings of the 20th International Conference on World Wide Web, WWW 2011, pp. 337–346. ACM, New York (2011)

    Google Scholar 

  12. Strötgen, J., Alonso, O., Gertz, M.: Identification of top relevant temporal expressions in documents. In: Proceedings of the 2nd Temporal Web Analytics Workshop, pp. 33–40. ACM (2012)

    Google Scholar 

  13. Strötgen, J., Gertz, M.: Heideltime: High quality rule-based extraction and normalization of temporal expressions. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 321–324. Association for Computational Linguistics (2010)

    Google Scholar 

  14. Vallet, D., Fernández, M., Castells, P.: An ontology-based information retrieval model. In: Gómez-Pérez, A., Euzenat, J. (eds.) ESWC 2005. LNCS, vol. 3532, pp. 455–470. Springer, Heidelberg (2005). doi:10.1007/11431053_31

    Chapter  Google Scholar 

  15. Wu, S., Crestani, F.: Data fusion with estimated weights. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 648–651. ACM (2002)

    Google Scholar 

  16. Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–49. ACM (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stefano Giovanni Rizzo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Rizzo, S.G., Montesi, D. (2017). Temporal Feature Space for Text Classification. In: Ekštein, K., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2017. Lecture Notes in Computer Science(), vol 10415. Springer, Cham. https://doi.org/10.1007/978-3-319-64206-2_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64206-2_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64205-5

  • Online ISBN: 978-3-319-64206-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics