Skip to main content

StoryTracker: A Semantic-Oriented Tool for Automatic Tracking Events by Web Documents

  • Conference paper
  • First Online:
Computational Science and Its Applications – ICCSA 2021 (ICCSA 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12951))

Included in the following conference series:

  • 1142 Accesses

Abstract

Media vehicles play an essential role in investigating events and keeping the public informed. Indirectly, logs of daily events made by newspapers and magazines have been built rich collections of data that can be used by lots of professionals such as economists, historians, and political scientists. However, exploring these logs with traditional search engines has become impractical for more demanding users. In this paper, we propose StoryTracker, a temporal exploration tool that helps users query news collections. We focus our efforts (i) to allow users to make queries by adding information from documents represented by word embbedings and (ii) to develop a strategy for retrieving temporal information to generate timelines and present them using a suitable interface for temporal exploration. We evaluated our solution using a real database of articles from a huge Brazilian newspaper and showed that our tool can trace different timelines, covering different subtopics of the same theme.

Supported by CAPES, CNPq, Finep, Fapesp and Fapemig.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.kaggle.com/marlesson/news-of-the-site-folhauol.

  2. 2.

    The code is available at https://github.com/warSantos/StoryTracker.git.

  3. 3.

    https://radimrehurek.com/gensim/models/doc2vec.html.

  4. 4.

    https://www.kaggle.com/marlesson/news-of-the-site-folhauol.

  5. 5.

    https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html.

  6. 6.

    https://en.wikipedia.org/wiki/Odebrecht.

  7. 7.

    https://en.wikipedia.org/wiki/Petrobras.

References

  1. Alonso, O., Gertz, M., Baeza-Yates, R.: Clustering and exploring search results using timeline constructions. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, pp. 97–106. Association for Computing Machinery, New York (2009)

    Google Scholar 

  2. Alonso, O., Strötgen, J., Baeza-Yates, R., Gertz, M.: Temporal information retrieval: challenges and opportunities. In: TWAW Workshop, WWW, vol. 707, no. 01 (2011)

    Google Scholar 

  3. Attar, R., Fraenkel, A.S.: Local feedback in full-text retrieval systems. J. ACM 24(3), 397–417 (1977)

    Article  Google Scholar 

  4. Azad, H.K., Deepak, A.: Query expansion techniques for information retrieval: a survey. Inf. Process. Manage. 56(5), 1698–1735 (2019)

    Article  Google Scholar 

  5. Chang, Y., Tang, J., Yin, D., Yamada, M., Liu, Y.: Timeline summarization from social media with life cycle models. In: IJCAI (2016)

    Google Scholar 

  6. Jones, K.S., Walker, S., Robertson, S.E.: A probabilistic model of information retrieval: development and comparative experiments. In: Information Processing and Management, pp. 779–840 (2000)

    Google Scholar 

  7. Kanhabua, N., Anand, A.: Temporal information retrieval, pp. 1235–1238 (2016)

    Google Scholar 

  8. Karvelis, P., Gavrilis, D., Georgoulas, G., Stylios, C.: Topic recommendation using doc2vec. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–6 (2018)

    Google Scholar 

  9. Kuzi, S., Shtok, A., Kurland, O.: Query expansion using word embeddings, pp. 1929–1932 (2016)

    Google Scholar 

  10. Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents (2014)

    Google Scholar 

  11. Lee, H., Yoon, Y.: Engineering doc2vec for automatic classification of product descriptions on O2O applications. Electron. Commer. Res. 18(3), 433–456 (2017). https://doi.org/10.1007/s10660-017-9268-5

    Article  Google Scholar 

  12. Li, J., Cardie, C.: Timeline generation: tracking individuals on Twitter (2014)

    Google Scholar 

  13. Matthews, M., Tolchinsky, P., Blanco, R., Atserias, J., Mika, P., Zaragoza, H.: Searching through time in the New York times (2010)

    Google Scholar 

  14. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013)

    Google Scholar 

  15. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality (2013)

    Google Scholar 

  16. Qamra, A., Tseng, B., Chang, E.Y.: Mining blog stories using community-based and temporal clustering. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, CIKM 2006, pp. 58–67. Association for Computing Machinery, New York (2006)

    Google Scholar 

  17. Rocchio, J.: Relevance feedback in information retrieval (1971)

    Google Scholar 

  18. Roy, D., Paul, D., Mitra, M., Garain, U.: Using word embeddings for automatic query expansion (2016)

    Google Scholar 

  19. Shao, Y., Taylor, S., Marshall, N., Morioka, C., Zeng-Treitler, Q.: Clinical text classification with word embedding features vs. bag-of-words features. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 2874–2878 (2018)

    Google Scholar 

  20. Singh, J., Nejdl, W., Anand, A.: History by diversity. In: Proceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval - CHIIR 2016 (2016)

    Google Scholar 

  21. Tahvili, S., Hatvani, L., Felderer, M., Afzal, W., Bohlin, M.: Automated functional dependency detection between test cases using doc2vec and clustering (2019)

    Google Scholar 

  22. Trieu, L., Tran, H., Tran, M.-T.: News classification from social media using Twitter-based doc2vec model and automatic query expansion, pp. 460–467 (2017)

    Google Scholar 

  23. Wang, Y., Huang, H., Feng, C.: Query expansion with local conceptual word embeddings in microblog retrieval. IEEE Trans. Knowl. Data Eng. 33(4), 1737–1749 (2019)

    Article  Google Scholar 

Download references

Acknowledgments

This project has partially supported by Huawei do Brasil Telecomunicações Ltda (Fundunesp Process # 3123/2020), FAPEMIG, and CAPES.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Diego Dias .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Santos, W., Fazzion, E., Tuler, E., Dias, D., Guimarães, M., Rocha, L. (2021). StoryTracker: A Semantic-Oriented Tool for Automatic Tracking Events by Web Documents. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2021. ICCSA 2021. Lecture Notes in Computer Science(), vol 12951. Springer, Cham. https://doi.org/10.1007/978-3-030-86970-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86970-0_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86969-4

  • Online ISBN: 978-3-030-86970-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics