Skip to main content
Log in

Text documents streams with improved incremental similarity

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

There has been a significant effort by the research community to address the problem of providing methods to organize documentation, with the help of Information Retrieval methods. In this paper, we present several experiments with stream analysis methods to explore streams of text documents. This paper also presents possible architectures of the Text Document Stream Organization, with the use of incremental algorithms like Incremental Sparse TF-IDF and Incremental Similarity. Our results show that with this architecture, significant improvements are achieved, regarding efficiency in grouping of similar documents. These improvements are important since it is of general knowledge that great amounts of text analysis are a high dimensional and complex subject of study, in the data analysis area.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

Download references

Acknowledgements

This work was fully financed by the Faculty of Engineering of Porto University. Rui Portocarrero Sarmento also gratefully acknowledges funding from FCT (Portuguese Foundation for Science and Technology) through a Ph.D. grant (SFRH/BD/119108/2016).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rui Portocarrero Sarmento.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sarmento, R.P., O. Cardoso, D., Dearo, K. et al. Text documents streams with improved incremental similarity. Soc. Netw. Anal. Min. 11, 113 (2021). https://doi.org/10.1007/s13278-021-00826-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-021-00826-z

Keywords

Navigation