Skip to main content

Abstract

In order to represent texts preserving their semantics, in earlier work we proposed the WAPO-Structure, which is an intermediate form of representation that allows related terms to remain together. This intermediate form can be visualized through a tag cloud, which in turn serves as a textual navigation and retrieval tool. WAPO-Structures were obtained through a modification of the APriori algorithm, which spends a lot of processing time computing frequent sequences, for which it must perform numerous readings on the text until finding the frequent sequences of maximal level.

In this paper we present an alternative method for the generation of the WAPO-Structure from the inverted indexes of the text. This method saves processing time in texts for which an inverted index is already computed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference in Very Large Data Bases, VLDB, vol. 1215, pp. 487–499. Citeseer (1994)

    Google Scholar 

  2. Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the Eleventh International Conference on Data Engineering, pp. 3–14. IEEE (1995). https://doi.org/10.1109/ICDE.1995.380415

  3. Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern Information Retrieval. ACM Press, New York (1999)

    Google Scholar 

  4. Blumer, A., Blumer, J., Haussler, D., McConnell, R., Ehrenfeucht, A.: Complete inverted files for efficient text retrieval and analysis. J. ACM 34(3), 578–595 (1987). https://doi.org/10.1145/28869.28873

    Article  MathSciNet  MATH  Google Scholar 

  5. Cutting, D., Karger, D., Pedersen, J., Tukey, J.: Scatter/Gather: a cluster-based approach to browsing large document collections. In: ACM SIGIR Forum, vol. 51, pp. 148–159. ACM (2017). https://doi.org/10.1145/3130348.3130362

    Article  Google Scholar 

  6. Hipp, J., Güntzer, U., Nakhaeizadeh, G.: Algorithms for association rule mining - a general survey and comparison. SIGKDD Explor. Newsl. 2, 58–64 (2000). https://doi.org/10.1145/360402.360421

    Article  Google Scholar 

  7. Patil, M., Thankachan, S., Shah, R., Hon, W., Vitter, J., Chandrasekaran, S.: Inverted indexes for phrases and strings. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 555–564. ACM (2011). https://doi.org/10.1145/2009916.2009992

  8. Torres-Parejo, U., Campaña, J.R., Vila, M.A., Delgado, M.: MTCIR: a multi-term tag cloud information retrieval system. Expert Syst. Appl. 40(14), 5448–5455 (2013). https://doi.org/10.1016/j.eswa.2013.04.010

    Article  Google Scholar 

  9. Torres-Parejo, U., Campaña, J., Vila, M., Delgado, M.: A theoretical model for the automatic generation of tag clouds. Knowl. Inf. Syst. 40(2), 315–347 (2014). https://doi.org/10.1007/s10115-013-0651-9

    Article  Google Scholar 

  10. Vdorhees, E.: The cluster hypothesis revisited. In: ACM SIGIR Forum, vol. 51, pp. 35–43. ACM (2017). https://doi.org/10.1145/3130348.3130353

    Article  Google Scholar 

  11. Zaki, M.: SPADE: an efficient algorithm for mining frequent sequences. Mach. Learn. 42(1), 31–60 (2001). https://doi.org/10.1109/ICDE.2004.1320012

    Article  MATH  Google Scholar 

Download references

Acknowledgements

This work has been partially supported by the “Plan Andaluz de Investigación, Junta de Andalucía” (Spain) under research project P10-TIC6019.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Úrsula Torres-Parejo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Torres-Parejo, Ú., Campaña, J.R., Vila, MA., Delgado, M. (2018). Obtaining WAPO-Structure Through Inverted Indexes. In: Medina, J., et al. Information Processing and Management of Uncertainty in Knowledge-Based Systems. Theory and Foundations. IPMU 2018. Communications in Computer and Information Science, vol 854. Springer, Cham. https://doi.org/10.1007/978-3-319-91476-3_53

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91476-3_53

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91475-6

  • Online ISBN: 978-3-319-91476-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics