Obtaining WAPO-Structure Through Inverted Indexes

Torres-Parejo, Úrsula; Campaña, Jesús R.; Vila, Maria-Amparo; Delgado, Miguel

doi:10.1007/978-3-319-91476-3_53

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 854))

Included in the following conference series:

International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems

1060 Accesses

Abstract

In order to represent texts preserving their semantics, in earlier work we proposed the WAPO-Structure, which is an intermediate form of representation that allows related terms to remain together. This intermediate form can be visualized through a tag cloud, which in turn serves as a textual navigation and retrieval tool. WAPO-Structures were obtained through a modification of the APriori algorithm, which spends a lot of processing time computing frequent sequences, for which it must perform numerous readings on the text until finding the frequent sequences of maximal level.

In this paper we present an alternative method for the generation of the WAPO-Structure from the inverted indexes of the text. This method saves processing time in texts for which an inverted index is already computed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference in Very Large Data Bases, VLDB, vol. 1215, pp. 487–499. Citeseer (1994)
Google Scholar
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the Eleventh International Conference on Data Engineering, pp. 3–14. IEEE (1995). https://doi.org/10.1109/ICDE.1995.380415
Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern Information Retrieval. ACM Press, New York (1999)
Google Scholar
Blumer, A., Blumer, J., Haussler, D., McConnell, R., Ehrenfeucht, A.: Complete inverted files for efficient text retrieval and analysis. J. ACM 34(3), 578–595 (1987). https://doi.org/10.1145/28869.28873
Article MathSciNet MATH Google Scholar
Cutting, D., Karger, D., Pedersen, J., Tukey, J.: Scatter/Gather: a cluster-based approach to browsing large document collections. In: ACM SIGIR Forum, vol. 51, pp. 148–159. ACM (2017). https://doi.org/10.1145/3130348.3130362
Article Google Scholar
Hipp, J., Güntzer, U., Nakhaeizadeh, G.: Algorithms for association rule mining - a general survey and comparison. SIGKDD Explor. Newsl. 2, 58–64 (2000). https://doi.org/10.1145/360402.360421
Article Google Scholar
Patil, M., Thankachan, S., Shah, R., Hon, W., Vitter, J., Chandrasekaran, S.: Inverted indexes for phrases and strings. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 555–564. ACM (2011). https://doi.org/10.1145/2009916.2009992
Torres-Parejo, U., Campaña, J.R., Vila, M.A., Delgado, M.: MTCIR: a multi-term tag cloud information retrieval system. Expert Syst. Appl. 40(14), 5448–5455 (2013). https://doi.org/10.1016/j.eswa.2013.04.010
Article Google Scholar
Torres-Parejo, U., Campaña, J., Vila, M., Delgado, M.: A theoretical model for the automatic generation of tag clouds. Knowl. Inf. Syst. 40(2), 315–347 (2014). https://doi.org/10.1007/s10115-013-0651-9
Article Google Scholar
Vdorhees, E.: The cluster hypothesis revisited. In: ACM SIGIR Forum, vol. 51, pp. 35–43. ACM (2017). https://doi.org/10.1145/3130348.3130353
Article Google Scholar
Zaki, M.: SPADE: an efficient algorithm for mining frequent sequences. Mach. Learn. 42(1), 31–60 (2001). https://doi.org/10.1109/ICDE.2004.1320012
Article MATH Google Scholar

Download references

Acknowledgements

This work has been partially supported by the “Plan Andaluz de Investigación, Junta de Andalucía” (Spain) under research project P10-TIC6019.

Author information

Authors and Affiliations

Department of Statistics and Operational Research, University of Cádiz, Cádiz, Spain
Úrsula Torres-Parejo
Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
Jesús R. Campaña, Maria-Amparo Vila & Miguel Delgado

Authors

Úrsula Torres-Parejo
View author publications
You can also search for this author in PubMed Google Scholar
Jesús R. Campaña
View author publications
You can also search for this author in PubMed Google Scholar
Maria-Amparo Vila
View author publications
You can also search for this author in PubMed Google Scholar
Miguel Delgado
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Úrsula Torres-Parejo .

Editor information

Editors and Affiliations

Universidad de Cádiz, Cádiz, Cadiz, Spain
Jesús Medina
Universidad de Málaga, Málaga, Málaga, Spain
Manuel Ojeda-Aciego
Universidad de Granada, Granada, Spain
José Luis Verdegay
Universidad de Granada, Granada, Spain
David A. Pelta
Universidad de Málaga, Málaga, Málaga, Spain
Inma P. Cabrera
LIP6, Université Pierre et Marie Curie, CNRS, Paris, France
Bernadette Bouchon-Meunier
Iona College, New Rochelle, New York, USA
Ronald R. Yager

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Torres-Parejo, Ú., Campaña, J.R., Vila, MA., Delgado, M. (2018). Obtaining WAPO-Structure Through Inverted Indexes. In: Medina, J., et al. Information Processing and Management of Uncertainty in Knowledge-Based Systems. Theory and Foundations. IPMU 2018. Communications in Computer and Information Science, vol 854. Springer, Cham. https://doi.org/10.1007/978-3-319-91476-3_53

Download citation

DOI: https://doi.org/10.1007/978-3-319-91476-3_53
Published: 18 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91475-6
Online ISBN: 978-3-319-91476-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics