Abstract
In order to represent texts preserving their semantics, in earlier work we proposed the WAPO-Structure, which is an intermediate form of representation that allows related terms to remain together. This intermediate form can be visualized through a tag cloud, which in turn serves as a textual navigation and retrieval tool. WAPO-Structures were obtained through a modification of the APriori algorithm, which spends a lot of processing time computing frequent sequences, for which it must perform numerous readings on the text until finding the frequent sequences of maximal level.
In this paper we present an alternative method for the generation of the WAPO-Structure from the inverted indexes of the text. This method saves processing time in texts for which an inverted index is already computed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference in Very Large Data Bases, VLDB, vol. 1215, pp. 487–499. Citeseer (1994)
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the Eleventh International Conference on Data Engineering, pp. 3–14. IEEE (1995). https://doi.org/10.1109/ICDE.1995.380415
Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern Information Retrieval. ACM Press, New York (1999)
Blumer, A., Blumer, J., Haussler, D., McConnell, R., Ehrenfeucht, A.: Complete inverted files for efficient text retrieval and analysis. J. ACM 34(3), 578–595 (1987). https://doi.org/10.1145/28869.28873
Cutting, D., Karger, D., Pedersen, J., Tukey, J.: Scatter/Gather: a cluster-based approach to browsing large document collections. In: ACM SIGIR Forum, vol. 51, pp. 148–159. ACM (2017). https://doi.org/10.1145/3130348.3130362
Hipp, J., Güntzer, U., Nakhaeizadeh, G.: Algorithms for association rule mining - a general survey and comparison. SIGKDD Explor. Newsl. 2, 58–64 (2000). https://doi.org/10.1145/360402.360421
Patil, M., Thankachan, S., Shah, R., Hon, W., Vitter, J., Chandrasekaran, S.: Inverted indexes for phrases and strings. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 555–564. ACM (2011). https://doi.org/10.1145/2009916.2009992
Torres-Parejo, U., Campaña, J.R., Vila, M.A., Delgado, M.: MTCIR: a multi-term tag cloud information retrieval system. Expert Syst. Appl. 40(14), 5448–5455 (2013). https://doi.org/10.1016/j.eswa.2013.04.010
Torres-Parejo, U., Campaña, J., Vila, M., Delgado, M.: A theoretical model for the automatic generation of tag clouds. Knowl. Inf. Syst. 40(2), 315–347 (2014). https://doi.org/10.1007/s10115-013-0651-9
Vdorhees, E.: The cluster hypothesis revisited. In: ACM SIGIR Forum, vol. 51, pp. 35–43. ACM (2017). https://doi.org/10.1145/3130348.3130353
Zaki, M.: SPADE: an efficient algorithm for mining frequent sequences. Mach. Learn. 42(1), 31–60 (2001). https://doi.org/10.1109/ICDE.2004.1320012
Acknowledgements
This work has been partially supported by the “Plan Andaluz de Investigación, Junta de Andalucía” (Spain) under research project P10-TIC6019.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Torres-Parejo, Ú., Campaña, J.R., Vila, MA., Delgado, M. (2018). Obtaining WAPO-Structure Through Inverted Indexes. In: Medina, J., et al. Information Processing and Management of Uncertainty in Knowledge-Based Systems. Theory and Foundations. IPMU 2018. Communications in Computer and Information Science, vol 854. Springer, Cham. https://doi.org/10.1007/978-3-319-91476-3_53
Download citation
DOI: https://doi.org/10.1007/978-3-319-91476-3_53
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91475-6
Online ISBN: 978-3-319-91476-3
eBook Packages: Computer ScienceComputer Science (R0)