Skip to main content

Leveraging Closed Patterns and Formal Concept Analysis for Enhanced Microblogs Retrieval

  • Chapter
  • First Online:
Complex Data Analytics with Formal Concept Analysis

Abstract

Social microblogging services have gained a significant interest for society during our decade. These online platforms offered by the web 2.0 showed up the emergence of a large amount of data, allowing users to produce, share and exchange various content. Twitter is one of the most popular microblogging sites used by people to find relevant posts that satisfy their information need (e.g., breaking news, popular trends, information about people of interest, etc). However, Twitter’s queries and messages are short and access to information is sometimes difficult because of the variety of published content and huge amount of data generated. In this context, it is difficult for the user to properly find the relevant information. The proposal work deals with the context of social information retrieval (SIR) and aims to improve tweets retrieval quality. Thus, we propose a query expansion method to expand users’ queries. The proposed approach is based on Formal Concept Analysis by extracting patterns from documents retrieved by the search system. Also, the method uses Word Embeddings to enrich the patterns by adding similar words. The final query is therefore given by merging the initial query with the extended query. We experiment and evaluate the proposed method on the TREC 2011 dataset containing approximately 16 million tweets and 49 queries. Results revealed the effectiveness of the proposed approach and show the interest of combining patterns and word embeddings for enhanced microblogs retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://twitter.com/.

  2. 2.

    Terrier is an effective open source search engine (Information Retrieval system), readily deployable on large-scale collections of documents.

  3. 3.

    https://github.com/medallia/Word2VecJava.

  4. 4.

    http://engineering.medallia.com.

  5. 5.

    https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html.

  6. 6.

    https://github.com/ravikiranj/twitter-sentiment-analyzer/blob/master/data/feature_list/stopwords.txt.

  7. 7.

    http://terrier.org/.

  8. 8.

    http://terrier.org/docs/v4.0/javadoc/org/terrier/matching/models/BM25.html.

  9. 9.

    https://trec.nist.gov/trec_eval/.

References

  1. Aggarwal N, Buitelaar P (2012) Query expansion using Wikipedia and DBpedia. In: CLEF (Online Working Notes/Labs/Workshop)

    Google Scholar 

  2. Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. SIGMOD Rec 22(2):207–216. https://doi.org/10.1145/170036.170072

    Article  Google Scholar 

  3. ALMasri M, Berrut C, Chevallet JP (2013) Wikipedia based Semantic Query Enrichment. In: Proceedings of the Sixth International Workshop on Exploiting Semantic Annotations in Information Retrieval, ESAIR ’13, pp 5–8

    Google Scholar 

  4. Almasri M, Berrut C, Chevallet J (2016) A Comparison of Deep Learning Based Query Expansion with Pseudo-Relevance Feedback and Mutual Information. In: Advances in Information Retrieval - 38th European Conference on IR Research, ECIR 2016, Padua, Italy, March 20–23, 2016. Proceedings, pp 709–715

    Google Scholar 

  5. Amati G (2003) Probability models for information retrieval based on divergence from randomness. PhD thesis, University of Glasgow

    Google Scholar 

  6. Bai J, Song D, Bruza P, Nie Jy, Cao G (2005) Query expansion using term relationships language models for information retrieval. International Conference on Information and Knowledge Management, Proceedings

    Google Scholar 

  7. Carpineto C, Romano G (2012) A survey of automatic query expansion in information retrieval. ACM Comput Surv 44(1)

    Google Scholar 

  8. Codocedo V, Napoli A (2015) Formal Concept Analysis and Information Retrieval – A Survey. In: International Conference in Formal Concept Analysis - ICFCA 2015, Springer, Nerja, Spain, vol 9113, pp 61–77

    Google Scholar 

  9. Codocedo V, Baixeries J, Kaytoue M, Napoli A (2016) Contributions to the Formalization of Order-like Dependencies using FCA. In: What can FCA do for Artificial Intelligence?, The Hague, Netherlands

    Google Scholar 

  10. Diaz F, Mitra B, Craswell N (2016) Query expansion with locally-trained word embeddings. CoRR abs/1605.07891

    Google Scholar 

  11. Dogra N, Mulhem P, Goeuriot L, Amini MR (2018) Corpus d’entraînement sur les plongements de mots pour la recherche de microblogs culturels. In: COnférence en Recherche d’Informations et Applications - CORIA 2018, Rennes, France

    Google Scholar 

  12. Ganter B, Wille R (1999) Formal concept analysis: mathematical foundations. Springer Science

    Google Scholar 

  13. Gong Z, Cheang CW, Hou U L (2006) Multi-term Web Query Expansion Using WordNet. In: Database and Expert Systems Applications, Springer Berlin Heidelberg, Berlin, Heidelberg, pp 379–388

    Chapter  Google Scholar 

  14. Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 15(1):55–86

    Article  MathSciNet  Google Scholar 

  15. Hu J, Deng W, Guo J (2006) Improving retrieval performance by global analysis. In: 18th International Conference on Pattern Recognition, vol 2, pp 703–706

    Google Scholar 

  16. Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of IR techniques. ACM Trans Inf Syst 20(4):422–446

    Article  Google Scholar 

  17. Jones KS, Walker S, Robertson SE (2000) A probabilistic model of information retrieval: Development and comparative experiments. Inf Process Manage 36(6)

    Google Scholar 

  18. Kotov A, Zhai C (2012) Tapping into knowledge base for concept feedback: Leveraging ConceptNet to improve search results for difficult queries. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, ACM, New York, NY, USA, WSDM ’12, pp 403–412

    Google Scholar 

  19. Lau C, Li Y, Tjondronegoro D (2011) Microblog retrieval using topical features and query expansion. Proceedings of The Twentieth Text REtrieval Conference

    Google Scholar 

  20. Li W, Jones GJF (2017) Comparative evaluation of query expansion methods for enhanced search on microblog data: DCU ADAPT @ SMERP 2017 workshop data challenge. In: Proceedings of the First International Workshop on Exploitation of Social Media for Emergency Relief and Preparedness co-located with European Conference on Information Retrieval, pp 61–72

    Google Scholar 

  21. Li Y, Dong X, Guan Y (2011) HIT_LTRC at TREC 2011 Microblog Track. In: Text REtrieval Conference (TREC) 2011

    Google Scholar 

  22. Macdonald C, Ounis I (2007) Expertise Drift and Query Expansion in Expert Search. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, CIKM ’07, pp 341–350

    Google Scholar 

  23. Massoudi K, Tsagkias M, de Rijke M, Weerkamp W (2011) Incorporating query expansion and quality indicators in searching microblog posts. In: Proceedings of the 33rd European Conference on Advances in Information Retrieval, Springer-Verlag, Berlin, Heidelberg, ECIR’11, pp 362–367

    Google Scholar 

  24. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient Estimation of Word Representations in Vector Space. In: Proceedings of the International Conference on Learning Representations, pp 1–12

    Google Scholar 

  25. Mittal N, Nayak R, Govil MC, Jain KC (2010) Dynamic query expansion for efficient information retrieval. In: 2010 International Conference on Web Information Systems and Mining, vol 1, pp 211–215

    Google Scholar 

  26. Ounis I, Amati G, Plachouras V, He B, Macdonald C, Johnson D (2005) Terrier information retrieval platform. In: Proceedings of the 27th European Conference on Advances in Information Retrieval Research, Springer-Verlag, Berlin, Heidelberg, pp 517–519

    Chapter  Google Scholar 

  27. Ounis I, Macdonald C, Lin J, Soboroff I (2011) Overview of the trec-2011 microblog track. In: In Proceedings of TREC 2011

    Google Scholar 

  28. Pal D, Mitra M, Bhattacharya S (2015) Exploring query categorisation for query expansion: A study. CoRR abs/1509.05567. arXiv:1509.05567

    Google Scholar 

  29. Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Efficient mining of association rules using closed itemset lattices. Inf Syst 24(1):25–46

    Article  Google Scholar 

  30. Robertson SE, Walker S (1994) Some Simple Effective Approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval. In: Croft BW, van Rijsbergen CJ (eds) SIGIR ’94, Springer London, London, pp 232–241

    Chapter  Google Scholar 

  31. Rocchio JJ (1971) Relevance feedback in information retrieval. In: Salton G (ed) The Smart retrieval system - experiments in automatic document processing, Englewood Cliffs, NJ: Prentice-Hall, pp 313–323

    Google Scholar 

  32. Roy D, Paul D, Mitra M, Garain U (2016) Using word embeddings for automatic query expansion. CoRR abs/1606.07608. arXiv:1606.07608

    Google Scholar 

  33. Sanderson M (2010) Test collection based evaluation of information retrieval systems. Foundations and TrendsⓇin Information Retrieval 4(4):247–375

    Article  Google Scholar 

  34. Silva PRC, Dias SM, Brandão WC, Song MAJ, Zárate LE (2017) Formal concept analysis applied to professional social networks analysis. In: Proceedings of the 19th International Conference on Enterprise Information Systems, Volume 1, Porto, Portugal, April, 2017, pp 123–134

    Google Scholar 

  35. Spink A, Wolfram D, Jansen J, Saracevic T (2001) Searching the web: The public and their queries. Journal of the American Society for Information Science and Technology 52:226 – 234

    Article  Google Scholar 

  36. Xu J, Croft WB (1996) Query Expansion Using Local and Global Document Analysis. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’96, pp 4–11

    Google Scholar 

  37. Yang Z, Li C, Fan K, Huang J (2017) Exploiting multi-sources query expansion in microblogging filtering. Neural Network World 27:59–76

    Article  Google Scholar 

  38. Zaki MJ, Hsiao C (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng 17(4):462–478

    Article  Google Scholar 

  39. Zhai C, Lafferty J (2001) Model-based feedback in the language modeling approach to information retrieval. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, ACM, NY,USA, pp 403–410

    Chapter  Google Scholar 

  40. Zingla MA, Chiraz L, Slimani Y (2016) Short query expansion for microblog retrieval. Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 20th International Conference KES-2016 96(C)

    Google Scholar 

  41. Zingla MA, Latiri C, Mulhem P, Berrut C, Slimani Y (2018) Hybrid query expansion model for text and microblog information retrieval. Inf Retr Journal 21(4):337–367

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Meryem Bendella .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Bendella, M., Quafafou, M. (2022). Leveraging Closed Patterns and Formal Concept Analysis for Enhanced Microblogs Retrieval. In: Missaoui, R., Kwuida, L., Abdessalem, T. (eds) Complex Data Analytics with Formal Concept Analysis. Springer, Cham. https://doi.org/10.1007/978-3-030-93278-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-93278-7_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-93277-0

  • Online ISBN: 978-3-030-93278-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics