Skip to main content
Log in

On exploiting static and dynamically mined metadata for exploratory web searching

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Most Web Search Engines (WSEs) are appropriate for focalized search, i.e., they make the assumption that users can accurately describe their information need using a small sequence of terms. However, as several user studies have shown, a high percentage of search tasks are exploratory, and focalized search very commonly leads to inadequate interactions and poor results. This paper proposes exploiting static and dynamically mined metadata for enriching web searching with exploration services. Online results clustering, which is a mining task of dynamic nature since it is based on query-dependent snippets, is useful for providing users with overviews of the top results and thus allowing them to restrict their focus to the desired parts. On the other hand, the various static metadata that are available to a search engine (e.g., domain, language, date, and filetype) are commonly exploited only through the advanced (form-based) search facilities that some WSEs offer (and users rarely use). We propose an approach that combines both kinds of metadata by adopting the interaction paradigm of dynamic taxonomies and faceted exploration, which allows the users to restrict their focus gradually using both static and dynamically derived metadata. Special focus is given on the design and analysis of incremental algorithms for speeding up the exploration process. The experimental evaluation over a real WSE shows that this combination results to an effective, flexible, and efficient exploration experience. Finally, we report the results of a user study indicating that this direction is promising in terms of user preference, satisfaction, and effort.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. (Apr 2006) Special issue on supporting exploratory search. Commun ACM 49(4)

  2. Anagnostopoulos A, Broder A, Punera K (2008) Effective and efficient classification on a search-engine model. Knowl Inf Syst 16(2): 129–154

    Article  Google Scholar 

  3. Azzopardi L (2009) Usage based effectiveness measures: monitoring application performance in information retrieval. In: Proceedings the 18th ACM conference on information and knowledge management (CIKM’09). ACM, New York, NY, USA, pp 631–640

  4. Ben-Yitzhak O, Golbandi N, Har’El N, Lempel R, Neumann A, Ofek-Koifman S, Sheinwald D, Shekita E, Sznajder B, Yogev S (Feb 2008) Beyond basic faceted search. In: Proceedings of the international conference on web search and web data mining (WSDM’08). Palo Alto, California, USA, pp 33–44

  5. Berry KJ, Johnston JE, Mielke PW (2008) Weighted kappa for multiple raters. Percept Motor Skills 107: 837–848

    Google Scholar 

  6. Crabtree D, Gao X, Andreae P (Sep 2005) Improving web clustering by cluster selection. In: Proceedings of the IEEE/WIC/ACM international conference on web intelligence (WI’05). Compiegne, France, pp 172–178

  7. Cutting D, Karger D, Pedersen J, Tukey J (June 1992) Scatter/Gather: a cluster-based approach to browsing large document collections. In: Proceedings of the 15th annual international ACM conference on research and development in information retrieval (SIGIR’92). Copenhagen, Denmark, pp 318–329

  8. Dakka W, Ipeirotis P (Apr 2008) Automatic extraction of useful facet hierarchies from text databases. In: Proceedings of the 24th international conference on data engineering (ICDE’08). Cancún, México, pp 466–475

  9. de Borda JC (1781) Memoire sur les Elections au Scrutin. Histoire de l’Academie Royale des Sciences, Paris

    Google Scholar 

  10. Doms A, Schroeder M (2005) GoPubMed: exploring PubMed with the gene ontology. Nucleic Acids Res 33(Web Server Issue): W783

    Article  Google Scholar 

  11. Faulkner L (2003) Beyond the five-user assumption: benefits of increased sample sizes in usability testing. Behav Res Methods Instrum Comput 35(3): 379–383

    Article  Google Scholar 

  12. Ferragina P, Gulli A (May 2005) A personalized search engine based on web-snippet hierarchical clustering. In: Proceedings of the 14th international conference on world wide web (WWW’05), vol 5. Chiba, Japan, pp 801–810

  13. Gelgi F, Davulcu H, Vadrevu S (June 2007) Term ranking for clustering web search results. In: 10th international workshop on the web and databases (WebDB’07), Beijing, China

  14. Hearst M, Pedersen J (Aug 1996) Reexamining the cluster hypothesis: Scatter/Gather on retrieval results. In: Proceedings of the 19th annual international ACM conference on research and development in information retrieval (SIGIR’96), Zurich, Switzerland, pp 76–84

  15. Hildebrand M, van Ossenbruggen J, Hardman L (Nov 2006) /facet: a browser for heterogeneous semantic web repositories. In: Proceedings of international semantic web conference (ISWC’06), Athens, GA, USA, pp 272–285

  16. Hyvönen E, Mäkelä E, Salminen M, Valo A, Viljanen K, Saarela S, Junnila M, Kettula S (2005) MuseumFinland—Finnish museums on the semantic web. J Web Semant 3(2): 25

    Article  Google Scholar 

  17. Janruang J, Kreesuradej W (Nov 2006) A new web search result clustering based on true common phrase label discovery. In: Proceedings of the international conference on computational intelligence for modeling control and automation and international conference on intelligent agents web technologies and international commerce (CIMCA/IAWTIC’06), Washington, DC, USA, p 242. ISBN 0-7695-2731-0

  18. Järvelin K, Price SL, Delcambre LML, Nielsen ML (2008) Discounted cumulated gain based evaluation of multiple-query IR sessions. In: European conference on information retrieval (ECIR), pp 4–15

  19. Jing L, Ng M, Huang J (2010) Knowledge-based vector space model for text clustering. Knowl Inf Syst 25(1): 1–21

    Article  Google Scholar 

  20. Käki M, Aula A (2008) Controlling the complexity in comparing search user interfaces via user studies. Inf Process Manag 44(1): 82–91

    Article  Google Scholar 

  21. Karlson AK, Robertson GG, Robbins DC, Czerwinski MP, Smith GR (Apr 2006) FaThumb: a facet-based interface for mobile search. In: Proceedings of the conference on human factors in computing systems (CHI’06), Montréal, Québec, Canada, pp 711–720

  22. Kelly D (2009) Methods for evaluating interactive information retrieval systems with users. Found Trends Inf Retr 3(1–2): 1–224

    Google Scholar 

  23. Kopidaki S, Papadakos P, Tzitzikas Y (Oct 2009) STC+ and NM-STC: two novel online results clustering methods for web searching. In: Proceedings of the 10th international conference on web information systems engineering (WISE’09)

  24. Krikelas J (1983) Information-seeking behavior: patterns and concepts. Drexel Libr Q 2: 5–20

    Google Scholar 

  25. Kules B, Capra R (2008a) Creating exploratory tasks for a faceted search interface. In: Proceedings of the 2nd workshop on human-computer interaction (HCIR’08)

  26. Kules B, Capra R (2008b) Creating exploratory tasks for a faceted search interface. In: workshop on computer interaction and information retrieval, HCIR 2008, pp 18–21

  27. Kules B, Kustanowitz J, Shneiderman B (June 2006) Categorizing web search results into meaningful and stable categories using fast-feature techniques. In: Proceedings of the 6th ACM/IEEE-CS joint conference on digital libraries, (JCDL’06), Chapel Hill, NC, USA, pp 210–219. ISBN 1-59593-354-9

  28. Kules B, Wilson M, Schraefel M, Shneiderman B (2008) From keyword search to exploration: how result visualization aids discovery on the web. Human-Computer Interaction Lab Technical Report HCIL-2008-06, University of Maryland

  29. Kules B, Capra R, Banta M, Sierra T (2009) What do exploratory searchers look at in a faceted search interface? In: JCDL, pp 313–322

  30. Lindgaard G, Chattratichart J (2007) Usability testing: what have we overlooked? In: Proceedings of the SIGCHI conference on human factors in computing systems (CHI’07). ACM, New York, NY, USA. pp 1415–1424

  31. Mäkelä E, Viljanen K, Lindgren P, Laukkanen M, Hyvönen E (2005) Semantic yellow page service discovery: the veturi portal. Poster paper at international semantic web conference (ISWC’05), Galway, Ireland

  32. Mäkelä E, Hyvönen E, Saarela S (Nov 2006) Ontogator—a semantic view-based search engine service for web applications. In: Proceedings of international semantic web conference (ISWC’06), Athens, GA, USA, pp 847–860

  33. Malik H, Fradkin D, Moerchen F (2010) Single pass text classification by direct feature weighting. Knowl Inf Syst, pp 1–20. ISSN 0219-1377

  34. Papadakos P, Theoharis Y, Marketakis Y, Armenatzoglou N, Tzitzikas Y (Aug 2008) Mitos: design and evaluation of a dbms-based web search engine. In: Proceedings of the 12th Pan-Hellenic conference on informatics (PCI’08), Greece

  35. Papadakos P, Kopidaki S, Armenatzoglou N, Tzitzikas Y (Sept 2009) Exploratory web searching with dynamic taxonomies and results clustering. In: Proceedings of the 13th European conference on digital libraries (ECDL’09)

  36. Petrelli D (2008) On the role of user-centered evaluation in the advancement of interactive information retrieval. Inf Process Manage 44(1):22–38. ISSN 0306–4573

    Google Scholar 

  37. Sacco G (2006) Some research results in dynamic taxonomy and faceted search systems. In: SIGIR’2006 workshop on faceted search

  38. Sacco GM, Tzitzikas Y (eds)(2009) Dynamic taxonomies and faceted search: theory, practise and experience. Springer. ISBN 978-3-642-02358-3

  39. Schraefel MC, Karam M, Zhao S (Aug 2003) mSpace: interaction design for user-determined, adaptable domain exploration in hypermedia. In: Proceedings of workshop on adaptive hypermedia and adaptive web based systems, Nottingham, UK, pp 217–235

  40. Stefanowski J, Weiss D (May 2003) Carrot2 and language properties in web search results clustering. In: Proceedings of the international atlantic web intelligence Conference (AWIC’03). Springer, Madrid, Spain

  41. Tzitzikas Y, Armenatzoglou N, Papadakos P (2008) FleXplorer: a framework for providing faceted and dynamic taxonomy-based information exploration. In: 19th international workshop on database and expert systems applications (FIND’08 at DEXA’08), Torino, Italy, pp 392–396

  42. Wang J, Mo Y, Huang B, Wen J, He L (2008) Web search results clustering based on a novel suffix tree structure. In: Proceedings of 5th international conference on autonomic and trusted computing (ATC’08), vol 5060, Oslo, Norway, pp 540–554

  43. Wang P, Hu J, Zeng H, Chen Z (2009) Using wikipedia knowledge to improve text classification. Knowl Inf Syst 19(3): 265–281

    Article  Google Scholar 

  44. White RW, Drucker SM, Marchionini G, Hearst MA, Schraefel MC (2007) Exploratory search and hci: designing and evaluating interfaces to support exploratory search interaction. In: Rosson MB, Gilmore DJ (eds) CHI extended abstracts. ACM, New York, pp 2877–2880

    Chapter  Google Scholar 

  45. Wilson ML, Schraefel MC (Apr 2007) Bridging the Gap: using IR models for evaluating exploratory search interfaces. In: Workshop on exploratory search and HCI (SIGCHI’2007). ACM

  46. Xing D, Xue G, Yang Q, Yu Y (Feb 2008) Deep classifier: automatically categorizing search results into large-scale hierarchies. In: Proceedings of the international conference on web search and web data mining (WSDM’08), Palo Alto, California, USA, pp 139–148

  47. Yee K, Swearingen K, Li K, Hearst M (Apr 2003a) Faceted metadata for image search and browsing. In: Proceedings of the conference on human factors in computing systems (CHI’03), Ft. Lauderdale, Florida, USA, pp 401–408

  48. Yee K, Swearingen K, Li K, Hearst M (2003b) Faceted metadata for image search and browsing. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 401–408

  49. Zamir O, Etzioni O (Aug 1998) Web document clustering: a feasibility demonstration. In: Proceedings of the 21th annual international ACM conference on research and development in information retrieval (SIGIR’98), Melbourne, Australia, pp 46–54

  50. Zeng H, He Q, Chen Z, Ma W, Ma J (July 2004) Learning to cluster web search results. In: Proceedings of the 27th annual international conference on research and development in information retrieval (SIGIR’04), Sheffield, UK, pp 210–217

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yannis Tzitzikas.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Papadakos, P., Armenatzoglou, N., Kopidaki, S. et al. On exploiting static and dynamically mined metadata for exploratory web searching. Knowl Inf Syst 30, 493–525 (2012). https://doi.org/10.1007/s10115-011-0388-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-011-0388-2

Keywords

Navigation