Abstract
We propose heading-aware methods of generating search result snippets of web pages. A heading is a brief description of the topic of its associated sentences. Some existing methods give priority to sentences containing many words that also appear in headings when selecting sentences to be included in snippets with limited length. However, according to our observation, words in heading are very often omitted from their associated sentences because readers can understand the topic of the sentences by reading their heading. To score sentences considering such omission, our methods count keyword occurrences in their headings as well as in the sentences themselves. Our evaluation result indicated that our methods were effective only for queries with clear intents or containing four or more keywords. To discuss the statistical significance of the result, another evaluation with more queries is needed.
T. Manabe—Research Fellow of Japan Society for the Promotion of Science.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ageev, M., Lagun, D., Agichtein, E.: Towards task-based snippet evaluation: preliminary results and challenges. In: MUBE (SIGIR Workshop), pp. 1–2 (2013)
Amer-Yahia, S., Lalmas, M.: XML search: languages, INEX and Scoring. SIGMOD Rec. 35(4), 16–23 (2006)
Arvola, P., Kekäläinen, J., Junkkari, M.: Contextualization models for XML retrieval. Inf. Process. Manage. 47(5), 762–776 (2011)
Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: SIGIR, pp. 335–336 (1998)
Clarke, C.L.A., Agichtein, E., Dumais, S., White, R.W.: The influence of caption features on clickthrough patterns in web search. In: SIGIR, pp. 135–142 (2007)
Collins-Thompson, K., Macdonald, C., Bennett, P.N., Diaz, F., Voorhees, E.M.: TREC 2014 web track overview. In: TREC (2014)
Kanungo, T., Orr, D.: Predicting the readability of short web summaries. In: WSDM, pp. 202–211 (2009)
Leal Bando, L., Scholer, F., Thom, J.: RMIT at INEX 2011 snippet retrieval track. In: Geva, S., Kamps, J., Schenkel, R. (eds.) INEX 2011. LNCS, vol. 7424, pp. 300–305. Springer, Heidelberg (2012)
Liang, S.F., Devlin, S., Tait, J.I.: Evaluating web search result summaries. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 96–106. Springer, Heidelberg (2006)
Lu, W., Robertson, S., MacFarlane, A.: Field-weighted XML retrieval based on BM25. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 161–171. Springer, Heidelberg (2006)
Manabe, T., Tajima, K.: Extracting logical hierarchical structure of HTML documents based on headings. VLDB 8(12), 1606–1617 (2015)
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: ACL, pp. 55–60 (2014)
Pembe, F.C., Güngör, T.: Structure-preserving and query-biased document summarisation for web searching. Online Info. Rev. 33(4), 696–719 (2009)
Porter, M.F.: An algorithm for suffix stripping. In: Readings in information retrieval, pp. 313–316. Morgan Kaufmann Publishers (1997)
Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC-3. In: TREC, pp. 109–126 (1996)
Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple weighted fields. In: CIKM, pp. 42–49 (2004)
Tombros, A., Sanderson, M.: Advantages of query biased summaries in information retrieval. In: SIGIR, pp. 2–10 (1998)
Trappett, M., Geva, S., Trotman, A., Scholer, F., Sanderson, M.: Overview of the INEX 2013 snippet retrieval track. In: CLEF (2013)
Wang, S., Hong, Y., Yang, J.: PKU at INEX 2011 XML snippet track. In: Geva, S., Kamps, J., Schenkel, R. (eds.) INEX 2011. LNCS, vol. 7424, pp. 331–336. Springer, Heidelberg (2012)
Zhang, L., Zhang, Y., Chen, Y.: Summarizing highly structured documents for effective search interaction. In: SIGIR, pp. 145–154 (2012)
Acknowledgment
This work was supported by JSPS KAKENHI Grant Number 13J06384, 26540163.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Manabe, T., Tajima, K. (2015). Heading-Aware Snippet Generation for Web Search. In: Zuccon, G., Geva, S., Joho, H., Scholer, F., Sun, A., Zhang, P. (eds) Information Retrieval Technology. AIRS 2015. Lecture Notes in Computer Science(), vol 9460. Springer, Cham. https://doi.org/10.1007/978-3-319-28940-3_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-28940-3_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28939-7
Online ISBN: 978-3-319-28940-3
eBook Packages: Computer ScienceComputer Science (R0)