Skip to main content

Heading-Aware Snippet Generation for Web Search

  • Conference paper
  • First Online:
Information Retrieval Technology (AIRS 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9460))

Included in the following conference series:

  • 783 Accesses

Abstract

We propose heading-aware methods of generating search result snippets of web pages. A heading is a brief description of the topic of its associated sentences. Some existing methods give priority to sentences containing many words that also appear in headings when selecting sentences to be included in snippets with limited length. However, according to our observation, words in heading are very often omitted from their associated sentences because readers can understand the topic of the sentences by reading their heading. To score sentences considering such omission, our methods count keyword occurrences in their headings as well as in the sentences themselves. Our evaluation result indicated that our methods were effective only for queries with clear intents or containing four or more keywords. To discuss the statistical significance of the result, another evaluation with more queries is needed.

T. Manabe—Research Fellow of Japan Society for the Promotion of Science.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ageev, M., Lagun, D., Agichtein, E.: Towards task-based snippet evaluation: preliminary results and challenges. In: MUBE (SIGIR Workshop), pp. 1–2 (2013)

    Google Scholar 

  2. Amer-Yahia, S., Lalmas, M.: XML search: languages, INEX and Scoring. SIGMOD Rec. 35(4), 16–23 (2006)

    Article  Google Scholar 

  3. Arvola, P., Kekäläinen, J., Junkkari, M.: Contextualization models for XML retrieval. Inf. Process. Manage. 47(5), 762–776 (2011)

    Article  Google Scholar 

  4. Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: SIGIR, pp. 335–336 (1998)

    Google Scholar 

  5. Clarke, C.L.A., Agichtein, E., Dumais, S., White, R.W.: The influence of caption features on clickthrough patterns in web search. In: SIGIR, pp. 135–142 (2007)

    Google Scholar 

  6. Collins-Thompson, K., Macdonald, C., Bennett, P.N., Diaz, F., Voorhees, E.M.: TREC 2014 web track overview. In: TREC (2014)

    Google Scholar 

  7. Kanungo, T., Orr, D.: Predicting the readability of short web summaries. In: WSDM, pp. 202–211 (2009)

    Google Scholar 

  8. Leal Bando, L., Scholer, F., Thom, J.: RMIT at INEX 2011 snippet retrieval track. In: Geva, S., Kamps, J., Schenkel, R. (eds.) INEX 2011. LNCS, vol. 7424, pp. 300–305. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  9. Liang, S.F., Devlin, S., Tait, J.I.: Evaluating web search result summaries. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 96–106. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  10. Lu, W., Robertson, S., MacFarlane, A.: Field-weighted XML retrieval based on BM25. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 161–171. Springer, Heidelberg (2006)

    Google Scholar 

  11. Manabe, T., Tajima, K.: Extracting logical hierarchical structure of HTML documents based on headings. VLDB 8(12), 1606–1617 (2015)

    Google Scholar 

  12. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: ACL, pp. 55–60 (2014)

    Google Scholar 

  13. Pembe, F.C., Güngör, T.: Structure-preserving and query-biased document summarisation for web searching. Online Info. Rev. 33(4), 696–719 (2009)

    Article  Google Scholar 

  14. Porter, M.F.: An algorithm for suffix stripping. In: Readings in information retrieval, pp. 313–316. Morgan Kaufmann Publishers (1997)

    Google Scholar 

  15. Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC-3. In: TREC, pp. 109–126 (1996)

    Google Scholar 

  16. Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple weighted fields. In: CIKM, pp. 42–49 (2004)

    Google Scholar 

  17. Tombros, A., Sanderson, M.: Advantages of query biased summaries in information retrieval. In: SIGIR, pp. 2–10 (1998)

    Google Scholar 

  18. Trappett, M., Geva, S., Trotman, A., Scholer, F., Sanderson, M.: Overview of the INEX 2013 snippet retrieval track. In: CLEF (2013)

    Google Scholar 

  19. Wang, S., Hong, Y., Yang, J.: PKU at INEX 2011 XML snippet track. In: Geva, S., Kamps, J., Schenkel, R. (eds.) INEX 2011. LNCS, vol. 7424, pp. 331–336. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  20. Zhang, L., Zhang, Y., Chen, Y.: Summarizing highly structured documents for effective search interaction. In: SIGIR, pp. 145–154 (2012)

    Google Scholar 

Download references

Acknowledgment

This work was supported by JSPS KAKENHI Grant Number 13J06384, 26540163.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tomohiro Manabe .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Manabe, T., Tajima, K. (2015). Heading-Aware Snippet Generation for Web Search. In: Zuccon, G., Geva, S., Joho, H., Scholer, F., Sun, A., Zhang, P. (eds) Information Retrieval Technology. AIRS 2015. Lecture Notes in Computer Science(), vol 9460. Springer, Cham. https://doi.org/10.1007/978-3-319-28940-3_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-28940-3_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-28939-7

  • Online ISBN: 978-3-319-28940-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics