Skip to main content
Log in

Diverse and proportional size-l object summaries using pairwise relevance

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

The abundance and ubiquity of graphs (e.g., online social networks such as Google\(+\) and Facebook; bibliographic graphs such as DBLP) necessitates the effective and efficient search over them. Given a set of keywords that can identify a data subject (DS), a recently proposed keyword search paradigm produces a set of object summaries (OSs) as results. An OS is a tree structure rooted at the DS node (i.e., a node containing the keywords) with surrounding nodes that summarize all data held on the graph about the DS. OS snippets, denoted as size-l OSs, have also been investigated. A size-l OS is a partial OS containing l nodes such that the summation of their importance scores results in the maximum possible total score. However, the set of nodes that maximize the total importance score may result in an uninformative size-l OSs, as very important nodes may be repeated in it, dominating other representative information. In view of this limitation, in this paper, we investigate the effective and efficient generation of two novel types of OS snippets, i.e., diverse and proportional size-l OSs, denoted as DSize-l and PSize-l OSs. Namely, besides the importance of each node, we also consider its pairwise relevance (similarity) to the other nodes in the OS and the snippet. We conduct an extensive evaluation on two real graphs (DBLP and Google\(+\)). We verify effectiveness by collecting user feedback, e.g., by asking DBLP authors (i.e., the DSs themselves) to evaluate our results. In addition, we verify the efficiency of our algorithms and evaluate the quality of the snippets that they produce.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. http://snap.stanford.edu/data/egonets-Gplus.html.

  2. www.wikipedia.org/wiki/Usability.

References

  1. Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: WSDM, pp. 5–14 (2009)

  2. Albert, A., Koudas, N.: Efficient diversity-aware search. In: SIGMOD, pp. 781–792 (2011)

  3. Balmin, A., Hristidis, V., Papakonstantinou, Y.: Objectrank: authority-based keyword search in databases. In: VLDB, pp. 564–575 (2004)

  4. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: WWW, pp. 107–117 (1998)

  5. Carbonell, J., Goldstein, J.: The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: SIGIR, pp. 335–336 (1998)

  6. Cheng, G., Tran, T., Qu, Y.: Relin: Relatedness and informativeness-based centrality for entity summarization. In: The Semantic Web-ISWC, pp. 114–129 (2011)

  7. Cheng, S., Arvanitis, A., Chrobak, M., Hristidis, V.: Multi-query diversification in microblogging posts. In: EDBT, pp. 133–144 (2014)

  8. Dang, V., Croft, W.B.: Diversity by proportionality: an election-based approach to search result diversification. In: SIGIR, pp. 65–74 (2012)

  9. Dimitriou, A., Theodoratos, D., Sellis, T.: Top-\(k\)-size keyword search on tree structured data. Inf. Syst. 47, 178–193 (2015)

  10. Drosou, M., Pitoura, E.: Disc diversity: result diversification based on dissimilarity and coverage. PVLDB 6(1), 13–24 (2012)

    MathSciNet  Google Scholar 

  11. Drosou, M., Pitoura, E.: The disc diversity model. In: EDBT/ICDT Workshops, pp. 173–175 (2014)

  12. Fakas, G.J.: Automated generation of object summaries from relational databases: a novel keyword searching paradigm. In: DBRank, ICDE, pp. 564 – 567 (2008)

  13. Fakas, G.J.: A novel keyword search paradigm in relational databases: object summaries. DKE 70(2), 208–229 (2011)

    Article  Google Scholar 

  14. Fakas, G.J., Cai, Z.: Ranking of object summaries. In: DBRank ’08, ICDE, pp. 1580–1583 (2009)

  15. Fakas, G.J., Cai, Z., Mamoulis, N.: Size-\(l\) object summaries for relational keyword search. PVLDB 5(3), 229–240 (2011)

    Google Scholar 

  16. Fakas, G.J., Cai, Z., Mamoulis, N.: Versatile size-\(l\) object summaries for relational keyword search. TKDE 26(4), 1026–1038 (2014)

    Google Scholar 

  17. Fakas, G.J., Cai, Z., Mamoulis, N.: Diverse and proportional size-\(l\) object summaries for keyword search. In: SIGMOD, pp. 363–375 (2015)

  18. Gollapudi, S., Sharma, A.: An axiomatic approach for result diversification. In: WWW, pp. 381–390 (2009)

  19. Hristidis, V., Gravano, L., Papakonstantinou, Y.: Efficient ir-style keyword search over relational databases. In: VLDB, pp. 850–861 (2003)

  20. Hristidis, V., Papakonstantinou, Y.: Discover: Keyword search in relational databases. In: VLDB, pp. 670–681 (2002)

  21. Huang, Y., Liu, Z., Chen, Y.: Query biased snippet generation in xml search. In: SIGMOD, pp. 315–326 (2008)

  22. Kashyap, A., Hristidis, V.: Logrank: Summarizing social activity logs. In: WebDB, pp. 1–6 (2012)

  23. Koutrika, G., Simitsis, A., Ioannidis, Y.: Précis: The essence of a query answer. In: ICDE, pp. 69–79 (2006)

  24. Luo, Y., Lin, X., Wang, W., Zhou, X.: Spark: Top-\(k\) keyword query in relational databases. In: SIGMOD, pp. 115–126 (2007)

  25. Simitsis, A., Koutrika, G., Ioannidis, Y.: Précis: from unstructured keywords as queries to structured databases as answers. The VLDB Journal 17(1), 117–149 (2008)

    Article  Google Scholar 

  26. Sydow, M., Pikula, M., Schenkel, R.: The notion of diversity in graphical entity summarisation on semantic knowledge graphs. J. Intell. Inf. Syst. 10(2), 1–41 (2013)

    Google Scholar 

  27. Turpin, A., Tsegay, Y., Hawking, D., Williams, H.E.: Fast generation of result snippets in web search. In: SIGIR, pp. 127–134 (2007)

  28. Vieira, M.R., Razente, H.L., Barioni, M.C.N., Hadjieleftheriou, M., Srivastava, D., Traina, C., Tsotras, V.J.: On query result diversification. In: ICDE, pp. 1163–1174 (2011)

  29. Wu, L., Wang, Y., Shepherd, J., Zhao, X.: An optimization method for proportionally diversifying search results. Adv. Knowl. Discov. Data Min. 70(2), 390–401 (2013)

    Article  Google Scholar 

Download references

Acknowledgments

Georgios Fakas was supported by GRF Grant 617412 from Hong Kong RGC. Zhi Cai was supported by Research Foundation of Beijing Municipal Education Commission Grant KM201610005022 and Natural Science Foundation of China Grant 91546111.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhi Cai.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1151 KB)

Appendix

Appendix

figure h
Fig. 17
figure 17

The G \(^{\mathrm{A}}\)s for the DBLP and Google\(+\) datasets. a The DBLP G \(^{\mathrm{A}}\). b The Google\(+\) G \(^{\mathrm{A}}\)

Fig. 18
figure 18

The Google\(+\) User \(G^{\mathrm{DS}}\) (relevance type)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fakas, G.J., Cai, Z. & Mamoulis, N. Diverse and proportional size-l object summaries using pairwise relevance. The VLDB Journal 25, 791–816 (2016). https://doi.org/10.1007/s00778-016-0433-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-016-0433-6

Keywords

Navigation