Abstract
The abundance and ubiquity of graphs (e.g., online social networks such as Google\(+\) and Facebook; bibliographic graphs such as DBLP) necessitates the effective and efficient search over them. Given a set of keywords that can identify a data subject (DS), a recently proposed keyword search paradigm produces a set of object summaries (OSs) as results. An OS is a tree structure rooted at the DS node (i.e., a node containing the keywords) with surrounding nodes that summarize all data held on the graph about the DS. OS snippets, denoted as size-l OSs, have also been investigated. A size-l OS is a partial OS containing l nodes such that the summation of their importance scores results in the maximum possible total score. However, the set of nodes that maximize the total importance score may result in an uninformative size-l OSs, as very important nodes may be repeated in it, dominating other representative information. In view of this limitation, in this paper, we investigate the effective and efficient generation of two novel types of OS snippets, i.e., diverse and proportional size-l OSs, denoted as DSize-l and PSize-l OSs. Namely, besides the importance of each node, we also consider its pairwise relevance (similarity) to the other nodes in the OS and the snippet. We conduct an extensive evaluation on two real graphs (DBLP and Google\(+\)). We verify effectiveness by collecting user feedback, e.g., by asking DBLP authors (i.e., the DSs themselves) to evaluate our results. In addition, we verify the efficiency of our algorithms and evaluate the quality of the snippets that they produce.
Similar content being viewed by others
References
Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: WSDM, pp. 5–14 (2009)
Albert, A., Koudas, N.: Efficient diversity-aware search. In: SIGMOD, pp. 781–792 (2011)
Balmin, A., Hristidis, V., Papakonstantinou, Y.: Objectrank: authority-based keyword search in databases. In: VLDB, pp. 564–575 (2004)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: WWW, pp. 107–117 (1998)
Carbonell, J., Goldstein, J.: The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: SIGIR, pp. 335–336 (1998)
Cheng, G., Tran, T., Qu, Y.: Relin: Relatedness and informativeness-based centrality for entity summarization. In: The Semantic Web-ISWC, pp. 114–129 (2011)
Cheng, S., Arvanitis, A., Chrobak, M., Hristidis, V.: Multi-query diversification in microblogging posts. In: EDBT, pp. 133–144 (2014)
Dang, V., Croft, W.B.: Diversity by proportionality: an election-based approach to search result diversification. In: SIGIR, pp. 65–74 (2012)
Dimitriou, A., Theodoratos, D., Sellis, T.: Top-\(k\)-size keyword search on tree structured data. Inf. Syst. 47, 178–193 (2015)
Drosou, M., Pitoura, E.: Disc diversity: result diversification based on dissimilarity and coverage. PVLDB 6(1), 13–24 (2012)
Drosou, M., Pitoura, E.: The disc diversity model. In: EDBT/ICDT Workshops, pp. 173–175 (2014)
Fakas, G.J.: Automated generation of object summaries from relational databases: a novel keyword searching paradigm. In: DBRank, ICDE, pp. 564 – 567 (2008)
Fakas, G.J.: A novel keyword search paradigm in relational databases: object summaries. DKE 70(2), 208–229 (2011)
Fakas, G.J., Cai, Z.: Ranking of object summaries. In: DBRank ’08, ICDE, pp. 1580–1583 (2009)
Fakas, G.J., Cai, Z., Mamoulis, N.: Size-\(l\) object summaries for relational keyword search. PVLDB 5(3), 229–240 (2011)
Fakas, G.J., Cai, Z., Mamoulis, N.: Versatile size-\(l\) object summaries for relational keyword search. TKDE 26(4), 1026–1038 (2014)
Fakas, G.J., Cai, Z., Mamoulis, N.: Diverse and proportional size-\(l\) object summaries for keyword search. In: SIGMOD, pp. 363–375 (2015)
Gollapudi, S., Sharma, A.: An axiomatic approach for result diversification. In: WWW, pp. 381–390 (2009)
Hristidis, V., Gravano, L., Papakonstantinou, Y.: Efficient ir-style keyword search over relational databases. In: VLDB, pp. 850–861 (2003)
Hristidis, V., Papakonstantinou, Y.: Discover: Keyword search in relational databases. In: VLDB, pp. 670–681 (2002)
Huang, Y., Liu, Z., Chen, Y.: Query biased snippet generation in xml search. In: SIGMOD, pp. 315–326 (2008)
Kashyap, A., Hristidis, V.: Logrank: Summarizing social activity logs. In: WebDB, pp. 1–6 (2012)
Koutrika, G., Simitsis, A., Ioannidis, Y.: Précis: The essence of a query answer. In: ICDE, pp. 69–79 (2006)
Luo, Y., Lin, X., Wang, W., Zhou, X.: Spark: Top-\(k\) keyword query in relational databases. In: SIGMOD, pp. 115–126 (2007)
Simitsis, A., Koutrika, G., Ioannidis, Y.: Précis: from unstructured keywords as queries to structured databases as answers. The VLDB Journal 17(1), 117–149 (2008)
Sydow, M., Pikula, M., Schenkel, R.: The notion of diversity in graphical entity summarisation on semantic knowledge graphs. J. Intell. Inf. Syst. 10(2), 1–41 (2013)
Turpin, A., Tsegay, Y., Hawking, D., Williams, H.E.: Fast generation of result snippets in web search. In: SIGIR, pp. 127–134 (2007)
Vieira, M.R., Razente, H.L., Barioni, M.C.N., Hadjieleftheriou, M., Srivastava, D., Traina, C., Tsotras, V.J.: On query result diversification. In: ICDE, pp. 1163–1174 (2011)
Wu, L., Wang, Y., Shepherd, J., Zhao, X.: An optimization method for proportionally diversifying search results. Adv. Knowl. Discov. Data Min. 70(2), 390–401 (2013)
Acknowledgments
Georgios Fakas was supported by GRF Grant 617412 from Hong Kong RGC. Zhi Cai was supported by Research Foundation of Beijing Municipal Education Commission Grant KM201610005022 and Natural Science Foundation of China Grant 91546111.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix
Appendix
Rights and permissions
About this article
Cite this article
Fakas, G.J., Cai, Z. & Mamoulis, N. Diverse and proportional size-l object summaries using pairwise relevance. The VLDB Journal 25, 791–816 (2016). https://doi.org/10.1007/s00778-016-0433-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-016-0433-6