Skip to main content
Log in

Answering unique topic queries with dynamic threshold

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Queries with threshold are common when dealing with unstructured data such as text corpus. It often requires several exploring attempts for users to achieve final results. In this work, we propose an automatic sampling method for threshold determination without any interaction with users, in which two optimizing algorithms are introduced to reach the lower-bound time complexity in each sampling trial. We evaluate our methods using several experiments and demonstrate the effectiveness of it, which can be an enormously powerful tool for ordinary users.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9

Similar content being viewed by others

References

  1. Bentley, J.: Programming pearls: algorithm design techniques. Commun. ACM 27(9), 865–873 (1984)

    Article  Google Scholar 

  2. Bentley, J.: Programming pearls: perspective on performance. Commun. ACM 27(9), 1087–1092 (1984)

    Article  Google Scholar 

  3. Blei, D. M., Ng, A. Y., Jordan, M. I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  4. Cetintemel, U., Cherniack, M., DeBrabant, J., Diao, Y., Dimitriadou, K., Kalinin, A., Papaemmanouil, O., Zdonik, S. B.: Query steering for interactive data exploration. In: CIDR (2013)

  5. Cheng, R., Kalashnikov, D. V., Prabhakar, S.: Querying imprecise data in moving object environments. IEEE TKDE 16(9), 1112–1127 (2004)

    Google Scholar 

  6. Diao, Y., Dimitriadou, K., Li, Z., Liu, W., Papaemmanouil, O., Peng, K., Peng, L.: Aide: an automatic user navigation system for interactive data exploration. PVLDB 8(12), 1964–1967 (2015)

    Google Scholar 

  7. Dimitriadou, K., Papaemmanouil, O., Diao, Y.: Explore-by-example: an automatic query steering framework for interactive data exploration. In: SIGMOD (2014)

  8. Dimitriadou, K., Papaemmanouil, O., Diao, Y.: Aide: an active learning-based approach for interactive data exploration. IEEE Trans. Knowl. Data Eng. 28(11), 2842–2856 (2016)

    Article  Google Scholar 

  9. Drosou, M., Pitoura, E.: Ymaldb: exploring relational databases via result-driven recommendations. VLDB J. 22(6), 849–874 (2013)

    Article  Google Scholar 

  10. Fukuda, T., Morimoto, Y., Morishita, S., Tokuyama, T.: Mining optimized association rules for numeric attributes. In: Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp 182–191. ACM (1996)

  11. Fung, G. P. C., Yu, J. X., Yu, P. S., Lu, H.: Parameter free bursty events detection in text streams. In: VLDB (2005)

  12. Griffiths, T. L., Steyvers, M.: A probabilistic approach to semantic representation. In: Proceedings of the 24th Annual Conference of the Cognitive Science Society, pp 381–386. Citeseer (2002)

  13. Griffiths, T. L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101(suppl 1), 5228–5235 (2004)

    Article  Google Scholar 

  14. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 50–57. ACM (1999)

  15. Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1), 177–196 (2001)

    Article  MATH  Google Scholar 

  16. Jiang, L., Nandi, A.: Snaptoquery: providing interactive feedback during exploratory query specification. PVLDB 8(11), 1250–1261 (2015)

    Google Scholar 

  17. Joglekar, M., Garcia-Molina, H., Parameswaran, A.: Interactive data exploration with smart drill-down. arXiv:1412.0364 (2014)

  18. Kahng, M., Navathe, S. B., Stasko, J. T., Chau, D. H.: Interactive browsing and navigation in relational databases. arXiv:1603.02371 (2016)

  19. Kamat, N., Jayachandran, P., Tunga, K., Nandi, A.: Distributed and interactive cube exploration. In: ICDE (2014)

  20. Kleinberg, J.: Bursty and hierarchical structure in streams. Data Min. Knowl. Disc. 7(4), 373–397 (2003)

    Article  MathSciNet  Google Scholar 

  21. Lappas, T., Arai, B., Platakis, M., Kotsakos, D., Gunopulos, D.: On burstiness-aware search for document sequences. In: SIGKDD (2009)

  22. Qarabaqi, B., Riedewald, M.: User-driven refinement of imprecise queries. In: ICDE (2014)

  23. Sellam, T., Kersten, M.: Fast, explainable view detection to characterize exploration queries. In: Proceedings of the 28th International Conference on Scientific and Statistical Database Management, p 20. ACM (2016)

  24. Sellam, T., Kersten, M., et al.: Meet charles, big data query advisor. In: CIDR (2013)

  25. Sellam, T., Müller, E., Kersten, M.: Semi-automated exploration of data warehouses. In: CIKM (2015)

  26. Smith, D. R.: Applications of a strategy for designing divide-and-conquer algorithms. Sci. Comput. Program. 8(3), 213–229 (1987)

    Article  MATH  Google Scholar 

  27. Soliman, M. A., Ilyas, I. F., Chang, K. C. -C.: Top-k query processing in uncertain databases. In: ICDE (2007)

  28. Tukey, J.: Exploratory data analysis. Addison-Wesley, Reading, Mass., (1977)

  29. Vartak, M., Rahman, S., Madden, S., Parameswaran, A., Polyzotis, N.: SeeDB: efficient data-driven visualization recommendations to support visual analytics. Proceedings of the VLDB Endowment 8(13), 2182–2193 (2015)

    Article  Google Scholar 

  30. Yang, Z., Ma, H., He, Z., Wang, X. S.: Finding maximal ranges with unique topics in a text database. World Wide Web 1–22 (2017). https://doi.org/10.1007/s11280-017-0448-y

Download references

Acknowledgment

This work is supported by NSFC(No. 61732004, 61370080) and the Shanghai Innovation Action Project (No. 16DZ1100200).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yinan Jing, Zhenying He or X. Sean Wang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, H., Yang, Z., Jing, Y. et al. Answering unique topic queries with dynamic threshold. World Wide Web 22, 39–58 (2019). https://doi.org/10.1007/s11280-018-0528-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-018-0528-7

Keywords

Navigation