Skip to main content

Optimization of Bounded Continuous Search Queries Based on Ranking Distributions

  • Conference paper
Web Information Systems Engineering – WISE 2007 (WISE 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4831))

Included in the following conference series:

Abstract

A common search problem in the World Wide Web concerns finding information if it is not known when the sources of information appear and how long sources will be available on the Web, as e.g. sales offers for products or news reports. Continuous queries are a means to monitor the Web over a specific period of time. Main problems concerning the optimization of such queries are to provide high quality and up-to-date results and to control the amount of information returned by a continuous query engine. In this paper we present a new method to realize such search queries which is based on the extraction of the distribution of ranking values and a new strategy to select relevant data objects in a stream of documents. The new method provides results of significantly higher quality if ranking distributions may be modeled by Gaussian distributions. This is usually the case if a larger number of information sources on the Web and higher quality candidates are considered.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y.: Topic detection and tracking pilot study: Final report. In: Proc. of the DARPA Broadcast News Transcription and Understanding Workshop, pp. 194–218 (1998)

    Google Scholar 

  2. Arampatzis, A., van Hameran, A.: The score-distributional threshold optimization for adaptive binary classification tasks. In: SIGIR 2001. Proc. of ACM SIGIR conf. on Research and development in IR, pp. 285–293. ACM Press, New York, NY, USA (2001)

    Chapter  Google Scholar 

  3. Collins-Thompson, K., Ogilvie, P., Zhang, Y., Callan, J.: Information filtering, novelty detection, and named-page finding. In: TREC 2002, Gaithersburg (2002)

    Google Scholar 

  4. DeGroot, M.H: Optimal Statistical Decisions. Wiley Classics Library (2004)

    Google Scholar 

  5. Gilbert, J.P., Mosteller, F.: Recognizing the maximum of a sequence. Journal of the American Statistical Association 61(313), 35–73 (1966)

    Article  MathSciNet  Google Scholar 

  6. Glasser, K.S., Holzsager, R., Barron, A.: The d choice secretary problem. Comm. Statist. -Sequential Anal. 2(3), 177–199 (1983)

    Article  MATH  MathSciNet  Google Scholar 

  7. Google alert (2006), http://www.googlealert.com

  8. Kadison, R.V.: Strategies in the secretary problem. Expo. Math. 12(2), 125–144 (1994)

    MATH  MathSciNet  Google Scholar 

  9. Kekalainen, J., Jarvelin, K.: Using graded relevance assessments in IR evaluation. J. of the American Society for Information Science and Technology 53(13) (2002)

    Google Scholar 

  10. Kukulenz, D., Ntoulas, A.: Answering bounded continuous search queries in the world wide web. In: Proc. of WWW-07, World Wide Web Conf., ACM Press, Banff, Canada (2007)

    Google Scholar 

  11. Liu, L., Pu, C., Tang, W.: Continual queries for internet scale event-driven information delivery. Knowledge and Data Engineering 11(4), 610–628 (1999)

    Article  Google Scholar 

  12. Liu, R.-L., Lin, W.-J.: Adaptive sampling for thresholding in document filtering and classification. Inf. Process. Manage. 41(4), 745–758 (2005)

    Article  MathSciNet  Google Scholar 

  13. Windows live alerts (2006), http://alerts.live.com/Alerts/Default.aspx

  14. Praeter, J.: On multiple choice secretary problems. Mathematics of Operations Research 19(3), 597–602 (1994)

    Article  MathSciNet  Google Scholar 

  15. Salton, G., Buckle, C.: Term-weighting approaches in automatic text retrieval. Information Processing Management 24(5), 513–523 (1988)

    Article  Google Scholar 

  16. Shiryaev, A., Peskir, G.: Optimal Stopping and Free-Boundary Problems (Lectures in Mathematics. ETH Zürich). Birkhauser (2006)

    Google Scholar 

  17. Stewart, T.J.: Optimal selection from a random sequence with learning of the underlying distribution. Journal of the American Statistical Association 73(364) (1978)

    Google Scholar 

  18. Text retrieval conf. (TREC) (2006), http://trec.nist.gov/

  19. Yang, Y.: A study on thresholding strategies for text categorization. In: Proc. of SIGIR-2001, Int. Conf. on Research and Development in IR, New Orleans, US, pp. 137–145. ACM Press, New York (2001)

    Google Scholar 

  20. Yang, Y., Pierce, T., Carbonell, J.: A study of retrospective and on-line event detection. In: SIGIR 1998. Proc. of the ACM SIGIR conf. on Research and development in IR, pp. 28–36. ACM Press, New York, NY, USA (1998)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Boualem Benatallah Fabio Casati Dimitrios Georgakopoulos Claudio Bartolini Wasim Sadiq Claude Godart

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kukulenz, D., Hoeller, N., Groppe, S., Linnemann, V. (2007). Optimization of Bounded Continuous Search Queries Based on Ranking Distributions. In: Benatallah, B., Casati, F., Georgakopoulos, D., Bartolini, C., Sadiq, W., Godart, C. (eds) Web Information Systems Engineering – WISE 2007. WISE 2007. Lecture Notes in Computer Science, vol 4831. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76993-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-76993-4_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-76992-7

  • Online ISBN: 978-3-540-76993-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics