Skip to main content
Log in

Answering form-based web queries using the data-mining approach

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Web users often post queries through form-based interfaces on the Web to retrieve data from the Web; however, answers to these queries are mostly computed according to keywords entered into different fields specified in a query interface, and their precision and recall could be low. The precision and recall ratios in answering this type of query can be improved by considering closely related previous queries submitted through the same interface, along with their answers. In this paper, we present an approach for enhancing the retrieval of relevant answers to a form-based Web query by adopting the data-mining approach using previous, relevant queries and their answers. Experimental results on a randomly selected set of 3,800 documents retrieved from various Web sites show that our data-mining, query-rewriting approach achieves average precision and true positive ratios on rewritten queries in the upper 80% range, whereas the average false positive ratio is less than 2.0%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abiteboul, S., Segoufin, L., & Vianu, V. (2001). Representing and querying XML with incomplete information. In Proceedings of the ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, Santa Barbara, CA (pp. 40–50). New York: ACM.

    Google Scholar 

  • Afrati, F., Li, C., & Ullman, J. D. (2001). Generating efficient plans for queries using views. In Proceedings of the 2001 ACM SIGMOD international conference on management of data, Santa Barbara, CA (pp. 319–330). New York: ACM. (May)

    Chapter  Google Scholar 

  • Baeza-Yates, R. & Ribeiro-Neto, B. (1999). Modern information retrieval. New York: Addison-Wesley.

    Google Scholar 

  • Berry, M. W., Dumais, S. T., & O’Brien, G. W. (1995). Using linear algebra for intelligent information retrieval. SIAM Review, 37(4), 573–595.

    Article  MATH  MathSciNet  Google Scholar 

  • Blair, D. C., & Maron, M. E. (1985). An evaluation of retrieval effectiveness for a full-text document-retrieval system. Communications of the ACM, 28(3), 280–299.

    Article  Google Scholar 

  • Blockeel, H., & Raedt, L. D. (1998). Top–down induction of first-order logical decision trees. Artificial Intelligence, 101, 1–2.

    Article  MathSciNet  Google Scholar 

  • Calvanese, D., De Giacomo, G., Lenzerini, M., & Vardi, M. (2000). View-based query processing for regular path queries with inverse. In Proceedings of the ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, Dallas, TX (pp. 58–66). New York: ACM.

    Google Scholar 

  • Chaudhuri, S., Krishnamurthy, R., Potamianos, S., & Shim, K. (1995). Optimizing queries with materialized views. In Proceedings of the 11th international conference on data engineering (ICDE’95), Taipei, Taiwan (pp. 190–200). Washington, DC: IEEE Computer Society.

    Google Scholar 

  • Fernandez, M., Suciu, D., & Tan, W.C. (2000). SilkRoute: Trading between relations and XML. In Proceedings of the 9th international conference on World Wide Web, Amsterdam, The Netherlands.

  • Han, J. & Kamber, M. (2001). Data mining: Concepts and techniques. San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • Konopnicki, D., & Shmueli, O. (1995). W3QS: A query system for the world-wide web. In Proceedings of the 21st international conference on very large data bases, Zurich, Switzerland (pp. 54–65). New York: ACM.

    Google Scholar 

  • Korfhage, R.R. (1997). Information storage and retrieval. New York: Wiley.

    Google Scholar 

  • Lakshmanan, L.V.S., Sadri, F., & Subramanian, I.N. (1996). A declarative language for querying and restructuring the web. In Post-ICDE IEEE workshop on research issues in data engineering, New Orleans, LA, February 1996 (p. 12). Washington, DC: IEEE Computer Society.

    Google Scholar 

  • Lam-Adesian, A.M., & Jones, G. (2001). Applying summarization techniques for term selection in relevance feedback. In Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, New Orleans, LA (pp. 1–9). New York: ACM.

    Chapter  Google Scholar 

  • Last, M., Shapira, B., Elovici, Y., Zaafrany, O., & Kandel, A. (2003). Content-based methodology for anomaly detection on the web. In Proceedings of atlantic web intelligence conference (AWIC’03): Advances in web intelligence, Lecture Notes in Artificial Intelligence, vol. 2663 (pp. 113–123). Berlin Heidelberg New York: Springer. (May)

    Google Scholar 

  • Levy, A.Y., Mendelzon, A.O., Sagiv, Y., & Sivastava, D. (1995). Answering queries using views. In Proceedings of the ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (pp. 95–104). New York: ACM.

    Google Scholar 

  • Levy, A.Y., Rajaraman, A., & Ordille, J. (1996). Querying heterogeneous information sources using source descriptions. In Proceedings of the 22nd international conference on very large data bases (pp. 251–262).

  • Liu, S., Liu, F., Yu, C., & Meng, W. (2004). An effective approach to document retrieval via utilizing WordNet and recognizing phrases. In Proceedings of the 27th annual international ACM SIGIR conference (pp. 266–272). New York: ACM.

    Google Scholar 

  • Martin, J. & Hirschberg, D. (1996). The complexity of learning decision trees. In Proceedings of the international symposium on artificial intelligence & mathematics, Fort Lauderdale, FL (pp. 112–115).

  • Mendelzon, A. O., & Milo, T. (1997). Formal models of web queries. In Proceedings of the ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, Tucson, AZ (pp. 134–143). New York: ACM. (May)

    Google Scholar 

  • Mitchell, T. M. (1997). Machine learning. New York: McGraw-Hill.

    MATH  Google Scholar 

  • Popa, L., Deutsch, A., Sahuguet, A., & tannen, V. (2000). A chase too far?. In Proceedings of the 2000 ACM SIGMOD international conference on management of data, Dallas, TX (pp. 273–284). New York: ACM.

    Chapter  Google Scholar 

  • Pottinger, R., & Levy, A.Y. (2000). A scalable algorithm for answering queries using views. In Proceedings of the 26th international conference on very large data bases, Cairo, Egypt (pp. 484–495). San Francisco, CA: Morgan-Kaufmann.

    Google Scholar 

  • Sequeira, K., & Zaki, M. (2002). ADMIT: Anomaly-based data mining for intrusions. In Proceedings of the eight ACM SIGKDD international conference on knowledge discovery and data mining, Alberta, Canada (pp. 386–395). New York: ACM.

    Chapter  Google Scholar 

  • Theeramunkong, T. (2004). Applying passage in Web text mining. International Journal of Intelligent Systems, 19(1-2), 149–158.

    Article  Google Scholar 

  • Ullman, J.D. (1997). Information integration using logical views. In Proceedings of the international conference on database theory (pp. 19–40).

  • Yerra, R., & Ng, Y. -K. (2005). Detecting similar HTML documents using a fuzzy set information retrieval approach. In Proceedings of the IEEE international conference on granular computing (IEEE GrC’05), Beijing, China (pp. 693–699). Washington, DC: IEEE Computer Society.

    Google Scholar 

  • Zwillinger, D., Krantz, S.G., & Rosen, K.H. (Eds.) (1996) Standard mathematical tables and formulae (30th edition). Boca Raton, FL: CRC Press.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaochun Yang.

Additional information

Work partially done during a visit to BYU and partially supported by National Natural Science Foundation of China No. 60503036 and Fok YingTong Education Foundation No. 104027.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, X., Ng, YK. Answering form-based web queries using the data-mining approach. J Intell Inf Syst 30, 1–32 (2008). https://doi.org/10.1007/s10844-006-0017-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-006-0017-9

Keywords

Navigation