Skip to main content
Log in

Discovering Document Semantics QBYS: A System for Querying the WWW by Semantics

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This paper describes our research into a query-by-semantics approach to searching the World Wide Web. This research extends existing work, which had focused on a query-by-structure approach for the Web. We present a system that allows users to request documents containing not only specific content information, but also to specify that documents be of a certain type. The system captures and utilizes structure information as well as content during a distributed query of the Web. The system also allows the user the option of creating their own document types by providing the system with example documents. In addition, although the system still gives users the option of dynamically querying the web, the incorporation of a document database has improved the response time involved in the search process. Based on extensive testing and validation presented herein, it is clear that a system that incorporates structure and document semantic information into the query process can significantly improve search results over the standard keyword search.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Altavista. The Internet. http://www.altavista.com

  2. T. Berners-Lee, J. Hendler, and O. Lassila, The Semantic Web, Scientific American, May 2001, pp. 35–43.

  3. J. Boyan, D. Freitag, and T. Joachims, "A machine learning architecture for optimizing web search engines," AAAI-96 Workshop on Internet-Based Information Systems, Portland, OR, 1996, pp. 334–335.

  4. S. Brin and L. Page, "The anatomy of a large-scale hypertextual web search engine," in Proceedings of the 7th International World Wide Web Conference, Brisbane, Australia, 1998, pp. 107–117.

  5. S.F. Chang, J.R. Smith, M. Beigi, and A. Benitez, "Visual information retrieval from large distributed online repositories," Comm. ACM, Vol. 40, No. 12, pp. 63–71, 1997.

    Google Scholar 

  6. C. Cortes and V. Vapnik, "Support vector networks," Machine Learning, Vol. 20, pp. 273–297, 1995.

    Google Scholar 

  7. F. Fotouhi, W. Grosky, and M. Johnson, "CLaP:Asystem to query the web using content, link, and presentation information," in Proceedings of the 14th International Symposium on Computer and Information Sciences, Kusadasi, Turkey, 1999, pp. 214–221.

  8. E. Glover, G. Flake, S. Lawrence, W. Birmingham, A. Kruger, C. Giles, and D. Pennock, "Improving category specific web search by learning query modifications," in Symposium on Applications and the Internet, SAINT San Diego, California, 2001.

  9. Google. The Internet. <http://www.google.com>

  10. M. Hagan, H. Demuth, and M. Beale, Neural Network Design, 1st ed., Boston, MA, 1996.

  11. D. Harman, "Relevance feedback and other query modification techniques," Information Retrieval: Data Structures and Algorithms, Chapter 11, pp. 241–263, 1992.

  12. G. Holmes, A. Donkin, and I. Witten, "Weka: A machine learning workbench," in Proceedings of the 1994 Second Australian and New Zealand Conference on Intelligent Information Systems, Brisbane, Australia, 1994, pp. 357–361.

  13. HotBot. The Internet. http://www.hotbot.com

  14. M. Johnson, F. Fotouhi, and S. Draghici, Query-by-Structure Approach for the Web. Data Mining: Opportunities and Challenges, Chapter 13, pp. 301–322, 2003.

  15. M. J ohnson, F. Fotouhi, and S. Draghici, "A neural network net query-by-structure approach," in 12th International Conference of the Information Resources Management Association, IRMA'01, Toronto, Canada, 2001, pp. 108–111.

  16. D. Konopnicki and O. Shmueli, "W3QS: A query system for the world wide web," in Proceedings of the 21st International Conference on Very Large Data Bases, Zurich, Switzerland, 1995, pp. 11–15.

  17. L.V.S. Lakshmanan, F. Sadri, and I.N. Subramanian, "A declarative language for querying and restructuring the web," in Proceedings of the Sixth International Workshop on Research Issues in Data Engineering, New Orleans, LA, 1996, pp. 12–21.

  18. S. Lu, M. Dong, and F. Fotouhi, "The semantic web: Opportunities and challenges for next-generation web applications," Information Research, 7(4). Special Issue on the Semantic Web, 2002.

  19. S.K. Madria, S.S. Bhowmick, W.K. Ng, and E.P. Lim, "Research issues in web data mining," in Proceedings of DataWarehousing and Knowledge Discovery, First International Conference, DaWaK '99, 1999, pp. 303–312.

  20. J. Mao and A. Jain, "Artificial neural networks for feature extraction and multivariate data projection," IEEE Transactions on Neural Networks, Vol. 6, No. 2, pp. 296–316, 1995.

    Google Scholar 

  21. A.O. Mendelzon, G.A. Mihaila, and T. Milo, "Querying the world wide web," in Proceedings of the International Conference on Parallel and Distributed Information Systems (PDIS'96), Miami, Florida, 1996, pp. 54–67.

  22. M. Naphade and T. Huang, "Extracting semantics from audiovisual content: The final frontier in multimedia retrieval," IEEE Trans. on Neural Networks, Vol. 13, No. 4, pp. 793–809, 2002.

    Google Scholar 

  23. M. Pazzani, J. M uramatsu, and D. Billsus, "Syskill & Webert: Identifying interesting web sites," in Proceedings of the 13th National Conference on Artificial Intelligence, Menlo Park, CA, 1996, pp. 54–61.

  24. The ht://dig Group. The Internet. <http://www.htdig.org>

  25. Yahoo! The Internet. http://www.yahoo.com

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Johnson, M., Fotouhi, F., DrĂghici, S. et al. Discovering Document Semantics QBYS: A System for Querying the WWW by Semantics. Multimedia Tools and Applications 24, 155–188 (2004). https://doi.org/10.1023/B:MTAP.0000036841.99415.44

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:MTAP.0000036841.99415.44

Navigation