Skip to main content
Book cover

OOIS 2000 pp 349–357Cite as

A Probabilistic Model for Classification of Multiple-Record Web Documents

  • Conference paper
  • 82 Accesses

Abstract

The amount of information available on the World Wide Web, which appear in various Web documents, have been increasing dramatically in recent years. Classification of Web documents is becoming a more significant method for organizing such information. In this paper, we adopt a probabilistic model to classify Web documents into relevant documents and irrelevant documents with respect to an application ontology. Our model is based on the multivariant statistical analysis and is different from the conventional probabilistic information retrieval models. The experiments we have conducted using our probabilistic model look promising in terms of classification of multiple-record Web documents.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anderson, T. W. An Introduction to Multivariate Statistical Methods. John Wiley, New York, 1984.

    Google Scholar 

  2. Crestani, F. and van Rijsbergen, C. J. A Study of Probability Kinetmatics in Information Retrieval. ACM Trans. Inf. Syst. 16(3), 225–255, 1998.

    Article  Google Scholar 

  3. Embley, D. W. Object Database Development: Concepts and Principles. Addison Wesley Longman, 1998.

    Google Scholar 

  4. Embley, D. W., Campbell, D. M., Jiang, Y., Liddle, S. W., Lonsdale, D. W., Ng, Y.-K., and Smith, R. D. Conceptual-Model-Based Data Extraction from Multiple-Record Web Pages, Journal of Data and Knowledge Engineering. 31(3), 227–251, 1999.

    Article  MATH  Google Scholar 

  5. Fuhr, N. The Probabilistic Models in Information Retrieval. Comput. J. 35(3), 243–255, June 1992.

    Article  MATH  Google Scholar 

  6. Fuhr, N. and Buckley, C. A Probabilistic Learning Approach for Document Indexing. ACM Trans. Inf. Syst. 9(3), 223–248, 1991.

    Article  Google Scholar 

  7. Gövert, N., Lalmas, M., and Fuhr, N. A Probabilistic Description-Oriented Approach for Categorising Web Documents. Preprint, 1999.

    Google Scholar 

  8. Johnson, R. A. and Wichern, D. W. Applied Multivariate Statistical Analysis. Prentice-Hall Inc., New Jersey, 1998.

    Google Scholar 

  9. Kendall, M. G. Multivariate Analysis. Hafner Press, New York, 1975.

    MATH  Google Scholar 

  10. Salton, G. and McGill, M. J. Introduction to Modern Information Retrieval, McGraw-Hill, New York, 1983.

    MATH  Google Scholar 

  11. van Rijsbergen, C. J. Information Retrieval. Butterworths, London, U. K., 1979.

    Google Scholar 

  12. Wong, S. K. M. and Yao, Y. Y.] On Modeling Information Retrieval with Probabilistic Inference. ACM Trans. Inf. Syst. 13(1), 38–68, 1995.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag London Limited

About this paper

Cite this paper

Tang, J., Ng, YK. (2001). A Probabilistic Model for Classification of Multiple-Record Web Documents. In: Patel, D., Choudhury, I., Patel, S., de Cesare, S. (eds) OOIS 2000. Springer, London. https://doi.org/10.1007/978-1-4471-0299-1_29

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-0299-1_29

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-85233-420-8

  • Online ISBN: 978-1-4471-0299-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics