Skip to main content

Determining the Semantic Orientation of Web-Based Corpora

  • Conference paper
Intelligent Data Engineering and Automated Learning (IDEAL 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2690))

Abstract

The Web media monitoring methodology underlying this paper provides linguistic descriptives by automatically mirroring, processing and comparing large samples of Web-based corpora. Since May 1999, the database of the webLyzard project has continually been extended and now comprises more than 3,700 sites, which are being monitored in monthly intervals. The wealth of information contained in these sites is converted into aggregated representations through structural and textual analysis. Based on word frequencies and distance measures, perceptual maps and the semantic orientation of Web-based corpora towards particular concepts are computed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bauer, C., Scharl, A.: Quantitative Evaluation of Web Site Content and Structure. Internet Research: Networking Applications and Policy 10, 31–43 (2000)

    Article  Google Scholar 

  2. Scharl, A.: Evolutionary Web Development. Springer, London (2000)

    MATH  Google Scholar 

  3. Scharl, A., Bauer, C.: Explorative Analysis and Evaluation of Commercial Web Information Systems. In: Proc. 20th International Conference on Information Systems, pp. 534–539 (1999)

    Google Scholar 

  4. Chang, G., Healey, M.J., McHugh, J.A.M., Wang, J.T.L.: Mining the World Wide Web - An Information Search Approach. Kluwer Academic Publishers, Norwell (2001)

    MATH  Google Scholar 

  5. Mena, J.: Data Mining Your Website. Digital Press, Boston (1999)

    Google Scholar 

  6. Turban, E., Aronson, J.E.: Decision Support Systems and Intelligent Systems, 5th edn. Prentice-Hall, Upper Saddle River (1998)

    Google Scholar 

  7. Pyle, D.: Data Preparation for Data Mining. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  8. Kleinberg, J., Papadimitriou, C., Raghavan, P.: A Microeconomic View of Data Mining. Data Mining and Knowledge Discovery 2, 311–324 (1998)

    Article  Google Scholar 

  9. Murphy, J., Hofacker, C.F., Bennett, M.: Website-generated Market-research Data: Tracing the Tracks Left Behind by Visitors. Cornell Hotel and Restaurant Administration Quarterl 42, 82–91 (2001)

    Google Scholar 

  10. Mobasher, B., Dai, H., Nakagawa, M., Luo, T.: Discovery and Evaluation of Aggregate Usage Profiles for Web Personalization. Data Mining and Knowledge Discovery 6, 61–82 (2002)

    Article  MathSciNet  Google Scholar 

  11. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  12. Bauer, C., Scharl, A.: Acquisition and Symbolic Visualization of Aggregated Customer Information for Analyzing Web Information Systems. In: Proc. 32nd Hawaii International Conference on System Sciences (1999)

    Google Scholar 

  13. McMillan, S.J.: The Microscope and the Moving Target: The Challenge of Applying Content Analysis to the World Wide Web. Journalism and Mass Communication Quarterly 77, 80–98 (2000)

    Google Scholar 

  14. Koster, M.: Evaluation of the Standard for Robots Exclusion [Online], Available: http://www.robotstxt.org/wc/evalhtml

  15. Krippendorf, K.: Content Analysis: An Introduction to Its Methodology. Sage, Beverly Hills (1980)

    Google Scholar 

  16. Potter, J.W., Levine-Donnerstern, D.: Rethinking Validity and Reliability in Content Analysis. Journal of Applied Communication Research 27, 258–284 (1999)

    Article  Google Scholar 

  17. Lebart, L., Salem, A., Berry, L.: Exploring Textual Data. Kluwer Academic Publishers, Dordrecht (1998)

    Google Scholar 

  18. Potter, R.F.: Measuring the ”Bells & Whistles” of a New Medium: Using Content Analysis to Describe Structural Features of Cyberspace. In: Proc. 49th Annual Conference of the International Communication Association (1999)

    Google Scholar 

  19. McEnery, T., Wilson, A.: Corpus Linguistics. Edinburgh University Press, Edinburgh (1996)

    Google Scholar 

  20. Biber, D., Conrad, S., Reppen, R.: Corpus Linguistics - Investigating Language Structure and Use. Cambridge University Press, Cambridge (1998)

    Google Scholar 

  21. Tesch, R.: Qualitative Research: Analysis Types and Software Tools. Falmer Press, New York (1990)

    Google Scholar 

  22. Terveen, L.G., Hill, W.C., Amento, B.: Constructing, Organizing, and Visualizing Collections of Topically Related Web Resources. ACM Transactions on Computer-Human Interaction 6, 67–94 (1999)

    Article  Google Scholar 

  23. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press Books, Harlow (1999)

    Google Scholar 

  24. Titscher, S., Wodak, R., Meyer, M., Vetter, E.: Methoden der Textanalyse: Leitfaden und Überblick. Westdeutscher Verlag, Opladen (1998)

    Google Scholar 

  25. McMillan, S.J.: The Microscope and the Moving Target: The Challenge of Applying a Stable Research Technique to a Dynamic Communication Environment. In: Proc. 49th Annual Conference of the International Communication Association (1999)

    Google Scholar 

  26. Aarseth, E.J.: Nonlinearity and Literary Theory. In: Landow, G.P. (ed.) Hyper/Text/Theory, pp. 51–86. Johns Hopkins University Press, Baltimore (1994)

    Google Scholar 

  27. Pearce, C., Miller, E.: The TELLTALE Dynamic Hypertext Environment: Approaches to Scalability. In: Nicholas, C., Mayfield, J. (eds.) Intelligent Hypertext: Advanced Techniques for the World Wide Web, pp. 109–130. Springer, Heidelberg (1997)

    Google Scholar 

  28. Hull, D.A., Grefenstette, G.: Querying Across Languages: A Dictionary-based Approach to Multilingual Information Retrieval. In: Proc. 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 49-57 (1996)

    Google Scholar 

  29. Grefenstette, G.: Comparing Two Language Identification Schemes. In: Proc. 3rd International Conference on Statistical Analysis of Textual Data, pp. 263-268 (1995)

    Google Scholar 

  30. Someya, Y. (1999) e-lemma.txt [Online]. Available: http://www.lexically.net/downloads/e-lemma.zip

  31. Stone, P.J., Dunphy, D.C., Smith, M.S., Ogilvie, D.M.: The General Inquirer: A Computer Approach to Content Analysis. MIT Press, Cambridge (1966)

    Google Scholar 

  32. Stone, P.J.: The General Inquirer [Online]. Available: http://www.wjh.harvard.edu/~inquirer/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Scharl, A., Pollach, I., Bauer, C. (2003). Determining the Semantic Orientation of Web-Based Corpora. In: Liu, J., Cheung, Ym., Yin, H. (eds) Intelligent Data Engineering and Automated Learning. IDEAL 2003. Lecture Notes in Computer Science, vol 2690. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45080-1_116

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45080-1_116

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40550-4

  • Online ISBN: 978-3-540-45080-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics