Skip to main content

Corpus-Based Study of Scientific Methodology: Comparing the Historical and Experimental Sciences

  • Chapter
Computing Attitude and Affect in Text: Theory and Applications

Part of the book series: The Information Retrieval Series ((INRE,volume 20))

  • 1374 Accesses

Abstract

This chapter studies the use of textual features based on systemic functional linguistics, for genre-based text categorization. We describe feature sets that represent different types of conjunctions and modal assessment, which together can partially indicate how different genres structure text and may prefer certain classes of attitudes towards propositions in the text. This enables analysis of large-scale rhetorical differences between genres by examining which features are important for classification. The specific domain we studied comprises scientific articles in historical and experimental sciences (paleontology and physical chemistry, respectively). We applied the SMO learning algorithm, which with our feature set achieved over 83% accuracy for classifying articles according to field, though no field-specific terms were used as features. The most highly-weighted features for each were consistent with hypothesized methodological differences between historical and experimental sciences, thus lending empirical evidence to the recent philosophical claim of multiple scientific methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

8. References

  • Argamon, S., Koppel, M., Fine, J., and Shimoni, A. R. (2003a) Gender, Genre, and Writing Style in Formal Written Texts. Text, 23(3).

    Google Scholar 

  • Argamon, S., Å ari, M., and Stein, S. S. (2003b) Style mining of electronic messages for multiple authorship discrimination: First Results. In Proceedings of ACM Conference on Knowledge Discovery and Data Mining 2003.

    Google Scholar 

  • Baayen, H., van Halteren, H., and Tweedie, F. (1996) Outside the cave of shadows: Using syntactic annotation to enhance authorship attribution, Literary and Linguistic Computing, 11.

    Google Scholar 

  • Baker, V.R. (1996) The pragmatic routes of American Quaternary geology and geomorphology. Geomorphology 16, pp. 197–215.

    Article  Google Scholar 

  • Cleland, C.E. (2002) Methodological and epistemic differences between historical science and experimental science. Philosophy of Science.

    Google Scholar 

  • Diamond, J. (1999) Guns, Germs, & Steel, New York: W. W. Norton and Company.

    Google Scholar 

  • Divisia-Blohorn, B., Genoud, F., Borel, C., Bidan, G., Kern, J-M., and Sauvage, J-P. (2003) Conjugated Polymetallorotaxanes: In-Situ ESR and Conductivity Investigations of Metal-Backbone Interactions, J. Phys. Chem. B, 107, pp. 5126–5132.

    Article  Google Scholar 

  • Dodick, J. T. and Orion, N. (2003) Geology as an Historical Science: Its Perception within Science and the Education System. Science and Education, 12(2).

    Google Scholar 

  • Dunbar, K. (1995) How scientists really reason: Scientific reasoning in real-world laboratories. In Sternberg, R.J. and Davidson, J. (Eds.). Mechanisms of Insight. Cambridge MA: MIT Press, pp. 365–395.

    Google Scholar 

  • Eggins, S. and Martin, J. R. (1997) Genres and registers of discourse. In van Dijk, T. A. (Ed.) Discourse as structure and process. A multidisciplinary introduction. Discourse studies 1. London: Sage, pp. 230–256.

    Google Scholar 

  • Goodwin, C. (1994) Professional Vision. American Anthropologist, 96(3), pp. 606–633.

    Article  MathSciNet  Google Scholar 

  • Gould, S. J. (1986) Evolution and the Triumph of Homology, or, Why History Matters, American Scientist, Jan.–Feb. 1986:60–69.

    Google Scholar 

  • Gregory, M. (1967) Aspects of varieties differentiation, Journal of Linguistics 3:177–198.

    Google Scholar 

  • Halliday, M.A.K. (1991) Corpus linguistics and probabilistic grammar. In Karin Aijmer & Bengt Altenberg (Ed.) English Corpus Linguistics: Studies in honour of Jan Svartvik. (London: Longman), pp. 30–44.

    Google Scholar 

  • Halliday, M.A.K. (1994). An Introduction to Functional Grammar. Edward Arnold, London.

    Google Scholar 

  • Hasan, R. (1988) Language in the process of socialisation: Home and school. In Oldenburg, J., v Leeuwen, Th., and Gerot, L. (ed.), Language and socialisation: Home and school; Proceedings from the Working Conference on Language in Education, 17–21 November, 1986. North Ryde, N.S.W., Macquarie University.

    Google Scholar 

  • Holmes, D. I. and Forsyth, R. S. (1995). The federalist revisited: New directions in authorship attribution. Literary and Linguistic Computing, 10(2):111–126

    Article  Google Scholar 

  • Joachims, T. (1998) Text categorization with Support Vector Machines: Learning with many relevant features. In Machine Learning: ECML-98, Tenth European Conference on Machine Learning, pp. 137–142.

    Google Scholar 

  • Koppel, M., Argamon, S., and Shimoni, A. R. (2003) Automatically categorizing written texts by author gender. Literary and Linguistic Computing 17(4).

    Google Scholar 

  • Latour, B. and Woolgar, S. (1986) Laboratory Life: The Construction of Scientific Facts, Princeton: Princeton University Press.

    Google Scholar 

  • Lewin, B.A., Fine, J. and Young, L. (2001) Expository Discourse: A Genre-Based Approach to Social Science Research Texts, Continuum Press.

    Google Scholar 

  • Losee, R. M. (1996) Text Windows and Phrases Differing by Discipline, Location in Document, and Syntactic Structure. Information Processing & Management, 32(6):747–767.

    Article  Google Scholar 

  • Marcu, D. (2000) The Rhetorical Parsing of Unrestricted Texts: A Surface-Based Approach. Computational Linguistics, 26(3):395–448.

    Article  MathSciNet  Google Scholar 

  • Martin, J. R. (1992) English Text: System and Structure. Amsterdam: Benjamins.

    Google Scholar 

  • Matthews, R. A. J. and Merriam, T. V. N. (1997) Distinguishing literary styles using neural networks. In Fiesler, E. and Beale, R. (Eds) Handbook of Neural Computation, chapter 8. Oxford University Press.

    Google Scholar 

  • Matthiessen, C. (1995) Lexicogrammatical Cartography: English Systems. International Language Sciences Publishers: Tokyo, Taipei & Dallas.

    Google Scholar 

  • Mayr, E. (1976). Evolution and the Diversity of Life. Cambridge: Harvard University Press.

    Google Scholar 

  • Mosteller, F. and Wallace, D. L. (1964) Inference and Disputed Authorship: The Federalist Papers, Reading, Mass.: Addison Wesley.

    Google Scholar 

  • Ochs, E., Jacoby, S., and Gonzales, P. (1994) Interpretive journeys: How physicists talk and travel through graphic space, Configurations 1:151–171.

    Google Scholar 

  • Platt, J. (1998) Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines, Microsoft Research Technical Report MSR-TR-98-14.

    Google Scholar 

  • Plum, G. A. and Cowling, A. (1987) Social constraints on grammatical variables: Tense choice in English. In Steele, R. and Threadgold, T. (Eds.), Language topics. Essays in honour of Michael Halliday. Amsterdam: Benjamins.

    Google Scholar 

  • Sebastiani, F. (2002) Machine learning in automated text categorization, ACM Computing Surveys, 34(1):1–47.

    Article  Google Scholar 

  • Smith, F. A. and Betancourt, J. L. (2003) The effect of Holocene temperature fluctuations on the evolution and ecology of Neotoma (woodrats) in Idaho and northwestern Utah, Quaternary Research 59:160–171.

    Article  Google Scholar 

  • Stamatatos, E., Fakotakis, N., and Kokkinakis, G. (2001) Computer-based authorship attribution without lexical measures, Computers and the Humanities 35.

    Google Scholar 

  • Teufel, S. and Moens, M. (1998) Sentence extraction and rhetorical classification for flexible abstracts. In Proc. AAAI Spring Symposium on Intelligent Text Summarization.

    Google Scholar 

  • Wiebe, J., Wilson, T., and Bell, M. (2001) Identifying Collocations for Recognizing Opinions. In Proc. ACL/EACL’ 01 Workshop on Collocation, Toulouse, France, July 200.

    Google Scholar 

  • Whewell, W. (1837) History of the Inductive Sciences, John W. Parker, London.

    Google Scholar 

  • Witten, I.H. and Frank E. (1999) Weka 3: Machine Learning Software in Java; http://www.cs.waikato.ac.nz/~ml/weka.

    Google Scholar 

  • Yule, G.U. (1938) On sentence length as a statistical characteristic of style in prose with application to two cases of disputed authorship, Biometrika, 30:363–390.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer

About this chapter

Cite this chapter

Argamon, S., Dodick, J. (2006). Corpus-Based Study of Scientific Methodology: Comparing the Historical and Experimental Sciences. In: Shanahan, J.G., Qu, Y., Wiebe, J. (eds) Computing Attitude and Affect in Text: Theory and Applications. The Information Retrieval Series, vol 20. Springer, Dordrecht. https://doi.org/10.1007/1-4020-4102-0_17

Download citation

  • DOI: https://doi.org/10.1007/1-4020-4102-0_17

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-1-4020-4026-9

  • Online ISBN: 978-1-4020-4102-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics