Skip to main content

A performance evaluation of automatic survey classifiers

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1433))

Abstract

A novel NLP task, automatic survey coding, is described, and two methods for performing this task are presented. The first method uses a Boolean pattern-matching strategy to code survey responses, while the second uses a vector-based (probabilistic) method. The performance of the two methods is tested and compared on three representative survey datasets. The Boolean method is shown to perform slightly better on average than the vector-based method. Linguistic factors affecting the difficulty of the coding task for each survey are discussed.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Berlin, B. (1978) ‘Ethnobiological classification.’ In E. Rosch and B. Lloyd (eds.) Cognition and Categorization, pp. 9–27. Hillsdale, New Jersey: Lawrence Erlbaum.

    Google Scholar 

  2. Bookstein, A., (1985) ‘Probability and fuzzy-set applications to information retrieval.’ In M. Williams (ed.), Annual Review of Information Science and Technology 20:117–151.

    Google Scholar 

  3. Cohen, J. (1960) ‘A coefficient of agreement for nominal scales.’ Education and Psychological Measurement 20:37–46.

    Google Scholar 

  4. Davis, J., and Smith, T. (1996) General Social Surveys, 1972–1996: Cumulative Codebook. Chicago: National Opinion Research Center.

    Google Scholar 

  5. Deerwester, S., Dumais, S., Furnas, G., Landauer, T., and Harshman, R. (1990) ‘Indexing by latent semantic analysis.’ Journal of the American Society for Information Science 41(6).

    Google Scholar 

  6. Duda, R., and Hart, P. (1973) Pattern Classification and Scene Analysis. New York: John Wiley & Sons.

    Google Scholar 

  7. Ellis, D. (1990) New Horizons in Information Retrieval. London: Library Association.

    Google Scholar 

  8. Fellbaum, C. (1993) ‘English verbs as a semantic net.’ In G. Miller (ed.) Five Papers on Wordnet. http://www.cogsci.princeton.edu/~wn.

    Google Scholar 

  9. Landis, J., and Koch, G. (1977) ‘The measurement of observer agreement for categorical data.’ Biometrics 33:159–174.

    Article  MATH  MathSciNet  Google Scholar 

  10. Lewis, D. (1992) ‘An evaluation of phrasal and clustered representations on a text categorization task.’ ACM-SIGIR'92, pp. 37–50.

    Google Scholar 

  11. Pratt, D., and Mays, J. (1989) ‘Automatic coding of transcript data for a survey of recent college graduates.’ Proceedings of the Section on Survey Methods of the American Statistical Association Annual Meeting,pp. 796–801.

    Google Scholar 

  12. Raud, R., and Fallig, M. (1995) ‘Automating the coding process with neural networks.’ http://www.monmouth.com/~rraud/autocode.html.

    Google Scholar 

  13. Rosch, E. (1978) ‘Principles of categorization.’ In E. Rosch and B. Lloyd (eds.)Cognition and Categorization, pp. 28–49. Hillsdale, New Jersey: Lawrence Erlbaum.

    Google Scholar 

  14. Salton, G. (ed.) (1971) The SMART Retrieval System — Experiments in Automatic Document Processing. Englewood Cliffs, New Jersey: Prentice-Hall.

    Google Scholar 

  15. Salton, G., and McGill, M. (1983) Introduction to Modern Information Retrieval. New York: McGraw-Hill.

    Google Scholar 

  16. Schuetze, H., Hull, D., and Pedersen, P. (1995) ‘A comparison of classifiers and document representations for the routing problem.’ ACM-SIGIR'95, pp. 229–237.

    Google Scholar 

  17. Thomas, T. (1994) ‘Concept extraction applied to text analysis of medical records.’ Los Alamos Science 22:145–148.

    Google Scholar 

  18. Viechnicki, P. (1997) ‘A comparison of classification algorithms for a survey coding task.’ http://student-www.uchicago.edu/users/pdviechn/comp.html.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Vasant Honavar Giora Slutzki

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Viechnicki, P. (1998). A performance evaluation of automatic survey classifiers. In: Honavar, V., Slutzki, G. (eds) Grammatical Inference. ICGI 1998. Lecture Notes in Computer Science, vol 1433. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0054080

Download citation

  • DOI: https://doi.org/10.1007/BFb0054080

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64776-8

  • Online ISBN: 978-3-540-68707-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics