skip to main content
10.1145/2872518.2890559acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
abstract

A Test Suite for Evaluating POS Taggers across Varieties of English

Published: 11 April 2016 Publication History

Abstract

We present a suite of 12 datasets for evaluating POS taggers across varieties of English to enable researchers to evaluate the robustness of their models. The suite includes three new datasets, sampled from lyrics from black American hip-hop artists, southeastern American Twitter, and the subtitles from the TV series The Wire. We present an example eval- uation of an off-the-shelf POS tagger across these datasets.

References

[1]
J. Demsar. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7:1--30, 2006.
[2]
J. Eisenstein. Phonological factors in social media writing. In Proceedings of NAACL Workshop on Language Analysis in Social Media, 2013.
[3]
J. Eisenstein. What to do about bad language on the Internet. In Proceedings of NAACL-HLT, 2013.
[4]
J. Eisenstein. Systematic patterning of phonologically-motivated orthographic variation. Journal of Sociolinguistics, 19(2):161--188, April 2015.
[5]
J. Eisenstein, B. O'Connor, N. A. Smith, and E. P. Xing. A latent variable model for geographic lexical variation. In Proceedings of EMNLP, 2010.
[6]
J. Foster, O. Cetinoglu, J. Wagner, J. L. Roux, J. Nivre, D. Hogan, and J. van Genabith. From news to comment: Resources and benchmarks for parsing the language of web 2.0. In Proceedings of IJCNLP, 2013.
[7]
D. Hovy, B. Plank, and A. Søgaard. When POS datasets don't add up: Combatting sample bias. In LREC, 2014.
[8]
A. Jørgensen, D. Hovy, and A. Søgaard. Challenges of studying and processing dialects in social media. In Proceedings of the ACL Workshop on Noisy User-generated Text, 2015.
[9]
O. Owoputi, B. O'Connor, C. Dyer, K. Gimpel, N. Schneider, and N. A. Smith. Improved part-of-speech tagging for online conversational text with word clusters. In Proceedings of NAACL-HLT, 2013.
[10]
S. Petrov, D. Das, and R. McDonald. A universal part-of-speech tagset. In Proceedings of LREC, 2011. {11} K. Pollock, Bailey, Berni, J. R. Fletcher, Hinton, and Weaver. Phonological features of African American Vernacular English (AAVE), 2001.
[11]
J. Rickford. African American Vernacular English. Blackwell, Oxford and Malden, MA, 1999.
[12]
E. R. Thomas. Phonological and phonetic characteristics of African American Vernacular English. Language and Linguistics Compass, 1(5):450--475, 2007.
[13]
J. Trotta and O. Blyahher. Game done changed: A look at selected AAVE features in the TV series The Wire. Moderna Spak, 2011.

Index Terms

  1. A Test Suite for Evaluating POS Taggers across Varieties of English

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '16 Companion: Proceedings of the 25th International Conference Companion on World Wide Web
    April 2016
    1094 pages
    ISBN:9781450341448

    Sponsors

    • IW3C2: International World Wide Web Conference Committee

    In-Cooperation

    Publisher

    International World Wide Web Conferences Steering Committee

    Republic and Canton of Geneva, Switzerland

    Publication History

    Published: 11 April 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. datasets
    2. evaluation
    3. performance
    4. pos tagging

    Qualifiers

    • Abstract

    Funding Sources

    • ERC Starting Grant LOWLANDS

    Conference

    WWW '16
    Sponsor:
    • IW3C2
    WWW '16: 25th International World Wide Web Conference
    April 11 - 15, 2016
    Québec, Montréal, Canada

    Acceptance Rates

    WWW '16 Companion Paper Acceptance Rate 115 of 727 submissions, 16%;
    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 104
      Total Downloads
    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media