skip to main content
10.1145/1277741.1277848acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Improving text classification for oral history archives with temporal domain knowledge

Published: 23 July 2007 Publication History

Abstract

This paper describes two new techniques for increasing the accuracy oftopic label assignment to conversational speech from oral history interviews using supervised machine learning in conjunction with automatic speech recognition. The first, time-shifted classification, leverages local sequence information from the order in which the story is told. The second, temporal label weighting, takes the complementary perspective by using the position within an interview to bias label assignment probabilities. These methods, when used in combination, yield between 6% and 15% relative improvements in classification accuracy using a clipped R-precision measure that models the utility of label sets as segment summaries in interactive speech retrieval applications.

References

[1]
W. Byrne et al. Automatic Recognition of Spontaneous Speech for Access to Multilingual Oral History Archives. IEEE Transactions on Speech and Audio Processing, Special Issue on Spontaneous Speech Processing, 12(4):420--435, July 2004.
[2]
A. Dayanik et al. Constructing Informative Prior Distributions from Domain Knowledge in Text Classification. In SIGIR'06.
[3]
T. G. Dietterich. Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Comput., 10(7), 1998.
[4]
J. Fiscus et al. The Rich Transcription 2006 Evaluation Overview and Speech-To-Text Results. In 3rd Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms, Recognition Workshop, 2006.
[5]
G. Forman. Tackling Concept Drift by Temporal Inductive Transfer. In SIGIR'06.
[6]
Martin Franz. In unpublished correspondence.
[7]
E. Gabrilovich and S. Markovitch. Feature Generation for Text Categorization Using World Knowledge. In IJCAI'05.
[8]
U. Iurgel and G. Rigoll. Spoken Document Classification with SVMs using Linguistic Unit Weighting and Probabilistic Couplers. Proceedings of the 17th International Conference on Pattern Recognition, 2004.
[9]
R. Jones et al. Bootstrapping for Text Learning Tasks. In IJCAI'99 Workshop on Text Mining: Foundations, Techniques and Applications.
[10]
F. Kubala et al. Integrated Technologies for Indexing Spoken Language. Commun. ACM, 43(2), 2000.
[11]
D. W. Oard et al. Overview of the CLEF-2006 Cross-Language Speech Retrieval Track. In CLEF CL-SR'06. http://clef-clsr.umiacs.umd.edu/.
[12]
J. S. Olsson et al. Cross-Language Text Classification. In SIGIR'05.
[13]
S. E. Robertson et al. Okapi at TREC-3. In Text REtrieval Conference, 1992.
[14]
M. Sanderson and X. M. Shou. Search of Spoken Documents Retrieves Well Recognized Transcripts. In ECIR'07.
[15]
R. Schapire et al. Incorporating Prior Knowledge into Boosting. In Machine Learning: Proceedings of the Nineteenth International Conference, 2002.
[16]
B. W. Silverman. Density Estimation. Chapman and Hall, London, 1986.
[17]
W. N. Venables and B. D. Ripley. Modern Applied Statistics with S. Springer-Verlag, New York, NY, USA, 2002.
[18]
X. Wu and R. Srihari. Incorporating Prior Knowledge with Weighted Margin Support Vector Machines. In KDD'04.
[19]
Y. Yang and J. O. Pedersen. A Comparative Study on Feature Selection in Text Categorization. In ICML'97.

Cited By

View all
  • (2014)BejoProceedings of the 2014 IEEE 6th International Conference on Cloud Computing Technology and Science10.1109/CloudCom.2014.48(10-17)Online publication date: 15-Dec-2014
  • (2014)English and Chinese bilingual topic aspect classification: Exploring similarity measures, optimal LSA dimensions, and centroid correction of translated training examplesProceedings of the American Society for Information Science and Technology10.1002/meet.1450500103950:1(1-12)Online publication date: 8-May-2014
  • (2013)English and chinese bilingual topic aspect classificationProceedings of the 76th ASIS&T Annual Meeting: Beyond the Cloud: Rethinking Information Boundaries10.5555/2655780.2655823(1-12)Online publication date: 1-Nov-2013
  • Show More Cited By

Index Terms

  1. Improving text classification for oral history archives with temporal domain knowledge

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
      July 2007
      946 pages
      ISBN:9781595935977
      DOI:10.1145/1277741
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 23 July 2007

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. automatic topic classification
      2. classifying with domain knowledge
      3. spoken document classification

      Qualifiers

      • Article

      Conference

      SIGIR07
      Sponsor:
      SIGIR07: The 30th Annual International SIGIR Conference
      July 23 - 27, 2007
      Amsterdam, The Netherlands

      Acceptance Rates

      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 17 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2014)BejoProceedings of the 2014 IEEE 6th International Conference on Cloud Computing Technology and Science10.1109/CloudCom.2014.48(10-17)Online publication date: 15-Dec-2014
      • (2014)English and Chinese bilingual topic aspect classification: Exploring similarity measures, optimal LSA dimensions, and centroid correction of translated training examplesProceedings of the American Society for Information Science and Technology10.1002/meet.1450500103950:1(1-12)Online publication date: 8-May-2014
      • (2013)English and chinese bilingual topic aspect classificationProceedings of the 76th ASIS&T Annual Meeting: Beyond the Cloud: Rethinking Information Boundaries10.5555/2655780.2655823(1-12)Online publication date: 1-Nov-2013
      • (2011)Automatic tagging and geotagging in video collections and communitiesProceedings of the 1st ACM International Conference on Multimedia Retrieval10.1145/1991996.1992047(1-8)Online publication date: 18-Apr-2011
      • (2010)Natural Language Processing for Cultural Heritage DomainsLanguage and Linguistics Compass10.1111/j.1749-818X.2010.00230.x4:9(750-768)Online publication date: 6-Sep-2010
      • (2008)Pairwise document similarity in large collections with MapReduceProceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers10.5555/1557690.1557767(265-268)Online publication date: 16-Jun-2008
      • (2008)Bilingual topic aspect classification with a few training examplesProceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval10.1145/1390334.1390371(203-210)Online publication date: 20-Jul-2008
      • (2008)Access to recorded interviewsJournal on Computing and Cultural Heritage 10.1145/1367080.13670831:1(1-27)Online publication date: 18-Jun-2008

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media