Article

Intelligent information triage

Authors:
Sofus A. Macskassy

Rutgers Univ., Piscataway, NJ

Rutgers Univ., Piscataway, NJ
View Profile

,
Foster Provost

NYU Stern School of Business, New York, NY

NYU Stern School of Business, New York, NY
View Profile

SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrievalSeptember 2001Pages 318–326https://doi.org/10.1145/383952.384015

Published:01 September 2001Publication History

SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 318–326

ABSTRACT

In many applications, large volumes of time-sensitive textual information require triage: rapid, approximate prioritization for subsequent action. In this paper, we explore the use of prospective indications of the importance of a time-sensitive document, for the purpose of producing better document filtering or ranking. By prospective, we mean importance that could be assessed by actions that occur in the future. For example, a news story may be assessed (retrospectively) as being important, based on events that occurred after the story appeared, such as a stock price plummeting or the issuance of many follow-up stories. If a system could anticipate (prospectively) such occurrences, it could provide a timely indication of importance. Clearly, perfect prescience is impossible. However, sometimes there is sufficient correlation between the content of an information item and the events that occur subsequently. We describe a process for creating and evaluating approximate information-triage procedures that are based on prospective indications. Unlike many information-retrieval applications for which document labeling is a laborious, manual process, for many prospective criteria it is possible to build very large, labeled, training corpora automatically. Such corpora can be used to train text classification procedures that will predict the (prospective) importance of each document. This paper illustrates the process with two case studies, demonstrating the ability to predict whether a news story will be followed by many, very similar news stories, and also whether the stock price of one or more companies associated with a news story will move significantly following the appearance of that story. We conclude by discussing how the comprehensibility of the learned classifiers can be critical to success.}

References

1.J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic detection and tracking pilot study: Final report. In Proceedings of the Broadcast News Understanding and Transcription Workshop, pages 194-218, 1998.Google Scholar
2.J. Allan, V. Lavrenko, and H. Jin. First story detection in TDT is hard. In Proceedings of the Ninth International Conference on Information and Knowledge Management, pages 374-381, 2000. Google ScholarDigital Library
3.J. Allan, V. Lavrenko, and R. Papka. Event tracking. CIIR Technical Report IR-128, University of Massachusetts Computer Science Department, 1998.Google Scholar
4.J. Allan, R. Papka, and V. Lavrenko. On-line new event detection and tracking. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1998. Google ScholarDigital Library
5.W. W. Cohen. Fast effective rule induction. In Proceedings of the Twelfth International Conference on Machine Learning, 1995.Google ScholarDigital Library
6.W. W. Cohen. Learning trees and rules with set-valued features. In Proceedings of the National Conference on Artificial Intelligence, 1996. Google ScholarDigital Library
7.M. W. Craven and J. W. Shavlik. Extracting tree-structured representations of trained networks. In Advances in Neural Information Processing Systems, pages 24-30, 1996.Google ScholarDigital Library
8.A. Danyluk and F. Provost. Small disjuncts in action: Learning to diagnose errors in the telephone network local loop. In Proceedings of the Tenth International Conference on Machine Learning, 1993.Google ScholarCross Ref
9.P. Domingos. Knowledge acquisition from examples via multiple models. In Proceedings of the Fourteenth International Conference on Machine Learning, pages 98-106, 1997. Google ScholarDigital Library
10.P. Domingos and M. Pazzani. Beyond independence: Conditions for the optimality of the simple Bayesian classifier. In Proceedings of the 13th International Conference on Machine Learning, pages 105-112, 1996.Google ScholarDigital Library
11.T. Fawcett and F. Provost. Activity monitoring: Noticing interesting changes in behavior. In Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, 1999. Google ScholarDigital Library
12.P. W. Foltz and S. T. Dumais. Personalized information delivery: An analysis of information filtering methods. Communications of the ACM, 35(12):51-60, Dec. 1992. Google ScholarDigital Library
13.D. J. Hand. Construction and Assessment of Classification Rules. Chichester:John Wiley and Sons, 1997.Google Scholar
14.E. M. Houseman and D. E. Kaskela. State of the art of selective dissemination of information. IEEE Transactions on Engineering Writing and Speech, 13(2):78-83, 1970.Google ScholarCross Ref
15.R. B. T. II, C. Olsen, and J. R. Dietrich. Attributes of news about firms: An analysis of firm-specific new reported in the wall street journal index. Journal of Accounting Research, 25(2), 1987.Google Scholar
16.T. Joachims. A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In Proceedings of the Fourteenth International Conference on Machine Learning, 1997. Google ScholarDigital Library
17.V. Lavrenko, M. Schmill, D. Lawrie, P. Ogilvie, D. Jensen, and J. Allan. Language models for financial news recommendation. In Proceedings of the Ninth International Conference on Information and Knowledge Management, pages 389-396, 2000. Google ScholarDigital Library
18.C. Marshall and F. Shipman. Spatial hypertext and the practice of information triage. In Proceedings of the '97 ACM Conference on Hypertext, pages 124-133, Apr 1997. Google ScholarDigital Library
19.A. Martin, G. Doddington, T. Kamm, , M. Ordowski, and M. Przybocki. The DET curve in assessment of detection task performance. In Proceedings EuroSpeech, volume4, pages 1895-1898, 1997.Google Scholar
20.A. K. McCallum. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/ mccallum/bow, 1996.Google Scholar
21.D. J. Mostow. Machine transformation of advice into a heuristic search procedure. In Machine Learning: An Artificial Intelligence Approach, pages 367-403. Morgan Kaufmann, 1983.Google Scholar
22.K.-B. Ng and P. Kantor. Predicting the effectiveness of naive data fusion on the basis of system characteristics. Journal of American Society for Information Science, 51(13):1177-1189, 2000. Google ScholarDigital Library
23.F. Provost and T. Fawcett. Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pages 445-453, 1997.Google Scholar
24.F. Provost and T. Fawcett. Robust classification for imprecise environments. Machine Learning, 42:203-231, 2001. Google ScholarDigital Library
25.J. Rocchio. Relevance feedback in information retrieval. In Salton, editor, The SMART Retrieval System: Experiments in Automatic Document Processing, chapter 14, pages 313-323. Prentice-Hall, 1971.Google Scholar
26.G. Salton and C. Buckley. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41:288-297, 1990.Google ScholarCross Ref
27.G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, Inc., 1983. Google ScholarDigital Library
28.R. Schapire, Y. Singer, and A. Singhal. Boosting and Rocchio applied to text filtering. In Proceedings of ACM SIGIR, pages 215-223, 1998. Google ScholarDigital Library
29.J. Swets. Measuring the accuracy of diagnostic systems. Science, 240:1285-1293, 1988.Google ScholarCross Ref
30.F. Walls, H. Jin, S. Sista, and R. Schwartz. Probabilistic models for topic detection and tracking. In IEEE International Conference On Acoustics, Speech and Signal Processing, 1999. Google ScholarDigital Library
31.J. P. Yamron, L. Gillick, S. Knecht, S. Lowe, and P. van Mulbregt. Statistical models for tracking and detection. In Working notes of the DARPA TDT-3 Workshop, 2000.Google Scholar
32.Y. Yang, T. Ault, T. Pierce, and C. W. Lattimer. Improving text categorization methods for event tracking. In Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval, pages 65-72, 2000. Google ScholarDigital Library
33.Y. Yang, J. G. Carbonell, R. D. Brown, T. Pierce, B. T. Archibald, and X. Liu. Learning approaches for detecting and tracking news events. IEEE Intelligent Systems, 14(4):32-43, 1999. Google ScholarDigital Library
34.Y. Yang, T. Pierce, and J. G. Carbonell. A study on retrospective and on-line event detection. In ACM SIGIR Conference on Research and Development in Information Retrieval, 1998. Google ScholarDigital Library

Index Terms

Intelligent information triage
1. Applied computing
  1. Document management and text processing
    1. Document capture
      1. Document analysis
2. Information systems
  1. Information retrieval
    1. Information retrieval query processing
    2. Retrieval models and ranking

Recommendations

Balancing human and system visualization during document triage
Read More
Triage training system: adjusting the difficulty level according to user proficiency
MUM '15: Proceedings of the 14th International Conference on Mobile and Ubiquitous Multimedia

At times of mass casualty incidents, medical resources such as personnel and equipment are limited. By performing triage, the act of deciding the priority of treating patients by severity of their condition, efficient use of the resources can be ...
Read More
Improving skim reading for document triage
IIiX '08: Proceedings of the second international symposium on Information interaction in context

When users seek for information, they repeatedly make relevance judgements on individual documents: the act of document triage. Recent research demonstrates that document triage decisions are prone to significant error rates. Document triage also ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
September 2001
454 pages
ISBN:1581133316
DOI:10.1145/383952
Chairmen:
Donald H. Kraft
Louisiana State Univ.
,
W. Bruce Croft
University of Massachusetts, (For the Americas)
,
David J. Harper
The Robert Gordon University, (For Europe and Africa)
,
Justin Zobel
RMIT University, (For Asia and Australasia)
Copyright © 2001 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 September 2001
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
SIGIR '01 Paper Acceptance Rate47of201submissions,23%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 643
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Intelligent information triage

SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Balancing human and system visualization during document triage

Triage training system: adjusting the difficulty level according to user proficiency

Improving skim reading for document triage