Abstract:
We explore the problem of locating documents pertaining to critical technologies (e.g., restricted, proprietary, or sensitive technical information) from among a massive ...Show MoreMetadata
Abstract:
We explore the problem of locating documents pertaining to critical technologies (e.g., restricted, proprietary, or sensitive technical information) from among a massive and highly heterogeneous collection of largely unimportant files. We present a system that employs the use of supervised machine learning (i.e., pattern recognition) to detect such critical documents. To address difficult or ambiguous instances, we supplement the text classifier with an automated keyword search. That is, we extract, in an automated fashion, discriminative terms (i.e., keywords) from the training set and match them against documents during the classification process. We demonstrate the effectiveness of this hybrid approach through a series of validation tests and case studies.
Date of Conference: 29 October 2012 - 01 November 2012
Date Added to IEEE Xplore: 28 January 2013
ISBN Information: