skip to main content
10.1145/1557019.1557055acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Learning with a non-exhaustive training dataset: a case study: detection of bacteria cultures using optical-scattering technology

Published: 28 June 2009 Publication History

Abstract

For a training dataset with a nonexhaustive list of classes, i.e. some classes are not yet known and hence are not represented, the resulting learning problem is ill-defined. In this case a sample from a missing class is incorrectly classified to one of the existing classes. For some applications the cost of misclassifying a sample could be negligible. However, the significance of this problem can better be acknowledged when the potentially undesirable consequences of incorrectly classifying a food pathogen as a nonpathogen are considered. Our research is directed towards the real-time detection of food pathogens using optical-scattering technology. Bacterial colonies consisting of the progeny of a single parent cell scatter light at 635 nm to produce unique forward-scatter signatures. These spectral signatures contain descriptive characteristics of bacterial colonies, which can be used to identify bacteria cultures in real time. One bottleneck that remains to be addressed is the nonexhaustive nature of the training library. It is very difficult if not impractical to collect samples from all possible bacteria colonies and construct a digital library with an exhaustive set of scatter signatures. This study deals with the real-time detection of samples from a missing class and the associated problem of learning with a nonexhaustive training dataset. Our proposed method assumes a common prior for the set of all classes, known and missing. The parameters of the prior are estimated from the samples of the known classes. This prior is then used to generate a large number of samples to simulate the space of missing classes. Finally a Bayesian maximum likelihood classifier is implemented using samples from real as well as simulated classes. Experiments performed with samples collected for 28 bacteria subclasses favor the proposed approach over the state of the art.

Supplementary Material

JPG File (p279-dundar.jpg)
MP4 File (p279-dundar.mp4)

References

[1]
E. Bae, P. P. Banada, K. Huff, A. K. Bhunia, J. P. Robinson, and E. D. Hirleman. Biophysical modeling of forward scattering from bacterial colonies using scalar diffraction theory. Applied Optics, 46(17):3639--48, June 2007.
[2]
P. P. Banada, S. Guo, B. Bayraktar, E. Bae, B. Rajwa, J. P. Robinson, E. D. Hirleman, and A. K. Bhunia. Optical forward-scattering for detection of listeria monocytogenes and other listeria species. Biosensors&Bioelectronics, 22(8):1664--71, Mar. 2007.
[3]
P. P. Banada, K. Huff, E. Bae, B. Rajwa, A. Aroonnual, B. Bayraktar, A. Adil, J. P. Robinson, E. D. Hirleman, and A. K. Bhunia. Label-free detection of multiple bacterial pathogens using light-scattering sensor. Biosensors&Bioelectronics, 24(6):1685--92, Feb. 2009.
[4]
B. Bayraktar, P. P. Banada, E. D. Hirleman, A. K. Bhunia, J. P. Robinson, and B. Rajwa. Feature extraction from light-scatter patterns of listeria colonies for identification and classification. Journal of Biomedical Optics, 11(3):34006, 2006.
[5]
R. E. Bellman. Dynamic Programming. Princeton University Press, 1957.
[6]
K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, San Diego, CA, 1990.
[7]
T. H. J. Friedman and R. Tibshirani. Additive logistic regression: A statistical view of boosting. The Annals of Statistics, 38, 2000.
[8]
J. Munoz-Mari, L. Bruzzone, and G. Camps-Valls. A support vector domain description approach to supervised classification of remote sensing images. IEEE Transaction on Geoscience and Remote Sensing, 45(8):2683--2692, 2008.
[9]
B. Rajwa, M. Venkatapathi, K. Ragheb, P. P. Banada, E. D. Hirleman, T. Lary, and J. P. Robinson. Automated classification of bacterial particles in flow by multiangle scatter measurement and support vector machine classifier. Cytometry. Part A: The Journal of the International Society for Analytical Cytology, 73(4):369--79, Apr. 2008.
[10]
B. Scholkopf, J. C. Platt, J. C. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the support of a high-dimensional distribution. Neural Comput., 13(7):1443--1471, 2001
[11]
B. Scholkopf, R. Williamson, A. Smola, J. Shawe-Taylor, and J. Platt. Support vector method for novelty detection. volume 12, 2000.
[12]
E. J. Spinosa and A. C. Carvalho. Support vector machines for novel class detection in bioinformatics. Genet Mol Res., 4(3):608--15, 2005.
[13]
H. B. Steen. Light scattering measurement in an arc lamp-based flow cytometer. Cytometry, 11(2):223--30, 1990.
[14]
D. M. J. Tax and R. P. W. Duin. Support vector domain description. Pattern Recognition Letters, 20(11-13):1191--1199, 1999.
[15]
J. Theiler and D. M. Cai. Resampling approach for anomaly detection in multispectral images, 2003.
[16]
M. E. Tipping. The relevance vector machine. In S. Solla, T. Leen, and K.-R. Muller, editors, Advances in Neural Information Processing Systems 12, pages 652--658. MIT Press, Cambridge, MA, 2000.
[17]
V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, 1995.
[18]
P. J. Wyatt. Identification of bacteria by differential light scattering. Nature, 221(5187):1257--8, Mar. 1969.
[19]
P. J. Wyatt and D. T. Phillips. Structure of single bacteria from light scattering. Journal of Theoretical Biology, 37(3):493--501, Dec. 1972.

Cited By

View all
  • (2024)Combined Mutual Learning Net for Raman Spectral Microbial Strain IdentificationAnalytical Chemistry10.1021/acs.analchem.3c0510796:15(5824-5831)Online publication date: 4-Apr-2024
  • (2023)Keyword Extraction From Specification Documents for Planning Security Mechanisms2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00143(1661-1673)Online publication date: May-2023
  • (2010)A machine‐learning approach to detecting unknown bacterial serovarsStatistical Analysis and Data Mining: The ASA Data Science Journal10.1002/sam.100853:5(289-301)Online publication date: 3-Aug-2010
  • Show More Cited By

Index Terms

  1. Learning with a non-exhaustive training dataset: a case study: detection of bacteria cultures using optical-scattering technology

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
      June 2009
      1426 pages
      ISBN:9781605584959
      DOI:10.1145/1557019
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 28 June 2009

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. anomaly detection
      2. bacteria detection
      3. bayes classifier
      4. nonexhaustive learning
      5. novelty detection

      Qualifiers

      • Research-article

      Conference

      KDD09

      Acceptance Rates

      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Upcoming Conference

      KDD '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 02 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Combined Mutual Learning Net for Raman Spectral Microbial Strain IdentificationAnalytical Chemistry10.1021/acs.analchem.3c0510796:15(5824-5831)Online publication date: 4-Apr-2024
      • (2023)Keyword Extraction From Specification Documents for Planning Security Mechanisms2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00143(1661-1673)Online publication date: May-2023
      • (2010)A machine‐learning approach to detecting unknown bacterial serovarsStatistical Analysis and Data Mining: The ASA Data Science Journal10.1002/sam.100853:5(289-301)Online publication date: 3-Aug-2010
      • (2010)Discovering the unknown: Detection of emerging pathogens using a label‐free light‐scattering systemCytometry Part A10.1002/cyto.a.2097877A:12(1103-1112)Online publication date: 24-Nov-2010

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media