A multi-one-class dynamic classifier for adaptive digitization of document streams

Ngo Ho, Anh Khoi; Eglin, Véronique; Ragot, Nicolas; Ramel, Jean-Yves

doi:10.1007/s10032-017-0286-6

A multi-one-class dynamic classifier for adaptive digitization of document streams

Original Paper
Published: 18 May 2017

Volume 20, pages 137–154, (2017)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

423 Accesses
2 Citations
Explore all metrics

Abstract

In this paper, we present a new dynamic classifier design based on a set of one-class independent SVM for image data stream categorization. Dynamic or continuous learning and classification has been recently investigated to deal with different situations, like online learning of fixed concepts, learning in non-stationary environments (concept drift) or learning from imbalanced data. Most of solutions are not able to deal at the same time with many of these specificities. Particularly, adding new concepts, merging or splitting concepts are most of the time considered as less important and are consequently less studied, whereas they present a high interest for stream-based document image classification. To deal with that kind of data, we explore a learning and classification scheme based on one-class SVM classifiers that we call mOC-iSVM (multi-one-class incremental SVM). Even if one-class classifiers are suffering from a lack of discriminative power, they have, as a counterpart, a lot of interesting properties coming from their independent modeling. The experiments presented in the paper show the theoretical feasibility on different benchmarks considering addition of new classes. Experiments also demonstrate that the mOC-iSVM model can be efficiently used for tasks dedicated to documents classification (by image quality and image content) in a context of streams, handling many typical scenarii for concepts extension, drift, split and merge.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on semi-supervised learning

Article Open access 15 November 2019

Jesper E. van Engelen & Holger H. Hoos

ImageNet Large Scale Visual Recognition Challenge

Article 11 April 2015

Olga Russakovsky, Jia Deng, … Li Fei-Fei

Learning from imbalanced data: open challenges and future directions

Article Open access 22 April 2016

Bartosz Krawczyk

Notes

Europeana Project: http://www.europeana.eu/. Gallica Project: http://gallica.bnf.fr/. NYPL Digital Collection: http://digitalcollections.nypl.org/.
(ANR-10-CORD-0020)—CONTenus et INTeractions (CONTINT) http://digidoc.labri.fr.
Please remind that we are dealing with supervised learning.
The system could handle the adaptation of the parameters on each learning step to assure the best integration of concept changing. The system could also use the predefined parameters to economize the learning cost.
We could have done some grid search optimization at different times, as in batch mode, by keeping some training examples, for example, each of the 50 data. A windowing technique could have also been used. Consequently, the results obtained can be easily improved.
Online results are also presented.
Or online learning.
Experiments we have performed have shown that using the negative information available for parameter selection, in case of one-class SVM can improve their performances of at least 10%.
Please notice that in this figure, as well as others, interpolated curves are shown to provide a better rendering of a stream simulation (considering our protocol a more exact representation would have been dot or step plots). Consequently, there is apparent loss of accuracy between steps (between steps 5 and 6 here, for example). In fact, in our scenarii such loss only occurs at the new step (step 6 here) and not before.
One book of 400 pages in online learning will have an approximate cost of 10 and of 14 s for, respectively, quality and content learning.
Particularly, the time needed for reading files (models of SVM, features, etc.) has a huge impact.

References

Helbing, D.: Thinking ahead: essays on big data, digital revolution, and participatory market society. p 194, Springer (2015)
di Lenardo, I, Kaplan, F.: Venice Time Machine : Recreating the density of the past; Digital Humanities 2015, Sydney, June 29–July 3 (2015)
Chen, N., Blostein, D.: A survey of document image classification: problem statement, classifier architecture and performance evaluation. Int. J. Doc. Anal. Recognit. 10(1), 1–16 (2007)
Article Google Scholar
Baharudin, B., Lee, L.H., Khan, K.: A review of machine learning algorithms for text-documents classification. J. Adv. Inf. Technol. 1(1), 4–20 (2010)
Google Scholar
Prudent, Y., Ennaji, A.: A New Learning Algorithm For Incremental Self-Organizing Maps, ESANN 2005, pp. 27–29. Bruges, Belgium (2005)
MATH Google Scholar
Singh, U., Hasan, S.: Survey paper on document classification and classifiers. Int. J. Comput. Sci. Trends Technol. 3(2), 83–87 (2015)
Google Scholar
G. Cauwenberghs, T. Poggio; Incremental and decremental support vector machine learning. In NIPS 2000, 13 (2001)
Karasuyama, M., Takeuchi, I.: Multiple incremental decremental learning of support vector machines. IEEE Trans. Neural Networks 21(7), 1048–1059 (2010)
Article Google Scholar
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of KDD 2000, ACM Press, New York, USA, pp. 71–80 (2000)
Su, M.C., Lee, J., Hsieh, K.L.: A new Artmap-based neural network for incremental learning. Neurocomputing 69(16–18), 2284–2300 (2006)
Article Google Scholar
Lughofer, E.: Flexfis : a robust incremental learning approach for evolving Takagi–Sugeno fuzzy models. IEEE Trans. Fuzzy Syst. 16(6), 1393–1410 (2008)
Article Google Scholar
Minku, L., Li, F., Inoue, H., Yao, X.: Negative Correlation In Incremental Learning. Journal Natural Computing: An International Journal Archive, Kluwer Academic Publishers Hingham, MA, USA 8(2), 289–320 (2009)
Song, S., Qiao, X., Chen, P.: Hierarchical text classification incremental learning. Neural Inf. Process. LNCS 5863, 247–258 (2009)
Google Scholar
M.N. Kapp, R. Sabourin, P. Maupin; Adaptive incremental learning with an ensemble of support vector machines. In: 20th International Conference on Pattern Recognition, pp. 4048–4051 (2010)
Laskov, P., Gehl, C., Kruger, S., Muller, K.-R.: Incremental support vector learning: analysis, implementation and applications. J. Mach. Learn. Res. 7, 1909–1936 (2006)
MathSciNet MATH Google Scholar
Shilton, A., Palaniswami, M., Ralph, D., Tsoi, A.C.: Incremental training of support vector machines. IEEE Trans. Neural Netw. 16(1), 114–131 (2005)
Article Google Scholar
Polikar, R., Udpa, L., Udpa, S., Honavar, V.: Learn++: an incremental learning algorithm for supervised neural networks. IEEE Trans. Syst. Man And Cybern. (C) Spec. Issue Knowl. Manag. 31(4), 497–508 (2001)
Article Google Scholar
Connolly, J.-F., Granger, E., Sabourin, R.: Supervised Incremental Learning with the Fuzzy ARTMAP. IAPR Workshop on Artificial Neural Networks in Pattern Recognition, LNAI 5064(2008), pp 66–77 (2008)
Carpenter, G.A., Grossberg, S., Markuzon, N., Reynolds, J.H., Rosen, D.B.: Fuzzy Artmap: a neural network architecture for incremental supervised learning of analog multidimensional maps. IEEE Trans. Neural Netw. 3, 698–713 (1992)
Article Google Scholar
Shen, F., Hasegawa, O.: Self-organizing Incremental Neural Network and Its Application; Artificial Neural Networks (ICANN’10) (2010)
Almaksour, A., Anquetil, E.: Fast incremental learning strategy driven by confusion reject for online handwriting recognition. In: 10th International Conference On Document Analysis And Recognition (ICDAR’09), Spain (2009)
Almaksour, A., Anquetil, E., Quiniou, S., Cheriet, M.: Evolving fuzzy classifiers application to incremental learning of handwritten gesture recognition. In: International Conference On Pattern Recognition (ICPR’10), Istanbul, Turkey (2010)
Muhlbaier, M., Topalis, A., Polikar, R.: Learn++.NC: combining ensemble of classifiers combined with dynamically weighted consult-and-vote for efficient incremental learning of new classes. IEEE Trans. Neural Netw. 20(1), 152–168 (2009)
Article Google Scholar
Erdem, Z., Polikar, R., Gurgen, F., Yumusak, N.: Ensemble Of SVMs For Incremental Learning; Simulation, 3541, pp. 246–256, Springer (2005)
Hamza, H., Belaïd, Y., Belaïd, A., Chaudhuri, B.B.: An end-to-end administrative document analysis system. In Document Analysis Systems (DAS’08), pp. 175–182 (2008)
Bouguelia, M.R., Belaïd, Y., Belaïd, A.: Document image and zone classification through incremental learning. In: 20th IEEE International Conference On Image Processing (ICIP’13), pp. 4230–4234 (2013)
Ristin, M., Guillaumin, M., Gall, J., Gool, L.V.: Incremental Learning of NCM Forests for Large-Scale Image Classification; Computer Vision and Pattern Recognition (CVPR’14), pp. 3654–3661 (2014)
Littlestone, N.: Learning quickly when irrelevant attributes abound: a new linear threshold algorithm. Mach. Learn. 2, 285–318 (1988)
Google Scholar
Bifet, A.: Adaptive Stream Mining: Pattern Learning And Mining From Evolving Data Streams. IOS Press Inc, Amsterdam (2010). http://www.iospress.nl/book/adaptive-stream-mining-pattern-learning-and-mining-from-evolving-data-streams/
Lazarescu, M., Venkatesh, S., Bui, H.: Using multiple windows to track concept drift. Intell. Data Anal. IOS Press Amsterdam 8(1), 29–59 (2004)
Alippi, C., Roveri, M.: Just-in-time adaptive classifiers in non-stationary conditions, pp. 1014–1019. IJCNN, IEEE, New York (2007)
Alippi, C., Boracchi, G., Roveri, M.: Just in time classifiers: managing the slow-drift case, pp. 114–120. IJCNN, IEEE, New York (2009)
R. Klinkenberg; Learning drifting concepts: example selection vs. example weighting. Intell. Data Anal. Special Issue On Incremental Learning Systems Capable Of Dealing With Concept Drift, 8(3) pp. 281–300 (2004)
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams classifiers. In: Proceeding of The 9th ACM SIGKDD International Conference, ACM Press, New York, pp. 226–235 (2003)
Kolter, J., Maloof, M.: Dynamic Weighted Majority (DWM): An Ensemble Method For Drifting Concepts; JMLR’08, pp. 2755–2790 (2008)
Street, W.N., Kim, Y.S.: A streaming ensemble algorithm (SEA) for large-scale classication. In Proceeding of The 7th International Conference On Knowledge Discovery And Data Mining, ACM Press, pp. 377–382 (2001)
Oza, N.C.: Online Ensemble Learning; PhD Thesis, University Of California, Berkeley (2001)
Nattee, C., Numao, M.: Geometric method for document understanding and classification using online machine learning. In: Proceeding Of The 6th International Conference On Document Analysis And Recognition, Seattle, USA, pp. 602606 (2001)
Salles, T., Rocha, L., Pappa, G.L., Mouro, F., Meira, Jr. W., Gonalves, M.: Temporally-aware algorithms for document classification. In: Proceeding of the 33rd International Conference on Research and development in Information Retrieval (SIGIR’10), ACM, New York, NY, USA, pp. 307–314 (2010)
Elwell, R., Polikar, R.: Incremental learning in nonstationary environments with controlled forgetting. In: International Joint Conference On Neural Networks (IJCNN 2009), Atlanta, GA, pp. 771–778 (2009)
Elwell, R., Polikar, R.: Incremental learning of concept drift in non-stationary environments. IEEE Trans. Neural Netw. 22(10), 1517–1531 (2011)
Article Google Scholar
Bouillon, M., Anquetil, E., Almaksour, A.: Decremental learning of evolving fuzzy inference systems: application to handwritten gesture recognition. Mach. Learn. Data Min. Pattern Recognit. LNCS 7988, 115–129 (2013)
Google Scholar
Syed, N., Liu, H., Sung, K.: Incremental learning with support vector machines. In: Proceeding of The Workshop On Support Vector Machines IJCAI’99, Stockholm, Sweden (1999)
Rüping, S.: Incremental learning with support vector machines; ICDM01, pp. 641–642 (2001)
Sato, J.R., Jane, R., Janaina, M.-M.: Measuring abnormal brains: building normative rules in neuroimaging using one-class support vector machines. Front. Neurosci. 6, 178 (2012). doi:10.3389/fnins.2012.00178
Article Google Scholar
Ngo-Ho, A-K., Ragot, N., Ramel, J-Y., Eglin, V., Sidere, N.: Document Classification in a Non-stationary Environment: a One-Class SVM Approach; ICDAR13, Washington DC, USA (2013)
Ngo-Ho, A.-K., Ragot, N., Ramel, J.-Y., Eglin, V., Sidere, N.: Multi one-class incremental SVM for both stationary and non-stationary environment. In: 16th Confrence Francophone sur l’Apprentissage Automatique. Saint-Etienne, France (2014)
Ngo-Ho, A.-K., Ragot, N., Ramel, J.-Y., Eglin, V., Sidere, N.: Multi one-class incremental svm for document stream digitization. In: 12th IAPR International Workshop on Document Analysis Systems. Santorini, Greece (2016)
Scolkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution; technical report, microsoft research, MSR-TR-99-87 (1999)
Kaynak, C.: Methods of combining multiple classifiers and their applications to handwritten digit recognition; Msc. Thesis, Institute Of Graduate Studies In Science And Engineering, Bogazici University (1995)
Vinsonneau, E., Domenger, J-P., Cherif, A.: Mesure de la Netteté Sur Une image Seule Dans Des Documents Anciens, CIFED 2014, France (2014)
Tong, H., Li, M., Zhang, H., Zhang, C.: Blur detection for digital images using wavelet transform. In: IEEE International Conference on Multimedia and Expo. (ICME04), vol. 1, IEEE, pp. 17–20 (2004)
Zhuo, S., Sim, T.: Defocus map estimation from a single image. Pattern Recogniti. 44(9), 1852–1858 (2011)
Article Google Scholar
Lelegard, L., Bredif, M., Vallet, B., Boldo, D.: Motion Blur Detection in Aerial Images Shot with Channel-Dependent Exposure Time; International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences (IAPRS), vol. 38, part 3A, Saint-Mand, France, pp. 180–185 (2010)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)
Article MATH Google Scholar
Chen, Y., Zhou, X.S., Huang, T.: One-class SVM for learning in image retrieval. In: IEEE International Conference on Image Processing (ICIP’2001), pp. 34–37 (2001)
Brodersen, K.H., Ong, C.S., Stephan, K.E., Buhmann, J.M.: The balanced accuracy and its posterior distribution; ICPR’10, pp. 3121–3124 (2010)
Chang, C.-C., Lin, C.-J.: LibSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011)
Article Google Scholar
Zhou, Z.-H., Chen, Z.-Q.: Hybrid decision tree. Knowl. Based Syst. 15(8), 515–528 (2002)
Article Google Scholar

Download references

Acknowledgements

This research has been carried out under the DIGIDOC project with financial support of the ANR (French National Agency for Research).

Author information

Authors and Affiliations

CNRS INSA-Lyon LIRIS - UMR 5205 CNRS, Université de Lyon, 69621, Lyon, France
Anh Khoi Ngo Ho & Véronique Eglin
Laboratoire Informatique - LI EA 6300, Université François Rabelais Tours, Tours, France
Nicolas Ragot & Jean-Yves Ramel

Authors

Anh Khoi Ngo Ho
View author publications
You can also search for this author in PubMed Google Scholar
Véronique Eglin
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Ragot
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Yves Ramel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Véronique Eglin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ngo Ho, A.K., Eglin, V., Ragot, N. et al. A multi-one-class dynamic classifier for adaptive digitization of document streams. IJDAR 20, 137–154 (2017). https://doi.org/10.1007/s10032-017-0286-6

Download citation

Received: 22 April 2016
Revised: 26 April 2017
Accepted: 04 May 2017
Published: 18 May 2017
Issue Date: September 2017
DOI: https://doi.org/10.1007/s10032-017-0286-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A multi-one-class dynamic classifier for adaptive digitization of document streams

Abstract

Access this article

Similar content being viewed by others

A survey on semi-supervised learning

ImageNet Large Scale Visual Recognition Challenge

Learning from imbalanced data: open challenges and future directions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A multi-one-class dynamic classifier for adaptive digitization of document streams

Abstract

Access this article

Similar content being viewed by others

A survey on semi-supervised learning

ImageNet Large Scale Visual Recognition Challenge

Learning from imbalanced data: open challenges and future directions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation