Skip to main content
Log in

A multi-one-class dynamic classifier for adaptive digitization of document streams

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

In this paper, we present a new dynamic classifier design based on a set of one-class independent SVM for image data stream categorization. Dynamic or continuous learning and classification has been recently investigated to deal with different situations, like online learning of fixed concepts, learning in non-stationary environments (concept drift) or learning from imbalanced data. Most of solutions are not able to deal at the same time with many of these specificities. Particularly, adding new concepts, merging or splitting concepts are most of the time considered as less important and are consequently less studied, whereas they present a high interest for stream-based document image classification. To deal with that kind of data, we explore a learning and classification scheme based on one-class SVM classifiers that we call mOC-iSVM (multi-one-class incremental SVM). Even if one-class classifiers are suffering from a lack of discriminative power, they have, as a counterpart, a lot of interesting properties coming from their independent modeling. The experiments presented in the paper show the theoretical feasibility on different benchmarks considering addition of new classes. Experiments also demonstrate that the mOC-iSVM model can be efficiently used for tasks dedicated to documents classification (by image quality and image content) in a context of streams, handling many typical scenarii for concepts extension, drift, split and merge.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24

Notes

  1. Europeana Project: http://www.europeana.eu/. Gallica Project: http://gallica.bnf.fr/. NYPL Digital Collection: http://digitalcollections.nypl.org/.

  2. (ANR-10-CORD-0020)—CONTenus et INTeractions (CONTINT) http://digidoc.labri.fr.

  3. Please remind that we are dealing with supervised learning.

  4. The system could handle the adaptation of the parameters on each learning step to assure the best integration of concept changing. The system could also use the predefined parameters to economize the learning cost.

  5. We could have done some grid search optimization at different times, as in batch mode, by keeping some training examples, for example, each of the 50 data. A windowing technique could have also been used. Consequently, the results obtained can be easily improved.

  6. Online results are also presented.

  7. Or online learning.

  8. Experiments we have performed have shown that using the negative information available for parameter selection, in case of one-class SVM can improve their performances of at least 10%.

  9. Please notice that in this figure, as well as others, interpolated curves are shown to provide a better rendering of a stream simulation (considering our protocol a more exact representation would have been dot or step plots). Consequently, there is apparent loss of accuracy between steps (between steps 5 and 6 here, for example). In fact, in our scenarii such loss only occurs at the new step (step 6 here) and not before.

  10. One book of 400 pages in online learning will have an approximate cost of 10 and of 14 s for, respectively, quality and content learning.

  11. Particularly, the time needed for reading files (models of SVM, features, etc.) has a huge impact.

References

  1. Helbing, D.: Thinking ahead: essays on big data, digital revolution, and participatory market society. p 194, Springer (2015)

  2. di Lenardo, I, Kaplan, F.: Venice Time Machine : Recreating the density of the past; Digital Humanities 2015, Sydney, June 29–July 3 (2015)

  3. Chen, N., Blostein, D.: A survey of document image classification: problem statement, classifier architecture and performance evaluation. Int. J. Doc. Anal. Recognit. 10(1), 1–16 (2007)

    Article  Google Scholar 

  4. Baharudin, B., Lee, L.H., Khan, K.: A review of machine learning algorithms for text-documents classification. J. Adv. Inf. Technol. 1(1), 4–20 (2010)

    Google Scholar 

  5. Prudent, Y., Ennaji, A.: A New Learning Algorithm For Incremental Self-Organizing Maps, ESANN 2005, pp. 27–29. Bruges, Belgium (2005)

    MATH  Google Scholar 

  6. Singh, U., Hasan, S.: Survey paper on document classification and classifiers. Int. J. Comput. Sci. Trends Technol. 3(2), 83–87 (2015)

    Google Scholar 

  7. G. Cauwenberghs, T. Poggio; Incremental and decremental support vector machine learning. In NIPS 2000, 13 (2001)

  8. Karasuyama, M., Takeuchi, I.: Multiple incremental decremental learning of support vector machines. IEEE Trans. Neural Networks 21(7), 1048–1059 (2010)

    Article  Google Scholar 

  9. Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of KDD 2000, ACM Press, New York, USA, pp. 71–80 (2000)

  10. Su, M.C., Lee, J., Hsieh, K.L.: A new Artmap-based neural network for incremental learning. Neurocomputing 69(16–18), 2284–2300 (2006)

    Article  Google Scholar 

  11. Lughofer, E.: Flexfis : a robust incremental learning approach for evolving Takagi–Sugeno fuzzy models. IEEE Trans. Fuzzy Syst. 16(6), 1393–1410 (2008)

    Article  Google Scholar 

  12. Minku, L., Li, F., Inoue, H., Yao, X.: Negative Correlation In Incremental Learning. Journal Natural Computing: An International Journal Archive, Kluwer Academic Publishers Hingham, MA, USA 8(2), 289–320 (2009)

  13. Song, S., Qiao, X., Chen, P.: Hierarchical text classification incremental learning. Neural Inf. Process. LNCS 5863, 247–258 (2009)

    Google Scholar 

  14. M.N. Kapp, R. Sabourin, P. Maupin; Adaptive incremental learning with an ensemble of support vector machines. In: 20th International Conference on Pattern Recognition, pp. 4048–4051 (2010)

  15. Laskov, P., Gehl, C., Kruger, S., Muller, K.-R.: Incremental support vector learning: analysis, implementation and applications. J. Mach. Learn. Res. 7, 1909–1936 (2006)

    MathSciNet  MATH  Google Scholar 

  16. Shilton, A., Palaniswami, M., Ralph, D., Tsoi, A.C.: Incremental training of support vector machines. IEEE Trans. Neural Netw. 16(1), 114–131 (2005)

    Article  Google Scholar 

  17. Polikar, R., Udpa, L., Udpa, S., Honavar, V.: Learn++: an incremental learning algorithm for supervised neural networks. IEEE Trans. Syst. Man And Cybern. (C) Spec. Issue Knowl. Manag. 31(4), 497–508 (2001)

    Article  Google Scholar 

  18. Connolly, J.-F., Granger, E., Sabourin, R.: Supervised Incremental Learning with the Fuzzy ARTMAP. IAPR Workshop on Artificial Neural Networks in Pattern Recognition, LNAI 5064(2008), pp 66–77 (2008)

  19. Carpenter, G.A., Grossberg, S., Markuzon, N., Reynolds, J.H., Rosen, D.B.: Fuzzy Artmap: a neural network architecture for incremental supervised learning of analog multidimensional maps. IEEE Trans. Neural Netw. 3, 698–713 (1992)

    Article  Google Scholar 

  20. Shen, F., Hasegawa, O.: Self-organizing Incremental Neural Network and Its Application; Artificial Neural Networks (ICANN’10) (2010)

  21. Almaksour, A., Anquetil, E.: Fast incremental learning strategy driven by confusion reject for online handwriting recognition. In: 10th International Conference On Document Analysis And Recognition (ICDAR’09), Spain (2009)

  22. Almaksour, A., Anquetil, E., Quiniou, S., Cheriet, M.: Evolving fuzzy classifiers application to incremental learning of handwritten gesture recognition. In: International Conference On Pattern Recognition (ICPR’10), Istanbul, Turkey (2010)

  23. Muhlbaier, M., Topalis, A., Polikar, R.: Learn++.NC: combining ensemble of classifiers combined with dynamically weighted consult-and-vote for efficient incremental learning of new classes. IEEE Trans. Neural Netw. 20(1), 152–168 (2009)

    Article  Google Scholar 

  24. Erdem, Z., Polikar, R., Gurgen, F., Yumusak, N.: Ensemble Of SVMs For Incremental Learning; Simulation, 3541, pp. 246–256, Springer (2005)

  25. Hamza, H., Belaïd, Y., Belaïd, A., Chaudhuri, B.B.: An end-to-end administrative document analysis system. In Document Analysis Systems (DAS’08), pp. 175–182 (2008)

  26. Bouguelia, M.R., Belaïd, Y., Belaïd, A.: Document image and zone classification through incremental learning. In: 20th IEEE International Conference On Image Processing (ICIP’13), pp. 4230–4234 (2013)

  27. Ristin, M., Guillaumin, M., Gall, J., Gool, L.V.: Incremental Learning of NCM Forests for Large-Scale Image Classification; Computer Vision and Pattern Recognition (CVPR’14), pp. 3654–3661 (2014)

  28. Littlestone, N.: Learning quickly when irrelevant attributes abound: a new linear threshold algorithm. Mach. Learn. 2, 285–318 (1988)

    Google Scholar 

  29. Bifet, A.: Adaptive Stream Mining: Pattern Learning And Mining From Evolving Data Streams. IOS Press Inc, Amsterdam (2010). http://www.iospress.nl/book/adaptive-stream-mining-pattern-learning-and-mining-from-evolving-data-streams/

  30. Lazarescu, M., Venkatesh, S., Bui, H.: Using multiple windows to track concept drift. Intell. Data Anal. IOS Press Amsterdam 8(1), 29–59 (2004)

  31. Alippi, C., Roveri, M.: Just-in-time adaptive classifiers in non-stationary conditions, pp. 1014–1019. IJCNN, IEEE, New York (2007)

  32. Alippi, C., Boracchi, G., Roveri, M.: Just in time classifiers: managing the slow-drift case, pp. 114–120. IJCNN, IEEE, New York (2009)

  33. R. Klinkenberg; Learning drifting concepts: example selection vs. example weighting. Intell. Data Anal. Special Issue On Incremental Learning Systems Capable Of Dealing With Concept Drift, 8(3) pp. 281–300 (2004)

  34. Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams classifiers. In: Proceeding of The 9th ACM SIGKDD International Conference, ACM Press, New York, pp. 226–235 (2003)

  35. Kolter, J., Maloof, M.: Dynamic Weighted Majority (DWM): An Ensemble Method For Drifting Concepts; JMLR’08, pp. 2755–2790 (2008)

  36. Street, W.N., Kim, Y.S.: A streaming ensemble algorithm (SEA) for large-scale classication. In Proceeding of The 7th International Conference On Knowledge Discovery And Data Mining, ACM Press, pp. 377–382 (2001)

  37. Oza, N.C.: Online Ensemble Learning; PhD Thesis, University Of California, Berkeley (2001)

  38. Nattee, C., Numao, M.: Geometric method for document understanding and classification using online machine learning. In: Proceeding Of The 6th International Conference On Document Analysis And Recognition, Seattle, USA, pp. 602606 (2001)

  39. Salles, T., Rocha, L., Pappa, G.L., Mouro, F., Meira, Jr. W., Gonalves, M.: Temporally-aware algorithms for document classification. In: Proceeding of the 33rd International Conference on Research and development in Information Retrieval (SIGIR’10), ACM, New York, NY, USA, pp. 307–314 (2010)

  40. Elwell, R., Polikar, R.: Incremental learning in nonstationary environments with controlled forgetting. In: International Joint Conference On Neural Networks (IJCNN 2009), Atlanta, GA, pp. 771–778 (2009)

  41. Elwell, R., Polikar, R.: Incremental learning of concept drift in non-stationary environments. IEEE Trans. Neural Netw. 22(10), 1517–1531 (2011)

    Article  Google Scholar 

  42. Bouillon, M., Anquetil, E., Almaksour, A.: Decremental learning of evolving fuzzy inference systems: application to handwritten gesture recognition. Mach. Learn. Data Min. Pattern Recognit. LNCS 7988, 115–129 (2013)

    Google Scholar 

  43. Syed, N., Liu, H., Sung, K.: Incremental learning with support vector machines. In: Proceeding of The Workshop On Support Vector Machines IJCAI’99, Stockholm, Sweden (1999)

  44. Rüping, S.: Incremental learning with support vector machines; ICDM01, pp. 641–642 (2001)

  45. Sato, J.R., Jane, R., Janaina, M.-M.: Measuring abnormal brains: building normative rules in neuroimaging using one-class support vector machines. Front. Neurosci. 6, 178 (2012). doi:10.3389/fnins.2012.00178

    Article  Google Scholar 

  46. Ngo-Ho, A-K., Ragot, N., Ramel, J-Y., Eglin, V., Sidere, N.: Document Classification in a Non-stationary Environment: a One-Class SVM Approach; ICDAR13, Washington DC, USA (2013)

  47. Ngo-Ho, A.-K., Ragot, N., Ramel, J.-Y., Eglin, V., Sidere, N.: Multi one-class incremental SVM for both stationary and non-stationary environment. In: 16th Confrence Francophone sur l’Apprentissage Automatique. Saint-Etienne, France (2014)

  48. Ngo-Ho, A.-K., Ragot, N., Ramel, J.-Y., Eglin, V., Sidere, N.: Multi one-class incremental svm for document stream digitization. In: 12th IAPR International Workshop on Document Analysis Systems. Santorini, Greece (2016)

  49. Scolkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution; technical report, microsoft research, MSR-TR-99-87 (1999)

  50. Kaynak, C.: Methods of combining multiple classifiers and their applications to handwritten digit recognition; Msc. Thesis, Institute Of Graduate Studies In Science And Engineering, Bogazici University (1995)

  51. Vinsonneau, E., Domenger, J-P., Cherif, A.: Mesure de la Netteté Sur Une image Seule Dans Des Documents Anciens, CIFED 2014, France (2014)

  52. Tong, H., Li, M., Zhang, H., Zhang, C.: Blur detection for digital images using wavelet transform. In: IEEE International Conference on Multimedia and Expo. (ICME04), vol. 1, IEEE, pp. 17–20 (2004)

  53. Zhuo, S., Sim, T.: Defocus map estimation from a single image. Pattern Recogniti. 44(9), 1852–1858 (2011)

    Article  Google Scholar 

  54. Lelegard, L., Bredif, M., Vallet, B., Boldo, D.: Motion Blur Detection in Aerial Images Shot with Channel-Dependent Exposure Time; International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences (IAPRS), vol. 38, part 3A, Saint-Mand, France, pp. 180–185 (2010)

  55. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)

    Article  MATH  Google Scholar 

  56. Chen, Y., Zhou, X.S., Huang, T.: One-class SVM for learning in image retrieval. In: IEEE International Conference on Image Processing (ICIP’2001), pp. 34–37 (2001)

  57. Brodersen, K.H., Ong, C.S., Stephan, K.E., Buhmann, J.M.: The balanced accuracy and its posterior distribution; ICPR’10, pp. 3121–3124 (2010)

  58. Chang, C.-C., Lin, C.-J.: LibSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011)

    Article  Google Scholar 

  59. Zhou, Z.-H., Chen, Z.-Q.: Hybrid decision tree. Knowl. Based Syst. 15(8), 515–528 (2002)

    Article  Google Scholar 

Download references

Acknowledgements

This research has been carried out under the DIGIDOC project with financial support of the ANR (French National Agency for Research).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Véronique Eglin.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ngo Ho, A.K., Eglin, V., Ragot, N. et al. A multi-one-class dynamic classifier for adaptive digitization of document streams. IJDAR 20, 137–154 (2017). https://doi.org/10.1007/s10032-017-0286-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-017-0286-6

Keywords

Navigation