Skip to main content
Log in

Multi-label incremental learning applied to web page categorization

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Multi-label problems are challenging because each instance may be associated with an unknown number of categories, and the relationship among the categories is not always known. A large amount of data is necessary to infer the required information regarding the categories, but these data are normally available only in small batches and distributed over a period of time. In this work, multi-label problems are tackled using an incremental neural network known as the evolving Probabilistic Neural Network (ePNN). This neural network is capable of continuous learning while maintaining a reduced architecture, so that it can always receive training data when available with no drastic growth of its structure. We carried out a series of experiments on web page data sets and compared the performance of ePNN to that of other multi-label categorizers. On average, ePNN outperformed the other categorizers in four out of five metrics used for evaluation, and the structure of ePNN was less complex than that of the other algorithms evaluated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. Times were obtained using a PC with an Intel Dual Core 2.30 GHz processor with 4 GB of RAM.

  2. Data set available at http://www.inf.ufes.br/alberto/vitoria.tar.gz.

  3. Data available at http://mulan.sourceforge.net/datasets.html.

References

  1. Baeza-Yates R, Ribeiro-Neto B (1998) Modern information retrieval, 1st edn. Addison-Wesley, New York

    Google Scholar 

  2. Bevington PR, Robinson DK (2003) Data reduction and error analysis for the physical sciences, 3rd edn. Mc Graw Hill, New York

    Google Scholar 

  3. Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recogn 37(9):1757–1771

    Article  Google Scholar 

  4. Bueno R, Traina AJM, Traina JC (2007) Genetic algorithms for approximate similarity queries. Data Knowl Eng 62(3):459–482

    Article  Google Scholar 

  5. Cheng W, Hullermeier E (2009) Combining instance-based learning and logistic regression for multilabel classification. Mach Learn 76(2–3):211–225

    Article  Google Scholar 

  6. Ciarelli PM, Oliveira E, Salles EOT (2010) An evolving system based on probabilistic neural network. 11th Brazilian symposium on neural networks, pp 1–6

  7. Ciarelli PM, Oliveira E, Salles EOT (2012) An incremental neural network with a reduced architecture. Neural Netw 35:70–81

    Article  Google Scholar 

  8. CNAE (2003) Classificaçõ Nacional de Atividades Econômicas—Fiscal (CNAE-Fiscal) 1.1. Tech. rep., Instituto Brasileiro de Geografia e Estatística (IBGE), Rio de Janeiro, RJ

  9. Comité FD, Gilleron R, Tommasi M (2003) Learning multi-label alternating decision tree from texts and data. In: Proceedings of the 3rd international conference on machine learning and data mining in pattern recognition, vol 2734, pp 35–49

  10. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39:1–38

    MATH  MathSciNet  Google Scholar 

  11. Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley-Interscience, New York

    Google Scholar 

  12. Elisseeff A, Weston J (2002) A kernel method for multi-labelled classification. Adv Neural Inf Process Syst 14:681–687

    Google Scholar 

  13. Oliveira E, Ciarelli PM, Badue C, Souza AFD (2008) A comparison between a kNN based approach and a PNN algorithm for a multi-label classification problem. In: 8th international conference on intelligent systems design and applications, pp 628–633

  14. Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2007) Numerical recipes—the art of scientific computing, 3rd edn. Cambridge University Press, New York

    Google Scholar 

  15. Saad R, Halgamuge SK, Li J (2007) Polynomial kernel adaptation and extensions to the SVM classifier learning. Neural Comput Appl 17(1):19–25

    Article  Google Scholar 

  16. Sarinnapakorn K, Kubat M (2008) Induction from multi-label examples in information retrieval systems: a case study. Appl Artif Intell 22(5):407–432

    Article  Google Scholar 

  17. Schapire RE, Singer Y (2000) BoosTexter: a boosting-based system for text categorization. Mach Learn 39(2–3):135–168

    Article  MATH  Google Scholar 

  18. Souza AFD, Pedroni F, Oliveira E, Ciarelli PM, Henrique WF, Veronese L, Badue C (2009) Automated multi-label text categorization with VG-RAM weightless neural networks. Neurocomputing 72(10–12):2209–2217

    Article  Google Scholar 

  19. Specht DF (1988) Probabilistic neural networks for classification, mapping, or associative memory. IEEE Int Conf Neural Netw 1(24):525–532

    Article  Google Scholar 

  20. Spyromitros E, Tsoumakas G, Vlahavas I (2008) An empirical study of lazy multilabel classification algorithms. SETN ’08: proceedings of the 5th Hellenic conference on artificial intelligence, pp 401–406

  21. Vlassis NA, Papakonstantinou G, Tsanakas P (1999) Mixture density estimation based on maximum likelihood and sequential test statistics. Neural Process Lett 9:63–76

    Article  Google Scholar 

  22. Yu C, Cui B, Wang S, Su J (2007) Efficient index-based kNN join processing for high-dimensional data. Inf Softw Technol 49(4):332–344

    Article  Google Scholar 

  23. Zhang ML, Zhou ZH (2007) ML-kNN: a lazy learning approach to multi-label learning. Pattern Recognit 40(7):2038–2048

    Article  MATH  Google Scholar 

  24. Zhang ML, Pena JM, Robles V (2009) Feature selection for multi-label naive bayes classification. Inf Sci 179(19):3218–3229

    Article  MATH  Google Scholar 

  25. Zhang Z, Chen C, Sun J, Chan KL (2003) EM algorithms for gaussian mixtures with split-and-merge operation. Pattern Recognit 36:1973–1983

    Article  MATH  Google Scholar 

Download references

Acknowledgments

We would like to thank Min-Ling Zhang for all the help with the ML-kNN categorization tool and web page data sets. P.M. Ciarelli thanks PPGEE (Programa de Pós-Graduação da Engenharia Elétrica) of UFES (Universidade Federal do Espírito Santo).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick Marques Ciarelli.

Appendix

Appendix

Suppose that a training instance x is presented to the neural network and the outputs of all components of each GMM in the neural network are calculated.

If the component s is the most activated, and it is not in the GMM assigned to the class of x, then it is desirable to reduce the output value of s and increase the output value of the component r, which is the most activated in the GMM assigned to the class of x. In other words, if

$$ f_r(x,\mu_{r},\Upsigma_{r},\varphi_{r}) < f_s(x,\mu_{s},\Upsigma_{s},\varphi_{s}), $$

then it is desired to find new values of the receptive field sizes of the components s (\(\varphi_{s_{\rm new}}\)) and r (\(\varphi_{r_{\rm new}}\)), such that

$$ f_r(x,\mu_{r},\Upsigma_{r},\varphi_{r_{\rm new}}) \geq f_s(x,\mu_{s},\Upsigma_{s},\varphi_{s_{\rm new}}). $$
(17)

The value of \(\varphi_{r_{\rm new}}\) is computed using Eq. (6). To obtain the value of \(\varphi_{s_{\rm new}}\), a constant η, 0 < η ≤ 1, should be added to Eq. (17) to achieve equality

$$ \eta f_r(x,\mu_{r},\Upsigma_{r},\varphi_{r_{\rm new}}) = f_s(x,\mu_{s},\Upsigma_{s},\varphi_{s_{\rm new}}). $$

Therefore, from Eq. (2) one obtains

$$ \eta f_r(x,\mu_{r},\Upsigma_{r},\varphi_{r_{\rm new}}) = \frac{1}{\sqrt{2\pi}\varphi_{s_{\rm new}}}\exp\left(\frac{D_s}{\varphi_{s_{\rm new}}^{2}}\right), $$

where D s  = xT μ s  − 1.

To solve this equation, a Taylor expansion is applied to the exponential function to linearize it. Therefore,

$$ \eta f_r(x,\mu_{r},\Upsigma_{r},\varphi_{r_{\rm new}}) = \frac{1}{\sqrt{2\pi}{\varphi_{s_{\rm new}}}} \left[\exp\left(\frac{D_s}{\varphi_{s}^2}\right) - 2\frac{D_s}{\varphi_{s}^{3}} \exp\left(\frac{D_s}{\varphi_{s}^2}\right)(\varphi_{s_{\rm new}} - \varphi_{s}) \right]. $$

The Taylor expansion is valid for small values of \(\epsilon = \varphi_{s_{\rm new}} - \varphi_{s}\). After further manipulation, the value of \(\varphi_{s_{\rm new}}\) can be obtained using Eq. (18),

$$ \varphi_{s_{\rm new}} = \frac{\varphi_{s} f_s(x,\mu_{s},\Upsigma_{s},\varphi_{s})(\varphi_{s}^{2} + 2D_s)} {\eta\varphi_{s}^{2}f_r(x,\mu_{r},\Upsigma_{r},\varphi_{r_{\rm new}}) + 2D_s f_s(x,\mu_{s},\Upsigma_{s},\varphi_{s})}. $$
(18)

To prevent the value of the receptive field size from being significantly altered when a single training instance is presented, two thresholds are employed. The first is used over the saturated linear function to limit the value and render the Taylor expansion applicable [parameter α in Eq. (7)]. The second threshold is used when the receptive field size is updated [parameter ρ in Eqs. (6)–(7)].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ciarelli, P.M., Oliveira, E. & Salles, E.O.T. Multi-label incremental learning applied to web page categorization. Neural Comput & Applic 24, 1403–1419 (2014). https://doi.org/10.1007/s00521-013-1345-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-013-1345-7

Keywords

Navigation