An Efficient Method to Estimate Labelled Sample Size for Transductive LDA(QDA/MDA) Based on Bayes Risk

Liu, Han; Yuan, Xiaobin; Tang, Qianying; Kustra, Rafal

doi:10.1007/978-3-540-30115-8_27

Han Liu²²,
Xiaobin Yuan²³,
Qianying Tang²⁴ &
…
Rafal Kustra²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3201))

Included in the following conference series:

European Conference on Machine Learning

4218 Accesses
3 Citations

Abstract

As semi-supervised classification drawing more attention, many practical semi-supervised learning methods have been proposed. However,one important issue was ignored by current literature–how to estimate the exact size of labelled samples given many unlabelled samples. Such an estimation method is important because of the rareness and expensiveness of labelled examples and is also crucial in exploring the relative value of labelled and unlabelled samples given a specific model. Based on the assumption of a latent gaussian-distribution to the domain, we described a method to estimate the number of labels required in a dataset for semi-supervised linear discriminant classifiers (Transductive LDA) to reach an desired accuracy. Our technique extends naturally to handle two difficult problems: learning from gaussian distributions with different covariances, and learning for multiple classes. This method is evaluated on two datasets, one toy dataset and one real-world wine dataset. The result of this research can be used in areas such text mining, information retrieval or bioinformatics.

Download to read the full chapter text

Chapter PDF

Learning Gradient Boosted Multi-label Classification Rules

Soft-constrained Laplacian score for semi-supervised multi-label feature selection

Article 17 May 2015

Graph-Based Transductive Label Propagation Approach for Multi-label Classification

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Seeger, M.: Learning with labeled and unlabeled data (Technical Report), Institute for Adaptive and Neural Computation, University of Edinburgh, Edinburgh, United Kingdom, pp. 609-616 (2001)
Google Scholar
Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39(2/3), 103–134 (2000)
Article MATH Google Scholar
Bennett, K., Demiriz, A.: Semi-supervised support vector machines. In: Advances in Neural Information Processing Systems (NIPS) [NIPS99], pp. 1–7 (1999)
Google Scholar
Blum, A., Mitchell, T.: Combing labeled and unlabeled data with cotraining. In: Proc. Of the 1998 Conference on Computational Learning Theory, pp. 1–10 (1998)
Google Scholar
Blum, A., Chawla Learning, S.: from labeled and unlabeled data using graph mincut. In: Proc. 17th Intl Conf. on Machine Learning (ICML), pp. 1181–1188 (2001)
Google Scholar
Mardia, K., Kent, J., Bibby, J.: Multivariate Analysis. Academic Press, London (1979)
MATH Google Scholar
O’Neil, T.: Normal discrimination with unclassified observations. Journal of American Statistical Association 73(364), 821–826 (1978)
Article Google Scholar
Duda, R., Hart, P.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
MATH Google Scholar
Wu, Y., Tian, Q., Huang, T.S.: Discriminant-EM Algorithm with Application to Image Retrieval, Technical Report, UIUC, USA (1999)
Google Scholar
Zhang, T., Oles, F.: A probability Analysis on the value of unlabeled data for classification problem. In: ICML, pp. 1191–1198 (2000)
Google Scholar
Forina, M., et al.: PARVUS. An Extensdible Package for Data Exploration, Classification and Correlation. In: Institute of Pharmaceutical and Food Analysis and Technologies, Via Brigata Salerno, Italy
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Toronto, M5S 3G4, Toronto, Canada
Han Liu
Department of Statistics, University of Toronto, M5S 3G3, Toronto, Canada
Xiaobin Yuan & Rafal Kustra
Department of Electronics and Computer Engineering, University of Toronto,
Qianying Tang

Authors

Han Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaobin Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Qianying Tang
View author publications
You can also search for this author in PubMed Google Scholar
Rafal Kustra
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

INSA-Lyon, LIRIS CNRS UMR5205, F-69621, Villeurbanne, France
Jean-François Boulicaut
Dipartimento di Informatica, Università degli Studi di Bari,
Floriana Esposito
Pisa KDD Laboratory, ISTI - CNR, Area della Ricerca di Pisa, Via Giuseppe Moruzzi 1, Pisa, Italy
Fosca Giannotti
Dipartimento di Informatica, Via F. Buonarroti 2, 56127, Pisa, Italy
Dino Pedreschi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, H., Yuan, X., Tang, Q., Kustra, R. (2004). An Efficient Method to Estimate Labelled Sample Size for Transductive LDA(QDA/MDA) Based on Bayes Risk. In: Boulicaut, JF., Esposito, F., Giannotti, F., Pedreschi, D. (eds) Machine Learning: ECML 2004. ECML 2004. Lecture Notes in Computer Science(), vol 3201. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30115-8_27

Download citation

DOI: https://doi.org/10.1007/978-3-540-30115-8_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23105-9
Online ISBN: 978-3-540-30115-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

An Efficient Method to Estimate Labelled Sample Size for Transductive LDA(QDA/MDA) Based on Bayes Risk

Abstract

Chapter PDF

Similar content being viewed by others

Learning Gradient Boosted Multi-label Classification Rules

Soft-constrained Laplacian score for semi-supervised multi-label feature selection

Graph-Based Transductive Label Propagation Approach for Multi-label Classification

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

An Efficient Method to Estimate Labelled Sample Size for Transductive LDA(QDA/MDA) Based on Bayes Risk

Abstract

Chapter PDF

Similar content being viewed by others

Learning Gradient Boosted Multi-label Classification Rules

Soft-constrained Laplacian score for semi-supervised multi-label feature selection

Graph-Based Transductive Label Propagation Approach for Multi-label Classification

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation