Abstract
Learning from Positive and Unlabelled examples (LPU) has emerged as an important problem in data mining and information retrieval applications. Existing techniques are not ideally suited for real world scenarios where the datasets are linearly inseparable, as they either build linear classifiers or the non-linear classifiers fail to achieve the desired performance. In this work, we propose to extend maximum margin clustering ideas and present an iterative procedure to design a non-linear classifier for LPU. In particular, we build a least squares support vector classifier, suitable for handling this problem due to symmetry of its loss function. Further, we present techniques for appropriately initializing the labels of unlabelled examples and for enforcing the ratio of positive to negative examples while obtaining these labels. Experiments on real-world datasets demonstrate that the non-linear classifier designed using the proposed approach gives significantly better generalization performance than the existing relevant approaches for LPU.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Schneider, K.-M.: Learning to Filter Junk E-Mail from Positive and Unlabeled Examples. In: Su, K.-Y., Tsujii, J., Lee, J.-H., Kwong, O.Y. (eds.) IJCNLP 2004. LNCS (LNAI), vol. 3248, pp. 426–435. Springer, Heidelberg (2005)
Zhang, B., Zuo, W.: Learning from Positive and Unlabeled Examples: A Survey. In: Yu, F., Luo, Q. (eds.) International Symposium on Information Processing, pp. 650–654. IEEE Computer Society (2008)
Liu, B., Dai, Y., Li, X., Lee, W.S., Yu, P.S.: Building Text Classifiers Using Positive and Unlabeled Examples. In: Proceedings of the 3rd IEEE International Conference on Data Mining, pp. 179–188 (2003)
Zhang, K., Tsang, I.W., Kwok, J.T.: Maximum Margin Clustering Made Practical. IEEE Transactions on Neural Networks 20(4), 583–596 (2009)
Manevitz, L.M., Yousef, M.: One-class SVMs for Document Classification. Journal of Machine Learning Research 2, 139–154 (2001)
Zhang, B., Zuo, W.: Reliable Negative Extracting Based on kNN for Learning from Positive and Unlabeled Examples. Journal of Computers 4(1), 94–101 (2009)
Yu, H., Han, J., Chang, K.C.C.: PEBL: Positive Example Based Learning for Web Page Classification using SVM. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 239–248. ACM Press, New York (2002)
Elkan, C., Noto, K.: Learning Classifiers from Only Positive and Unlabeled Data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, pp. 213–220. ACM, New York (2008)
Calvo, B., Larraaga, P., Lozano, J.A.: Learning Bayesian Classifiers from Positive and Unlabeled Examples. Pattern Recognition Letters 28(16), 2375–2384 (2007)
Suykens, J.A.K., Vandewalle, J.: Least Squares Support Vector Machine Classifiers. Neural Processing Letters 9, 293–300 (1999)
Joachims, T.: Transductive Inference for Text Classification using Support Vector Machines. In: Proceedings of the Sixteenth International Conference on Machine Learning, pp. 200–209. Morgan Kaufmann Publishers Inc., San Francisco (1999)
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chaudhari, S., Shevade, S. (2012). Learning from Positive and Unlabelled Examples Using Maximum Margin Clustering. In: Huang, T., Zeng, Z., Li, C., Leung, C.S. (eds) Neural Information Processing. ICONIP 2012. Lecture Notes in Computer Science, vol 7665. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34487-9_56
Download citation
DOI: https://doi.org/10.1007/978-3-642-34487-9_56
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34486-2
Online ISBN: 978-3-642-34487-9
eBook Packages: Computer ScienceComputer Science (R0)