Abstract
For automatic object detection tasks, large amounts of training images are usually labeled to achieve more reliable training of the object classifiers; this is cost-expensive since it requires hiring professionals to label large-scale training images. When a large number of object classes come into view, the issue of obtaining a large enough amount of the labeled training images becomes more critical. There are three potential solutions to reduce the burden for image labeling: (1) allowing people to provide the object labels loosely at the image level rather than at the object level (e.g., loosely-tagged images without identifying the exact object locations in the images); (2) harnessing large-scale collaboratively-tagged images that are available on the Internet; and, (3) developing new machine learning algorithms that can directly leverage large-scale collaboratively- or loosely-tagged images for achieving more effective training of a large number of object classifiers. Based on these observations, a multi-task multi-label multiple instance learning (MTML-MIL) algorithm is developed in this paper by leveraging both interobject correlations and large-scale loosely-labeled images for object classifier training. By seamlessly integrating multi-task learning, multi-label learning, and multiple instance learning, our MTML-MIL algorithm can achieve more accurate training of a large number of inter-related object classifiers (where an object network is constructed for determining the inter-related learning tasks directly in the feature space rather than in the label space). Our experimental results have shown that our MTML-MIL algorithm can achieve higher detection accuracy rates for automatic object detection.
Similar content being viewed by others
References
Boutell, M.R., Luo, J., Shen, X., Brown, C.M., 2004. Learning multi-label scene classification. Pattern Recogn., 37(9):1757–1771. [doi:10.1016/j.patcog.2004.03.009]
Chen, Y., Bi, J., Wang, J.Z., 2006. MILES: multiple instance learning via embedded instance selection. IEEE Trans. PAMI, 28(12):1931–1947. [doi:10.1109/TPAMI.2006.248]
Deng, Y., Manjunath, B.S., 1999. Color Image Segmentation. IEEE CVPR, p.2446–2451. [doi:10.1109/CVPR.1999.784719]
Evgeniou, T., Micchelli, C.A., Pontil, M., 2005. Learning multiple tasks with kernel methods. J. Mach. Learn. Res., 6:615–637.
Fan, J., Gao, Y., Luo, H., 2004. Multi-Level Annotation of Natural Scenes Using Dominant Image Components and Semantic Image Concepts. ACM Multimedia, p.540–547. [doi:10.1145/1027527.1027660]
Fan, J., Luo, H., Gao, Y., Jain, R., 2007. Incorporating concept ontology for hierarchical video classification, annotation and visualization. IEEE Trans. Multimedia, 9(5):939–957. [doi:10.1109/TMM.2007.900143]
Fan, J., Gao, Y., Luo, H., 2008a. Integrating concept ontology and multi-task learning to achieve more effective classifier training for multi-level image annotation. IEEE Trans. Image Process., 17(3):407–426. [doi:10.1109/TIP.2008.916999]
Fan, J., Gao, Y., Luo, H., Jain, R., 2008b. Mining multi-level image semantics via hierarchical classification IEEE Trans. Multimedia, 10(1):167–187. [doi:10.1109/TMM.2007.911775]
Fan, J., Shen, Y., Zhou, N., Gao, Y., 2010. Harvesting Large-Scale Weakly-Tagged Image Databases from the Web. IEEE CVPR, p.802–809. [doi:10.1109/CVPR.2010.5540135]
Fan, R., Chen, P., Lin, C.J., 2005. Working set selection using the second order information for training SVM. J. Mach. Learn. Res., 6:1889–1918.
Frey, B.J., Dueck, D., 2007. Clustering by passing messages between data points. Science, 315(5814):972–976. [doi:10.1126/science.1136800]
Graf, H.P., Cosatto, E., Bottou, L., Durdanovic, I., Vapnik, V., 2004. Parallel Support Vector Machines: the Cascade SVM. NIPS, p.1–8.
Hanley, J.A., McNeil, B.J., 1982. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1):29–36.
Jiang, W., Chang, S.F., Loui, A., 2007. Context-Based Concept Fusion with Boosted Conditional Random Fields. IEEE ICASSP, p.949–952. [doi:10.1109/ICASSP.2007.366066]
Joachims, T., Finley, T., Yu, C., 2009. Cuttingplane training of structural SVMs. Mach. Learn., 77(1):27–59. [doi:10.1007/s10994-009-5108-8]
Kumar, S., Herbert, M., 2006. Discriminative random fields. Int. J. Comput. Vis., 68(2):179–201. [doi:10.1007/s11263-006-7007-9]
Liu, J., Li, M., Ma, W.Y., Liu, Q., Lu, H., 2006. An Adaptive Graph Model for Automatic Image Annotation. ACM Multimedia Workshop on MIR, p.61–70. [doi:10.1145/1178677.1178689]
Maron, O., Ratan, A.L., 1998. Multiple-Instance Learning for Natural Scene Classification. ICML, p.341–349.
Qi, G.J., Hua, X.S., Rui, Y., Tang, J., Mei, T., Zhang, H.J., 2007. Correlative Multi-Label Video Annotation. ACM Multimedia, p.17–26. [doi:10. 1145/1291233.1291245]
Russell, B., Efros, A., Sivic, J., Freeman, W., Zisserman, A., 2006. Using Multiple Segmentations to Discover Objects and Their Extent in Image Collections. IEEE CVPR, p.1605–1614. [doi:10.1109/CVPR.2006.326]
Tang, J., Hua, X., Wang, M., Gu, Z., Qi, G., Wu, X., 2009. Correlative linear neighborhood propagation for video annotation. IEEE Trans. SMC, 39(2):409–416. [doi:10.1109/TSMCB.2008.2006045]
Torralba, A., Murphy, K.P., Freeman, W.T., 2004. Sharing Features: Efficient Boosting Procedures for Multiclass Object Detection. IEEE CVPR, p.762–769. [doi:10.1109/CVPR.2004.1315241]
Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y., 2005. Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res., 6:1453–1484.
Vijayanarasimhan, S., Grauman, K., 2008. Keywords to Visual Categories: Multiple-Instance Learning for Weakly Supervised Object Categorization. IEEE CVPR, p.1–8. [doi:10.1109/CVPR.2008.4587632]
Yang, J., Liu, Y., Ping, E.X., Hauptmann, A.G., 2007. Harmonium Models for Semantic Video Representation and Classification. SIAM Conf. on Data Mining, p.1–12.
Zha, Z., Hua, X.S., Mei, T., Wang, J., Qi, G.J., Wang, Z., 2008. Joint Multi-Label Multi-Instance Learning for Image Classification. IEEE CVPR, p.1–8. [doi:10.1109/CVPR.2008.4587384]
Zhang, Q., Yu, W., Goldman, S.A., Fritts, J.E., 2002. Content-Based Image Retrieval Using Multiple-Instance Learning. ICML, p.682–689.
Zhu, Z.H., Zhang, M.L., 2006. Multi-Instance Multi-Label Learning with Application to Scene Classification. NIPS, p.1609–1616.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shen, Y., Fan, Jp. Multi-taskmulti-labelmultiple instance learning. J. Zhejiang Univ. - Sci. C 11, 860–871 (2010). https://doi.org/10.1631/jzus.C1001005
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/jzus.C1001005
Key words
- Object network
- Loosely tagged images
- Multi-task learning
- Multi-label learning
- Multiple instance learning