skip to main content
10.1145/1631058.1631067acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Large-scale multimedia semantic concept modeling using robust subspace bagging and MapReduce

Published:23 October 2009Publication History

ABSTRACT

With the rapid growth of multimedia data, it becomes increasingly important to develop semantic concept modeling approaches that are consistently effective, highly efficient, and easily scalable. To this end, we first propose the robust subspace bagging (RB-SBag) algorithm by augmenting random subspace bagging with forward model selection. Compared with traditional modeling approaches, RB-SBag offers a considerably faster learning process while minimizing the risk of overfitting. Its ensemble structure also enables a convenient transformation into a simple parallel framework called MapReduce. To further improve scalability, we also develop a task scheduling algorithm to optimize task placement for heterogenous tasks. On a collection consisting of more than 250,000 images and several standard TRECVID benchmark datasets, RB-SBag achieved more than a 10-fold speedup with comparable or even better classification performance than baseline SVMs. We also deployed the MapReduce implementation on a 16-node Hadoop cluster, where the proposed task scheduler demonstrates a significantly better scalability than the baseline scheduler in the presence of task heterogeneity.

References

  1. Hadoop. http://hadoop.apache.org/.Google ScholarGoogle Scholar
  2. Hadoop wiki. http://wiki.apache.org/hadoop/PoweredBy.Google ScholarGoogle Scholar
  3. L. Breiman. Bagging predictors. Machine Learning, 24(2):123--140, 1996. Google ScholarGoogle ScholarCross RefCross Ref
  4. L. Breiman. Random forests. Machine Learning, 45(1):5--32, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. E. Bryant. Data-intensive supercomputing: The case for disc. Technical report, School of Computer Science, Carnegie Mellon University, 2007.Google ScholarGoogle Scholar
  6. R. Caruana, A. Niculescu-Mizil, G. Crew, and A. Ksikes. Ensemble selection from libraries of models. In Intl. Conf. of Machine Learning, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. E. Y. Chang, K. Zhu, H. Wang, H. Bai, J. Li, and Z. Qiu. Psvm: Parallelizing support vector machines on distributed computers. In Advances in Neural Information Processing Systems, volume 20, 2007.Google ScholarGoogle Scholar
  8. C. Chu, S. Kim, Y. Lin, Y. Yu, G. Bradski, A. Ng, and K. Olukotun. Map-Reduce for machine learning on multicore. In Advances in Neural Information Processing Systems: Proceedings of the 2006 Conference, page 281. MIT Press, 2007.Google ScholarGoogle Scholar
  9. E. G. Coffman, M. R. Garey, and D. S. Johnson. An application of bin-packing to multiprocessor scheduling. SIAM Journal on Computing, 7(1):1--17, 1978.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107--113, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. K. Ho. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell., 20(8):832--844, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y.-G. Jiang, C.-W. Ngo, and J. Yang. Towards optimal bag-of-features for object categorization and semantic video retrieval. In Proceedings of the 6th ACM Intl. Conf. on Image and video retrieval, pages 494--501, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Joachims. Making large-scale support vector machine learning practical. In A. S. B. Schölkopf, C. Burges, editor, Advances in Kernel Methods: Support Vector Machines. MIT Press, Cambridge, MA, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y. Lu, L. Zhang, Q. Tian, and W.-Y. Ma. What Are the High-Level Concepts with Small Semantic Gaps? In CVPR08, 2008.Google ScholarGoogle Scholar
  15. G. Martinez-Munoz and A. Suárez. Pruning in ordered bagging ensembles. In Proceedings of the 23rd Intl. Conf. on Machine Learning, pages 609--616, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. R. Naphade and J. R. Smith. On the detection of semantic concepts at trecvid. In Proceedings of the 12th annual ACM international conference on Multimedia, pages 660--667, New York, NY, USA, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Over, T. Ianeva, W. Kraaij, and A. F. Smeaton. Trecvid 2006 overview. In NIST TRECVID-2006, 2006.Google ScholarGoogle Scholar
  18. Same Author. N/A. In N/A, 2007.Google ScholarGoogle Scholar
  19. D. Tao, X. Tang, X. Li, and X. Wu. Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans. Pattern Anal. Mach. Intell., 28(7):1088--1099, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Yan, J. Tesic, and J. R. Smith. Model-shared subspace boosting for multi-label classification. In Proceedings of the 13th ACM SIGKDD Intl. Conf. on Knowledge discovery and data mining, pages 834--843, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Zaharia, A. Konwinski, A. D. Joseph, R. H. Katz, and I. Stoica. Improving mapreduce performance in heterogeneous environments. Technical Report UCB/EECS-2008-99, EECS Department, University of California, Berkeley, Aug 2008.Google ScholarGoogle Scholar

Index Terms

  1. Large-scale multimedia semantic concept modeling using robust subspace bagging and MapReduce

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      LS-MMRM '09: Proceedings of the First ACM workshop on Large-scale multimedia retrieval and mining
      October 2009
      144 pages
      ISBN:9781605587561
      DOI:10.1145/1631058

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 October 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader