skip to main content
10.1145/2733373.2807418acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
tutorial

Learning Knowledge Bases for Multimedia in 2015

Published:13 October 2015Publication History

ABSTRACT

Knowledge acquisition, representation, and reasoning has been one of the long-standing challenges in artificial intelligence and related application areas. Only in the past few years, massive amounts of structured and semi-structured data that directly or indirectly encode human knowledge be- came widely available, turning the knowledge representation problems into a computational grand challenge with feasible solutions in sight. The research and development on knowledge bases is becoming a lively fusion area among web in- formation extraction, machine learning, databases and information retrieval, with knowledge over images and multimedia emerging as another new frontier of representation and acquisition. This tutorial aims to present a gentle overview of knowledge bases on text and multimedia, including representation, acquisition, and inference. In particular, the 2015 edition of the tutorial will include recent progress from several active research communities: web, natural language processing, and computer vision and multimedia.

References

  1. S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh. Vqa: Visual question answering. arXiv preprint arXiv:1505.00468, 2015.Google ScholarGoogle Scholar
  2. J. Borge-Holthoefer and A. Arenas. Semantic networks: Structure and dynamics. Entropy, 12(5):1264--1302, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  3. A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. Hruschka Jr, and T. M. Mitchell. Toward an architecture for never-ending language learning. In AAAI, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. X. Chen and C. Lawrence Zitnick. Mind's eye: A recurrent visual representation for image caption generation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.Google ScholarGoogle ScholarCross RefCross Ref
  5. X. Chen, A. Shrivastava, and A. Gupta. Neil: Extracting visual knowledge from web data. ICCV, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, June 2015.Google ScholarGoogle ScholarCross RefCross Ref
  7. X. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang. Knowledge vault. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '14, pages 601--610, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. O. Etzioni, M. Banko, S. Soderland, and D. S. Weld. Open information extraction from the web. Communications of the ACM, 51(12):68--74, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. Fang, S. Gupta, F. Iandola, R. K. Srivastava, L. Deng, P. Dollar, J. Gao, X. He, M. Mitchell, J. C. Platt, C. Lawrence Zitnick, and G. Zweig. From captions to visual concepts and back. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.Google ScholarGoogle ScholarCross RefCross Ref
  10. H. Gao, J. Mao, J. Zhou, Z. Huang, L. Wang, and W. Xu. Are you talking to a machine? dataset and methods for multilingual image question answering. arXiv preprint arXiv:1505.05612, 2015.Google ScholarGoogle Scholar
  11. T. L. Griffiths, M. Steyvers, and A. Firl. Google and the mind: Predicting fluency with pagerank. Psychological Science, 18(12):1069--1076, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  12. M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. Journal of Artificial Intelligence Research, pages 853--899, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg. Baby talk: Understanding and generating simple image descriptions. In CVPR, pages 1601--1608. IEEE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. B. Lenat. Cyc: A large-scale investment in knowledge infrastructure. Communications of the ACM, 38(11):33--38, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. Li, Z. Ming, H. Li, and T.-S. Chua. Video reference: question answering on youtube. MM '09, pages 773--776, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. Liu and P. Singh. ConceptNet -- a practical commonsense reasoning tool-kit. BT technology journal, 22(4):211--226, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Miller. WordNet: a lexical database for English. Communications of the ACM, 38(11):39--41, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Nickel, K. Murphy, V. Tresp, and E. Gabrilovich. A review of relational machine learning for knowledge graphs: From multi-relational link prediction to automated knowledge graph construction. arXiv preprint arXiv:1503.00759, 2015.Google ScholarGoogle Scholar
  19. P. Perona. Vision of a visipedia. Proceedings of the IEEE, 98(8):1526--1534, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  20. M. Ren, R. Kiros, and R. Zemel. Image question answering: A visual semantic embedding model and a new dataset. arXiv preprint arXiv:1505.02074, 2015.Google ScholarGoogle Scholar
  21. S. Riedel, L. Yao, A. Mccallum, and B. M. Marlin. Relation Extraction with Matrix Factorization and Universal Schemas. In HLT-NAACL '13, 2013.Google ScholarGoogle Scholar
  22. F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: A Core of Semantic Knowledge. In WWW '07, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.Google ScholarGoogle ScholarCross RefCross Ref
  24. W. Wu, H. Li, H. Wang, and K. Q. Zhu. Probase: a probabilistic taxonomy for text understanding. SIGMOD, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. L. Xie and X. He. Picture tags and world knowledge: Learning tag relations from visual semantic sources. In ACM Multimedia, October 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Learning Knowledge Bases for Multimedia in 2015

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MM '15: Proceedings of the 23rd ACM international conference on Multimedia
      October 2015
      1402 pages
      ISBN:9781450334594
      DOI:10.1145/2733373

      Copyright © 2015 Owner/Author

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 October 2015

      Check for updates

      Qualifiers

      • tutorial

      Acceptance Rates

      MM '15 Paper Acceptance Rate56of252submissions,22%Overall Acceptance Rate995of4,171submissions,24%

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader