MMDBench: A Benchmark for Hybrid Query in Multimodal Database

Mao, Along; Hu, Chuan; Li, Chong; Wang, Huajin; Rao, Junjian; Wang, Kainan; Shen, Zhihong

doi:10.1007/978-981-97-0316-6_6

Along Mao^10,11,
Chuan Hu^10,11,
Chong Li¹⁰,
Huajin Wang¹⁰,
Junjian Rao^10,11,
Kainan Wang^10,11 &
…
Zhihong Shen¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14521))

Included in the following conference series:

International Symposium on Benchmarking, Measuring and Optimization

96 Accesses

Abstract

Multimodal data, integrating various types of data like images, text, audio, and video, has become prevalent in the era of big data. However, there is a gap in benchmarking specifically designed for multimodal data, as existing benchmarks primarily focus on traditional and multimodel databases, lacking a comprehensive framework for evaluating systems handling multimodal data. In this paper, we present a novel benchmark program, named MMDBench, specifically designed to evaluate the performance of multimodal databases that accommodate various data modalities, including structured data, images, and text. The workload of MMDBench is composed of eleven tasks, inspired by real-world scenarios in social networks, where multiple data modalities are involved. Each task simulates a specific scenario that necessitates the integration of at least two distinct data modalities. To demonstrate the effectiveness of MMDBench, we have developed a hybrid database system to execute the workload and have uncovered diverse characteristics of multimodal databases in the execution of hybrid queries.

Supported by National Key R &D Program of China(Grant No. 2022YFF0711600), National Key R &D Program of China(Grant No. 2021YFF0704200) and Informatization Plan of Chinese Academy of Sciences(Grant No. CAS-WX2022GC-02).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Armstrong, T.G., Ponnekanti, V., Borthakur, D., Callaghan, M.: Linkbench: a database benchmark based on the Facebook social graph. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 1185–1196 (2013)
Google Scholar
Bronson, N., et al.: \(\{\)TAO\(\}\):\(\{\)Facebook’s\(\}\) distributed data store for the social graph. In: 2013 USENIX Annual Technical Conference (USENIX ATC 2013), pp. 49–60 (2013)
Google Scholar
Cai, Q., Wang, H., Li, Z., Liu, X.: A survey on multimodal data-driven smart healthcare systems: approaches and applications. IEEE Access 7, 133583–133599 (2019)
Article Google Scholar
Chandrasekaran, G., Nguyen, T.N., Hemanth D, J.: Multimodal sentimental analysis for social media applications: a comprehensive review. Wiley Interdisc. Rev.: Data Min. Knowl. Disc. 11(5), e1415 (2021)
Google Scholar
Chasseur, C., Li, Y., Patel, J.M.: Enabling JSON document stores in relational systems. In: WebDB, vol. 13, pp. 14–15 (2013)
Google Scholar
Erling, O., et al.: The LDBC social network benchmark: interactive workload. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 619–630 (2015)
Google Scholar
Ghazal, A., et al.: Bigbench v2: the new and improved bigbench. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 1225–1236. IEEE (2017)
Google Scholar
Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Proj. Rep. Stanford 1(12), 2009 (2009)
Google Scholar
Huang, G.B., Mattar, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database forstudying face recognition in unconstrained environments. In: Workshop on Faces in ‘Real-Life’ Images: Detection, Alignment, and Recognition (2008)
Google Scholar
Kim, B., Koo, K., Enkhbat, U., Kim, S., Kim, J., Moon, B.: M2bench: a database benchmark for multi-model analytic workloads. Proc. VLDB Endowment 16(4), 747–759 (2022)
Article Google Scholar
Misra, R.: News category dataset. arXiv preprint arXiv:2209.11429 (2022)
Nambiar, R.O., Poess, M.: The making of TPC-DS. In: VLDB, vol. 6, pp. 1049–1058 (2006)
Google Scholar
Rothe, R., Timofte, R., Gool, L.V.: Deep expectation of real and apparent age from a single image without facial landmarks. Int. J. Comput. Vision 126(2–4), 144–157 (2018)
Article MathSciNet Google Scholar
Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(1), 1–48 (2019)
Article Google Scholar
Wang, Z., Li, L., Li, Q., Zeng, D.: Multimodal data enhanced representation learning for knowledge graphs. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019)
Google Scholar
Wei, C., et al.: AnalyticDB-V: a hybrid analytical engine towards query fusion for structured and unstructured data. Proc. VLDB Endowment 13(12), 3152–3165 (2020)
Article Google Scholar
Wei, J., Zou, K.: Eda: easy data augmentation techniques for boosting performance on text classification tasks. arXiv preprint arXiv:1901.11196 (2019)
Zhang, C., Lu, J.: Holistic evaluation in multi-model databases benchmarking. Distrib. Parallel Databases 39, 1–33 (2021)
Article Google Scholar
Zhang, C., Lu, J., Xu, P., Chen, Y.: UniBench: a benchmark for multi-model database management systems. In: Nambiar, R., Poess, M. (eds.) TPCTC 2018. LNCS, vol. 11135, pp. 7–23. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11404-6_2
Chapter Google Scholar
Zhao, Z., Shen, Z., Mao, A., Wang, H., Hu, C.: PandaDB: an AI-native graph database for unified managing structured and unstructured data. In: Wang, X., et al. (eds.) Database Systems for Advanced Applications, DASFAA 2023. LNCS, vol. 13946, pp. 669–673. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-30678-5_53
Chapter Google Scholar
Zhu, X., et al.: Multi-modal knowledge graph construction and application: a survey. IEEE Trans. Knowl. Data Eng. (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
Along Mao, Chuan Hu, Chong Li, Huajin Wang, Junjian Rao, Kainan Wang & Zhihong Shen
University of Chinese Academy of Sciences, Beijing, China
Along Mao, Chuan Hu, Junjian Rao & Kainan Wang

Authors

Along Mao
View author publications
You can also search for this author in PubMed Google Scholar
Chuan Hu
View author publications
You can also search for this author in PubMed Google Scholar
Chong Li
View author publications
You can also search for this author in PubMed Google Scholar
Huajin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Junjian Rao
View author publications
You can also search for this author in PubMed Google Scholar
Kainan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhihong Shen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhihong Shen .

Editor information

Editors and Affiliations

TU Wien, Vienna, Austria
Sascha Hunold
Chinese Academy of Sciences, Beijing, China
Biwei Xie
Illinois Institute of Technology, Chicago, IL, USA
Kai Shu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mao, A. et al. (2024). MMDBench: A Benchmark for Hybrid Query in Multimodal Database. In: Hunold, S., Xie, B., Shu, K. (eds) Benchmarking, Measuring, and Optimizing. Bench 2023. Lecture Notes in Computer Science, vol 14521. Springer, Singapore. https://doi.org/10.1007/978-981-97-0316-6_6

Download citation

DOI: https://doi.org/10.1007/978-981-97-0316-6_6
Published: 14 February 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0315-9
Online ISBN: 978-981-97-0316-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MMDBench: A Benchmark for Hybrid Query in Multimodal Database