skip to main content
10.1145/3543873.3587305acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
demonstration

DataExpo: A One-Stop Dataset Service for Open Science Research

Published: 30 April 2023 Publication History

Abstract

The large volumes of data on the Internet provides new opportunities for scientific discovery, especially promoting data-driven open science research. However, due to lack of accurate semantic markups, finding relevant data is still difficult. To address this problem, we develop a one-stop dataset service called DataExpo and propose a deep learning method for automatic metadata ingestion. In this demo paper, we describe the system architecture, and how DataExpo facilitates dataset discovery, search and recommendation. Up till now, DataExpo has indexed over 960,000 datasets from more than 27,000 repositories in the context of Deep-time Digital Earth Program. Demo visitors can explore our service via https://dataexpo.acemap.info.

References

[1]
Tarfah Alrashed, Dimitris Paparas, Omar Benjelloun, Ying Sheng, and Natasha F. Noy. 2021. Dataset or Not? A Study on the Veracity of Semantic Markup for Dataset Pages. In The Semantic Web - ISWC 2021 - 20th International Semantic Web Conference, ISWC 2021, Virtual Event, October 24-28, 2021, Proceedings, Vol. 12922. Springer, 338–356.
[2]
Omar Benjelloun, Shiyu Chen, and Natasha F. Noy. 2020. Google Dataset Search by the Numbers. In The Semantic Web - ISWC 2020 - 19th International Semantic Web Conference, Athens, Greece, November 2-6, 2020, Proceedings, Part II, Vol. 12507. Springer, 667–682.
[3]
Dan Brickley, Matthew Burgess, and Natasha F. Noy. 2019. Google Dataset Search: Building a search engine for datasets in an open Web ecosystem. In The World Wide Web Conference, WWW 2019, San Francisco, CA, USA, May 13-17, 2019. ACM, 1365–1375.
[4]
Eric Bunch, Qian You, and Glenn Fung. 2020. Human-In-The-Loop Topic Discovery with Embedded Text Representations. In 1st Workshop on Data Science with Human in the Loop, DaSH@KDD 2020, San Diego, California, USA, August 24, 2020.
[5]
Sonia Castelo, Rémi Rampin, Aécio S. R. Santos, Aline Bessa, Fernando Chirigati, and Juliana Freire. 2021. Auctus: A Dataset Search Engine for Data Discovery and Augmentation. Proc. VLDB Endow. 14, 12 (2021), 2791–2794.
[6]
Xiaoling Chen, Anupama E. Gururaj, Ibrahim Burak Özyurt, Ruiling Liu, Ergin Soysal, Trevor Cohen, Firat Tiryaki, Yueling Li, Nansu Zong, 2018. DataMed - an open source discovery index for finding biomedical datasets. J. Am. Medical Informatics Assoc. 25, 3 (2018), 300–308.
[7]
Ramanathan V. Guha, Dan Brickley, and Steve Macbeth. 2016. Schema.org: evolution of structured data on the web. Commun. ACM 59, 2 (2016), 44–51.
[8]
Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL. ACL, 1746–1751.
[9]
Yaojie Lu, Qing Liu, Dai Dai, Xinyan Xiao, Hongyu Lin, Xianpei Han, Le Sun, and Hua Wu. 2022. Unified Structure Generation for Universal Information Extraction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022. 5755–5772.
[10]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research 21 (2020), 140:1–140:67.
[11]
Stephen E. Robertson and Hugo Zaragoza. 2009. The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3, 4 (2009), 333–389.
[12]
Chengshan Wang, Robert M Hazen, Qiuming Cheng, Michael H Stephenson, Chenghu Zhou, Peter Fox, Shu-zhong Shen, Roland Oberhänsli, Zengqian Hou, 2021. The Deep-Time Digital Earth program: data-driven discovery in geosciences. National Science Review 8, 9 (02 2021).
[13]
Chunting Zhou, Chonglin Sun, Zhiyuan Liu, and Francis C. M. Lau. 2015. A C-LSTM Neural Network for Text Classification. CoRR abs/1511.08630 (2015).

Cited By

View all
  • (2024)OXYGENERATORProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693419(33219-33242)Online publication date: 21-Jul-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '23 Companion: Companion Proceedings of the ACM Web Conference 2023
April 2023
1567 pages
ISBN:9781450394192
DOI:10.1145/3543873
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 April 2023

Check for updates

Author Tags

  1. dataset discovery
  2. meta search
  3. web mining

Qualifiers

  • Demonstration
  • Research
  • Refereed limited

Conference

WWW '23
Sponsor:
WWW '23: The ACM Web Conference 2023
April 30 - May 4, 2023
TX, Austin, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)106
  • Downloads (Last 6 weeks)6
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)OXYGENERATORProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693419(33219-33242)Online publication date: 21-Jul-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media