skip to main content
10.1145/2797115.2797124acmotherconferencesArticle/Chapter ViewAbstractPublication PageswimsConference Proceedingsconference-collections
research-article

A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time

Published: 13 July 2015 Publication History

Abstract

Promoted by major search engines, schema.org has become a widely adopted standard for marking up structured data in HTML web pages. In this paper, we use a series of large-scale Web crawls to analyze the evolution and adoption of schema.org over time. The availability of data from different points in time for both the schema and the websites deploying data allows for a new kind of empirical analysis of standards adoption, which has not been possible before. To conduct our analysis, we compare different versions of the schema.org vocabulary to the data that was deployed on hundreds of thousands of Web pages at different points in time. We measure both top-down adoption (i.e., the extent to which changes in the schema are adopted by data providers) as well as bottom-up evolution (i.e., the extent to which the actually deployed data drives changes in the schema). Our empirical analysis shows that both processes can be observed.

References

[1]
J. Ashraf, R. Cyganiak, S. O'Riain, and M. Hadzic. Open ebusiness ontology usage: Investigating community implementation of goodrelations. In LDOW, 2011.
[2]
C. Bizer, K. Eckert, R. Meusel, H. Mühleisen, M. Schuhmacher, and J. Völker. Deployment of rdfa, microdata, and microformats on the web -- a quantitative analysis. In ISWC. Springer, 2013.
[3]
P. Y. Chau and K. L. Hui. Determinants of small business edi adoption: an empirical investigation. Journal of Organizational Computing and Electronic Commerce, 11(4):229--252, 2001.
[4]
M. Chen. Factors affecting the adoption and diffusion of xml and web services standards for e-business systems. International Journal of Human-Computer Studies, 58(3):259--279, 2003.
[5]
M. Chen. An analysis of the driving forces for web services adoption. Information Systems and e-Business Management, 3(3):265--279, 2005.
[6]
B. Chiao, J. Lerner, and J. Tirole. The rules of standard-setting organizations: an empirical analysis. The RAND Journal of Economics, 38(4):905--930, 2007.
[7]
A. P. Ciganek, M. N. Haines, and W. Haseman. Horizontal and vertical factors influencing the adoption of web services. In System Sciences, 2006. HICSS'06. Proceedings of the 39th Annual Hawaii International Conference on, volume 6, pages 109c--109c. IEEE, 2006.
[8]
T. Fawcett. An introduction to roc analysis. Pattern recognition letters, 27(8):861--874, 2006.
[9]
B. Glimm, A. Hogan, M. Krötzsch, and A. Polleres. OWL: yet to arrive on the web of data? CoRR, abs/1202.0984, 2012.
[10]
K. Goel, R. V. Guha, and O. Hansson. Introducing rich snippets. http://googlewebmastercentral.blogspot.de/2009/05/introducing-rich-snippets.html, 2009.
[11]
I. Hickson, G. Kellogg, J. Tennison, and I. Herman. Microdata to rdf -- second edition, 2014. http://www.w3.org/TR/microdata-rdf/.
[12]
A. Hogan, A. Harth, A. Passant, S. Decker, and A. Polleres. Weaving the pedantic web. In Linked Data on the Web, 2010.
[13]
E. Kowalczuk, J. Potoniec, and A. Ławrynowicz. Extracting usage patterns of ontologies on the web: a case study on goodrelations vocabulary in rdfa. In OWLED, 2014.
[14]
R. Meusel and H. Paulheim. Heuristics for fixing errors in deployed schema.org microdata. In Extended Semantic Web Conference, 2015. to appear.
[15]
R. Meusel, P. Petrovski, and C. Bizer. The webdatacommons microdata, rdfa and microformat dataset series. In ISWC, 2014.
[16]
P. Mika and T. Potter. Metadata statistics for a large web corpus. In LDOW 2012: Linked Data on the Web, CEUR Workshop Proceedings, Vol. 937. CEUR-ws.org, 2012.
[17]
P. F. Patel-Schneider. Analyzing Schema.org. In International Semantic Web Conference, 2014.
[18]
H. Paulheim. What the adoption of schema.org tells about linked open data. In 2nd International Workshop on Dataset PROFIling & fEderated Search for Linked Data, 2015.
[19]
M. Schmachtenberg, C. Bizer, and H. Paulheim. Adoption of the linked data best practices in different topical domains. In ISWC, 2014.
[20]
C. E. Shannon. A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review, 5(1):3--55, 2001.
[21]
W3C. Sparql query language for rdf, 2008. http://www.w3.org/TR/rdf-sparql-query/.
[22]
A. Yee-Loong Chong and K.-B. Ooi. Adoption of interorganizational system standards in supply chains: an empirical analysis of rosettanet standards. Industrial Management & Data Systems, 108(4):529--547, 2008.

Cited By

View all
  • (2023)The Web Data Commons Schema.org Data Set SeriesCompanion Proceedings of the ACM Web Conference 202310.1145/3543873.3587331(136-139)Online publication date: 30-Apr-2023
  • (2023)VOYAGE: A Large Collection of Vocabulary Usage in Open RDF DatasetsThe Semantic Web – ISWC 202310.1007/978-3-031-47243-5_12(211-229)Online publication date: 27-Oct-2023
  • (2023)One schema to rule them all: How Schema.org models the world of searchJournal of the Association for Information Science and Technology10.1002/asi.24744Online publication date: 24-Feb-2023
  • Show More Cited By

Index Terms

  1. A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Other conferences
          WIMS '15: Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics
          July 2015
          176 pages
          ISBN:9781450332934
          DOI:10.1145/2797115
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          In-Cooperation

          • WNRI: Western Norway Research Institute
          • University of Cyprus

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 13 July 2015

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. Adoption
          2. Data Space Profiling
          3. Microdata
          4. Standardization
          5. schema.org

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Conference

          WIMS '15

          Acceptance Rates

          Overall Acceptance Rate 140 of 278 submissions, 50%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)10
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 03 Mar 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2023)The Web Data Commons Schema.org Data Set SeriesCompanion Proceedings of the ACM Web Conference 202310.1145/3543873.3587331(136-139)Online publication date: 30-Apr-2023
          • (2023)VOYAGE: A Large Collection of Vocabulary Usage in Open RDF DatasetsThe Semantic Web – ISWC 202310.1007/978-3-031-47243-5_12(211-229)Online publication date: 27-Oct-2023
          • (2023)One schema to rule them all: How Schema.org models the world of searchJournal of the Association for Information Science and Technology10.1002/asi.24744Online publication date: 24-Feb-2023
          • (2022)Aesthetic Trends and Semantic Web Adoption of Media Outlets Identified through Automated Archival Data ExtractionFuture Internet10.3390/fi1407020414:7(204)Online publication date: 30-Jun-2022
          • (2022)Estimation on the Importance of Semantic Web Integration for Art and Culture Related Online Media OutletsFuture Internet10.3390/fi1402003614:2(36)Online publication date: 24-Jan-2022
          • (2021)Making Open Educational Resources Discoverable: A JSON-LD Generator for OER Semantic Annotation2021 Eighth International Conference on eDemocracy & eGovernment (ICEDEG)10.1109/ICEDEG52154.2021.9530872(182-187)Online publication date: 28-Jul-2021
          • (2021)Wissensgraphen im WebSemantische Datenintelligenz im Einsatz10.1007/978-3-658-31938-0_7(127-151)Online publication date: 10-Sep-2021
          • (2020)Networking-Aware IoT Application DevelopmentSensors10.3390/s2003089720:3(897)Online publication date: 7-Feb-2020
          • (2020)The Semantic WebSemantic Web10.3233/SW-19038711:1(169-185)Online publication date: 1-Jan-2020
          • (2020)Are we better off with just one ontology on the Web?Semantic Web10.3233/SW-19037911:1(87-99)Online publication date: 1-Jan-2020
          • Show More Cited By

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media