Skip to main content

Any Suggestions? Active Schema Support for Structuring Web Information

  • Conference paper
Database Systems for Advanced Applications (DASFAA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8422))

Included in the following conference series:

Abstract

Backed up by major Web players schema.org is the latest broad initiative for structuring Web information. Unfortunately, a representative analysis on a corpus of 733 million Web documents shows that, a year after its introduction, only 1.56% of documents featured any schema.org annotations. A probable reason is that providing annotations is quite tiresome, hindering wide-spread adoption. Here even state-of-the-art tools like Google’s Structured Data Markup Helper offer only limited support. In this paper we propose SASS, a system for automatically finding high quality schema suggestions for page content, to ease the annotation process. SASS intelligently blends supervised machine learning techniques with simple user feedback. Moreover, additional support features for binding attributes to values even further reduces the necessary effort. We show that SASS is superior to current tools for schema.org annotations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Berners-Lee, T.: Linked Data. Design issues for the World Wide Web Consortium (2006), http://www.w3.org/DesignIssues/LinkedData.html

  2. Bizer, C., et al.: Linked Data - The Story So Far. Int. J. Semant. Web Inf. Syst. (2009)

    Google Scholar 

  3. Cafarella, M.J., et al.: WebTables: Exploring the Power of Tables on the Web. PVLDB (2008)

    Google Scholar 

  4. Cafarella, M.J., Etzioni, O.: Navigating Extracted Data with Schema Discovery. Proc. of the 10th Int. Workshop on Web and Databases, WebDB (2007)

    Google Scholar 

  5. Finkel, J.R., et al.: Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. In: Proc. of Annual Meeting of the Assoc. for Comp. Linguistics, ACL (2005)

    Google Scholar 

  6. Freund, Y., Schapire, R.E.: A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci. 55, 1 (1997)

    Article  MathSciNet  Google Scholar 

  7. Homoceanu, S., Wille, P., Balke, W.-T.: ProSWIP: Property-based Data Access for Semantic Web Interactive Programming. In: Alani, H., et al. (eds.) ISWC 2013, Part I. LNCS, vol. 8218, pp. 184–199. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  8. Homoceanu, S., et al.: Review Driven Customer Segmentation for Improved E-Shopping Experience. In: Int. Conf. on Web Science, WebSci (2011)

    Google Scholar 

  9. Homoceanu, S., et al.: Will I Like It? Providing Product Overviews Based on Opinion Excerpts. IEEE (2011)

    Google Scholar 

  10. Homoceanu, S., Balke, W.-T.: A Chip Off the Old Block – Extracting Typical Attributes for Entities based on Family Resemblance (2013) (Under submission), http://www.ifis.cs.tu-bs.de/node/2859

  11. Jain, P., et al.: Contextual ontology alignment of LOD with an upper ontology: A case study with proton. The Semantic Web: Research and Applications (2011)

    Google Scholar 

  12. Jain, P., et al.: Ontology Alignment for Linked Open Data. Information. Retrieval. Boston (2010)

    Google Scholar 

  13. Khalili, A., Auer, S.: WYSIWYM – Integrated Visualization, Exploration and Authoring of Un-structured and Semantic Content. In: WISE (2013)

    Google Scholar 

  14. Norbaitiah, A., Lukose, D.: Enriching Webpages with Semantic Information. In: Proc. Dublin Core and Metadata Applications (2012)

    Google Scholar 

  15. Suchanek, F.M., Weikum, G.: YAGO: A Core of Semantic Knowledge Unifying WordNet and Wikipedia. In: WWW (2007)

    Google Scholar 

  16. Tversky, A.: Features of similarity. Psychol. Rev. 84, 4 (1977)

    Article  Google Scholar 

  17. Veres, C., Elseth, E.: Schema. org for the Semantic Web with MaDaME. In: Proc. of I-SEMANTICS (2013)

    Google Scholar 

  18. Whitelaw, C., Kehlenbeck, A., Petrovic, N., Ungar, L.: Web-scale named entity recognition. In: CIKM (2008)

    Google Scholar 

  19. Wittgenstein, L.: Philosophical investigations. The MacMillan Company, New York (1953)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Homoceanu, S., Geilert, F., Pek, C., Balke, WT. (2014). Any Suggestions? Active Schema Support for Structuring Web Information. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds) Database Systems for Advanced Applications. DASFAA 2014. Lecture Notes in Computer Science, vol 8422. Springer, Cham. https://doi.org/10.1007/978-3-319-05813-9_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-05813-9_17

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-05812-2

  • Online ISBN: 978-3-319-05813-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics