Skip to main content

A Schema-Less Data Model for the Web

  • Conference paper
  • First Online:
Conceptual Modeling (ER 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9381))

Included in the following conference series:

  • 1821 Accesses

Abstract

To extract and represent domain-independent web scale data, we introduce a schema-less and self-describing data model called Object-oriented Web Model (OWM), which is rich in semantics and flexible in structure. It represents web pages as objects with hierarchical structures and links in a web page as relationships to other objects, so that objects form a network. Taking use of web segmentation techniques, data from data-intensive web pages can be extracted, represented and integrated as OWM objects.

This work is supported by National Natural Science Funds of China under grant No. 61202100.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ferrara, E., De Meo, P., Fiumara, G., Baumgartner, R.: Web data extraction, applications and techniques: a survey. CoRR (2012)

    Google Scholar 

  2. Chang, C.H., Kayed, M., Girgis, M.R., Shaalan, K.F.: A survey of web information extraction systems. IEEE Trans. Knowl. Data Eng. 18(10), 1411–1428 (2006)

    Article  Google Scholar 

  3. Sarawagi, S.: Automation in information extraction and integration. In: Tutorial of The 28th International Conference on Very Large Data Bases (VLDB) (2002)

    Google Scholar 

  4. Su, W., Wang, J., Lochovsky, F.H.: Ode: Ontology-assisted data extraction. ACM Trans. Database Syst. (TODS) 34(2), 12 (2009)

    Article  Google Scholar 

  5. Embley, D.W.: Toward semantic understanding: an approach based on information extraction ontologies. In: Proceedings of the 15th Australasian database conference, vol. 27, pp. 3–12. Australian Computer Society Inc (2004)

    Google Scholar 

  6. Crescenzi, V., Mecca, G., Merialdo, P., et al.: Roadrunner: towards automatic data extraction from large web sites. VLDB 1, 109–118 (2001)

    Google Scholar 

  7. Mulwad, V., Finin, T., Joshi, A.: A domain independent framework for extracting linked semantic data from tables. In: Ceri, S., Brambilla, M. (eds.) Search Computing. LNCS, vol. 7538, pp. 16–33. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  8. Michael, J., Cafarella, A.H., Wang, D.Z., Wang, E., Zhang, Y.: Webtables: exploring the power of tables on the web. Proc. VLDB Endowment 1(1), 538–549 (2008)

    Article  Google Scholar 

  9. Bohannon, P., Dalvi, N., Filmus, Y., Jacoby, n., Keerthi, S., Kirpal, A.: Automatic web-scale information extraction. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 609–612. ACM (2012)

    Google Scholar 

  10. Madhavan, J., Halevy, A.Y., Cohen, S., Dong, X.L., Jeffery, S.R., Ko, D., Yu, C.: Structured data meets the web: a few observations. IEEE Data Eng. Bull. 29(4), 19–26 (2006)

    Google Scholar 

  11. Talukdar, P.P., Ives, Z.V., Pereira, F.: Automatically incorporating new sources in keyword search-based data integration. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 387–398. ACM (2010)

    Google Scholar 

  12. Zeng, J., Flanagan, B., Xiong, Q., Wen, J., Hirokawa, S.: A web page segmentation approach using seam degree and content similarity. In: Lee, R.Y. (ed.) Applied Computing and Information Technology, pp. 91–103. Springer, Berlin (2014)

    Chapter  Google Scholar 

  13. Kohlschtter, C., Nejdl, W.: A densitometric approach to web page segmentation. In: CIKM 2008, pp. 1173–1182 (2008)

    Google Scholar 

  14. Cai, D., Yu, S., Wen, J., Ma, W.-Y.: Extracting content structure for web pages based on visual representation. In: Zhou, X., Zhang, Y., Orlowska, M.E. (eds.) APWeb 2003. LNCS, vol. 2642, pp. 406–417. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  15. Liu, W., Meng, X., Meng, W.: Vide: a vision-based approach for deep web data extraction. IEEE Trans. Knowl. Data Eng. 22(3), 447–460 (2010)

    Article  Google Scholar 

  16. Dong, X.L., Srivastava, D.: Big data integration. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 1245–1248. IEEE (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liu Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Chen, L., Liu, M., Yu, T. (2015). A Schema-Less Data Model for the Web. In: Johannesson, P., Lee, M., Liddle, S., Opdahl, A., Pastor López, Ó. (eds) Conceptual Modeling. ER 2015. Lecture Notes in Computer Science(), vol 9381. Springer, Cham. https://doi.org/10.1007/978-3-319-25264-3_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25264-3_44

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25263-6

  • Online ISBN: 978-3-319-25264-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics