A Schema-Less Data Model for the Web

Chen, Liu; Liu, Mengchi; Yu, Ting

doi:10.1007/978-3-319-25264-3_44

Liu Chen¹⁸,
Mengchi Liu¹⁸ &
Ting Yu¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9381))

Included in the following conference series:

International Conference on Conceptual Modeling

1821 Accesses

Abstract

To extract and represent domain-independent web scale data, we introduce a schema-less and self-describing data model called Object-oriented Web Model (OWM), which is rich in semantics and flexible in structure. It represents web pages as objects with hierarchical structures and links in a web page as relationships to other objects, so that objects form a network. Taking use of web segmentation techniques, data from data-intensive web pages can be extracted, represented and integrated as OWM objects.

This work is supported by National Natural Science Funds of China under grant No. 61202100.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ferrara, E., De Meo, P., Fiumara, G., Baumgartner, R.: Web data extraction, applications and techniques: a survey. CoRR (2012)
Google Scholar
Chang, C.H., Kayed, M., Girgis, M.R., Shaalan, K.F.: A survey of web information extraction systems. IEEE Trans. Knowl. Data Eng. 18(10), 1411–1428 (2006)
Article Google Scholar
Sarawagi, S.: Automation in information extraction and integration. In: Tutorial of The 28th International Conference on Very Large Data Bases (VLDB) (2002)
Google Scholar
Su, W., Wang, J., Lochovsky, F.H.: Ode: Ontology-assisted data extraction. ACM Trans. Database Syst. (TODS) 34(2), 12 (2009)
Article Google Scholar
Embley, D.W.: Toward semantic understanding: an approach based on information extraction ontologies. In: Proceedings of the 15th Australasian database conference, vol. 27, pp. 3–12. Australian Computer Society Inc (2004)
Google Scholar
Crescenzi, V., Mecca, G., Merialdo, P., et al.: Roadrunner: towards automatic data extraction from large web sites. VLDB 1, 109–118 (2001)
Google Scholar
Mulwad, V., Finin, T., Joshi, A.: A domain independent framework for extracting linked semantic data from tables. In: Ceri, S., Brambilla, M. (eds.) Search Computing. LNCS, vol. 7538, pp. 16–33. Springer, Heidelberg (2012)
Chapter Google Scholar
Michael, J., Cafarella, A.H., Wang, D.Z., Wang, E., Zhang, Y.: Webtables: exploring the power of tables on the web. Proc. VLDB Endowment 1(1), 538–549 (2008)
Article Google Scholar
Bohannon, P., Dalvi, N., Filmus, Y., Jacoby, n., Keerthi, S., Kirpal, A.: Automatic web-scale information extraction. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 609–612. ACM (2012)
Google Scholar
Madhavan, J., Halevy, A.Y., Cohen, S., Dong, X.L., Jeffery, S.R., Ko, D., Yu, C.: Structured data meets the web: a few observations. IEEE Data Eng. Bull. 29(4), 19–26 (2006)
Google Scholar
Talukdar, P.P., Ives, Z.V., Pereira, F.: Automatically incorporating new sources in keyword search-based data integration. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 387–398. ACM (2010)
Google Scholar
Zeng, J., Flanagan, B., Xiong, Q., Wen, J., Hirokawa, S.: A web page segmentation approach using seam degree and content similarity. In: Lee, R.Y. (ed.) Applied Computing and Information Technology, pp. 91–103. Springer, Berlin (2014)
Chapter Google Scholar
Kohlschtter, C., Nejdl, W.: A densitometric approach to web page segmentation. In: CIKM 2008, pp. 1173–1182 (2008)
Google Scholar
Cai, D., Yu, S., Wen, J., Ma, W.-Y.: Extracting content structure for web pages based on visual representation. In: Zhou, X., Zhang, Y., Orlowska, M.E. (eds.) APWeb 2003. LNCS, vol. 2642, pp. 406–417. Springer, Heidelberg (2003)
Chapter Google Scholar
Liu, W., Meng, X., Meng, W.: Vide: a vision-based approach for deep web data extraction. IEEE Trans. Knowl. Data Eng. 22(3), 447–460 (2010)
Article Google Scholar
Dong, X.L., Srivastava, D.: Big data integration. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 1245–1248. IEEE (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

State Key Lab of Software Engineering, School of Computer, Wuhan University, Wuhan, China
Liu Chen, Mengchi Liu & Ting Yu

Authors

Liu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Mengchi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ting Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liu Chen .

Editor information

Editors and Affiliations

Stockholm University, Kista, Sweden
Paul Johannesson
National University of Singapore, Singapore, Singapore
Mong Li Lee
Brigham Young University, Provo, Utah, USA
Stephen W. Liddle
University of Bergen, Bergen, Norway
Andreas L. Opdahl
Universidad Politécnica de Valencia, Valencia, Spain
Óscar Pastor López

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, L., Liu, M., Yu, T. (2015). A Schema-Less Data Model for the Web. In: Johannesson, P., Lee, M., Liddle, S., Opdahl, A., Pastor López, Ó. (eds) Conceptual Modeling. ER 2015. Lecture Notes in Computer Science(), vol 9381. Springer, Cham. https://doi.org/10.1007/978-3-319-25264-3_44

Download citation

DOI: https://doi.org/10.1007/978-3-319-25264-3_44
Published: 08 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25263-6
Online ISBN: 978-3-319-25264-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics