RecipeCrawler: Collecting Recipe Data from WWW Incrementally

Li, Yu; Meng, Xiaofeng; Wang, Liping; Li, Qing

doi:10.1007/11775300_23

Yu Li¹⁹,
Xiaofeng Meng¹⁹,
Liping Wang²⁰ &
…
Qing Li²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4016))

Included in the following conference series:

International Conference on Web-Age Information Management

1293 Accesses
10 Citations

Abstract

WWW has posed itself as the largest data repository ever available in the history of humankind. Utilizing the Internet as a data source seems to be natural and many efforts have been made. In this paper we focus on establishing a robust system to collect structured recipe data from the Web incrementally, which, as we believe, is a critical step towards practical, continuous, reliable web data extraction systems and therefore utilizing WWW as data sources for various database applications. The reasons for advocating such an incremental approach are two-fold: (1) it is impractical to crawl all the recipe pages from relevant web sites as the Web is highly dynamic; (2) it is almost impossible to induce a general wrapper for future extraction from the initial batch of recipe web pages. In this paper, we describe such a system called RecipeCrawler which targets at incrementally collecting recipe data from WWW. General issues in establishing an incremental data extraction system are considered and techniques are applied to recipe data collection from the Web. Our RecipeCrawler is actually used as the backend of a fully-fledged multimedia recipe database system being developed jointly by City University of Hong Kong and Renmin University of China.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Recipe analysis for knowledge discovery of gastronomic dishes

Article 19 June 2021

QuicklyCook: A User-Friendly Recipe Recommender

Analysis of Traditional Italian Food Recipes: Experiments and Results

References

Arasu, A., Garcia-Molina, H.: Extracting structured data from web pages. In: Proceedings of the 22th ACM SIGMOD International Conference on Management of Data, pp. 337–348 (2003)
Google Scholar
Chang, C.H., Lui, S.C.: Iepad: information extraction based on pattern discovery. In: Proceedings of the 10th International World Wide Web Conference, pp. 681–688 (2001)
Google Scholar
Crescenzi, V., Mecca, G., Merialdo, P.: Roadrunner: Towards automatic data extraction from large web sites. In: Proceedings of 27th International Conference on Very Large Data Bases, pp. 109–118 (2001)
Google Scholar
Crescenzi, V., Mecca, G., Merialdo, P.: Wrapping-oriented classification of web pages. In: Proceedings of the 17th ACM Symposium on Applied Computing (SAC), pp. 1108–1112 (2002)
Google Scholar
Grumbach, S., Mecca, G.: In search of the lost schema. In: ICDT 1999, pp. 314–331 (1999)
Google Scholar
Kushmerick, N.: Wrapper verification. World Wide Web 3(2), 79–94 (2000)
Article MATH Google Scholar
Liu, B., Grossman, R.L., Zhai, Y.: Mining data records in web pages. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 601–606 (2003)
Google Scholar
Meng, X., Hu, D., Li, C.: Schema-guided wrapper maintenance for web-data extraction. In: The 5th ACM CIKM International Workshop on Web Information and Data Management, pp. 1–8 (2003)
Google Scholar
Reis, D.C., Golgher, P.B., Silva, A.S., Laender, A.H.F.: Automatic web news extraction using tree edit distance. In: Proceedings of the 13th international conference on World Wide Web, pp. 502–511 (2004)
Google Scholar
Wang, J., Lochovsky, F.H.: Data extraction and label assignment for web databases. In: Proceedings of the 12th International World Wide Web Conference, pp. 187–196 (2003)
Google Scholar
Zhai, Y., Liu, B.: Web data extraction based on partial tree alignment. In: Proceedings of the 14th international conference on World Wide Web, pp. 76–85 (2005)
Google Scholar
Zhao, H., Meng, W., Wu, Z., Raghavan, V., Yu, C.T.: Fully automatic wrapper generation for search engines. In: Proceedings of the 14th international conference on World Wide Web, pp. 66–75 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information, Renmin Univ. of China, China
Yu Li & Xiaofeng Meng
Computer Science Dept., City Univ. of Hong Kong, HKSAR, China
Liping Wang & Qing Li

Authors

Yu Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofeng Meng
View author publications
You can also search for this author in PubMed Google Scholar
Liping Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qing Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Chinese University of Hong Kong, Hong Kong, China
Jeffrey Xu Yu
Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, 153-8505, Tokyo, Japan
Masaru Kitsuregawa
Department of Computing, Hong Kong Polytechnic University, Hong Kong
Hong Va Leong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y., Meng, X., Wang, L., Li, Q. (2006). RecipeCrawler: Collecting Recipe Data from WWW Incrementally. In: Yu, J.X., Kitsuregawa, M., Leong, H.V. (eds) Advances in Web-Age Information Management. WAIM 2006. Lecture Notes in Computer Science, vol 4016. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11775300_23

Download citation

DOI: https://doi.org/10.1007/11775300_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35225-9
Online ISBN: 978-3-540-35226-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

RecipeCrawler: Collecting Recipe Data from WWW Incrementally

Abstract

Access this chapter

Preview

Similar content being viewed by others

Recipe analysis for knowledge discovery of gastronomic dishes

QuicklyCook: A User-Friendly Recipe Recommender

Analysis of Traditional Italian Food Recipes: Experiments and Results

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

RecipeCrawler: Collecting Recipe Data from WWW Incrementally

Abstract

Access this chapter

Preview

Similar content being viewed by others

Recipe analysis for knowledge discovery of gastronomic dishes

QuicklyCook: A User-Friendly Recipe Recommender

Analysis of Traditional Italian Food Recipes: Experiments and Results

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation