Abstract
A large number of Web information extraction algorithms are based on machine learning techniques. For such extraction algorithms, we propose employing a lazy learning strategy to build a specialized model for each test instance to improve the extraction accuracy and avoid the disadvantages of constructing a single general model.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aha, D.W. (ed.): Lazy learning. Kluwer Academic Publishers, Norwell (1997)
Chang, C.-H., Kayed, M., Girgis, M.R., Shaalan, K.F.: A survey of web information extraction systems. IEEE Trans. Knowl. Data Eng. 18(10), 1411–1428 (2006)
Freitag, D.: Information extraction from html: Application of a general machine learning approach. In: Proceedings of AAAI/IAAI, pp. 517–523 (1998)
Veloso, A., Meira Jr., W., Zaki, M.J.: Lazy associative classification. In: Proceedings of IEEE International Conference on Data Mining, pp. 645–654 (2006)
Wachsmuth, H., Stein, B., Engels, G.: Constructing efficient information extraction pipelines. In: Proceedings of CIKM 2011, pp. 2237–2240 (2011)
WebKB: CMU, world wide knowledge base (WebKB) project (2011) http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ozcan, R., Altingovde, I.S., Ulusoy, Ö. (2012). In Praise of Laziness: A Lazy Strategy for Web Information Extraction. In: Baeza-Yates, R., et al. Advances in Information Retrieval. ECIR 2012. Lecture Notes in Computer Science, vol 7224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28997-2_65
Download citation
DOI: https://doi.org/10.1007/978-3-642-28997-2_65
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28996-5
Online ISBN: 978-3-642-28997-2
eBook Packages: Computer ScienceComputer Science (R0)