Abstract
In this tutorial we provide an insight into Web Mining, i.e., discovering knowledge from the World Wide Web, especially with reference to the latest developments in Web technology. The topics covered are: the Deep Web, also known as the Hidden Web or Invisible Web; the Semantic Web including standards such as RDFS and OWL; the eXtensible Markup Language XML, a widespread communication medium for the Web; and domain-specific markup languages defined within the context of XML We explain how each of these developments support knowledge discovery from data stored over the Web, thereby assisting several real-world applications.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Chang, C., Kayed, M., Girgis, M.R., Shaalan, K.F.: A survey of Web information extraction systems. IEEE Transactions on Knowledge and Data Engineering 18(10), 1411–1428 (2006)
Crescenzi, V., Mecca, G., Merialdo, P.: Roadrunner: Towards automatic data extraction from large Web sites. In: VLDB, Rome, Italy (September 2001)
He, B., Patel, M., Zhang, Z., Chang, K.C.: Accessing the deep Web: A survey. Communications of the ACM 50(2), 94–101 (2007)
Madhavan, J., Halevy, A.Y., Cohen, S., Dong, X., Jeffery, S.R., Ko, D., Yu, C.: Structured data meets the Web: A few observations. IEEE Data Engineering Bullerin 29(4), 19–26 (2006)
Senellart, P., Mittal, A., Muschick, D., Gilleron, R.: andTommasi, M., Automatic Wrapper Induction from Hidden-Web Sources with Domain Knowledge. In: WIDM, Napa, USA, pp. 9–16 (October 2008)
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: A nucleus for a web of open data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)
Lenat, D., Guha, R.V.: Building Large Knowledge Based Systems: Representation and Inference in the Cyc Project. Addison-Wesley, Reading (1989)
Staab, S., Studer, R. (eds.): Handbook on Ontologies, 2nd edn. Springer, Heidelberg (2008)
Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: A Core of Semantic Knowledge. In: WWW 2007 (2007)
Word Wide Web Consortium. OWL Web Ontology Language (W3C Recommendation 2004-02-10), http://www.w3.org/TR/owl-features/
Li, H., Shan, F., Lee, S.Y.: Online mining of frequent query trees over XML data streams. In: 15th international conference on World Wide Web, Edinburgh, Scotland, pp. 959–960. ACM Press, New York (2008)
Kutty, S., Nayak, R.: Frequent Pattern Mining on XML documents. In: Song, M., Wu, Y.-F. (eds.) Handbook of Research on Text and Web Mining Technologies, pp. 227–248. Idea Group Inc., USA (2008)
Nayak, R.: Fast and Effective Clustering of XML Data Utilizing their Structural Information. Knowledge and Information Systems (KAIS) 14(2), 197–215 (2008)
Rusu, L.I., Rahayu, W., Taniar, D.: Mining Association Rules from XML Documents. In: Vakali, A., Pallis, G. (eds.) Web Data Management Practices (2007)
Wan, J.: Mining Association rules from XML data mining query. Research and practice in Information Technology 32, 169–174 (2004)
Boag, S., Fernandez, M., Florescu, D., Robie, J., Simeon, J.: XQuery 1.0: An XML Query Language. W3C Working Draft (November 2003)
Clark, J., DeRose, S.: XML Path Language (XPath) Version 1.0. W3C Recommendation (November 1999)
Davidson, S., Fan, W., Hara, C., Qin, J.: Propagating XML Constraints to Relations. In: International Conference on Data Engineering (March 2003)
Guo, J., Araki, K., Tanaka, K., Sato, J., Suzuki, M., Takada, A., Suzuki, T., Nakashima, Y., Yoshihara, H.: The Latest MML (Medical Markup Language) —XML based Standard for Medical Data Exchange / Storage. Journal of Medical Systems 27(4), 357–366 (2003)
Varde, A., Rundensteiner, E., Fahrenholz, S.: XML Based Markup Languages for Specific Domains. In: Web Based Support Systems. Springer, Heidelberg (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Varde, A., Suchanek, F., Nayak, R., Senellart, P. (2009). Knowledge Discovery over the Deep Web, Semantic Web and XML. In: Zhou, X., Yokota, H., Deng, K., Liu, Q. (eds) Database Systems for Advanced Applications. DASFAA 2009. Lecture Notes in Computer Science, vol 5463. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00887-0_73
Download citation
DOI: https://doi.org/10.1007/978-3-642-00887-0_73
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00886-3
Online ISBN: 978-3-642-00887-0
eBook Packages: Computer ScienceComputer Science (R0)