skip to main content
10.1145/1498759.1498816acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

A model for fast web mining prototyping

Published: 09 February 2009 Publication History

Abstract

Web mining is a computation intensive task, even after the mining tool itself has been developed. Most mining software are developed ad-hoc and usually are not scalable nor reused for other mining tasks. The objective of this paper is to present a model for fast Web mining prototyping, referred to as WIM -- Web Information Mining. The underlying conceptual model of WIM provides its users with a level of abstraction appropriate for prototyping and experimentation throughout the Web data mining task. Abstracting from the idiosyncrasies of raw Web data representations facilitates the inherently iterative mining process. We present the WIM conceptual model, its associated algebra, and the WIM tool software architecture, which implements the WIM model. We also illustrate how the model can be applied to real Web data mining tasks. The experimentation of WIM in real use cases has shown to significantly facilitate Web mining prototyping.

References

[1]
S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995.
[2]
G. O. Arocena and A. O. Mendelzon. WebOQL: Restructuring documents, databases, and Webs. In 14th Intl. Conf. on Data Engineering (ICDE'98), pages 24--33, Washington, DC, USA, 1998.
[3]
R. Baeza-Yates, A. Pereira, and N. Ziviani. Genealogical trees on the Web: a search engine user perspective. In 17th Intl. World Wide Web Conf., pages 367--376, Beijing, China, April 2008.
[4]
D. Borthakur. The hadoop distributed file system: Architecture and design, 2007. http://hadoop.apache.org/core/docs/current/hdfs_design.pdf.
[5]
S. Chakrabarti. Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kauffman, 2002.
[6]
DB2 Intelligent Miner, July 2008. http://www-306.ibm.com/software/data/iminer/.
[7]
J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008.
[8]
M. Fernandez, D. Florescu, A. Levy, and D. Suciu. A query language for a Web-site management system. SIGMOD Record, 26(3):4--11, 1997.
[9]
Hadoop, July 2008. http://hadoop.apache.org/.
[10]
R. Kosala and H. Blockeel. Web mining research: A survey. SIGKDD Explorations: Newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining, 2, 2000.
[11]
Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Springer, January 2007.
[12]
A. O. Mendelzon, G. A. Mihaila, and T. Milo. Querying the World Wide Web. Intl. Journal on Digital Libraries, 1(1):54--67, 1997.
[13]
Microsoft SQL Server 2005 Data Mining, July 2008. http://www.microsoft.com/sql/technologies/dm.
[14]
W.-K. Ng, E.-P. Lim, C.-T. Huang, S. Bhowmick, and F.-Q. Qin. Web warehousing: An algebra for Web information. In Advances in Digital Libraries Conf. (ADL'98), pages 228--237, 1998.
[15]
Oracle Data Mining, July 2008. http://www.oracle.com/technology/products/bi/odm.
[16]
A. Pereira, R. Baeza-Yates, and N. Ziviani. A model for web mining applications -- conceptual model, architecture, implementation and use cases. Technical Report 001/2008, Federal Univ. of Minas Gerais, Feb. 2008. http://www.dcc.ufmg.br/~alvaro/pbz08b.pdf.
[17]
S. Raghavan and H. Garcia-Molina. Complex queries over Web repositories. In Very Large Data Bases (VLDB'03), pages 33--44, Berlin, Germany, Sept. 2003.
[18]
I. H. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco, USA, second edition, 2005.
[19]
C. A. Wood and T. T. Ow. WEBVIEW: an SQL extension for joining corporate data to data derived from the web. Commun. of ACM, 48(9):99--104, 2005.

Cited By

View all
  • (2018)Information and data management at PUC-rio and UFMGProceedings of the VLDB Endowment10.14778/3229863.324049011:12(2114-2129)Online publication date: 1-Aug-2018
  • (2014)Rapid prototyping of a web categorization toolProceedings of the 18th International Database Engineering & Applications Symposium10.1145/2628194.2628216(294-297)Online publication date: 7-Jul-2014
  • (2012)GraphGenProceedings of the 2012 Eighth Latin American Web Congress10.1109/LA-WEB.2012.15(87-94)Online publication date: 25-Oct-2012
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining
February 2009
314 pages
ISBN:9781605583907
DOI:10.1145/1498759
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 February 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. model
  2. prototyping
  3. web mining
  4. web mining applications

Qualifiers

  • Research-article

Funding Sources

Conference

WSDM'09
Sponsor:

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Information and data management at PUC-rio and UFMGProceedings of the VLDB Endowment10.14778/3229863.324049011:12(2114-2129)Online publication date: 1-Aug-2018
  • (2014)Rapid prototyping of a web categorization toolProceedings of the 18th International Database Engineering & Applications Symposium10.1145/2628194.2628216(294-297)Online publication date: 7-Jul-2014
  • (2012)GraphGenProceedings of the 2012 Eighth Latin American Web Congress10.1109/LA-WEB.2012.15(87-94)Online publication date: 25-Oct-2012
  • (2010)A model for automatic generation of multi-partite graphs from arbitrary dataProceedings of the 2010 international conference on Web-age information management10.5555/1927585.1927591(49-60)Online publication date: 15-Jul-2010
  • (2010)A Model for Automatic Generation of Multi-partite Graphs from Arbitrary DataWeb-Age Information Management10.1007/978-3-642-16720-1_5(49-60)Online publication date: 2010

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media