skip to main content
10.1145/1963192.1963266acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
poster

Growing parallel paths for entity-page discovery

Published: 28 March 2011 Publication History

Abstract

In this paper, we use the structural and relational information on the Web to find entity-pages. Specifically, given a Web site and an entity-page (e.g., department and faculty member homepage) we seek to find all of the entity-pages of the same type (e.g., all faculty members in the department). To do this, we propose a web structure mining method which grows parallel paths through the web graph and DOM trees. We show that by utilizing these parallel paths we can efficiently discover all entity-pages of the same type. Finally, we demonstrate the accuracy of our method with a case study on various domains.

References

[1]
L. Blanco, V. Crescenzi, and P. Merialdo. Efficiently locating collections of web pages to wrap. In WEBIST, pages 247--254. INSTICC Press, 2005.
[2]
V. Crescenzi, P. Merialdo, and P. Missier. Clustering web pages based on their structure. Data Knowl. Eng., 54(3):279--299, 2005.
[3]
B. Liu, R. Grossman, and Y. Zhai. Mining data records in web pages. In KDD, pages 601--606, New York, NY, USA, 2003.
[4]
T. Weninger, F. Fumarola, J. Han, and D. Malerba. Mapping web pages to database records via link paths. In CIKM, October 2010.

Cited By

View all
  • (2013)The parallel path framework for entity discovery on the webACM Transactions on the Web10.1145/2516633.25166387:3(1-29)Online publication date: 30-Sep-2013
  • (2013)Research-insightProceedings of the 2013 ACM SIGMOD International Conference on Management of Data10.1145/2463676.2463689(1093-1096)Online publication date: 22-Jun-2013
  • (2012)Building enriched web page representations using link pathsProceedings of the 23rd ACM conference on Hypertext and social media10.1145/2309996.2310006(53-62)Online publication date: 25-Jun-2012
  • Show More Cited By

Index Terms

  1. Growing parallel paths for entity-page discovery

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '11: Proceedings of the 20th international conference companion on World wide web
    March 2011
    552 pages
    ISBN:9781450306379
    DOI:10.1145/1963192

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 March 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. entity pages
    2. parallel paths
    3. semi-structured data
    4. web structure mining

    Qualifiers

    • Poster

    Conference

    WWW '11
    WWW '11: 20th International World Wide Web Conference
    March 28 - April 1, 2011
    Hyderabad, India

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2013)The parallel path framework for entity discovery on the webACM Transactions on the Web10.1145/2516633.25166387:3(1-29)Online publication date: 30-Sep-2013
    • (2013)Research-insightProceedings of the 2013 ACM SIGMOD International Conference on Management of Data10.1145/2463676.2463689(1093-1096)Online publication date: 22-Jun-2013
    • (2012)Building enriched web page representations using link pathsProceedings of the 23rd ACM conference on Hypertext and social media10.1145/2309996.2310006(53-62)Online publication date: 25-Jun-2012
    • (2012)Construction of Web-Based, Service-Oriented Information Networks: A Data Mining PerspectiveWeb-Age Information Management10.1007/978-3-642-32281-5_2(17-19)Online publication date: 2012
    • (2011)Construction and analysis of web-based computer science information networksProceedings of the 13th international conference on Rough sets, fuzzy sets, data mining and granular computing10.5555/2026782.2026784(1-2)Online publication date: 25-Jun-2011
    • (2011)WINACSProceedings of the 2011 ACM SIGMOD International Conference on Management of data10.1145/1989323.1989469(1255-1258)Online publication date: 12-Jun-2011
    • (2011)Construction and Analysis of Web-Based Computer Science Information NetworksRough Sets, Fuzzy Sets, Data Mining and Granular Computing10.1007/978-3-642-21881-1_1(1-2)Online publication date: 2011

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media