skip to main content
10.1145/1963192.1963251acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
poster

Web information extraction using Markov logic networks

Published: 28 March 2011 Publication History

Abstract

In this paper, we consider the problem of extracting structured data from web pages taking into account both the content of individual attributes as well as the structure of pages and sites. We use Markov Logic Networks (MLNs) to capture both content and structural features in a single unified framework, and this enables us to perform more accurate inference. We show that inference in our information extraction scenario reduces to solving an instance of the maximum weight subgraph problem. We develop specialized procedures for solving the maximum subgraph variants that are far more efficient than previously proposed inference methods for MLNs that solve variants of MAX-SAT. Experiments with real-life datasets demonstrate the effectiveness of our approach.

References

[1]
M. Richardson and P. Domingos. Markov logic networks. Machine Learning, 62(1-2):107--136, 2006.
[2]
J. Zhu, Z. Nie, J. Wen, B. Zhang, and W. Ma. Simultaneous record detection and attribute labeling in web data extraction. In ACM SIGKDD, 2006.

Cited By

View all
  • (2022)Landmarks and regions: a robust approach to data extractionProceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3519939.3523705(993-1009)Online publication date: 9-Jun-2022
  • (2013)Towards high-throughput gibbs sampling at scaleProceedings of the 2013 ACM SIGMOD International Conference on Management of Data10.1145/2463676.2463702(397-408)Online publication date: 22-Jun-2013
  • (2012)Learning to adapt cross language information extraction wrapperApplied Intelligence10.1007/s10489-011-0305-036:4(918-931)Online publication date: 1-Jun-2012

Index Terms

  1. Web information extraction using Markov logic networks

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '11: Proceedings of the 20th international conference companion on World wide web
    March 2011
    552 pages
    ISBN:9781450306379
    DOI:10.1145/1963192

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 March 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Markov logic networks
    2. information extraction

    Qualifiers

    • Poster

    Conference

    WWW '11
    WWW '11: 20th International World Wide Web Conference
    March 28 - April 1, 2011
    Hyderabad, India

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Landmarks and regions: a robust approach to data extractionProceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3519939.3523705(993-1009)Online publication date: 9-Jun-2022
    • (2013)Towards high-throughput gibbs sampling at scaleProceedings of the 2013 ACM SIGMOD International Conference on Management of Data10.1145/2463676.2463702(397-408)Online publication date: 22-Jun-2013
    • (2012)Learning to adapt cross language information extraction wrapperApplied Intelligence10.1007/s10489-011-0305-036:4(918-931)Online publication date: 1-Jun-2012

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media