Conferences >2009 IEEE International Confe...

Extracting company information from the web

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

As World Wide Web is becoming the most important information repository, increasing amount of information is available. Currently, web search engines can only provide doc...Show More

Metadata

Abstract:

As World Wide Web is becoming the most important information repository, increasing amount of information is available. Currently, web search engines can only provide document oriented searches. In order to fully make use of information from the web, some effective and efficient extraction algorithms are definitely desirable. In this paper, some existing achievements are investigated firstly. Then our current technique on web information extraction is discussed in detail. In our approach, rules and patterns are extracted from sample pages through training process, with human involvements. We use both keywords and regular expressions to represent rules and patterns in our system. The keywords work as anchors to locate the positions of the potential information and regular expressions work as validations of the values. In our system, all the extracted information is represented in XML format.

Published in: 2009 IEEE International Conference on Systems, Man and Cybernetics

Date of Conference: 11-14 October 2009

Date Added to IEEE Xplore: 04 December 2009

ISBN Information:

Print ISSN: 1062-922X

DOI: 10.1109/ICSMC.2009.5346863

Conference Location: San Antonio, TX, USA

Contents

References is not available for this document.

Extracting company information from the web

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Extracting company information from the web

Alerts

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?