ABSTRACT
In this poster, we present an information extraction engine for web-based forums. The engine analyzes the HTML files crawled from web forums, deduces the wrapper (template) of the pages and extracts the information about posts (e.g., author, title, content, number of replies and views, etc.). Extraction is an important module for forum search engine, since it helps to understand the content of a forum HTML page and facilitates ranking during retrieval. We discuss the system architecture of the extraction engine in the context of a forum search engine and present various components in the extraction engine. We also introduce briefly the extraction process and discuss some implementation issues.
- Arasu, A. and Garcia-Molina, H. Extracting structured data from web pages. SIGMOD 2003, 337--348 Google ScholarDigital Library
- Crescenzi, V., Mecca G., and Merialdo P. ROADRUNNER: towards automatic data extraction from large web sites. VLDB 2001, 109--118 Google ScholarDigital Library
- Google: http://www.google.comGoogle Scholar
- Lycos Discussion: http://discussion.lycos.comGoogle Scholar
- Wang, J. and Lochovsky, F.H. Data extraction and label assignment for web databases. WWW 2003, 187--196 Google ScholarDigital Library
Index Terms
- An information extraction engine for web discussion forums
Recommendations
Search Engine Optimization by Re-Ranking the Product Search Result Based on User Click Data
AISS '21: Proceedings of the 3rd International Conference on Advanced Information Science and SystemBlibli.com provides a search engine for its customers. It used Solr search engine with only plain BM25 similarity function which is based on probability. In order to improve search engine performance, this research tried to implement an algorithm that ...
Discovering the representative of a search engine
CIKM '01: Proceedings of the tenth international conference on Information and knowledge managementGiven a large number of search engines on the Internet, it is difficult for a person to determine which search engines could serve his/her information needs. A common solution is to construct a metasearch engine on top of the search engines. Upon ...
Discovering the representative of a search engine
CIKM '02: Proceedings of the eleventh international conference on Information and knowledge managementGiven a large number of search engines on the Internet, it is difficult for a person to determine which search engines could serve his/her information needs. A common solution is to construct a metasearch engine on top of the search engines. Upon ...
Comments