To read this content please select one of the options below:

A hidden Markov model‐based approach for extracting information from web news

Brandt Tso (Department of Information Management, Management College, NDU, Taipei, Taiwan)

International Journal of Web Information Systems

ISSN: 1744-0084

Article publication date: 28 September 2007

309

Abstract

Purpose

This paper aims to present a method based on hidden Markov models (HMM) for extracting information from web news.

Design/methodology/approach

The samples under study are derived from the contents of PROC “People's Daily Online,” a web‐based news publication containing non‐structured archives. This study focuses on developing HMM‐based tools for news filtering in order to retrieve terms of interest, such as “Geo‐location,” “System,” and “Personas.” The experiments are performed in two stages. In the first stage, each HMM being built is exclusively serving for extracting unique target term in order to evaluate the fundamental information extraction (IE) capability. In the second stage, the experiment is then extended to resolve a more complex, multi‐term extraction issue.

Findings

The results reveal that, by using HMMs as a basis, the accuracies (F‐measure) for unique IE tasks can achieve more than 70 per cent on average, while no fewer than 66 per cent accuracies are obtained for multi‐term extraction.

Originality/value

The study reveals the promising of using HMM for developing automatic tool in filtering free‐structured data.

Keywords

Citation

Tso, B. (2007), "A hidden Markov model‐based approach for extracting information from web news", International Journal of Web Information Systems, Vol. 3 No. 1/2, pp. 104-115. https://doi.org/10.1108/17440080710829243

Publisher

:

Emerald Group Publishing Limited

Copyright © 2007, Emerald Group Publishing Limited

Related articles