Abstract:
In this paper we analytically compare the two widely accepted approaches of spoken document indexing, Position Specific Posterior Lattices (PSPL) and Confusion Network (C...Show MoreMetadata
Abstract:
In this paper we analytically compare the two widely accepted approaches of spoken document indexing, Position Specific Posterior Lattices (PSPL) and Confusion Network (CN), in terms of retrieval accuracy and index size. The fundamental distinctions between these two approaches in terms of construction units, posterior probabilities, number of clusters, indexing coverage and space requirements are discussed in detail. A new approach to approximate subword posterior probability in a word lattice is also incorporated in PSPL/CN to handle OOV/rare word problems, which were unaddressed in original PSPL and CN approaches. Extensive experimental results on Chinese broadcast news segments indicate that PSPL offers higher accuracy than CN but requiring much larger disk space, while subword-based PSPL turns out to be very attractive because it lowers the storage cost while offers even higher accuracies.
Date of Conference: 09-13 December 2007
Date Added to IEEE Xplore: 14 January 2008
ISBN Information: