Skip to main content
Log in

Processing noisy structured textual data using a fuzzy matching approach: application to postal address errors

  • Original paper
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

A multiparadigm approach is developed and demonstrated for exploiting knowledge about structure for the purpose of extracting information from noisy textual data. A motivating example of a potential application would be an address encoding system for a delivery service such as UPS, Federal Express or the United States Post Office. This approach combines aspects of database organization and clustering of records, fuzzy parsing, fuzzy retrieval, an aggregation algebra, and measures of both performance and accuracy. Fuzzy retrieval, in the form of set and fuzzy operators, is accomplished by considering each symbol of the input text to be imperfect and retrieving non-exact matching records from the database that hold for a particular threshold value. The set of low-level database operators constrains the cardinality and accuracy of retrievals. A hierarchical method of clustering the database is defined, whereby the records are partitioned in a manner such that similar records are in the same cluster. This clustering strategy is guaranteed to be mutually exclusive and a complete cover of the data records. Associated with these clusters is an algebra that combines clusters of data into one window of ranked data. A set of fuzzy measures is defined that are used to aggregate and rank sets of records.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Buckley, J., Buckles, B. & Petry, F. Processing noisy structured textual data using a fuzzy matching approach: application to postal address errors. Soft Computing 4, 195–205 (2000). https://doi.org/10.1007/s005000000054

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s005000000054

Navigation