Let be a text (or sequence) on a finite alphabet Σ. A fingerprint in s is the set of distinct characters contained in one of its substrings. Fingerprinting a text consists of computing the set of all fingerprints of all its substrings and being able to efficiently answer several questions on this set. A given fingerprint is represented by a binary array, F, of size named a fingerprint table. A fingerprint, , admits a number of maximal locations in S, that is the alphabet of is f and , if defined, are not in f. The set of maximal locations is . We present new algorithms and a new data structure for the three problems: (1) compute ; (2) given F, answer if F represents a fingerprint in ; (3) given F, find all maximal locations of F in s. These problems are, respectively, solved in , , and time—where K is the number of maximal locations of F.
This work is supported by the Russian Foundation for Fundamental Research (Grant 05-01-00994) and the program of the President of the Russian Federation for supporting of young researchers (Grant MD-3635.2005.1).