Authors:
Vasileios Iosifidis
and
Christos Makris
Affiliation:
University of Patras, Greece
Keyword(s):
Inverted File, Compression, LZ78, LZW, GZIP, Binary Interpolative Encoding, Gaps, Reorder, Searching and Browsing, Metrics and Performance.
Related
Ontology
Subjects/Areas/Topics:
Searching and Browsing
;
Web Information Systems and Technologies
;
Web Interfaces and Applications
Abstract:
In the paper, we present a compression algorithm that employs a modification of the well known Ziv Lempel Welch algorithm (LZW); it creates an index that treats terms as characters, and stores encoded document identifier patterns efficiently. We also equip our approach with a set of preprocessing {reassignment of document identifiers, Gaps} and post-processing methods {Gaps, IPC encoding, GZIP} in order to attain more significant space improvements. We used two different combinations of those discrete steps to see which one maximizes the performance of the modification we made on the LZW algorithm. Performed experiments in the Wikipedia dataset depict the superiority in space compaction of the proposed technique.