A Fast, Constant-Order, Symbol Ranking Text Compressor

Reference

Computer Science Technical Reports 145 (1997)

Degree Grantor

Abstract

Recent work on “symbol ranking” text compressors included a version which combines moderate compression with high speed and has been described in a poster at the 1997 Data Compression Conference. That work is extended here, especially to produce a compressor capable of efficient hardware implementation. The compressor is based on a conventional set-associative cache, with LRU update. To process a symbol, a context of the three preceding symbols is hashed to access a particular cache line. If the access “hits”, the LRU index is emitted as the recoded symbol, or otherwise the symbol is emitted and the LRU status updated. Several versions are described, with some improving the performance by maintaining a Move-To-Front list of symbols and emitting some symbols as “short” 5-bit indices into this table, or by encoding runs of correctly-predicted symbols. The final performance is in the range of 3.5 – 4 bits per byte over the Calgary Corpus, with software speeds of up to 1 Mbyte/s. The best of the hardware designs should run at 100Mbyte/s with discrete components, or 200–300 Mbyte/s with a completely integrated design.

Description

DOI

Related Link

Keywords

ANZSRC 2020 Field of Research Codes