Abstract:
A recent development in text compression is a “block sorting” algorithm which permutes the input text
according to a special sort procedure and then processes the permuted text with Move-to-Front and a final
statistical compressor. The technique combines good speed with excellent compression performance.
This report investigates the block sorting compression algorithm, in particular trying to understand its operation
and limitations. Various approaches are investigated in an attempt to improve the compression with block
sorting, most of which involve a hierarchy of coding models to allow fast adaptation to local contexts. The best
technique involves a new “structured” coding model, especially designed for compressing data with skew
symbol distributions. Block sorting compression is found to be related to work by Shannon in 1951 on the
prediction of English text.
The work confirms block-sorting as a good text compression technique, with a compression approaching that of
the currently best compressors while being much faster than other compressors of comparable performance.