Abstract:
T-decomposition was first introduced by Mark Titchener in 1993. It is a string parsing algorithm that has been investigated in the fields of coding and information measures. This thesis shows that the information-measuring capability of T-decomposition compares well with that of the well-accepted Lempel-Ziv parsing algorithms. This thesis also presents a T-decomposition algorithm with O(nlogn) time complexity as one of its core results. This now permits T-decomposition-based information measurements with the same time complexity as the fastest of the Lempel-Ziv parsing algorithms with comparable accuracy. The improved algorithm is applied to similarity measurements on both synthetic data and real-world data (character recognition) with promising results.