Abstract:
Network steganography is the art of exploiting network protocols or network flows to
innocuously hide information. Network steganalysis is the study of analysing network
traffic and flows to prevent any illicit use of steganography.
Using statistical metrics to compare malicious and matching benign flows has been
a solid methodological approach in network steganalysis. This approach may not work
in many practical situations, however, because it is difficult to acquire both malicious
and matching benign flows. A fundamental question thus inspires this thesis: What if
we only have the malicious flows on hand? That is, what if we do not have access to
the matching benign flow so there is nothing to compare against? Moreover, while it is
critical to detect the fraction of malicious flows with a steganalysis technique, there is a
lack of measurement on how much damage malicious flows cause. This leads to another
question: Can we estimate how much information a malicious flow contains, thereby
indicating potential damage? This thesis investigates the use of complexity derivates and
a re-embedding technique to answer the two fundamental questions posed above. The
experiments presented here show that it is possible to detect and estimate the amount
of malicious information accurately in a number of different scenarios. However, this
method is semi-automatic and relies on a significant amount of manual work, making it
impractical for large-scale networks that may generate a significant number of network
flows. Therefore, this thesis investigates and proposes a number of approaches to fully
automate the process.