Abstract:
An IP flow represents a group of packets that share the same attribute such as their
source address. The ever-growing network traffic produces an enormous number of
flows. Recent studies attempt to simplify and mine flows in order to understand the
network’s behaviour. The traditional technique of packet aggregation to 5-tuple flows
provides understanding of the flows themselves, but fails to capture an understanding of
the aggregated end-point that generates flows: the IP host.
This thesis describes the design, development and analysis of a measurement method
that identifies an IP host from network traffic. A conceptual model of IP host
aggregations has been designed to summarize traffic: from 5-tuple to 2-tuple and finally
to 1-tuple IP host. Using the framework, various observations and analyses have been
conducted at the host level, including empirical distributions and behaviour
relationships.
Several host characteristics and applications are examined from real-world network
data, such as characterizing host interaction variability and identifying hosts that are
potentially significant.