Abstract:
In order to experiment with machine learning and data mining techniques in the domain of Real-Time Strategy games such as StarCraft, a dataset is required that captures the complex detail of the interactions taking place between the players and the game. This paper describes a new extraction process by which game data is extracted both directly from game log (replay) files, and indirectly through simulating the replays within the StarCraft game engine. Data is then stored in a compact, hierarchical, and easily accessible format. This process is applied to a collection of expert replays, creating a new standardised dataset. The data recorded is enough for almost the complete game state to be reconstructed, from either player's viewpoint, at any point in time (to the nearest second). This process has revealed issues in some of the source replay files, as well as discrepancies in prior datasets. Where practical, these errors have been removed in order to produce a higher-quality reusable dataset.