Learning Behaviour by Observation for Real-Time Strategy Games

Robertson, Glen

Learning Behaviour by Observation for Real-Time Strategy Games

Robertson, Glen

Reference

2015

Degree Grantor

The University of Auckland

Abstract

This thesis details an exploratory investigation into the use of Learning By Observation (LBO) methods for the complex domain of real-time strategy games. The domain presents a number of interesting challenges for Artificial Intelligence (AI) research, many of which are analogous to real-world challenges in robotics. The aim of this research was to reduce the human effort required to develop new AI systems for such complex domains by using LBO methods to enable an agent to learn from human examples. Case-based reasoning was applied to the problem, creating a case base associating observed game states with the actions that humans took in those situations. Even with a very simple metric for deciding which states were similar, this method was able to select appropriate enough actions to emulate a human in the very early stages of the game. However, it suffered from issues in selecting the correct order of actions, and required a relatively large amount of time to make decisions. Additionally, the existing dataset used to form the case base presented a very simplified view of the game state, with many attributes not recorded or infrequently sampled. To address the limitation in data, an improved dataset was created using the same original human match records as the dataset from prior work, but extracting much more precise and complete information. This information was also stored more efficiently so that it could be quickly and easily searched or updated. In the process of extracting and checking this information, previously undetected errors with certain records were discovered and the records removed, making the dataset more accurate than any prior work. Data mining algorithms were applied to the new dataset in order to extract association rules and frequent patterns that could be useful for online decisionmaking. Most of the tested algorithms were unsuitable to handle the large scale and complexity of the data without extensive human preprocessing, making them unsuitable for LBO. A new method was devised to find simple highconfidence rules and patterns. It was able to discover some game rules and common action sequences. An alternative approach was then taken with the dataset: player actions were used to build a behaviour tree characterising typical player actions in a match. This was achieved using a novel application of a motif-finding algorithm from computational biology to align subsections of observed action sequences. This method was able to create a tree structure that generalised a large amount of action information, making it viable as a first step in building a more compact and generalised representation of observed behaviour.