Abstract:
Sketch recognition is the automatic identification of objects in a hand-drawn diagram. Hand-drawn diagrams often consist of distinct objects sketched with multiple ink strokes. Determining which of these strokes group to form an object is a difficult problem for sketch recognition. This research investigates the use data mining techniques to improve the accuracy of grouping. The grouping of nodes and edges in graph-based diagrams is a particularly challenging domain that will be used as an exemplar for this investigation. A review of the literature shows that feature-based grouping is both a promising and underexplored technique. Such techniques first partition the individual strokes of the diagram according to the type of object to which they belong. The pairs of strokes in each partition are then classified into those that should be grouped together and those that should not be. This procedure depends on good distinguishing characteristics for both individual strokes and stroke pairs. These features are then fed into a pair of classifier algorithms to determine the stroke groups. This approach has been used with a limited number of features and algorithms that were not selected based on optimal performance. This research focuses on extending the feature set present in the literature and performing a systematic analysis to select an optimal algorithm. An extended library of features was implemented; a repository of labelled sketch data was compiled; and a tool was designed to evaluate the grouping accuracy of various classifiers. A systematic investigation of machine learning algorithms followed, identifying the classifiers best suited to the grouping of nodes and edges. This evaluation showed that, when trained on the extended feature set, SMO, Logistic, and LADTree are significantly more accurate than existing classifiers at grouping stokes. These algorithms have not been used in the domain of node-edge grouping before.