Teaching Smaller Language Models To Generalise To Unseen Compositional Questions

Riddle, PatriciaHartill, Timothy John2024-11-192024-11-192024https://hdl.handle.net/2292/70611We are inspired by recent progress with pretrained large Language Models (LLMs), that are able to answer questions that are unlikely to have been encountered during training. However a diversity of potential applications exist in the broad domain of reasoning systems and considerations such as latency, cost, available compute resource and internet connectivity are relevant in determining an appropriate approach. We consider the setting where some local compute capacity is available at inference time but internet connectivity is not. Similar to a general-purpose LLM, we assume that our much smaller Reasoning Models may be asked arbitrary questions from unknown distributions, hence we focus on evaluation in an unseen setting where our evaluation datasets are disjoint from our training datasets. We equip our models to answer diverse questions through multitask training focused on instilling an ability to reason over a provided context to an answer. We acquire this context from two knowledge sources; a local Wikipedia corpus queried using a multi-hop dense retrieval system with novel extensions, and from rationales generated from a larger Language Model optimised to run in a lower resource environment. Our main contributions to the study of question-answering in this setting are as follows: We propose novel methods to evaluate whether our model is capable of answering contextualised questions without memorisation, and show that it is. We establish a comprehensive set of baseline results on unseen evaluation datasets. We show that the addition of novel retrievalaugmented training datasets (RATD) to the training regime of the Reasoning Model in conjunction with our retrieval system significantly improves results. We demonstrate further significant improvement through the application of methods for combining contextual knowledge from our two sources. The first method (RR) involves training a novel Rationale Ranking model to score both generated rationales and retrieved contexts with respect to relevance and truthfulness. We then use the scores to derive combined contexts from both knowledge sources using a number of strategies. We also show that utilising the RATD datasets enables our model to become proficient at utilising information from combined contexts both separately and in conjunction with the RR method.Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated.https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htmhttp://creativecommons.org/licenses/by-nc-sa/3.0/nz/Teaching Smaller Language Models To Generalise To Unseen Compositional QuestionsThesis2024-11-18Copyright: The authorhttp://purl.org/eprint/accessRights/OpenAccess