Explainable and Automated Scientific Fact-Checking with Neural Networks
Reference
Degree Grantor
Abstract
Fact-checking plays a crucial role in combating misinformation, especially in scientific domains where the stakes are high, and the consequences of false claims can be severe. As experienced during the COVID-19 pandemic, unfaithful claim verifications underscored the need for robust fact-checking systems. This thesis addresses the challenges of verifying claims in the scientific literature using deep learning-based computational linguistics techniques, focusing on faithfully incorporating knowledge from a vast existing literature and addressing the scarcity of appropriate training datasets for robust fact-checking systems. Language models in Natural Language Processing (NLP) are computational models designed to understand language. These models are trained on massive amounts of data to learn statistical patterns and relationships within language. They work by predicting words, sub-words, or characters in a sequence, taking into account the context provided by preceding sequence elements. In recent years, the dominant architecture for language models has been transformerbased models, which can be seen as a milestone in NLP research due to their significant improvements for downstream tasks. However, these models also have limitations, including modelling challenges and dataset challenges. Modelling challenges refer to the limitations and complexities faced by computational models, particularly those based on deep learning and natural language processing techniques. These challenges include difficulties in accurately capturing nuanced arguments, potential generation of false information (‘hallucinations’), constraints on handling lengthy input texts, and limitations in reasoning capabilities. On the other hand, dataset challenges stem from the scarcity of appropriate training data essential for building robust fact-checking systems. The specialized nature of scientific content, coupled with the need for accurate annotations, creates an expertise bottleneck. In this context, developing large-scale, domain-specific datasets becomes crucial to train models effectively. These challenges collectively necessitate innovative methodologies to enhance the capabilities of computational models for accurate scientific fact-checking, addressing both their inherent modelling intricacies and the scarcity of specialised training data. This thesis proposes methods to advance the field of scientific fact-checking by developing approaches that enhance the capabilities of transformer-based language models. In addressing modelling challenges, we present a novel methodology that leverages multiple viewpoints from scientific literature, allowing the assessment of contradictory arguments and implicit assumptions. Our proposed inference method enhances reasoning by distilling information from diverse, relevant scientific abstracts. This approach yields a verdict label that can be weighted based on the article’s reputation and an explanation that can be traced back to sources to avoid hallucinations. Our findings demonstrate that human evaluators perceive our explanation to be significantly superior to off-the-shelf models, enabling faithful tracing of evidence back to its original sources. For the problem of handling lengthy input texts, we introduce a method that utilises the layer-based attention scores of transformers to filter input length. This approach proves efficient for scientific paper topic classification and verdict label prediction tasks, which is critical for effective fact-checking. Regarding dataset challenges, we address the expertise bottleneck limiting the availability of appropriate training data for scientific fact-checking. We propose a pipeline, Multi2Claim, for automatically converting multiple-choice questions into fact-checking data. Using this pipeline, we create two large-scale datasets: Med-Fact for the medical domain and Gsci-Fact for general science. These datasets represent significant contributions as they are among the first large-scale scientific fact-checking datasets. Baseline models developed using each dataset show promising results, with performance improvements of up to 26% on existing fact-checking datasets such as SciFact, HEALTHVER, COVID-Fact, and CLIMATE-FEVER. In conclusion, the proposed methodologies in this thesis contribute to the advancement of scientific fact-checking by addressing modelling intricacies and dataset challenges, offering a promising step towards more accurate and effective systems to combat misinformation in scientific domains.