Multiple-Choice Cross-Lingual Question Answering through Transformer-Based Neural Machine Translation: an Empirical Study

Wang, Ziqi

Multiple-Choice Cross-Lingual Question Answering through Transformer-Based Neural Machine Translation: an Empirical Study

Degree Grantor

The University of Auckland

Abstract

The advancement of neural machine translation makes phenomenal impacts on the field of machine translation. As a typical neural machine translation architecture, the transformer achieves further improvement on both translation quality and convergence time, which is also beneficial to downstream tasks such as cross-lingual question answering (CLQA). CLQA refers to answering questions in one language through a question answering (QA) model trained in another language, where machine translation models can be used to translate the original question and the output of the QA model to corresponding languages. Previous works improve the accuracy of the CLQA tasks by forming more precise translations. However, the machine translation models and the QA models are often off-the-shelf, and the improvement is generally made by crafting rule-based corrections or introducing additional translation modules. Therefore, the effect of the machine translation and QA models are lacking in exploration. In addition, the number of test data is limited due to the span-based answer type, where the answer is a span of text being a summary or extracted from the corresponding document. Hence, the translated answers may be correct but not identical to the expected ones, which requires manual evaluation. Consequently, the evaluation process is laborious and may introduce biases. The present thesis studies the effect of CLQA by training machine translation models and QA models using publicly accessible data. A language pair of English and Chinese is used in this project to access the large variety of training data, and the span-based QA tasks are replaced with multiple-choice QA tasks to address the evaluation issue. Finally, this thesis empirically studies 24 machine translation models and 6 QA models. The experimental results suggest that both machine translation and QA models significantly affect the CLQA tasks’ accuracy, and the translation model’s domain plays a more dominant role than translation quality.