Abstract:
The advancement of neural machine translation makes phenomenal impacts on the field of
machine translation. As a typical neural machine translation architecture, the transformer
achieves further improvement on both translation quality and convergence time, which
is also beneficial to downstream tasks such as cross-lingual question answering (CLQA).
CLQA refers to answering questions in one language through a question answering (QA)
model trained in another language, where machine translation models can be used to
translate the original question and the output of the QA model to corresponding languages.
Previous works improve the accuracy of the CLQA tasks by forming more precise translations. However, the machine translation models and the QA models are often off-the-shelf,
and the improvement is generally made by crafting rule-based corrections or introducing
additional translation modules. Therefore, the effect of the machine translation and QA
models are lacking in exploration. In addition, the number of test data is limited due to
the span-based answer type, where the answer is a span of text being a summary or extracted from the corresponding document. Hence, the translated answers may be correct
but not identical to the expected ones, which requires manual evaluation. Consequently,
the evaluation process is laborious and may introduce biases.
The present thesis studies the effect of CLQA by training machine translation models and
QA models using publicly accessible data. A language pair of English and Chinese is used
in this project to access the large variety of training data, and the span-based QA tasks are
replaced with multiple-choice QA tasks to address the evaluation issue. Finally, this thesis
empirically studies 24 machine translation models and 6 QA models. The experimental
results suggest that both machine translation and QA models significantly affect the
CLQA tasks’ accuracy, and the translation model’s domain plays a more dominant role
than translation quality.