Prediction of Exon Position in Plant Genomes from RNA-Seq using Multi-Layer Perceptrons

Reference

Degree Grantor

The University of Auckland

Abstract

Gene annotation remains a significant, non-trivial problem in bioinformatics. The two main aspects of the gene annotation problem are the identification of the gene positions in the genomes and identification of the gene’s biological function. In this thesis work, we use experimental and simulated RNA-Seq (ribonucleic-acid sequencing) data from, Arabidopsis thaliana and experimental RNA-Seq data from the kiwifruit variety Actinidia chinensis aligned to their respective host genomes to train linear predictor and multi-level perceptron (MLP) artificial neural networks (ANN) to identify exon region within 99 Arabidopsis thaliana genes and 99 Actinidia chinensis "Red5" genes, which were selected to represent a broad range of gene exon structure complexities, categorised as high medium or low complexity. We conclude linear predictor models are not generally capable of accurately predicting exon regions within the genes used in this study. Furthermore, MLP-based ANN models perform well in our experiments, though the choice of optimal ANN architecture, the 1 or 2 hidden layer MLP, differs depending on the exon structure complexity of the gene’s being predicted. Finally, the RNA-Seq data produced by the simulation method used in this work is not sufficient to train the ANN models to accurately recognise exon regions in the genes used in this study.

Description

Full Text is available to authenticated members of The University of Auckland only.

DOI

Related Link

Keywords

ANZSRC 2020 Field of Research Codes