A holistic understanding of cancer genomics to inform treatment and research
Reference
Degree Grantor
Abstract
Background: The increase in volume of cancer genomic data has improved our understanding of cancer biology as well as laying the foundation for clinical precision oncology. The main limitation in this field now sits within the analytical methods, rather than data generation. Currently, there is no gold standard for analysing cancer next generation sequencing data. Therefore, we need to better understand the limitations of the available sequencing and bioinformatic analysis methods, in order to improve the use of DNA and RNA sequence data for cancer research and patient care. Objectives: To understand the validity and limitations of available bioinformatics methods then apply this knowledge to advance our understanding of (i) the genomic landscape of pancreatic neuroendocrine tumours and (ii) the expression patterns and biological impact of alternative RNA splice variants of TP53. Methods: To understand the validity of bioinformatic methods for studying tumours with few mutations and with substantial aneuploidy, I used simulated data containing known DNA sequence variants to evaluate different somatic single nucleotide variant calling methods and structural variant calling methods. Then, to assess whether TP53 splice variants can be quantitated accurately by current genomic and bioinformatic methods, simulated RNA sequencing data with known TP53 splice variant abundance was used, and the results compared with gold standard long digital PCR analysis. Results: I identified a list of clinically important single nucleotide variants likely to be missed by commonly used somatic DNA variant calling methods, and also determined the effectiveness and limitations of a range of methods for calling DNA structural variants. Analysis of combined clinicopathological and multi-omic data showed that pancreatic neuroendocrine tumours have a low number of somatic variants, and instead are characterised by extensive aneuploidy, with the pattern of aneuploidy linked with patient outcome. I then showed that RNA sequencing has difficulty quantitating low abundance TP53 transcripts, especially transcripts expressed <1,000 copies/ug of RNA. Conclusion: The distinct patterns of aneuploidy in pancreatic neuroendocrine tumours are strongly associated with patient outcome in two thirds of patients in our study. These patterns are potentially valuable for clinical decision making. My identification of ‘false negative’ errors in SNV calling suggests that the use of variant calling software ‘out of the box’ may be missing clinically important variants. Based on a detailed investigation of the accuracy of RNA-seq analysis for TP53 RNA splicing variants, I propose caution when interpreting sequencing analysis of TP53 RNA splicing, and suggest this finding may be generalised to other genes. The results from my thesis suggest that as more bioinformatic methods are developed, there is a continuous need to evaluate them, understand their limitations and ensure their suitability to address specific research questions.