Abstract:
Studying the phylogenetic reconstruction of somatic evolution can be challenging due to constraints in resources, the influence of various biological factors, and technical limitations. Many methods concentrate on the analysis of single-cell sequencing data. This approach,
while generally more accurate than using bulk-sequencing data, can be limiting due to its
computational complexity and potential technical artefacts.
We integrate two simulation software tools to facilitate the modelling of evolutionary histories
between multiple metastatic sites within a patient. The first tool is used to generate
phylogenetic trees, providing a framework that represents the true evolutionary histories both
at a multi-tumour and individual cell level. Subsequently, the second tool generates single-cell
tumour sequences based on the true cell-cell trees. By combining these two software tools,
cells are assigned to specific tumours, and therefore simulated under a structured population.
Our study evaluates the viability of pooling single-cell data into consensus sequences by comparing
their accuracy in reconstructing the multi-tumour tree against pseudo-bulk data. This
approach addresses the challenge of obtaining multi-tumour level evolutionary histories from
single-cell data. We aim to provide guidance for researchers when choosing their preferred
sequencing analysis method or for those looking to trace multi-tumour evolution.
Under various biological conditions, we simulate single-cell tumour sequences with a predefined
multi-tumour tree and its corresponding cell-cell tree replicates. We construct consensus
sequences, pooling cell sequences based on shared tumour lineage. For comparison,
we construct pseudobulk data. Calculations of tree distance between initial trees against
reconstructed trees show that consensus sequences do not perform as well as a pseudobulk
dataset.
We explore the process of reconstructing the true multi-tumour tree by integrating existing
data, compiled as sets of replicates. First, we perform tree reconstruction using a species
estimation method on single-cell data. Additionally, we explore supertree construction as
well as the use of concatenated sequences, leveraging pooled data. Through this analysis,
we find that using single-cell data directly or utilising pseudobulk data for reconstructing
multi-tumour evolution yields the best results.