Abstract:
Two-phase experiments arise when the treatment eects of interest in an experiment (Phase 1) cannot be measured directly and the material from the experiment requires further processing (Phase 2 experiment) in order for these eects to be evaluated. Two designs are needed for these experiments, one for each phase. Since the allocation of experimental units from the Phase 1 experiment to blocks in the Phase 2 experiment results in the block eects from the two phases interacting with one another, the Phase 2 design must be carried out with consideration to the Phase 1 design. Except for very small experiments, how this should be done in an optimal way is non-trivial. Theoretical analysis of variance (ANOVA) tables, showing the decomposition of the total variability in the data space into its constituent components of known sources of variation, and their corresponding degrees of freedom, are shown to play an important role in assessing the properties of competing designs for two-phase experiments. However, generating these ANOVA tables is a laborious manual task, even for relatively small two-phase experiments. To automate this process, an R package called infoDecompuTE was developed and is available on the Comprehensive R Archive Network. All of the ANOVA tables presented throughout this thesis were generated by infoDecompuTE. While the theoretical ANOVA tables are an important tool in assessing the properties of competing designs, the manual generation of optimal designs for two-phase experiments is non-trivial, particularly for non-orthogonal designs. Thus, a fundamental component of this thesis is the development of methodologies for the computer generation of designs for two-phase experiments. A combination of theory, to derive multi-criterion objective functions, and computing, in which a modied simulated annealing algorithm is developed, are used to identify A-optimal designs for the Phase 2 experiment when the Phase 1 experiment is arranged in either a completely randomised, a randomised complete block or a balanced incomplete block design. Optimal designs for a range of design parameters for both the Phase 1 and Phase 2 experiments are catalogued in the appendices of this thesis, as are summary tables of their properties. Data simulations were carried out to explore how well the variances of treatment effects are estimated among competing Phase 2 designs. For this, the eective degrees of freedom (EDF) for estimating the error variance, using Satterthwaite's approximation, were calculated using two methods of variance component estimation, namely taking linear combinations of expected mean squares and restricted maximum likelihood. The two methods of variance component estimation were found to have little eect on the EDF. However, the simulation studies were shown to be informative with respective to preferred choice of two competing designs when the relative magnitudes of the variance components are known. While the motivating examples in this thesis come from proteomics experiments, which have as their goal to link the identities and abundances of proteins in a biological sample to dierent experimental conditions (treatments), the methods presented in this thesis apply more generally across a wide range of biological, and other, experiments.