In eukaryotic organisms, DNA replication is initiated at “origins,” launching “forks” that spread bidirectionally to replicate the genome. The distribution and firing rate of these origins and the fork progression velocity form the “replication program.” With Antoine Baker, I generalize a stochastic model of DNA replication to allow for space and time variations in origin-initiation rates, characterized by a function I(x,t). We then address the inverse problem of inferring I(x,t) from experimental data concerning replication in cell populations. Previous work based on curve fitting depended on arbitrarily chosen functional forms for I(x,t), with free parameters that were constrained by the data. We introduce a model-free, non-parametric method of inference that is based on Gaussian process regression, a well-known inference technique from the machine-learning community. The method replaces specific assumptions about the functional form of the initiation rate with more general prior expectations about the smoothness of variation of this rate, along the genome and in time. Using this inference method, we can recover simulated replication schemes with data that are typical of current experiments without having to know or guess the functional form for the initiation rate I(x,t). I will argue that Gaussian process regression has many other potential applications to physics.


Talk Number PIRSA:13120017