Three-dimensional genome organization plays essential roles for all DNA-templated processes, including gene transcription, gene regulation, DNA replication, etc. Computer simulation can be an effective way of building genome structural models and improving our understanding of these molecular processes. It faces significant challenges, however. First, the human genome consists of over 6 billion base pairs, a system size that exceeds the capacity of traditional simulation approaches, even with the most powerful super-computer. Second, the set of molecular interactions that folds the genome is complex, with a wide array of protein molecules mediating the contacts between DNA segments. This complexity places a high demand on the chemical accuracy of force fields. Finally, the genome is inherently a non-equilibrium system, and one must go beyond conventional sampling techniques to account for the impact ATP-driven molecular motors on its organization. We tackle these challenges by bringing together statistical mechanics, computational modeling, and bioinformatics analysis to invent new methodologies that can accurately model the genome at different lengthscales.
In one research direction, we are characterizing genome organization at a near atomistic resolution. Our simulation approach generalizes beyond existing phenomenological models by providing a vastly improved description of protein-protein and protein-DNA interactions. It is helping us resolve the long-standing controversy regarding the most stable organization for a string of nucleosomes and the existence of regular 30 nm fibers in situ. Our innovation here goes into the development of force fields and sampling techniques to ensure both chemical accuracy and computational efficiency of chromatin modeling. These two features are essential for systematically investigating the dependence of chromatin stability on ionic concentrations, nucleosome repeat length, DNA sequence, and histone modifications.
Due to its high computational cost, the first-principles approach outlined above will be limited to study a single gene. To simulate the whole genome consisting of over 20,000 genes, we are developing computational approaches inspired by statistical mechanics, polymer simulations, and recently deep learning. One particular direction is what we call the data-driven mechanistic-modeling approach, which enables a high-resolution reconstruction of three dimensional genome organization. Importantly, it makes possible mechanistic studies to investigate casual relationships between genome organization and genetic and epigenetic marks. Meanwhile, we are working closely with experimental groups to identify crucial molecular players for organizing the genome and to investigate the implications of genome misfolding in tumorigenesis.
Chromatin is inherently an active system. Its conformational dynamics is subject to perturbations from ATP-driven enzymes that break the detailed balance. In addition, post-translational modifications to histone proteins undergo constant exchange, resulting in a time-dependent Hamiltonian. We are developing analytical approaches to address the impact of non-equilibrium activities on chromatin organization, with a focus on perturbation theories that approximate non-equilibrium steady state with effective equilibrium models.