Understanding genome function is one of the grand scientific challenges of the 21st century. This challenge lies not only in the structural and spatial complexity of the genome’s organization, resulting from the coordinated action of many different components, but also in the dynamical complexity of genome organization, encompassing processes that occur over different time scales, ranging from nanoseconds to hours, to exquisitely regulate the function of the genome.

Via developing novel theoretical and computational approaches, we aim to build a global framework of the human genome that connects its sequence with structure and activity, and to enable quantitative and predictive modeling of genome structure and function.

Information theoretic study of genome structure and dynamics

Genome-wide chromosome conformation capture (HiC) experiments provide a rich set of high-resolution data concerning the spatial organization of the genome. Via chemical cross-linking and high-throughput sequencing, HiC experiments determine the frequency at which any given pair of genomic loci of kilobase pair in length comes in close spatial proximity inside the cell nucleus. As these experiments probe the genome in its native environment, structural models that incorporate the HiC-generated loci contacts provide a more detailed and realistic 3D representation of the genome. We developed a maximum entropy approach to derive information theoretic energy landscapes of the chromosome that rigorous reproduce the contact probabilities measured in Hi-C. This approach has been applied to experimental data collected from both interphase and metaphase chromosomes, revealing many features of chromosome structures that were not obvious from direct analysis of the HiC contact map.

Multiscale modeling of chromatin fiber and protein complex assemblies

The maximum entropy approach proposed above enables high-resolution reconstruction of 3D genome structures without explicitly considering the underlying physical principles that organize the genome. To better understand the physical principles of genome organization and to further reveal chromosome structures at a finer resolution on the order of tens of nucleosomes, advanced computational models that allow direct first principle simulations are in need. We are developing protein-DNA models that are both computationally efficient to access the large system size, and chemically accurate to describe specific protein-protein interactions. 

Coupling genome structural dynamics with gene regulation

The structure and dynamics of the genome need to be investigated in the context of its activity to fully appreciate the driving force of genome folding. As is for proteins, the function of the genome is tightly coupled with its structure, i.e., the 3D organization. A prominent example is when a chromosome folds, the loops that form bring enhancers and promoters located far from each other in linear sequence (~ 1 Mb) closer together spatially  for gene regulation. At a shorter lengthscale (~10 kb), conformational dynamics of the two forms of chromatins, euchromatin and heterochromatin, can also play a role in gene regulation. We are developing novel theoretical approaches to incorporate genome structure and dynamics into gene network models, and to  elucidate the interplay of epigenetic modification and transcription factor regulation in cellular differentiation and fate determination