An AI system can predict thousands of 3D structures of chromatin – the thread-like mixture of DNA and proteins that are packed into chromosomes – in just minutes. The deep learning approach could speed up research into how different chromatin structures affect the way genes are expressed in individual cells, important for understanding genetic diseases and developing gene-editing treatments.1
Chromatin is one of the most complex materials in cells and enables the massive amount of DNA in the genome to fold up and fit into the nucleus of each cell. The building blocks of this genetic packaging material are called nucleosomes, which comprise sections of DNA that are wound around a core of proteins called histones, resembling beads on a string. Together, these bead-like structures form chromatin fibres that fold up tightly into chromosomes.
Folded chromatin structures are not random, however. They are determined by the genetic code and, in turn, they can control how a given gene is regulated. That’s because some genes are associated with special promotor and enhancer regions on the DNA that regulate gene expression based on how close or distant these regions are to one another. Crucially, the structure of chromatin can determine this distance and thus influence gene expression, giving rise to the different cell types in the body, from brain cells to skin cells, despite each cell containing the same genome.
However, as chromatin can form a huge variety of unpredictable 3D conformations, understanding how they influence genetic function has been a long-standing challenge. Previously, high-throughput sequencing and microscopic imaging technologies had revealed that even cells of the same type can have widely different chromatin structures. Methods have been developed over the past 20 years to investigate the diversity of these structures but they have required labour-intensive and time-consuming experiments, taking around a week to determine just a few chromatin structures from a single cell.
‘Studying individual cells is crucial to learn how these structures form and how they affect gene expression,’ explains Bin Zhang, who developed the new AI system with his colleagues at the Massachusetts Institute of Technology, US. ‘Recent advances in deep learning and generative AI, particularly the success of AlphaFold in protein structure prediction, motivated us to explore these methods for chromatin structure prediction.’
Training montage
Zhang and his colleagues designed a generative AI model called ChromoGen to quickly ‘read’ DNA sequences and predict the chromatin structures that might form. ‘In this way, we hope it will provide the data necessary to answer some of the important questions relating chromatin structure and gene expression,’ says Zhang.
To develop the system, the team turned to diffusion modelling, an advanced machine learning technique that is behind systems that turn text into artificially generated images. It has also found uses in predicting the 3D coordinates of ligands and protein molecules. The approach works by using algorithms to efficiently generate new data by progressively adding random noise into the training dataset. This process corrupts the data but then subsequently reverses the corruption to reconstruct new data and thereby arrives at realistic alternatives to the original data.
To develop ChromoGen, the team trained a deep learning model with over 11 million known 3D genome structures – a dataset that was obtained in 2018 using conventional cell-based experiments.2 The model was then also taught how to ‘read’ genomic DNA sequences to establish associations between chromatin structures and the underlying sequences that encode them.
This machine learning model was then combined with an AI diffusion model, which generated many predicted chromatin structures for a single cell type based on sequence data, effectively revealing sequence–structure relationships. The researchers confirmed that the generated structures were the same or close to those previously observed in real experimental data.
The team explains that the system can generate a thousand structures for a particular region of DNA in 20 minutes on just one GPU, far outstripping the speed of existing methods. ‘We are optimising ChromoGen to enhance its efficiency in predicting chromatin structures over longer regions,’ says Zhang. ‘Our long-term objective is to identify novel DNA sequence motifs involved in chromatin folding and elucidate the functional implications of 3D genome organisation.’
‘This work is an important step in bringing the realm of genomic spatial interactions closer to their actual three-dimensional representation and it outlines the key driving role of the underlying DNA sequence,’ comments Aleksandr Sahakyan, who uses AI in genomics at the University of Oxford. ‘With this line of scientific contribution, we can already predict that the problem of genome folding will soon be solved just as the problem of protein folding is through AlphaFold.’
References
1 G Schuette et al, 2025, Sci. Adv., DOI: 10.1126/sciadv.adr8265
2 L Tan et al, Science, 2018, 361, 924 (DOI: 10.1126/science.aat5641)
No comments yet