During her 2018 Nobel Prize-winning Chemistry Lecture, Frances Arnold said: “Today, for all practical purposes, we can read, write, and edit any DNA sequence, but we cannot arrange it.” This is no longer true.
Since then, science and technology have advanced so much that artificial intelligence has learned how to assemble DNA, and thanks to genetically modified bacteria, scientists are well on their way to designing and manufacturing proteins to order.
The goal is that with AI design talents and gene-editing engineering skills, scientists can modify bacteria to act as mini-factories producing modern proteins that can reduce greenhouse gases, digest plastics or act as species-specific pesticides.
As a chemistry professor and computational chemist who studies molecular science and environmental chemistry, I believe that advances in artificial intelligence and gene editing make this a realistic possibility.
Gene sequencing – reading life prescriptions
All living things contain genetic materials – DNA and RNA – that provide the hereditary information needed for replication and protein production. Proteins constitute 75% of human dehydrated weight. They create muscles, enzymes, hormones, blood, hair and cartilage. Understanding proteins means understanding much of biology. The order of the nucleotide bases in the DNA or RNA in some viruses encodes this information, and genome sequencing technologies identify the order of these bases.
The Human Genome Project is an international endeavor whose goal was to sequence the entire human genome between 1990 and 2003. Thanks to rapidly improving technologies, it took seven years to sequence the first 1 percent of the genome, and another seven years to sequence the remaining 99 percent. By 2003, scientists had the complete sequence of the 3 billion nucleotide base pairs coding for 20,000 to 25,000 genes in the human genome.
However, understanding the function of most proteins and correcting their malfunctions remained a challenge.
AI learns proteins
The shape of each protein is crucial to its function and is determined by its amino acid sequence, which in turn is determined by the nucleotide sequence of the gene. Misfolded proteins have the wrong shape and can cause diseases such as neurodegenerative diseases, cystic fibrosis, and type 2 diabetes. Understanding these diseases and developing treatments require knowledge of protein shapes.
Before 2016, the only way to determine a protein’s shape was through X-ray crystallography, a laboratory technique that uses X-ray diffraction on single crystals to precisely determine the three-dimensional arrangement of atoms and molecules within a molecule. During this time, the structures of approximately 200,000 proteins were determined using crystallography, at a cost of billions of dollars.
AlphaFold, a machine learning program, used these crystal structures as a training set to determine the shape of proteins based on their nucleotide sequences. In less than a year, the program calculated the protein structures of all 214 million genes that had been sequenced and published. All protein structures determined in AlphaFold have been made available in a publicly accessible database.
To effectively address non-communicable diseases and develop modern drugs, scientists need more detailed knowledge of how proteins, especially enzymes, bind petite molecules. Enzymes are protein catalysts that enable and regulate biochemical reactions.
AlphaFold3, released on May 8, 2024, can predict the shapes of proteins and the locations where petite molecules can bind to those proteins. In rational drug design, drugs are designed to bind proteins involved in a pathway related to the disease being treated. Petite molecule drugs bind to the protein binding site and modulate its activity, thereby influencing the disease pathway. With the ability to predict protein binding sites, AlphaFold3 will enhance researchers’ drug discovery capabilities.
AI + CRISPR = composing modern proteins
Around 2015, the development of CRISPR technology revolutionized gene editing. CRISPR can be used to find a specific part of a gene, change or remove it, cause a cell to express more or less of the gene product, or even add an entirely foreign gene in its place.
In 2020, Jennifer Doudna and Emmanuelle Charpentier were awarded the Nobel Prize in Chemistry “for their development of a (CRISPR) genome editing method.” Thanks to CRISPR, gene editing that once took years, was species-specific, costly and labor-intensive can now be done in days and at a fraction of the cost.
Artificial intelligence and genetic engineering are developing rapidly. What was once complicated and exorbitant is now routine. Looking to the future, the dream is proteins custom-designed and manufactured through a combination of machine learning and CRISPR-engineered bacteria. The artificial intelligence would design the proteins, and the CRISPR-altered bacteria would produce the proteins. Enzymes produced in this way could potentially inhale carbon dioxide and methane while exhaling organic raw materials, or break down plastics into concrete substitutes.
I don’t think these ambitions are unrealistic, considering that genetically modified organisms already make up 2 percent of the U.S. economy in agriculture and pharmaceuticals.
Two groups created functioning enzymes from scratch designed by different artificial intelligence systems. The David Baker Institute for Protein Design at the University of Washington developed a modern protein design strategy based on deep learning, which he called “family-wide hallucination,” and used it to produce a unique light-emitting enzyme. Meanwhile, biotech startup Profluent has used artificial intelligence trained on all its CRISPR-Cas knowledge to design modern functioning genome editors.
If AI can learn to create modern CRISPR systems and bioluminescent enzymes that work and have never been seen on Earth, the hope is that combining CRISPR with AI could be used to design other modern, tailored enzymes. Although the CRISPR-AI combination is still in its infancy, once it matures, it is likely to be very beneficial and may even aid the world fight climate change.
However, it is vital to remember that the more powerful the technology, the greater the risk it poses. Additionally, humans have not been very successful in engineering nature due to the complexity and interconnectedness of natural systems, which often leads to unintended consequences.