More effective CRISPR-Cas9 variations have been created by a research team from the LKS Faculty of Medicine at The University of Hong Kong (HKUMed). These variants may be applied in gene therapy procedures. The team has increased their ability to analyze up to 20 times more protein variants at once without the need for additional experimental data by developing a new pipeline methodology that applies machine learning to high-throughput screening to accurately predict the activity of protein variants. This significantly speeds up the process of protein engineering.
The pipeline has been successfully used in multiple Cas9 optimizations and created new, more effective Staphylococcus aureus Cas9 (SaCas9) variants. A patent application has been submitted based on this research, and the results have recently been published in Nature Communications.
Since of its tiny size, Staphylococcus aureus Cas9 (SaCas9) makes an excellent option for in vivo gene therapy because it can be packaged into adeno-associated viral vectors and given to human cells for therapeutic purposes. Its gene-editing capability might fall short in some disease regions, though.
More SaCas9 optimizations are essential for precision medicine before it can be employed as a trustworthy tool for treating human ailments. These improvements must include changing the Cas9 protein to increase its effectiveness and precision. The conventional methodology for changing the protein uses saturation mutagenesis, where the number of potential changes that could be made to the protein is orders of magnitude more than the experimental screening capability of even the most advanced high-throughput platforms.
The researchers investigated if combining machine learning with structure-guided mutagenesis library screening may enable the virtual screening of many more alterations in order to precisely select the uncommon and better-performing variants for additional in-depth validations. The team tested the machine learning framework on a number of previously published mutagenesis screens on Cas9 variations and was able to demonstrate that machine learning could reliably identify the highest performing variants using just 5 to 20 percent of the experimentally determined data.
Protospacer adjacent motif (PAM)-interacting (PI) and Wedge (WED) domains are two of the many components of the Cas9 protein that help it engage with the target DNA duplex. The study team combined mutations in the PI and WED domains of the SaCas9 protein to build an activity-enhanced SaCas9 protein enclosing the DNA duplex containing a mutation (PAM). It was intended to lessen the PAM constraint for wider genome targeting while maintaining the protein structure by strengthening the interaction with the PAM-containing DNA duplex via the WED domain. PAM is essential for Cas9 to alter the target DNA.
The researchers found additional variants in the screen and subsequent validations, including one known as KKH-SaCas9-plus with improved activity by up to 33 percent at particular genomic loci. The improved effectiveness of KKH-SaCas9-plus was explained by the novel contacts between the WED and PI domains that were generated at various sites within the DNA duplex that included PAM, as demonstrated by the following protein modeling investigation.
The structure-guided design has dominated Cas9 engineering up until recently. It only looks at a select few locations, amino-acid residues, and combinations, though. Through the use of a machine learning-coupled multi-domain combinatorial mutagenesis screening approach, the research team was able to demonstrate in this study that screening can be done on a larger scale while requiring less experimental work, time, and money. This approach also allowed them to discover a new high-efficiency variant called KKH-SaCas9-plus.
According to the assistant professor of the school of biomedical sciences at HKUMed, this method will significantly speed up the Cas9 proteins’ optimization, which would enable genome editing to be used to cure genetic illnesses more effectively.