Protein sequence design and high throughput screening solution

Overview

Traditional Enzyme Directed Evolution

Obtaining mutant enzymes with improved performance through either completely random mutagenesis or random mutagenesis at specific sites combined with high-throughput screening typically requires multiple rounds of screening. Besides the immense workload of screening, a significant problem with this approach is that, given 20 different conventional amino acids, the total number of possible mutants for a protein with a typical length of 300 amino acids is 20300, far exceeding our screening capabilities. Even if we choose to randomly alter 10 sites, using the NNK codon to construct a mutant library, the number of variables still reaches 1.1 x 1015, which is still beyond our screening capabilities. Obviously, only a tiny fraction of the possible mutations are screened through conventional directed evolution, thus the probability of finding significant performance improvements is typically low.

Machine learning-assisted enzyme engineering

A controllable amount of mutant activity data is used in machine learning to predict highly active mutants, and only a smaller number of predicted sequences can be screened to obtain significantly improved mutants (DOI: 10.1038/s41592-021-01100- y). It greatly reduces the screening workload and time and improves the efficiency of enzyme engineering and design.

Process

Creation of screening methods

What is the ideal screening method?

For enzyme engineering, the most important aspect is to choose an appropriate screening method that can distinguish between high-activity and low-activity mutants. On this basis, the higher the throughput, the more advantageous it is.

Ideal screening method

As shown in the figure, based on the original protein (WT), with the substantial increase in activity, the output signal of the screening system also increases significantly, and the mutants with a larger increase in enzyme activity can be distinguished through the output signal.

Suboptimal screening method

As the enzyme activity increases, the magnitude of the increase in the output signal is smaller. To achieve better screening results, it is necessary to increase the dynamic range of the output signal.

Suboptimal screening method

The output signal of the original protein is at a relatively high level, making it difficult to distinguish high-activity mutants. To achieve better screening results, it is necessary to shift the entire curve to the right.

Establishment of a high-throughput screening method for TnpB

High-throughput screening methods: There are mainly two types of conventional high-throughput methods: screening (based on fluorescent reporter genes such as GFP) and selection (mutants with higher activity leading to better cell growth and lower activity leading to cell death). In order to obtain a large amount of data for machine learning, the former often utilizes flow cytometry sorting combined with high-throughput sequencing (FACS+NGS), while the latter only requires conventional amplicon high-throughput sequencing, which is more convenient and cost-effective.

The report published in 10.1101/2023.09.18.558227 describes the construction of yeast nutritional mutants to screen for high-activity cjCas9 mutants. When Cas9 activity is high, the nutritional mutant phenotype is restored, allowing normal cell growth. For high-activity spCas9, the output signal of a single nutritional mutant screen is already close to the upper limit, and the output signal of a double mutant is also relatively high. Therefore, this system is not suitable for screening high-activity spCas9 mutants. However, for low-activity CjCas9, the output signal of a single nutritional mutant is sufficient for screening purposes. The authors ultimately used double mutant yeast to screen for high-activity CjCas9 and successfully identified mutants with improved activity by 12-fold.

For TnpB with lower activity than spCas9, we chose to construct a yeast strain with triple nutritional defects for screening. By adjusting the nutritional components in the culture medium, we can screen for a range of activities from 0 to 2 defects. The strain was constructed using spCas9 combined with homologous recombination.

Constructing screening strains using SpCas9 multiple editing

Screening principle:

Taking the screening of ADE2 site as an example, after active TnpB cuts DNA, homologous recombination occurs through the HR sequences at both ends, thereby forming an active ADE2 gene.

Evaluation of high-throughput screening methods for TnpB

1.Is there a corresponding relationship between output signal and enzyme activity under screening conditions?

Comparing the original TnpB with a highly active TnpB mutant (since there is currently no highly active TnpB mutant available, a SpCas9 with higher activity than TnpB will be used as a substitute), we will examine the proportion of viable cells in both nutrient-deficient medium and complete medium. The goal is to investigate whether higher enzyme activity correlates with a higher proportion of viable cells.

2.Can highly active enzymes be enriched under screening conditions?

Machine learning-assisted enzyme design and engineering

Why do you need machine learning?

Traditional directed evolution (DE) techniques have certain limitations in improving protein performance. Although they can utilize fitness landscape data to guide the improvement of protein performance, they can only sample a small portion of the sequences in the fitness landscape. Additionally, traditional DE methods often overlook spatial factors such as active sites and substrate pockets, focusing solely on the accumulation of single-point effects, which can easily lead to local optimal solutions.In contrast, machine learning (ML) approaches can fully exploit spatial information when learning the mapping between sequences and fitness, enabling a broader search scope. Therefore, ML methods can significantly improve screening efficiency and provide more effective pathways for enhancing protein performance. By leveraging the power of ML, we can overcome the limitations of traditional DE techniques and achieve more comprehensive and accurate predictions of protein behavior.

How to obtain large amounts of experimental data for machine learning?

Construct single-point or multi-point mutation library, cultivate and screen them by nutritional defective medium. The mutant with high activity will have higher cell count, while the mutant with low activity will have lower cell count, or even not grow. Through high-throughput sequencing (NGS), we can get the proportion of each mutant in the mutation library. By comparing the proportion under screening condition and that under non-screening condition, we can get the fitness landscape as shown in the following figure.

Establishment of enzyme activity evaluation methods and mutant activity evaluation.Traditional Enzyme Directed Evolution

How to ensure that the mutants obtained through screening perform better than the original protein?

1.High-throughput screening inevitably has some errors, so the mutants obtained by high-throughput screening usually need to be validated for activity under screening conditions to exclude false positives.

2.High-throughput screening is generally "what is screened is what is obtained", which means that the mutants obtained by screening are usually those that have been improved under the screening conditions. However, there may be differences between the actual enzyme reaction system and the screening system, so the mutants obtained by screening may not perform as expected when applied in practice. Therefore, we need to re-screen some of the mutants obtained under two conditions: first, to verify their activity through in vitro enzyme catalysis; second, to verify their activity under the actual application system, and finally obtain the best mutant.

1.Use the cell survival rate used in the primary screening for re-screening evaluation

Desired mutant effect

2.In vitro activity assay: Express and purify the mutant in E. coli, cleave DNA with the enzyme in vitro, observe by electrophoresis, and determine whether the mutant with high activity also has higher efficiency in cleaving DNA in vitro.

Desired mutant effect

3. Animal cell activity assay: Synthesize mRNA of the mutant, transfect 293T cells, select multiple sites with potential application for preliminary validation, use T7 endonuclease method to roughly determine the editing efficiency of each site, and select sites with better editing efficiency for high-throughput sequencing to accurately evaluate their editing efficiency.

Protein sequence design and high throughput screening solution

Overview

Traditional Enzyme Directed Evolution

Machine learning-assisted enzyme engineering

Process

Creation of screening methods

What is the ideal screening method?

Ideal screening method

Suboptimal screening method

Suboptimal screening method

Establishment of a high-throughput screening method for TnpB

Screening principle:

Evaluation of high-throughput screening methods for TnpB

1.Is there a corresponding relationship between output signal and enzyme activity under screening conditions?

2.Can highly active enzymes be enriched under screening conditions?

Machine learning-assisted enzyme design and engineering

Why do you need machine learning?

How to obtain large amounts of experimental data for machine learning?

Establishment of enzyme activity evaluation methods and mutant activity evaluation.Traditional Enzyme Directed Evolution

How to ensure that the mutants obtained through screening perform better than the original protein?

1.Use the cell survival rate used in the primary screening for re-screening evaluation

2.In vitro activity assay: Express and purify the mutant in E. coli, cleave DNA with the enzyme in vitro, observe by electrophoresis, and determine whether the mutant with high activity also has higher efficiency in cleaving DNA in vitro.

Services

Solutions

Products

Applications