Proteins are crucial for the functioning of cells and are closely associated with diseases. Understanding and characterizing proteins can provide insights into the mechanisms of diseases and facilitate the development of new therapies. However, the current process of designing proteins is complex and expensive, requiring significant computational and human resources.
To simplify this process, Microsoft has introduced EvoDiff, a general-purpose framework for generating proteins. Unlike other frameworks, EvoDiff does not require structural information about the target protein, making it more efficient. The framework has the potential to create enzymes for therapeutics, drug delivery methods, and industrial chemical reactions.
EvoDiff operates on a 640-parameter model that has been trained on data from various species and protein classes. It utilizes a diffusion model, similar to those used for image generation, to gradually transform a noisy starting protein into a specific protein sequence.
The framework allows for the generation of both folded and disordered proteins. Disordered proteins, despite not folding into a three-dimensional structure, play crucial roles in biology and diseases.
While EvoDiff has not yet undergone peer review, the Microsoft research team plans to continue scaling up the framework to improve generation quality. They also aim to test the viability of the proteins generated by EvoDiff in laboratory experiments to further validate the framework.
EvoDiff represents a significant advancement in protein engineering, moving away from the traditional structure-function approach to a sequence-first design. The framework provides greater generality, scale, and modularity in protein generation, potentially revolutionizing the field of protein engineering.
Further development of EvoDiff may involve incorporating additional information, such as text or chemical data, to achieve more precise control over protein design.