This part records the progress of our team and some communication activities we participated in.
We build our team, and did research for iDEC project direction. Last, we aimed at peptide drug discovery.
We came up with an original version of program framework to begin with.
Figure1: Sampling of Pocking Domain
Figure2: Peptide generation
Figure3: We envisioned how-to-use
Figure4: Dr. Cao, our instructor, envisioned the base framework of Peplib Generator
We began with peptide-protein interaction prediction model, pre-processed PPI databases like Propedia and Pepbdb. We’ve also tested speed of peptide-protein docking, which is quite depressing for it took 9-13 hours.
Figure5: Evolution Algorithm: a sequential genetic algorithm framework is proposed.
A transformer-based model is expected to be used on PPI model, we learnt how does multi-attention layer work.
We suspended our project for Mid Term Exams.
We used an encoder-decoder system to represent peptide, a part of data from peptideatlas is used to train the model, reached a high performance in both 256 and 512 dimensions. We discussed how to represent PPI complexes to train PPI model, a structure-based dataset contains much more information in a data pair than a sequential-based one, yet to build a model to represent structure complex is a bigger challenge, and gladly we accepted it.
Figure6
A part of our team took apart in the 1st Biological Computing Conference of China. And adjusted project to avoid some overlap work for other Drug companies like XtalPi.
We built a framework from ResNet and StyleGAN as PPI prediction model.
Figure7: PPI prediction model
We designed a UML diagram about how we deploy models in servers.
Figure8
We suspended our project for Final Term Exams.
Successfully, our Evolution Algorithm worked. We tested its performance with peptide representation model, and it find the longest sequence(50aa) in only 18s. However, the 1st PPI prediction model based on ResNet failed as the accuracy of it float around 50%. Here’s a hypothesis to explain its low accuracy: model can learn nothing but noise from our poorly represented structural dataset. Based on the hypothesis, we find a Geometry-CNN powered representation method: masif.
Figure9
We managed to deploy masif in the server, and represent all target protein using mesh data.
Figure10
We tried to embed mesh data into our PPI model.
Figure11
We joined 8th-CCiC in Fudan University, Shanghai, China.
We tested performance of AlphaFold2 in peptide-protein interaction prediction.
Requirements of a good prediction result can be summarized as follows:
A peptide with consistent 2nd structure or other intrinsic interaction like disulfide bonds in complex conformation
A flat, stable and hydrophobic binding pocket
Figure12: A group photo of our team members
We updated Evolution Algorithm, added a group of new features.
Figure13
We accelerate AlphaFold2, renamed it to turbo-AF2. Combined with updated Evolution Algorithm, we tested Peplib Generator, optimized a peptide sequence for better stability:
Figure14: Combined with updated Evolution Algorithm, we tested Peplib Generator, optimized a peptide sequence for better stability
Figure15: Mutations seemed senseful, with grate excitement, we tested its ability in generating peptide ligand, but it failed
We update its performance in generate stable ligands, and used an easier target, successfully we nailed it.
Figure16: A part of sequence data is provided in Supplementary
We kept updating its performance, and made more tests, got some conclusions about its pros and cons, they are concluded in our Reports.
We attended the 1st Synbiopunk in Shanghai, China. And make the length of generated peptide variable.
Figure17