Mark III Systems Blog

The Hidden Threat: Analyzing protein sequences of animals to identify potential intermediate hosts of SARS-CoV-2 – Background and Approach (Part 1 of 2)

Summary:  My research on SARS-CoV-2 is currently focused on finding a group of animals that may be an intermediate host, which could directly impact our understanding of how the disease spreads and how we can take steps to potentially limit transmission in an area that doesn’t get much attention.  I am examining the evolution of the host protein sequences to see which ACE2 receptors could potentially bind to the Receptor Binding Domain (RBD) based on structural information.

To watch an interview on this topic, please click here!

Some quick background… The SARS-CoV-2 virus has a portion of its RNA genome that serves as the template to make a protein known as the S protein, which has a section known as the Receptor Binding Site, or RBD.  In humans and other animals, some cells make a cell surface protein called angiotensin-converting enzyme 2 (ACE2).  To gain entry into the cell, the RBD of the virus binds to a part of the ACE2 protein and a series of events occurs allowing the virus to (1) gain entry into the cell, (2) hijack the cellular machinery, (3) get replicated repeatedly by the infected cell, then finally (4) be released by the cell and spread.  It’s a very elegant, yet frightening process where the cell does all the work that will eventually lead to its own death without even realizing it. Such is the trickery of viruses.  By determining how the RBD binds to the ACE2 receptor at the amino acid level, researchers can target drugs that will interfere with binding and prohibit the virus from entering the cell.

I have significantly expanded the range of animals that could be intermediate hosts in my research.  I think that if you limit the search to just animals that have previously been found to carry SARS-like viruses, then a massive amount of data, as well as other viable animals can be overlooked. With the wealth of DNA and protein sequences in GenBank, the problem is difficult but tractable.  In order to speed the work, I am utilizing NVIDIA V100 graphics processing units (GPUs) running on a couple different GPU-accelerated server platforms, including NVIDIA DGX-1, for BLASTp searches and analysis of the sequences.  I don’t want to go into too much detail yet, but suffice it to say, the results have been quite interesting. Some of my results implicate animals listed above, but many new animals are also predicted.  I will be submitting the manuscript for publication this month and it will include all of the details.

Ultimately, SARS-CoV-2 research is aimed at stopping the spread of the pandemic and saving lives.  I’ll skip all the COVID-19 buzzwords and just say that by using the technology we possess, such as DNA sequencing, cryo-EM, x-ray crystallography, artificial intelligence, and ultra-high powered computing resources, we can make a difference and save lives.  Multiple, varied disciplines - computer science, basic and clinical sciences, epidemiology, statistics, and even human intuition - working together are stronger than any one discipline.  Together, we can – and will – make a difference.


Read the full version of my background notes by clicking here and stay tuned for my full findings and conclusions in Part 2 in the upcoming weeks.