Quick Primer: NVIDIA NeMo Framework Vocabulary
In this blog we’ll be going over the basics of some terms related to NVIDIA's NeMo framework and explaining how they relate to each other.
The NeMo (Neural Modules) framework was developed by Nvidia to make it easier to use and train conversational AI models. Areas of focus for NeMo include speech processing, natural language processing, and text to speech. NeMo is built on the idea of utilizing reusable components to help users train and work with these models.
In NeMo a “model” is not only the neural network being used, but also all the supporting components that allow training and inference to happen. This is helpful because language related models require a lot of preprocessing and postprocessing and by bundling these components together with the model you have all the tools you need to train and perform inference. A NeMo “model” includes:
- A neural network architecture
- Dataset and data loaders
- Preprocessing and postprocessing scripts
- Optimizers and schedulers
- Any other supporting infrastructure
The notebook linked here (https://github.com/NVIDIA/NeMo/blob/main/tutorials/00_NeMo_Primer.ipynb) does a good job of describing the parts of a NeMo model in more detail.
Because all these components are meant to be modular they are easily swapped out and added into the model training process. This should make it easier to experiment with different components at all stages of model training and inference.
NeMo Megatron is an example of one of the pretrained models that can be utilized when developing with the NeMo framework. It is a large transformer that can support GPT, BART, or BERT style models. The article linked here (https://www.width.ai/post/bart-text-summarization) explains the differences between these model types. It is only one of many pretrained models available for use through NeMo. The documentation page linked here (https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/core/core.html) talks about what pretrained models are available for automatic speech recognition, natural language processing, and text-to-speech tasks.
BioNeMo is a collection of generative language models focused on drug discovery. The container of BioNeMo I have experimented with contained 3 models listed below:
- ESM-1nv: protein property prediction
- Prott5nv: protein generation
- MegaMolBART: small molecule generation
However, a blog on NVIDIA's webpage (linked here: https://developer.nvidia.com/blog/build-generative-ai-pipelines-for-drug-discovery-with-bionemo-service/ ) lists some more model types that will be available in BioNeMo. NVIDIA is going to offer BioNeMo as a cloud service for researchers looking to utilize these specialized models. One would interact and finetune these models using the NeMo framework. BioNeMo is currently still in early access.