Track 1 (ML): Cerebras modelzoo ML models¶
Corresponds to models already present in version R2.4.0 of the Cerebras modelzoo ML models software.
Good Match Criteria: You would be a good match for this track if your research already uses, or could potentially use, any of the following models, supported via PyTorch:
Model | Code Pointer |
---|---|
BERT | Code |
BERT (fine-tuning) Classifier | Code |
BERT (fine-tuning) Named Entity Recognition | Code |
BERT (fine-tuning) Summarization | Code |
BLOOM | Code |
BTLM | Code |
DiT | Code |
DPO | Code |
DPR | Code |
ESM-2 | Code |
Falcon | Code |
GPT-2 | Code |
GPT-3 | Code |
GPT-J | Code |
GPT-NeoX | Code |
GPT-J (fine-tuning) Summarization | Code |
JAIS | Code |
LLaMA, LLaMA-2 and LLaMA-3 | Code |
LLaVA | Code |
Mistral | Code |
Mixtral of Experts | Code |
Multimodal Simple | Code |
RoBERTa | Code |
SantaCoder | Code |
StarCoder | Code |
Transformer | Code |
T5 | Code |
Based on the Cerebras modelzoo R2.4.0 GitHub page.
Track Specific Questions¶
If your project falls under this category, make sure to address the following questions in your application document:
- Please, indicate which model(s) from the modelzoo you intend to use. Do you anticipate being interested in adjusting the model architecture?
- Please, describe the dataset you are intending to use.
- How big is the dataset of interest (total dataset size, number of samples, and sample size in MB)?
- Please, elaborate on the readiness of the dataset of interest. Is it fully available at this time? If not, how soon would it be fully available?
- Please specify the shapes of the input and output tensors for your model/s.
- If possible, please specify the name of the dimensions for your input and output tensors from the previous question. E.g. (batch, input channels, height, width)
- Please specify the loss function that you would like to use.
- Please, list the libraries complementary to standard PyTorch and/or TensorFlow distributions that you would need to train your model(s).
- Please, list the key libraries that you would need for data preprocessing.