Connecting to the Cerebras Cloud¶
Generating an SSH Key¶
If you do not already have an SSH key, generate one using the following command:
Follow the prompts:
- Accept the default location or specify a different one.
- Optionally, set a passphrase (if set, do not forget it).
Example output
Generating public/private key pair.
Enter file in which to save the key ($HOME/.ssh/id_key): <<< The default location is fine.
Enter passphrase (empty for no passphrase): <<< If you set it, you cannot forget it.
Enter same passphrase again:
Your identification has been saved in $HOME/.ssh/id_KEY <<< Private key. Don’t share it.
Your public key has been saved in $HOME/.ssh/id_KEY.pub <<< Public key. Please share it.
The key fingerprint is:
SHA256:cGoO3HowD3pcViA74UMz1oGNrzkDssLJom8cdA27JlM your_email@example.com
The key's randomart image is:
+-----[ KEY ]------+
| B=o. |
| ++*o. |
| ==. o |
|. o.E+o= |
|o+.+*+* S |
|+++oB% |
|+..=+o+ |
|. o. . |
| o. |
+----[SHA256]-------+
After generation, display the public key:
Share only your public key with the Neocortex team while keeping your private key secure.
For more information, please visit the SSH Project webpage.
Connecting to the Cerebras system¶
Once your access is set up, you will receive:
- Cerebras Cloud Credentials
- VPN Configuration Instructions
Please have in mind that your Cerebras credentials will be used for connecting to their VPN endpoint, and then the SSH connection will use the private SSH key generated by you.
VPN Connection¶
- Download the GlobalProtect VPN client from: https://access01.vpn.cerebras.net
- Use your Cerebras-provided VPN credentials to log in.
- Configure the VPN with Portal Address:
access01.vpn.cerebras.net
- Connect to the VPN.
SSH Connection¶
After establishing a VPN connection, access the system via SSH:
Replace <cerebras_username>
with your assigned username and id_KEY
with your private key.
To verify VPN connectivity:
ping cg3-us27.dfw1.cerebrascloud.com
Example output:
ping cg3-us27.dfw1.cerebrascloud.com
PING cg3-us27.dfw1.cerebrascloud.com (172.16.4.77): 56 data bytes
64 bytes from 172.16.4.77: icmp_seq=0 ttl=62 time=153.591 ms
64 bytes from 172.16.4.77: icmp_seq=1 ttl=62 time=157.990 ms
64 bytes from 172.16.4.77: icmp_seq=2 ttl=62 time=151.645 ms
64 bytes from 172.16.4.77: icmp_seq=3 ttl=62 time=151.368 ms
--- cg3-us27.dfw1.cerebrascloud.com ping statistics ---
4 packets transmitted, 4 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 151.368/153.649/157.990/2.649 ms
Explanation of Terms in the Cerebras Compile Report¶
When submitting jobs, the Cerebras Compile Report provides insights into:
- Model Compilation Time: Duration required for the model to be compiled.
- Resource Allocation: CS-3 systems allocated for the job.
- Memory Utilization: Reports the efficiency of memory usage.
- Execution Status: Whether the job is QUEUED, RUNNING, FAILED, or COMPLETED.
- Optimization Suggestions: Any recommendations to enhance efficiency.
Job Submission and Monitoring Procedures¶
Submitting a Job¶
Each project has a dedicated directory for training jobs, for example, /cra-XYZ/demo/trials
.
To submit a job:
- Navigate to the directory of the desired model.
- Run the experiment script:
bash run.sh
Monitoring Jobs¶
To check job status:
csctl get jobs -a
Running jobs will have a 'RUNNING'
status, and queued jobs will have a 'QUEUED'
status.
Monitoring with TensorBoard¶
To visualize training progress: tensorboard --logdir=. --bind_all --port 6006
Access TensorBoard from your browser:¶
- Default link:
http://cg3-us27.dfw1.cerebrascloud.com:6006
- If inaccessible, try using the IP address:
http://172.16.4.243:6006/
Killing a Job¶
To terminate a running job: csctl cancel job <jobID>
To find <jobID>
, using csctl get jobs -a
Resource Utilization Best Practices¶
- Use tmux to avoid job termination due to disconnection:
tmux new -s my_session
- Activate the Cerebras Virtual Environment before running jobs:
source /cra-XYZ/venvs/2.4.0/bin/activate
- Submit jobs in advance if running a model for the first time, as compilation may take time.
- Store data properly in
/cra-XYZ
to ensure access.
Neocortex Slack¶
Please take a look at the Neocortex System Slack section for information as to how to connect to our Slack space, and use it to to advance your project. In there, you can get:
- Official updates from the Neocortex team.
- Private project channels for collaboration.
- Discussions for AI/ML projects.
- Discussions for SDK/HPC projects.
Process for Requesting Support or Additional Resources¶
Please refer to the Getting Help section. You could reach out to us over email (neocortex@psc.edu), Slack, or schedule an office hour. We will be happy to help.
Support Channels¶
- Email: Reach out to the support team by emailing neocortex@psc.edu.
- Slack: Post in the appropriate channel or DM a team member.
- Office Hours: Schedule a session with the support team.
Requesting Additional Resources¶
To request additional compute resources, submit a formal request to neocortex@psc.edu including:
- Project Name
- Justification for Additional Resources
- Expected Usage Period
- Desired Configuration
Requests will be reviewed based on system availability.