Reference Compilation Example¶
Neocortex¶
Connect to the login node.
ssh researcher@neocortex.psc.edu
researcher@neocortex.psc.edu's password: ****************
********************************* W A R N I N G ********************************
You have connected to one of the Neocortex login nodes.
LOG OFF IMMEDIATELY if you do not agree to the conditions stated in this warning
********************************* W A R N I N G ********************************
For documentation on Neocortex, please see https://portal.neocortex.psc.edu/docs/
Please contact neocortex@psc.edu with any comments/concerns.
[researcher@neocortex-login023 ~]$
Take a look at the project grants available. There seem to be 2 different grants available. One for a different research project, and then the one for Neocortex. Since the latter is the one that has the SU, we should specify it for the different commands.
[researcher@neocortex-login023 ~]$ projects | grep "Project\|Title"
Project: CIS000000P
Title: A Very Important Project
Project: CIS123456P # << This one
Title: P99-Neocortex Research Project # << This one
Let's take a look at the output of the groups command, since the groups are usually all lowercase but the projects
output isn't.
[researcher@neocortex-login023 ~]$ groups
cis000000p cis123456p
"cis000000p" is showing as the first in that line (leftmost). That means that it's the primary group. What we want is to have the P## group to be the primary for all of the following commands, so let's run the "newgrp" command specifying it so that happens.
[researcher@neocortex-login023 ~]$ newgrp cis123456p
Now, by running the groups
command one more time we see that the "cis123456p" group is showing as primary, just like we need.
[researcher@neocortex-login023 ~]$ groups
cis123456p cis000000p
Since we have the correct group showing as primary, we can now proceed to start a job for copying files and running the actual compilation steps. This will start the SLURM job using the correct Project Allocation id (--account=GROUPID
).
We should start by setting some variables for copying the data.
[researcher@neocortex-login023 ~]$ export CEREBRAS_DIR=/ocean/neocortex/cerebras/
[researcher@neocortex-login023 ~]$ echo $PROJECT
/ocean/projects/cis123456p/researcher
In this case, we are copying the files by using rsync, since this command will update the target directory with any changes/updates from the origin path. That will not be the case with cp
, as that command will complain if the target directory already exists.
Also, if there are no new files under the $CEREBRAS_DIR/modelzoo
, the output will only have "sending incremental file list" and nothing else will be transferred since the updated files would already be in place.
Additionally, please have in mind that the "modelzoo" folder being copied should belong to the correct group after running the following commands. For this specific case, to "cis123456p" and not to "cis000000p".
[researcher@sdf-1 ~]$ rsync -PaL --chmod u+w $CEREBRAS_DIR/modelzoo $PROJECT/
sending incremental file list
modelzoo/
modelzoo/LICENSE
[--- OUTPUT SNIPPED FOR KEEPING THIS EXAMPLE SHORT ---]
[researcher@sdf-1 ~]$ ls $PROJECT/
modelzoo
Then change into the modelzoo folder of the model we want to evaluate/compile/train:
[researcher@neocortex-login023 ~]$ cd $PROJECT/modelzoo/fc_mnist/tf
This command will start a shell using the latest Cerebras container
[researcher@neocortex-login023 tf]$ srun --pty --cpus-per-task=28 --kill-on-bad-exit singularity shell --cleanenv --bind /local1/cerebras/data,/local2/cerebras/data,/local3/cerebras/data,/local4/cerebras/data,$PROJECT /local1/cerebras/cbcore_latest.sif
Singularity>
Inside that shell, you will get to run the different validation and compilation commands. For example, for running a validate_only
process:
Singularity> python run.py --mode train --validate_only --model_dir validate
INFO:tensorflow:TF_CONFIG environment variable: {}
Downloading and preparing dataset mnist (11.06 MiB) to cerebras/data/tfds/mnist/1.0.0...
Dl Completed...: 100%|██████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 12.50 url/s]
Extraction completed...: 100%|█████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 6.24 file/s]
Extraction completed...: 100%|█████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 5.84 file/s]
Dl Size...: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 15.59 MiB/s]
Dl Completed...: 100%|██████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 6.23 url/s]
0 examples [00:00, ? examples/s]2021-02-17 15:54:14.174234: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX512F
[--- OUTPUT SNIPPED FOR KEEPING THIS EXAMPLE SHORT ---]
XLA Extraction Complete
=============== Starting Cerebras Compilation ===============
Cerebras compilation completed: 100%|██████████████████████████████████████████████████████████████████████████| 2/2 [00:02s, 1.12s/stages]
=============== Cerebras Compilation Completed ===============
Singularity>
In the same way, a compile_only
process looks like this:
Singularity> python run.py --mode train --compile_only --model_dir compile
INFO:tensorflow:TF_CONFIG environment variable: {}
[--- OUTPUT SNIPPED FOR KEEPING THIS EXAMPLE SHORT ---]
XLA Extraction Complete
=============== Starting Cerebras Compilation ===============
Cerebras compilation completed: | | 19/? [00:26s, 1.37s/stages]
=============== Cerebras Compilation Completed ===============
Singularity>
Now, the different parameter files used for the validation/compilation/training processes can be specified.
Let's say that you want to use not the default "configs/params.yaml" file but one in a different (custom) directory (--params custom_configs/params.yaml
).
This can be done by using the original "params.yaml" file and setting the values there, and then the contents of the
output can also be written into a different path (--model_dir custom_output_dir
):
Singularity> cp -r configs custom_configs
Singularity> vi custom_configs/params.yaml
Singularity> python run.py --mode train --compile_only --params custom_configs/params.yaml --model_dir custom_output_dir
INFO:tensorflow:TF_CONFIG environment variable: {}
[--- OUTPUT SNIPPED FOR KEEPING THIS EXAMPLE SHORT ---]
XLA Extraction Complete
=============== Starting Cerebras Compilation ===============
Cerebras compilation completed: | | 19/? [00:25s, 1.34s/stages]
=============== Cerebras Compilation Completed ===============
Singularity>
The contents of the custom_configs
and custom_output_dir
have the parameters used and the output for this example compilation process.
Please note that the group ownership is still pointing to the correct group ("cis123456p" for this example),
since the account information to use was automatically passed to SLURM.
Singularity> ls -lash | grep custom
4.0K drwxr-sr-x 2 researcher cis123456p 4.0K Feb 17 17:07 custom_configs
4.0K drwxr-sr-x 3 researcher cis123456p 4.0K Feb 17 17:07 custom_output_dir
Singularity> ls -lsh custom*
custom_configs:
total 4.0K
4.0K -rw-r--r-- 1 researcher cis123456p 1.3K Feb 17 17:07 params.yaml
custom_output_dir:
total 16K
12K drwxr-sr-x 4 researcher cis123456p 12K Feb 17 17:08 cs_518e82fcc3928d8e9da4ffc039506e6f0019b41b46bc53085af34c080de4054e
4.0K -rw-r--r-- 1 researcher cis123456p 534 Feb 17 17:07 params.txt
Now, for training the model (since it's compiling without issues), we can create the wrapper scripts that save time and also make sure the right syntax is used for SLURM, Singularity, and the Python command for training inside the container.
This wrapper is tailored for the setup of Neocortex:
[researcher@neocortex-login023 tf]$ vim srun_train
#!/usr/bin/bash
srun --account=cis123456p --gres=cs:cerebras:1 --ntasks=7 --cpus-per-task=14 --kill-on-bad-exit singularity exec --bind /local1/cerebras/data,/local2/cerebras/data,/local3/cerebras/data,/local4/cerebras/data,$PROJECT /local1/cerebras/cbcore_latest.sif ./run_train "$@"
[researcher@neocortex-login023 tf]$ vim run_train
#!/usr/bin/bash
python run.py --cs_ip ${CS_IP_ADDR} --mode train "$@"
[researcher@neocortex-login023 tf]$ chmod +x srun_train run_train
[researcher@neocortex-login023 tf]$ ./srun_train --model_dir training_example
INFO:tensorflow:TF_CONFIG environment variable: {'cluster': {'chief': ['sdf-1:23111'], 'worker': ['sdf-1:23112', 'sdf-1:23113', 'sdf-1:23114', 'sdf-1:23115', 'sdf-1:23116', 'sdf-1:23117']}, 'task': {'type': 'chief', 'index': 0}}
WARNING:tensorflow:From /cb/toolchains/buildroot/monolith-default/202010061651-75-61959232/rootfs-x86_64/usr/lib/python3.7/site-packages/tensorflow/python/ops/resource_variable_ops.py:1666: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING:tensorflow:From /cb/toolchains/buildroot/monolith-default/202010061651-75-61959232/rootfs-x86_64/usr/lib/python3.7/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
2021-02-20 17:58:56.115455: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX512F
2021-02-20 17:58:56.132002: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2700000000 Hz
2021-02-20 17:58:56.133795: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4800490 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-02-20 17:58:56.133815: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-02-20 17:58:56.133907: I tensorflow/core/common_runtime/process_util.cc:147] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
WARNING:root:[input_fn] - flat_map(): use map() instead of flat_map() to improve performance and parallelize reads. If you are not calling `flat_map` directly, check if you are using: from_generator, TextLineDataset, TFRecordDataset, or FixedLenthRecordDataset. If so, set `num_parallel_reads` to > 1 or tf.data.experimental.AUTOTUNE, and TF will use map() automatically.
2021-02-20 17:58:57.534881: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX512F
2021-02-20 17:58:57.561731: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2700000000 Hz
2021-02-20 17:58:57.563606: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x6af0720 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-02-20 17:58:57.563627: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-02-20 17:58:57.622282: I tensorflow/tools/xla_extract/tf_graph_to_xla_lib.cc:267] number of function defs:1
2021-02-20 17:58:57.622307: I tensorflow/tools/xla_extract/tf_graph_to_xla_lib.cc:268] cluster_9063863211648629377
2021-02-20 17:58:57.622313: I tensorflow/tools/xla_extract/tf_graph_to_xla_lib.cc:269] xla args number:23
2021-02-20 17:58:57.622317: I tensorflow/tools/xla_extract/tf_graph_to_xla_lib.cc:270] fdef_args number:23
2021-02-20 17:58:57.622321: I tensorflow/tools/xla_extract/tf_graph_to_xla_lib.cc:275] fdef output mapping signature -> node_def:
2021-02-20 17:58:57.622325: I tensorflow/tools/xla_extract/tf_graph_to_xla_lib.cc:277] "mean_1_0_retval" -> "Mean_1:output:0"
2021-02-20 17:58:57.688709: W tensorflow/compiler/tf2xla/kernels/random_ops.cc:52] Warning: Using tf.random.uniform with XLA compilation will ignore seeds; consider using tf.random.stateless_uniform instead if reproducible behavior is desired.
XLA Extraction Complete
=============== Starting Cerebras Compilation ===============
Cerebras compilation completed: | | 19/? [00:25s, 1.35s/stages]2021-02-20 17:59:25.699705: I tensorflow/core/common_runtime/process_util.cc:147] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
WARNING:tensorflow:From /cbcore/py_root/cerebras/tf/cs_estimator.py:558: Variable.load (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Prefer Variable.assign which has equivalent behavior in 2.X.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
2021-02-20 17:59:25.862027: I tensorflow/core/common_runtime/process_util.cc:147] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into training_example/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:Programming CS-2 fabric. This may take couple of minutes - please do not interrupt.
INFO:tensorflow:Fabric programmed
INFO:tensorflow:Coordinator fully up. Waiting for Streaming (using 0.42% out of 308274 cores on the fabric)
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Waiting for 6 streamer(s) to prime the data pipeline
INFO:tensorflow:Streamers are ready
INFO:tensorflow:global step 1: loss = 2.3671875 (1.19 steps/sec)
INFO:tensorflow:global step 100: loss = 0.2467041015625 (89.75 steps/sec)
INFO:tensorflow:global step 200: loss = 0.1527099609375 (167.0 steps/sec)
[--- OUTPUT SNIPPED FOR KEEPING THIS EXAMPLE SHORT ---]
INFO:tensorflow:global step 99700: loss = 0.0003674030303955078 (471.25 steps/sec)
INFO:tensorflow:global step 99800: loss = 0.038543701171875 (471.75 steps/sec)
INFO:tensorflow:global step 99900: loss = 0.0 (472.0 steps/sec)
INFO:tensorflow:Training finished with 25600000 samples in 211.863 seconds, 120832.56 samples / second
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 100000...
INFO:tensorflow:Saving checkpoints for 100000 into training_example/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 100000...
INFO:tensorflow:global step 100000: loss = 0.0823974609375 (471.75 steps/sec)
INFO:tensorflow:global step 100000: loss = 0.0823974609375 (471.75 steps/sec)
INFO:tensorflow:Loss for final step: 0.0824.
=============== Cerebras Compilation Completed ===============
[researcher@neocortex-login023 tf]$
Finally, if you want to perform these steps in batch mode instead of interactively, you can ran all of them from a single sbatch file. Like this:
[researcher@neocortex-login023 tf]$ vim mnist.sbatch
#!/usr/bin/bash
#SBATCH --gres=cs:cerebras:1
#SBATCH --ntasks=7
#SBATCH --cpus-per-task=14
#SBATCH --account=cis123456p
newgrp cis123456p
cp ${0} slurm-${SLURM_JOB_ID}.sbatch
YOUR_DATA_DIR=${LOCAL}/cerebras/data
YOUR_MODEL_ROOT_DIR=${PROJECT}/modelzoo/
YOUR_ENTRY_SCRIPT_LOCATION=${YOUR_MODEL_ROOT_DIR}/fc_mnist/tf
BIND_LOCATIONS=/local1/cerebras/data,/local2/cerebras/data,/local3/cerebras/data,/local4/cerebras/data,${YOUR_DATA_DIR},${YOUR_MODEL_ROOT_DIR}
CEREBRAS_CONTAINER=/ocean/neocortex/cerebras/cbcore_latest.sif
cd ${YOUR_ENTRY_SCRIPT_LOCATION}
srun --ntasks=1 --kill-on-bad-exit singularity exec --bind ${BIND_LOCATIONS} ${CEREBRAS_CONTAINER} python run.py --mode train --validate_only --model_dir validate
srun --ntasks=1 --kill-on-bad-exit singularity exec --bind ${BIND_LOCATIONS} ${CEREBRAS_CONTAINER} python run.py --mode train --compile_only --model_dir compile
srun --kill-on-bad-exit singularity exec --bind ${BIND_LOCATIONS} ${CEREBRAS_CONTAINER} python run.py --mode train --model_dir training_example --cs_ip ${CS_IP_ADDR}
[researcher@neocortex-login023 tf]$ sbatch mnist.sbatch
Submitted batch job 345
[researcher@neocortex-login023 tf]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
345 sdf mnist.sb researcher R 0:02 1 sdf-1
[researcher@neocortex-login023 tf]$ tail -f slurm-345.out
[--- OUTPUT SNIPPED FOR KEEPING THIS EXAMPLE SHORT ---]
INFO:tensorflow:Cached compilation found for this model configuration
[--- OUTPUT SNIPPED FOR KEEPING THIS EXAMPLE SHORT ---]
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 100000...
INFO:tensorflow:Saving checkpoints for 100000 into training_example/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 100000...
INFO:tensorflow:Programming CS-2 fabric. This may take couple of minutes - please do not interrupt.
INFO:tensorflow:Fabric programmed
INFO:tensorflow:Coordinator fully up. Waiting for Streaming (using 0.42% out of 308274 cores on the fabric)
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Waiting for 6 streamer(s) to prime the data pipeline
INFO:tensorflow:Streamers are ready
INFO:tensorflow:global step 100001: loss = 0.0 (0.43 steps/sec)
INFO:tensorflow:global step 100100: loss = 0.0 (37.72 steps/sec)
INFO:tensorflow:global step 100200: loss = 0.0019931793212890625 (72.94 steps/sec)
[--- OUTPUT SNIPPED FOR KEEPING THIS EXAMPLE SHORT ---]
INFO:tensorflow:global step 199700: loss = 0.0 (470.75 steps/sec)
INFO:tensorflow:global step 199800: loss = 3.814697265625e-06 (471.25 steps/sec)
INFO:tensorflow:global step 199900: loss = 3.814697265625e-06 (469.0 steps/sec)
INFO:tensorflow:Training finished with 25600000 samples in 213.044 seconds, 120162.88 samples / second
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 200000...
INFO:tensorflow:Saving checkpoints for 200000 into training_example/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 200000...
INFO:tensorflow:global step 200000: loss = 0.0 (469.0 steps/sec)
INFO:tensorflow:global step 200000: loss = 0.0 (469.0 steps/sec)
INFO:tensorflow:Loss for final step: 0.0.
As it can be seen on the output above, the previous training was resumed since the initial checkpoint finished at step 100000 and we ran the batch job specifying the same model output directory (--model_dir training_example
).
Bridges-2¶
Connect to the login node.
ssh researcher@bridges2.psc.edu
researcher@bridges2.psc.edu's password: ****************
********************************* W A R N I N G ********************************
You have connected to br012.ib.bridges2.psc.edu, a login node of Bridges 2.
LOG OFF IMMEDIATELY if you do not agree to the conditions stated in this warning
********************************* W A R N I N G ********************************
[---OUTPUT SNIPPED---]
Projects
------------------------------------------------------------
Project: cis000000p PI: Paola Buitrago ***** default charging project *****
Extreme Memory 1,000 SU remain of 1,000 SU active: Yes
GPU AI 2,500 SU remain of 2,500 SU active: Yes
Regular Memory 49,999 SU remain of 50,000 SU active: Yes
Ocean /ocean/projects/cis000000p 14.43G used of 1000G
Project: cis123456p PI: Paola Buitrago
Extreme Memory 1,000 SU remain of 1,000 SU active: Yes
GPU AI 2,500 SU remain of 2,500 SU active: Yes
Regular Memory 50,000 SU remain of 50,000 SU active: Yes
Ocean /ocean/projects/cis123456p 26.97G used of 1000G
[researcher@bridges2-login012 ~]$
Note
Please have in mind that your Neocortex allocation/account has access to more than one partition. RM for Regular Memory and EM for Extreme Memory.
The default one to use should be the RM one, since there are more SU available there, and you should only switch to the EM one if needed after testing
everything on RM so your SU don't run out prematurely. Additionally, the EM partition (those nodes) do not allow you to run commands interactively via the "interact" command,
thus you will be required to either submit on batch mode, or to run the srun
command shown below.
Take a look at the project grants available. There seem to be 2 different grants available. One for a different research project, and then the one for Neocortex. Since the latter is the one that has the SU, we should specify it for the different commands.
[researcher@bridges2-login012 ~]$ projects | grep "Project\|Title"
Project: CIS000000P
Title: A Very Important Project
Project: CIS123456P # << This one
Title: P99-Neocortex Research Project # << This one
Let's take a look at the output of the groups command, since the groups are usually all lowercase but the projects
output isn't.
[researcher@bridges2-login012 ~]$ groups
cis000000p cis123456p
"cis000000p" is showing as the first in that line (leftmost). That means that it's the primary group. What we want is to have the P## group to be the primary for all of the following commands, so let's run the "newgrp" command specifying it so that happens.
[researcher@bridges2-login012 ~]$ newgrp cis123456p
Now, by running the groups
command one more time we see that the "cis123456p" group is showing as primary, just like we need.
[researcher@bridges2-login012 ~]$ groups
cis123456p cis000000p
Since we have the correct group showing as primary, we can now proceed to start a job for copying files and running the actual compilation steps. This will start the SLURM job using the correct Project Allocation id (--account=GROUPID
).
We should start by running a simple interact job while specifying the allocation to use, like this:
[researcher@bridges2-login012 ~]$ CEREBRAS_DIR=/ocean/neocortex/cerebras/
[researcher@bridges2-login012 ~]$ interact -A cis123456p -p RM
A command prompt will appear when your session begins
"Ctrl+d" or "exit" will end your session
--partition=RM
salloc -J Interact --partition=RM
salloc: Pending job allocation 312345
salloc: job 312345 queued and waiting for resources
salloc: job 312345 has been allocated resources
salloc: Granted job allocation 312345
salloc: Waiting for resource configuration
salloc: Nodes r051 are ready for job
[researcher@r051 ~]$
Note
Please remember that the interactive mode can only be used for RM nodes. For EM nodes, the batch mode has to be used.
As seen from the previous output, the prompt changed from saying that we were in a the "bridges2-login012" node to "r051" on the RM partition. It's now time to set some variables for copying the data.
[researcher@r051 ~]$ CEREBRAS_DIR=/ocean/neocortex/cerebras/
[researcher@r051 ~]$ echo $PROJECT
/ocean/projects/cis123456p/researcher
In this case, we are copying the files by using rsync, since this command will update the target directory with any changes/updates from the origin path. That will not be the case with cp
, as that command will complain if the target directory already exists.
Also, if there are no new files under the $CEREBRAS_DIR/modelzoo
, the output will only have "sending incremental file list" and nothing else will be transferred since the updated files would already be in place.
Additionally, please have in mind that the "modelzoo" folder being copied should belong to the correct group after running the following commands. For this specific case, to "cis123456p" and not to "cis000000p".
[researcher@r051 ~]$ rsync -PaL --chmod u+w $CEREBRAS_DIR/modelzoo $PROJECT/
sending incremental file list
modelzoo/
modelzoo/LICENSE
[--- OUTPUT SNIPPED FOR KEEPING THIS EXAMPLE SHORT ---]
[researcher@r051 ~]$ ls $PROJECT/
modelzoo
Since the information is already in place, we should exit that simple interactive mode and start the actual compilation with more resources. We can exit that interactive mode by typing exit
or pressing Ctrl+D
.
[researcher@r051 ~]$ exit
exit
salloc: Relinquishing job allocation 345678
[researcher@bridges2-login012 ~]$
Then change into the modelzoo folder of the model we want to evaluate/compile/train:
[researcher@bridges2-login012 ~]$ cd $PROJECT/modelzoo/fc_mnist/tf
This command will start a shell using the latest Cerebras container. Please have in mind that it might take a while for the job to start:
[researcher@bridges2-login012 tf]$ srun --pty --cpus-per-task=28 --account=cis123456p --partition=RM --kill-on-bad-exit singularity shell --cleanenv --bind $CEREBRAS_DIR/data,$PROJECT $CEREBRAS_DIR/cbcore_latest.sif
srun: job 345678 queued and waiting for resources
srun: job 345678 has been allocated resources
Singularity>
Inside that shell, you will get to run the different validation and compilation commands. For example, for running a validate_only
process:
Singularity> python run.py --mode train --validate_only --model_dir validate
INFO:tensorflow:TF_CONFIG environment variable: {}
Downloading and preparing dataset mnist (11.06 MiB) to cerebras/data/tfds/mnist/1.0.0...
Dl Completed...: 100%|█████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 23.65 url/s]
Extraction completed...: 100%|█████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 6.29 file/s]
Extraction completed...: 100%|█████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 5.81 file/s]
Dl Size...: 100%|██████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 15.72 MiB/s]
Dl Completed...: 100%|█████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 6.28 url/s]
0 examples [00:00, ? examples/s]2021-03-01 17:53:53.757037: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2245750000 Hz
[--- OUTPUT SNIPPED FOR KEEPING THIS EXAMPLE SHORT ---]
XLA Extraction Complete
=============== Starting Cerebras Compilation ===============
Cerebras compilation completed: 100%|██████████████████████████████████████████████████| 2/2 [00:04s, 2.03s/stages]
=============== Cerebras Compilation Completed ===============
Singularity>
In the same way, a compile_only
process looks like this:
Singularity> python run.py --mode train --compile_only --model_dir compile
INFO:tensorflow:TF_CONFIG environment variable: {}
WARNING:root:[input_fn] - flat_map(): use map() instead of flat_map() to improve performance and parallelize reads. If you are not calling `flat_map` directly, check if you are using: from_generator, TextLineDataset, TFRecordDataset, or FixedLenthRecordDataset. If so, set `num_parallel_reads` to > 1 or tf.data.experimental.AUTOTUNE, and TF will use map() automatically.
WARNING:tensorflow:From /cb/toolchains/buildroot/monolith-default/202010061651-75-61959232/rootfs-x86_64/usr/lib/python3.7/site-packages/tensorflow/python/ops/resource_variable_ops.py:1666: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
2021-03-01 17:56:29.146050: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2245750000 Hz
2021-03-01 17:56:29.151928: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x6308140 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-03-01 17:56:29.151990: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-03-01 17:56:29.182298: I tensorflow/tools/xla_extract/tf_graph_to_xla_lib.cc:267] number of function defs:1
2021-03-01 17:56:29.182327: I tensorflow/tools/xla_extract/tf_graph_to_xla_lib.cc:268] cluster_9063863211648629377
2021-03-01 17:56:29.182337: I tensorflow/tools/xla_extract/tf_graph_to_xla_lib.cc:269] xla args number:23
2021-03-01 17:56:29.182344: I tensorflow/tools/xla_extract/tf_graph_to_xla_lib.cc:270] fdef_args number:23
2021-03-01 17:56:29.182350: I tensorflow/tools/xla_extract/tf_graph_to_xla_lib.cc:275] fdef output mapping signature -> node_def:
2021-03-01 17:56:29.182357: I tensorflow/tools/xla_extract/tf_graph_to_xla_lib.cc:277] "mean_1_0_retval" -> "Mean_1:output:0"
2021-03-01 17:56:29.187951: W tensorflow/compiler/tf2xla/kernels/random_ops.cc:52] Warning: Using tf.random.uniform with XLA compilation will ignore seeds; consider using tf.random.stateless_uniform instead if reproducible behavior is desired.
XLA Extraction Complete
INFO:tensorflow:Cached compilation found for this model configuration
Singularity>
Now, the different parameter files used for the validation/compilation/training processes can be specified.
Let's say that you want to use not the default "configs/params.yaml" file but one in a different (custom) directory (--params custom_configs/params.yaml
).
This can be done by using the original "params.yaml" file and setting the values there, and then the contents of the
output can also be written into a different path (--model_dir custom_output_dir
):
Singularity> cp -r configs custom_configs
Singularity> vi custom_configs/params.yaml
Singularity> python run.py --mode train --compile_only --params custom_configs/params.yaml --model_dir custom_output_dir
INFO:tensorflow:TF_CONFIG environment variable: {}
[--- OUTPUT SNIPPED FOR KEEPING THIS EXAMPLE SHORT ---]
XLA Extraction Complete
=============== Starting Cerebras Compilation ===============
Cerebras compilation completed: | | 19/? [00:31s, 1.63s/stages]
=============== Cerebras Compilation Completed ===============
Singularity>
The contents of the custom_configs
and custom_output_dir
have the parameters used and the output for this example compilation process.
Please note that the group ownership is still pointing to the correct group ("cis123456p" for this example),
since the account information to use was automatically passed to SLURM.
Singularity> ls -lash | grep custom
4.0K drwxr-sr-x 2 researcher cis123456p 4.0K Mar 1 17:57 custom_configs
4.0K drwxr-sr-x 3 researcher cis123456p 4.0K Mar 1 17:58 custom_output_dir
Singularity> ls -lsh custom*
custom_configs:
total 4.0K
4.0K -rw-r--r-- 1 researcher cis123456p 1.3K Mar 1 17:57 params.yaml
custom_output_dir:
total 16K
12K drwxr-sr-x 4 researcher cis123456p 12K Mar 1 17:58 cs_518e82fcc3928d8e9da4ffc039506e6f0019b41b46bc53085af34c080de4054e
4.0K -rw-r--r-- 1 researcher cis123456p 534 Mar 1 17:58 params.txt
Now, regarding training the model (since it's compiling without issues), this training cannot be donen using Bridges-2. You will have to connect to Neocortex and follow the steps shown for training in the "Reference Compilation Example: Using Neocortex" section.
Finally, if you want to perform these steps in batch mode instead of interactively via srun
, you can ran all of them from a single sbatch file. This will allow us to use the Extreme Memory nodes in the EM partition. Like this:
git clone git@github.com:Cerebras/modelzoo.git
[researcher@neocortex-login023 tf]$ vim mnist.sbatch
#!/usr/bin/bash
#SBATCH --cpus-per-task=28
#SBATCH --account=cis123456p
#SBATCH --partition=EM
#SBATCH --time=60:00
newgrp cis123456p
cp ${0} slurm-${SLURM_JOB_ID}.sbatch
CEREBRAS_DIR=/ocean/neocortex/cerebras/
rsync -PaL --chmod u+w $CEREBRAS_DIR/modelzoo $PROJECT/
YOUR_DATA_DIR=$CEREBRAS_DIR/data
YOUR_MODEL_ROOT_DIR=${PROJECT}/
YOUR_ENTRY_SCRIPT_LOCATION=${YOUR_MODEL_ROOT_DIR}/modelzoo/fc_mnist/tf
BIND_LOCATIONS=${YOUR_DATA_DIR},${YOUR_MODEL_ROOT_DIR},/local
CEREBRAS_CONTAINER=$CEREBRAS_DIR/cbcore_latest.sif
cd ${YOUR_ENTRY_SCRIPT_LOCATION}
srun --ntasks=1 --kill-on-bad-exit singularity exec --bind ${BIND_LOCATIONS} ${CEREBRAS_CONTAINER} python run.py --mode train --validate_only --model_dir validate
srun --ntasks=1 --kill-on-bad-exit singularity exec --bind ${BIND_LOCATIONS} ${CEREBRAS_CONTAINER} python run.py --mode train --compile_only --model_dir compile
Note
If you run into problems when running jobs on Bridges-2, please remember to also take a look at the Bridges-2 User Guide User Guide.