Skip to content

Current Known Issues (technical)

These are some technical issues our team is working on right now. We will update this guide as soon as they are resolved, adding a "SOLVED" to the title every time the changes are reflecting on Neocortex.

  • Error 1 (SOLVED 2021-02-14):

    Triggered when running the command for starting interact session from the Neocortex login node:

    # Any of the following: 
    srun --nodes=1 -w sdf-1 --pty bash -i
    srun --nodes=1 --pty bash -i
    srun --pty bash -i
    

    Error message:

    /usr/bin/id: cannot find name for group ID 24809 /usr/bin/id: cannot find name for user ID 74742 
    [I have no name!@sdf-1 ~]$
    

    Cause: this was a network communication issue for reaching the LDAP server.

  • Error 2 (SOLVED 2021-02-14):

    Triggered when running the command for starting a singularity shell form the Neocortex login node:

    srun --pty --nodelist sdf-1 --cpus-per-task=28 --kill-on-bad-exit singularity shell --cleanenv -B /local1/cerebras/data,/local2/cerebras/data,/local3/cerebras/data,/local4/cerebras/data,/jet/home/PSC_USERNAME/modelzoo /local1/cerebras/cbcore_latest.sif
    

    Error message

    More processors requested than permitted
    
  • Error 3 (SOLVED 2021-02-14):

    Triggered when running the command for starting a singularity shell form the Neocortex login node:

    srun --pty --nodelist sdf-1 --cpus-per-task=1 --kill-on-bad-exit singularity shell --cleanenv -B /local1/cerebras/data,/local2/cerebras/data,/local3/cerebras/data,/local4/cerebras/data,/jet/home/PSC_USERNAME/modelzoo /local1/cerebras/cbcore_latest.sif
    

    Error message

    WARNING: Could not lookup the current user's information: user: unknown userid 73858
    FATAL:  Couldn't determine user account information: user: unknown userid 73858
    srun: error: sdf-1: tasks 0-1: Exited with exit code 255
    srun: launch/slurm: _step_signal: Terminating StepId=332.0
    

    Cause: this was a network communication issue for reaching the LDAP server.

  • Error 4: Slowness when using files on Jet or Ocean (SOLVED 2021-02-15).

    Error message: No error messages are shown, but it will take a long time for any operations to start or finish.

    Cause: this was a network communication issue for reaching the Jet and Ocean filesystem servers across multiple InfiniBand interfaces as opposed to using a single interface.

  • Error 5: failure to login into https://portal.neocortex.psc.edu/home

    Error message: the page takes a minute to load and an error page from the webserver is shown.

    Cause: it seems the authentication is timing out. Probably something changed when the migration from Bridges to Bridges2 started.

    Workaround: try logging in once again.

  • [Continuous development]