Generic HPC Workflow
TLDR;
Ceres vs Atlas
Most of our work is done on Atlas so you’ll need to use this url https://atlas-ood.hpc.msstate.edu/. SciNet may direct you to Ceres instead https://ceres-ood.scinet.usda.gov/. If you login to Ceres, you’ll see the same project directories but any data will be missing.
To login you’ll need an authenticator code in addition to your login info. Your user name will be the same for both HPCs but the password should differ.
In this example I want a gpu compitable container with jupyter allowing deep neural network development on Atlas.
Bare bones .def file
Bootstrap: docker
From: nvcr.io/nvidia/pytorch:23.04-py3
Testing container:
Build sandbox container:
singularity build --sandbox jnb jupyter.def
Test for pytorch & jupyter locally: singularity shell jnb python -c “import torch; print( torch.cuda.is_available() )” jupyter-notebook # then check on browser exit
Test for pytorch on lambda:
Add in jupyter:
Finalizing container
Finalize
Add
“The cycle”
- Identify new needs (libraries, tools)
- Edit .def
- Version control
- build container
- Send to HPC
- Development
- Local or on Open OnDemand
- Run GPU code
- Local
- Export notebook to txt and run as script.