Let’s define our very first job, a simple job that runs nvidia-smi
.
1. Define the job config
Copy the following YAML into a show-gpus.yaml
file:
resources:
# 1x A10 GPU
accelerators: A10:1
envs:
# set the environment variable MY_ENV_VAR to my-value
MY_ENV_VAR: my-value
# Copy the contents of the current directory onto the remote machine
workdir: .
# Typical use: pip install -r requirements.txt
# Invoked under the workdir (ie. can use its files)
setup: |
echo "Running setup operations"
file_mounts:
# Copy the contents of an S3 bucket to /dataset
/dataset:
source: s3://my-dataset
mode: COPY
# Mount an S3 bucket to /output
# Any files written here (eg. model checkpoints) will be uploaded back
# to the S3 bucket
/output:
source: s3://my-output-bucket
mode: MOUNT
# Typical use: python train.py
# Invoked under the workdir (ie. can use its files)
# This will save a file called nvidia-smi-output.txt in /output,
# which will then get written back to the s3 bucket
run: |
nvidia-smi > /output/nvidia-smi-output.txt
2. Launch the job
To launch this job, simply run:
komo job launch show-gpus.yaml
You can also choose to give the job an optional name (if not set, Komodo will automatically generate a name):
Names are not required to be unique across jobs. This requirement only exists for machines/services.
komo job launch show-gpus.yaml --name show-gpus
The logs will automatically be live streamed as the job runs. You can safely Ctrl-C after the job is submitted without the job being terminated.
You can also launch the job and detach from the logs:
komo job launch show-gpus.yaml -d
3. Get the job logs
You can see the logs for the job in the dashboard, or get them using the CLI:
When interacting with jobs using the CLI, you reference them using their unique ID
For jobs that are currently running, you can live stream the logs:
komo job logs --follow <job-id>
4. SSH into the job instance
You can also SSH into the instance that the job is running on:
5. List all jobs
You can list all the jobs you have launched by running: