Let’s define our very first service - a simple HTTP server.

1. Define the service config

Copy the following YAML into a simple-http.yaml file:

resources:
  # 1x A10 GPU
  accelerators: A10:1
  # Open port 8000
  ports: 8000

envs:
  # set the environment variable MY_ENV_VAR to my-value
  MY_ENV_VAR: my-value

# Copy the contents of the current directory onto the remote machine
workdir: .

# Typical use: pip install -r requirements.txt
# Invoked under the workdir (ie. can use its files)
setup: |
  echo "Running setup operations"

# Typical uses:
#    torchserve ..
#    vllm serve ...
# Invoked under the workdir (ie. can use its files)
run: |
  python -m http.server --port 8000

service:
  replica_policy:
    min_replicas: 1
    max_replicas: 5
    # When the average QPS (queries per second) per replica goes above this number,
    # Komodo will dynamically scale up the number of replicas running your service,
    # up to a maximum of max_replicas.
    # Similarly, when the average QPS per replica goes below this number, Komodo will
    # scale down the number of replicas running your service, down to a minimum of min_replicas.
    target_qps_per_replica: 10
  
  readiness_probe:
    # this is the endpoint path within your service that Komodo uses to check if your service is running
    path: /

Scaling to Zero

You can set num_replicas to 0 in order to enable scale-to-zero. When there are no replicas running, and you try to make a request to the service, you will get back a response from the load balancer with a message that says ‘No ready replicas’.

If you are enabling scale-to-zero, you should add error-handling logic in your code to handle the case when there are no ready replicas. As soon as the first request is made when there are zero replicas, a replica will be immediately provisioned. The time it takes for the replica to be ready depends largely on the boot time of your cloud instances, as well as the time it takes to run your setup/run scripts.

2. Launch the service

Launch the service by simply running:

komo service launch simple-http.yaml --name simple-http
Machine names are required to be unique, because that is how you reference them.

This will wait for the service to be up and running. You can detach as soon as the service is submitted by running:

komo service launch simple-http.yaml --name simple-http -d

Listing services

You can list all the services you have running using the CLI:

komo service list

Service logs

You can stream the logs for the replicas right from the web dashboard.

4. Make a request

When you launch the service, the CLI will print out the endpoint for the service. Alternatively, you can find it on the web dashboard as well. It will look like: https://services.komodo.io/service-id

You can use this endpoint as you would any other URL. This points to our load balancer, which will redirect you to one of the replicas, so if you’re using curl, make sure to specify the -L flag to follow redirects:

curl -L https://services.komodo.io/service-id

3. Terminate the service

When interacting with services using the CLI, you reference them using their unique name rather than their ID.

You can terminate the service by running:

komo service terminate simple-http