Simple Example to run on Anvil-GPU at Purdue#

A quick and short tutorial to set up an example backend on Anvil Notebook Service with GPU to understand the basics of the Tirton backend and client

This is based on the Nivida example tutorial for Pytorch backend and adopted to get it to work on Anvil Notebook Service

Follow the following link to login to Anvil Notebook Service

Step 1: Setup the Notebook#

  • Choose Apptainer Notebook - Run apptainer inside of this notebook

  • Choose the Launcher and open two terminal tabs. We will need one to set up the server and one for the client.

Get the Pytorch resnet50 model#

This step tries to get the resnet50 model in the pytorch .pt files extension.

Note

You can copy it from /shared-storage/pytorch/model.pt to the folder where you plan to store the PyTorch model.

cp /shared-storage/pytorch/model.pt .

Prepare the model configs#

For each model, you will need a model configuration file. Depending on the backend you are using; there is a fixed rule for how the files need to be structured.

For the PyTorch model, what you need is the following.

name: "resnet50"
platform: "pytorch_libtorch"
max_batch_size : 0
input [
 {
    name: "input__0"
    data_type: TYPE_FP32
    dims: [ 3, 224, 224 ]
    reshape { shape: [ 1, 3, 224, 224 ] }
 }
]
output [
 {
    name: "output__0"
    data_type: TYPE_FP32
    dims: [ 1, 1000 ,1, 1]
    reshape { shape: [ 1, 1000 ] }
 }
]

Prepare model folder and structure#

The model repository must fulfill specific structures and names, as shown in the following. More detailed information for other backends can be found in the official documentation.

models
|
+-- resnet50
 |
 +-- config.pbtxt
 +-- 1
 |
 +-- model.pt

Step.2: Set Up the Triton Inference Server#

Open a new terminal tab in the notebook

# Set model folder 
export YOUR_MODEL_FOLDER="{YOUR_MODEL_FOLDER}"

# Run the container with the Triton server 
apptainer run --nv --unsquash -B /proc:/proc -B /shared-storage:/shared-storage /images/tritonserver:24.09-py3

# Spin up a triton server
tritonserver --model-repository=${YOUR_MODEL_FOLDER}

You should see the following printout on the terminal.

...
+----------+---------+--------+
| Model | Version | Status |
+----------+---------+--------+
| resnet50 | 1 | READY |
+----------+---------+--------+
...

Setup Client#

Open another terminal to run the client script and send the inference request.

We are kind of cheating here, given the terminal are on the same machine. In real life scenario, you will need the IP of the server to forward the request to a remote server and deal with authentication.

# Luanch the server docker images 
apptainer run --nv --unsquash -B /proc:/proc -B /shared-storage:/shared-storage /images/tritonserver-tutorial:24.08-py3

# Download the input images
wget  -O img1.jpg "https://www.hakaimagazine.com/wp-content/uploads/header-gulf-birds.jpg"

#Get the dummy client
cp /shared-storage/pytorch/client.py .

# Check if the connection is ok 
curl -v localhost:8000/v2/health/ready

You should see the following message if the connection and the server are in a good state.

...
< HTTP/1.1 200 OK
< Content-Length: 0
< Content-Type: text/plain
...

Now, we are ready to run the client script!

# Straightforward script to send an image to server
python3 client.py 

It will take some time, depending on the GPU utilization. But if everything goes well, you will be able to see the following printout.

[b'12.474469:90' b'11.525709:92' b'9.660509:14' b'8.406358:136'
 b'8.220254:11']

The output is <confidence_score>:<classification_index>

Now, you get a triton client and server talking to each other!