Model Zoo
Computer Vision Models - Pretrained Checkpoints
You can load any of our pretrained model in 2 lines of code:
from super_gradients.training import models
from super_gradients.common.object_names import Models
model = models.get(Models.YOLOX_S, pretrained_weights="coco")
All the available models are listed in the column Model name
.
Pretrained Classification PyTorch Checkpoints
Model | Model name | Dataset | Resolution | Top-1 | Top-5 | Latency (HW)*T4 | Latency (Production)**T4 | Latency (HW)*Jetson Xavier NX | Latency (Production)**Jetson Xavier NX | Latency Cascade Lake |
---|---|---|---|---|---|---|---|---|---|---|
ViT base | vit_base | ImageNet21K | 224x224 | 84.15 | - | 4.46ms | 4.60ms | - * | - | 57.22ms |
ViT large | vit_large | ImageNet21K | 224x224 | 85.64 | - | 12.81ms | 13.19ms | - * | - | 187.22ms |
BEiT | beit_base_patch16_224 | ImageNet21K | 224x224 | - | - | -ms | -ms | - * | - | -ms |
EfficientNet B0 | efficientnet_b0 | ImageNet | 224x224 | 77.62 | 93.49 | 0.93ms | 1.38ms | - * | - | 3.44ms |
RegNet Y200 | regnetY200 | ImageNet | 224x224 | 70.88 | 89.35 | 0.63ms | 1.08ms | 2.16ms | 2.47ms | 2.06ms |
RegNet Y400 | regnetY400 | ImageNet | 224x224 | 74.74 | 91.46 | 0.80ms | 1.25ms | 2.62ms | 2.91ms | 2.87ms |
RegNet Y600 | regnetY600 | ImageNet | 224x224 | 76.18 | 92.34 | 0.77ms | 1.22ms | 2.64ms | 2.93ms | 2.39ms |
RegNet Y800 | regnetY800 | ImageNet | 224x224 | 77.07 | 93.26 | 0.74ms | 1.19ms | 2.77ms | 3.04ms | 2.81ms |
ResNet 18 | resnet18 | ImageNet | 224x224 | 70.6 | 89.64 | 0.52ms | 0.95ms | 2.01ms | 2.30ms | 4.56ms |
ResNet 34 | resnet34 | ImageNet | 224x224 | 74.13 | 91.7 | 0.92ms | 1.34ms | 3.57ms | 3.87ms | 7.64ms |
ResNet 50 | resnet50 | ImageNet | 224x224 | 81.91 | 93.0 | 1.03ms | 1.44ms | 4.78ms | 5.10ms | 9.25ms |
MobileNet V3_large-300 epochs | mobilenet_v3_large | ImageNet | 224x224 | 74.52 | 91.92 | 0.67ms | 1.11ms | 2.42ms | 2.71ms | 1.76ms |
MobileNet V3_small | mobilenet_v3_small | ImageNet | 224x224 | 67.45 | 87.47 | 0.55ms | 0.96ms | 2.01ms * | 2.35ms | 1.06ms |
MobileNet V2_w1 | mobilenet_v2 | ImageNet | 224x224 | 73.08 | 91.1 | 0.46 ms | 0.89ms | 1.65ms * | 1.90ms | 1.56ms |
> NOTE: |
||||||||||
> - Latency (HW)* - Hardware performance (not including IO) |
||||||||||
> - Latency (Production)** - Production Performance (including IO) | ||||||||||
> - Performance measured for T4 and Jetson Xavier NX with TensorRT, using FP16 precision and batch size 1 | ||||||||||
> - Performance measured for Cascade Lake CPU with OpenVINO, using FP16 precision and batch size 1 |
Pretrained Object Detection PyTorch Checkpoints
Model | Model Name | Dataset | Resolution | mAPval 0.5:0.95 |
Latency (HW)*T4 | Latency (Production)**T4 | Latency (HW)*Jetson Xavier NX | Latency (Production)**Jetson Xavier NX | Latency Cascade Lake |
---|---|---|---|---|---|---|---|---|---|
SSD lite MobileNet v2 | ssd_lite_mobilenet_v2 | COCO | 320x320 | 21.5 | 0.77ms | 1.40ms | 5.28ms | 6.44ms | 4.13ms |
SSD lite MobileNet v1 | ssd_mobilenet_v1 | COCO | 320x320 | 24.3 | 1.55ms | 2.84ms | 8.07ms | 9.14ms | 22.76ms |
YOLOX nano | yolox_n | COCO | 640x640 | 26.77 | 2.47ms | 4.09ms | 11.49ms | 12.97ms | - |
YOLOX tiny | yolox_t | COCO | 640x640 | 37.18 | 3.16ms | 4.61ms | 15.23ms | 19.24ms | - |
YOLOX small | yolox_s | COCO | 640x640 | 40.47 | 3.58ms | 4.94ms | 18.88ms | 22.48ms | - |
YOLOX medium | yolox_m | COCO | 640x640 | 46.4 | 6.40ms | 7.65ms | 39.22ms | 44.5ms | - |
YOLOX large | yolox_l | COCO | 640x640 | 49.25 | 10.07ms | 11.12ms | 68.73ms | 77.01ms | - |
NOTE:
- Latency (HW)* - Hardware performance (not including IO)
- Latency (Production)** - Production Performance (including IO) - Latency performance measured for T4 and Jetson Xavier NX with TensorRT, using FP16 precision and batch size 1 - Latency performance measured for Cascade Lake CPU with OpenVINO, using FP16 precision and batch size 1
Pretrained Semantic Segmentation PyTorch Checkpoints
Model | Model Name | Dataset | Resolution | mIoU | Latency b1T4 | Latency b1T4 including IO | Latency (Production)**Jetson Xavier NX |
---|---|---|---|---|---|---|---|
PP-LiteSeg B50 | pp_lite_b_seg50 | Cityscapes | 512x1024 | 76.48 | 4.18ms | 31.22ms | 31.69ms |
PP-LiteSeg B75 | pp_lite_b_seg75 | Cityscapes | 768x1536 | 78.52 | 6.84ms | 33.69ms | 49.89ms |
PP-LiteSeg T50 | pp_lite_t_seg50 | Cityscapes | 512x1024 | 74.92 | 3.26ms | 30.33ms | 26.20ms |
PP-LiteSeg T75 | pp_lite_t_seg75 | Cityscapes | 768x1536 | 77.56 | 5.20ms | 32.28ms | 38.03ms |
DDRNet 23 slim | ddrnet_23_slim | Cityscapes | 1024x2048 | 79.41 | 5.74ms | 32.01ms | 45.18ms |
DDRNet 23 | ddrnet_23 | Cityscapes | 1024x2048 | 81.48 | 12.74ms | 39.01ms | 106.26ms |
DDRNet 39 | ddrnet_39 | Cityscapes | 1024x2048 | 81.32 | 23.57ms | 52.41ms | 145.79ms |
STDC 1-Seg50 | stdc1_seg50 | Cityscapes | 512x1024 | 75.11 | 3.34ms | 30.12ms | 27.54ms |
STDC 1-Seg75 | stdc1_seg75 | Cityscapes | 768x1536 | 77.8 | 5.53ms | 32.490ms | 43.88 |
STDC 2-Seg50 | stdc2_seg50 | Cityscapes | 512x1024 | 76.44 | 4.12ms | 30.94ms | 32.03ms |
STDC 2-Seg75 | stdc2_seg75 | Cityscapes | 768x1536 | 78.93 | 6.95ms | 33.89ms | 54.48ms |
RegSeg (exp48) | regseg48 | Cityscapes | 1024x2048 | 78.15 | 12.03ms | 38.91ms | 78.20ms |
NOTE: Performance measured on T4 GPU with TensorRT, using FP16 precision and batch size 1 (latency), and not including IO
NOTE: For resolutions below 1024x2048 we first resize the input to the inference resolution and then resize the predictions to 1024x2048. The time of resizing is included in the measurements so that the practical input-size is 1024x2048.
NOTE: DDRNet23 and DDRNet23_Slim results were achieved with channel wise knowledge distillation training recipe.