Models
get_arch_params(config_name, overriding_params=None, recipes_dir_path=None)
Class for creating arch parameters dictionary, taking defaults from yaml files in src/super_gradients/recipes/arch_params.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config_name |
str
|
Name of the yaml to load (e.g. "resnet18_cifar_arch_params") |
required |
overriding_params |
Dict
|
Dict, dictionary like object containing entries to override. |
None
|
recipes_dir_path |
Optional[str]
|
Optional. Main directory where every recipe are stored. (e.g. ../super_gradients/recipes) This directory should include a "arch_params" folder, which itself should include the config file named after config_name. |
None
|
Source code in src/super_gradients/training/models/arch_params_factory.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
BaseClassifier
Bases: SgModule
, HasPredict
Source code in src/super_gradients/training/models/classification_models/base_classifer.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
|
predict(images, batch_size=32, fuse_model=True, skip_image_resizing=False, fp16=True)
Predict an image or a list of images.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
images |
ImageSource
|
Images to predict. |
required |
batch_size |
int
|
Maximum number of images to process at the same time. |
32
|
fuse_model |
bool
|
If True, create a copy of the model, and fuse some of its layers to increase performance. This increases memory usage. |
True
|
skip_image_resizing |
bool
|
If True, the image processor will not resize the images. |
False
|
fp16 |
bool
|
If True, use mixed precision for inference. |
True
|
Source code in src/super_gradients/training/models/classification_models/base_classifer.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
|
predict_webcam(fuse_model=True, skip_image_resizing=False, fp16=True)
Predict using webcam.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fuse_model |
bool
|
If True, create a copy of the model, and fuse some of its layers to increase performance. This increases memory usage. |
True
|
skip_image_resizing |
bool
|
If True, the image processor will not resize the images. |
False
|
fp16 |
bool
|
If True, use mixed precision for inference. |
True
|
Source code in src/super_gradients/training/models/classification_models/base_classifer.py
75 76 77 78 79 80 81 82 |
|
set_dataset_processing_params(class_names=None, image_processor=None)
Set the processing parameters for the dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
class_names |
Optional[List[str]]
|
(Optional) Names of the dataset the model was trained on. |
None
|
image_processor |
Optional[Processing]
|
(Optional) Image processing objects to reproduce the dataset preprocessing used for training. |
None
|
Source code in src/super_gradients/training/models/classification_models/base_classifer.py
22 23 24 25 26 27 28 29 30 |
|
BEIT: BERT Pre-Training of Image Transformers (https://arxiv.org/abs/2106.08254)
Model from official source: https://github.com/microsoft/unilm/tree/master/beit
At this point only the 1k fine-tuned classification weights and model configs have been added, see original source above for pre-training models and procedure.
Modifications by / Copyright 2021 Ross Wightman, original copyrights below
Beit
Bases: BaseClassifier
Vision Transformer with support for patch or hybrid CNN input stage
Source code in src/super_gradients/training/models/classification_models/beit.py
293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 |
|
Mlp
Bases: nn.Module
MLP as used in Vision Transformer, MLP-Mixer and related networks
Source code in src/super_gradients/training/models/classification_models/beit.py
118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
|
trunc_normal_(tensor, mean=0.0, std=1.0, a=-2.0, b=2.0)
Fills the input Tensor with values drawn from a truncated
normal distribution. The values are effectively drawn from the
normal distribution :math:\mathcal{N}(\text{mean}, \text{std}^2)
with values outside :math:[a, b]
redrawn until they are within
the bounds. The method used for generating the random values works
best when :math:a \leq \text{mean} \leq b
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tensor |
an n-dimensional |
required | |
mean |
the mean of the normal distribution |
0.0
|
|
std |
the standard deviation of the normal distribution |
1.0
|
|
a |
the minimum cutoff value |
-2.0
|
|
b |
the maximum cutoff value Examples: >>> w = torch.empty(3, 5) >>> nn.init.trunc_normal_(w) |
2.0
|
Source code in src/super_gradients/training/models/classification_models/beit.py
97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 |
|
DenseNet
Bases: BaseClassifier
Source code in src/super_gradients/training/models/classification_models/densenet.py
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 |
|
__init__(growth_rate, structure, num_init_features, bn_size, drop_rate, num_classes, in_channels=3)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
growth_rate |
int
|
number of filter to add each layer (noted as 'k' in the paper) |
required |
structure |
list
|
how many layers in each pooling block - sequentially |
required |
num_init_features |
int
|
the number of filters to learn in the first convolutional layer |
required |
bn_size |
int
|
multiplicative factor for the number of bottle neck layers (i.e. bn_size * k featurs in the bottleneck) |
required |
drop_rate |
float
|
dropout rate after each dense layer |
required |
num_classes |
int
|
number of classes in the classification task |
required |
in_channels |
int
|
number of channels in the input image |
3
|
Source code in src/super_gradients/training/models/classification_models/densenet.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
|
Dual Path Networks in PyTorch.
Credits: https://github.com/kuangliu/pytorch-cifar/blob/master/models/dpn.py
EfficientNet model class, based on "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks" https://arxiv.org/abs/1905.11946` Code source: https://github.com/lukemelas/EfficientNet-PyTorch Pre-trained checkpoints converted to Deci's code base with the reported accuracy can be found in S3 repo
BlockDecoder
Bases: object
Block Decoder for readability, straight from the official TensorFlow repository.
Source code in src/super_gradients/training/models/classification_models/efficientnet.py
222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 |
|
decode(string_list)
staticmethod
Decode a list of string notations to specify blocks inside the network.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
string_list |
List[str]
|
List of strings, each string is a notation of block. |
required |
Returns:
Type | Description |
---|---|
List[BlockArgs]
|
List of BlockArgs namedtuples of block args. |
Source code in src/super_gradients/training/models/classification_models/efficientnet.py
277 278 279 280 281 282 283 284 285 286 287 288 |
|
encode(blocks_args)
staticmethod
Encode a list of BlockArgs to a list of strings.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
blocks_args |
List
|
A list of BlockArgs namedtuples of block args. (list[namedtuples]) |
required |
Returns:
Type | Description |
---|---|
block_strings: A list of strings, each string is a notation of block. |
Source code in src/super_gradients/training/models/classification_models/efficientnet.py
290 291 292 293 294 295 296 297 298 299 300 |
|
Conv2dDynamicSamePadding
Bases: nn.Conv2d
2D Convolutions like TensorFlow, for a dynamic image size. The padding is operated in forward function by calculating dynamically.
Source code in src/super_gradients/training/models/classification_models/efficientnet.py
142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
Conv2dStaticSamePadding
Bases: nn.Conv2d
2D Convolutions like TensorFlow's 'SAME' mode, with the given input image size. The padding mudule is calculated in construction function, then used in forward.
Source code in src/super_gradients/training/models/classification_models/efficientnet.py
175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 |
|
EfficientNet
Bases: BaseClassifier
EfficientNet model.
References: [1] https://arxiv.org/abs/1905.11946 (EfficientNet)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
width_coefficient |
float
|
model's width coefficient. Used as the multiplier. |
required |
depth_coefficient |
float
|
model's depth coefficient. Used as the multiplier. |
required |
image_size |
int
|
Size of input image. |
required |
dropout_rate |
float
|
Dropout probability in final layer |
required |
num_classes |
int
|
Number of classes. |
required |
batch_norm_momentum |
Optional[float]
|
Value used for the running_mean and running_var computation |
0.99
|
batch_norm_epsilon |
Optional[float]
|
Value added to the denominator for numerical stability |
0.001
|
drop_connect_rate |
Optional[float]
|
Connection dropout probability |
0.2
|
depth_divisor |
Optional[int]
|
Model's depth divisor. Used as the divisor. |
8
|
min_depth |
Optional[int]
|
Model's minimal depth, if given. |
None
|
backbone_mode |
Optional[bool]
|
If true, dropping the final linear layer |
False
|
blocks_args |
Optional[list]
|
List of BlockArgs to construct blocks. (list[namedtuple]) |
None
|
Source code in src/super_gradients/training/models/classification_models/efficientnet.py
396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 |
|
extract_features(inputs)
Use convolution layer to extract feature.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inputs |
torch.Tensor
|
Input tensor. |
required |
Returns:
Type | Description |
---|---|
torch.Tensor
|
Output of the final convolution layer in the efficientnet model. |
Source code in src/super_gradients/training/models/classification_models/efficientnet.py
489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 |
|
forward(inputs)
EfficientNet's forward function. Calls extract_features to extract features, applies final linear layer, and returns logits.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inputs |
Input tensor. |
required |
Returns:
Type | Description |
---|---|
Output of this model after processing. |
Source code in src/super_gradients/training/models/classification_models/efficientnet.py
512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 |
|
load_state_dict(state_dict, strict=True)
load_state_dict - Overloads the base method and calls it to load a modified dict for usage as a backbone
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state_dict |
dict
|
The state_dict to load |
required |
strict |
bool
|
strict loading (see super() docs) |
True
|
Source code in src/super_gradients/training/models/classification_models/efficientnet.py
550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 |
|
Identity
Bases: nn.Module
Identity mapping. Send input to output directly.
Source code in src/super_gradients/training/models/classification_models/efficientnet.py
205 206 207 208 209 210 211 212 213 214 |
|
MBConvBlock
Bases: nn.Module
Mobile Inverted Residual Bottleneck Block.
References: [1] https://arxiv.org/abs/1704.04861 (MobileNet v1) [2] https://arxiv.org/abs/1801.04381 (MobileNet v2) [3] https://arxiv.org/abs/1905.02244 (MobileNet v3)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
block_args |
BlockArgs
|
BlockArgs. |
required |
batch_norm_momentum |
float
|
Batch norm momentum. |
required |
batch_norm_epsilon |
float
|
Batch norm epsilon. |
required |
image_size |
Union[Tuple, List]
|
[image_height, image_width]. |
None
|
Source code in src/super_gradients/training/models/classification_models/efficientnet.py
303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 |
|
forward(inputs, drop_connect_rate=None)
MBConvBlock's forward function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inputs |
torch.Tensor
|
Input tensor. |
required |
drop_connect_rate |
Optional[float]
|
Drop connect rate (float, between 0 and 1). |
None
|
Returns:
Type | Description |
---|---|
torch.Tensor
|
Output of this block after processing. |
Source code in src/super_gradients/training/models/classification_models/efficientnet.py
355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 |
|
calculate_output_image_size(input_image_size, stride)
Calculates the output image size when using Conv2dSamePadding with a stride. Necessary for static padding. Thanks to mannatsingh for pointing this out.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_image_size |
Union[int, Tuple, List]
|
Size of input image. |
required |
stride |
Union[int, Tuple, List]
|
Conv2d operation's stride. |
required |
Returns:
Type | Description |
---|---|
Optional[List[int]]
|
output_image_size: A list [H,W]. |
Source code in src/super_gradients/training/models/classification_models/efficientnet.py
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
|
drop_connect(inputs, p, training)
Drop connect.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
inputs
|
Input of this structure. (tensor: BCWH) |
required |
training |
bool
|
Running mode. |
required |
Returns:
Type | Description |
---|---|
torch.Tensor
|
output: Output after drop connection. |
Source code in src/super_gradients/training/models/classification_models/efficientnet.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
get_same_padding_conv2d(image_size=None)
Chooses static padding if you have specified an image size, and dynamic padding otherwise. Static padding is necessary for ONNX exporting of models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
image_size |
Optional[Union[int, Tuple[int, int]]]
|
Size of the image. |
None
|
Returns:
Type | Description |
---|---|
Conv2dDynamicSamePadding or Conv2dStaticSamePadding. |
Source code in src/super_gradients/training/models/classification_models/efficientnet.py
129 130 131 132 133 134 135 136 137 138 139 |
|
round_filters(filters, width_coefficient, depth_divisor, min_depth)
Calculate and round number of filters based on width multiplier. Use width_coefficient, depth_divisor and min_depth.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filters |
int
|
Filters number to be calculated. Params from arch_params: |
required |
width_coefficient |
int
|
model's width coefficient. Used as the multiplier. |
required |
depth_divisor |
int
|
model's depth divisor. Used as the divisor. |
required |
min_depth |
int
|
model's minimal depth, if given. |
required |
Returns:
Type | Description |
---|---|
new_filters: New filters number after calculating. |
Source code in src/super_gradients/training/models/classification_models/efficientnet.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
|
round_repeats(repeats, depth_coefficient)
Calculate module's repeat number of a block based on depth multiplier. Use depth_coefficient.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
repeats |
int
|
num_repeat to be calculated. |
required |
depth_coefficient |
int
|
the depth coefficient of the model. this func uses it as the multiplier. |
required |
Returns:
Type | Description |
---|---|
new repeat: New repeat number after calculating. |
Source code in src/super_gradients/training/models/classification_models/efficientnet.py
64 65 66 67 68 69 70 71 72 73 74 75 |
|
Googlenet code based on https://pytorch.org/vision/stable/_modules/torchvision/models/googlenet.html
GoogLeNet
Bases: BaseClassifier
Source code in src/super_gradients/training/models/classification_models/googlenet.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
|
load_state_dict(state_dict, strict=True)
load_state_dict - Overloads the base method and calls it to load a modified dict for usage as a backbone
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state_dict |
The state_dict to load |
required | |
strict |
strict loading (see super() docs) |
True
|
Source code in src/super_gradients/training/models/classification_models/googlenet.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
|
MobileNet in PyTorch.
See the paper "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications" for more details.
Block
Bases: nn.Module
Depthwise conv + Pointwise conv
Source code in src/super_gradients/training/models/classification_models/mobilenet.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
MobileNet
Bases: BaseClassifier
, SupportsReplaceInputChannels
Source code in src/super_gradients/training/models/classification_models/mobilenet.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
|
forward(x)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
up_to_layer |
forward through the net layers up to a specific layer. if None, run all layers |
required |
Source code in src/super_gradients/training/models/classification_models/mobilenet.py
52 53 54 55 56 57 58 59 60 61 62 63 64 |
|
This is a PyTorch implementation of MobileNetV2 architecture as described in the paper: Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation. https://arxiv.org/pdf/1801.04381
Code taken from https://github.com/tonylins/pytorch-mobilenet-v2 License: Apache Version 2.0, January 2004 http://www.apache.org/licenses/
Pre-trained ImageNet model: 'deci-model-repository/mobilenet_v2/ckpt_best.pth'
CustomMobileNetV2
Bases: MobileNetV2
Source code in src/super_gradients/training/models/classification_models/mobilenetv2.py
238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 |
|
__init__(arch_params)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
arch_params |
–≠ HpmStruct must contain: 'num_classes': int 'width_mult': float 'structure' : list. specify the mobilenetv2 architecture |
required |
Source code in src/super_gradients/training/models/classification_models/mobilenetv2.py
240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 |
|
InvertedResidual
Bases: nn.Module
Source code in src/super_gradients/training/models/classification_models/mobilenetv2.py
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
|
__init__(inp, oup, stride, expand_ratio, grouped_conv_size=1)
:grouped_conv_size: number of channels per grouped convolution, for depth-wise-separable convolution, use grouped_conv_size=1
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inp |
number of input channels |
required | |
oup |
number of output channels |
required | |
stride |
conv stride |
required | |
expand_ratio |
expansion ratio of the hidden layer after pointwise conv |
required |
Source code in src/super_gradients/training/models/classification_models/mobilenetv2.py
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
|
MobileNetV2
Bases: MobileNetBase
Source code in src/super_gradients/training/models/classification_models/mobilenetv2.py
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 |
|
MobileNetV2Base
Bases: MobileNetV2
Source code in src/super_gradients/training/models/classification_models/mobilenetv2.py
205 206 207 208 209 210 211 212 213 214 215 216 217 218 |
|
__init__(arch_params)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
arch_params |
HpmStruct must contain: 'num_classes': int |
required |
Source code in src/super_gradients/training/models/classification_models/mobilenetv2.py
207 208 209 210 211 212 213 214 215 216 217 218 |
|
MobileNetV2_135
Bases: MobileNetV2
Source code in src/super_gradients/training/models/classification_models/mobilenetv2.py
221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 |
|
__init__(arch_params)
This Model achieves–≠ 75.73% on Imagenet - similar to Resnet50
Parameters:
Name | Type | Description | Default |
---|---|---|---|
arch_params |
HpmStruct must contain: 'num_classes': int |
required |
Source code in src/super_gradients/training/models/classification_models/mobilenetv2.py
223 224 225 226 227 228 229 230 231 232 233 234 235 |
|
Creates a MobileNetV3 Model as defined in: Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, Hartwig Adam. (2019). Searching for MobileNetV3 arXiv preprint arXiv:1905.02244.
mobilenetv3_custom
Bases: MobileNetV3
Constructs a MobileNetV3-Customized model
Source code in src/super_gradients/training/models/classification_models/mobilenetv3.py
239 240 241 242 243 244 245 246 247 248 249 250 251 252 |
|
mobilenetv3_large
Bases: MobileNetV3
Constructs a MobileNetV3-Large model
Source code in src/super_gradients/training/models/classification_models/mobilenetv3.py
185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 |
|
mobilenetv3_small
Bases: MobileNetV3
Constructs a MobileNetV3-Small model
Source code in src/super_gradients/training/models/classification_models/mobilenetv3.py
214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 |
|
PNASNet in PyTorch.
Paper: Progressive Neural Architecture Search
https://github.com/kuangliu/pytorch-cifar/blob/master/models/pnasnet.py
SepConv
Bases: nn.Module
Separable Convolution.
Source code in src/super_gradients/training/models/classification_models/pnasnet.py
13 14 15 16 17 18 19 20 21 22 |
|
Pre-activation ResNet in PyTorch.
Reference: [1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun Identity Mappings in Deep Residual Networks. arXiv:1603.05027
Based on https://github.com/kuangliu/pytorch-cifar/blob/master/models/preact_resnet.py
PreActBlock
Bases: nn.Module
Pre-activation version of the BasicBlock.
Source code in src/super_gradients/training/models/classification_models/preact_resnet.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|
PreActBottleneck
Bases: nn.Module
Pre-activation version of the original Bottleneck module.
Source code in src/super_gradients/training/models/classification_models/preact_resnet.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
|
Regnet - from paper: Designing Network Design Spaces - https://arxiv.org/pdf/2003.13678.pdf Implementation of paradigm described in paper published by Facebook AI Research (FAIR) @author: Signatrix GmbH Code taken from: https://github.com/signatrix/regnet - MIT Licence
CustomAnyNet
Bases: AnyNetX
Source code in src/super_gradients/training/models/classification_models/regnet.py
277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 |
|
__init__(arch_params)
All parameters must be provided in arch_params other than SE
Source code in src/super_gradients/training/models/classification_models/regnet.py
279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 |
|
CustomRegNet
Bases: RegNetX
Source code in src/super_gradients/training/models/classification_models/regnet.py
259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 |
|
__init__(arch_params)
All parameters must be provided in arch_params other than SE
Source code in src/super_gradients/training/models/classification_models/regnet.py
261 262 263 264 265 266 267 268 269 270 271 272 273 274 |
|
NASRegNet
Bases: RegNetX
Source code in src/super_gradients/training/models/classification_models/regnet.py
296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 |
|
__init__(arch_params)
All parameters are provided as a single structure list: arch_params.structure
Source code in src/super_gradients/training/models/classification_models/regnet.py
298 299 300 301 302 303 304 305 306 307 308 309 310 311 |
|
verify_correctness_of_parameters(ls_num_blocks, ls_block_width, ls_bottleneck_ratio, ls_group_width)
VERIFY THAT THE GIVEN PARAMETERS FIT THE SEARCH SPACE DEFINED IN THE REGNET PAPER
Source code in src/super_gradients/training/models/classification_models/regnet.py
246 247 248 249 250 251 252 253 254 255 256 |
|
Repvgg Pytorch Implementation. This model trains a vgg with residual blocks but during inference (in deployment mode) will convert the model to vgg model. Pretrained models: https://drive.google.com/drive/folders/1Avome4KvNp0Lqh2QwhXO6L5URQjzCjUq Refrerences: [1] https://github.com/DingXiaoH/RepVGG [2] https://arxiv.org/pdf/2101.03697.pdf
Based on https://github.com/DingXiaoH/RepVGG
RepVGG
Bases: BaseClassifier
Source code in src/super_gradients/training/models/classification_models/repvgg.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
|
__init__(struct, num_classes=1000, width_multiplier=None, build_residual_branches=True, use_se=False, backbone_mode=False, in_channels=3)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
struct |
list containing number of blocks per repvgg stage |
required | |
num_classes |
number of classes if nut in backbone mode |
1000
|
|
width_multiplier |
list of per stage width multiplier or float if using single value for all stages |
None
|
|
build_residual_branches |
whether to add residual connections or not |
True
|
|
use_se |
use squeeze and excitation layers |
False
|
|
backbone_mode |
if true, dropping the final linear layer |
False
|
|
in_channels |
input channels |
3
|
Source code in src/super_gradients/training/models/classification_models/repvgg.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
|
ResNet in PyTorch. For Pre-activation ResNet, see 'preact_resnet.py'. Reference: [1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun Deep Residual Learning for Image Recognition. arXiv:1512.03385
Pre-trained ImageNet models: 'deci-model-repository/resnet?/ckpt_best.pth' => ? = the type of resnet (e.g. 18, 34...) Pre-trained CIFAR10 models: 'deci-model-repository/CIFAR_NAS_#??????/ckpt_best.pth' => ? = num of model, structure, width_mult
Code adapted from https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py
ResNet
Bases: BaseClassifier
Source code in src/super_gradients/training/models/classification_models/resnet.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 |
|
load_state_dict(state_dict, strict=True)
load_state_dict - Overloads the base method and calls it to load a modified dict for usage as a backbone
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state_dict |
The state_dict to load |
required | |
strict |
strict loading (see super() docs) |
True
|
Source code in src/super_gradients/training/models/classification_models/resnet.py
212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 |
|
ResNeXt in PyTorch.
See the paper "Aggregated Residual Transformations for Deep Neural Networks" for more details.
Code adapted from https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py
GroupedConvBlock
Bases: nn.Module
Grouped convolution block.
Source code in src/super_gradients/training/models/classification_models/resnext.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
|
conv1x1(in_planes, out_planes, stride=1)
1x1 convolution
Source code in src/super_gradients/training/models/classification_models/resnext.py
21 22 23 |
|
conv3x3(in_planes, out_planes, stride=1, groups=1, dilation=1)
3x3 convolution with padding
Source code in src/super_gradients/training/models/classification_models/resnext.py
16 17 18 |
|
SENet in PyTorch.
SENet is the winner of ImageNet-2017. The paper is not released yet.
Code adapted from https://github.com/fastai/imagenet-fast/blob/master/cifar10/models/cifar10/senet.py
ShuffleNet in PyTorch.
See the paper "ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices" for more details.
https://github.com/kuangliu/pytorch-cifar/blob/master/models/shufflenet.py
ShuffleBlock
Bases: nn.Module
Source code in src/super_gradients/training/models/classification_models/shufflenet.py
13 14 15 16 17 18 19 20 21 22 |
|
forward(x)
Channel shuffle: [N,C,H,W] -> [N,g,C/g,H,W] -> [N,C/g,g,H,w] -> [N,C,H,W]
Source code in src/super_gradients/training/models/classification_models/shufflenet.py
18 19 20 21 22 |
|
ShuffleNetV2 in PyTorch.
See the paper "ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design" for more details. (https://arxiv.org/abs/1807.11164)
Code taken from torchvision/models/shufflenetv2.py
ChannelShuffleInvertedResidual
Bases: nn.Module
Implement Inverted Residual block as in [https://arxiv.org/abs/1807.11164] in Fig.3 (c) & (d):
- When stride > 1
- the whole input goes through branch1,
- the whole input goes through branch2 , and the arbitrary number of output channels are produced.
- When stride == 1
- half of input channels in are passed as identity,
- another half of input channels goes through branch2, and the number of output channels after the block remains the same as in input.
Channel shuffle is performed on a concatenation in both cases.
Source code in src/super_gradients/training/models/classification_models/shufflenetv2.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 |
|
channel_shuffle(x, groups)
staticmethod
From "ShuffleNet V2: Practical Guidelines for EfficientCNN Architecture Design" (https://arxiv.org/abs/1807.11164): A “channel shuffle” operation is then introduced to enable information communication between different groups of channels and improve accuracy.
The operation preserves x.size(), but shuffles its channels in the manner explained further in the example.
Example: If group = 2 (2 branches with the same # of activation maps were concatenated before channel_shuffle), then activation maps in x are: from_B1, from_B1, ... from_B2, from_B2 After channel_shuffle activation maps in x will be: from_B1, from_B2, ... from_B1, from_B2
Source code in src/super_gradients/training/models/classification_models/shufflenetv2.py
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
ShuffleNetV2Base
Bases: BaseClassifier
Source code in src/super_gradients/training/models/classification_models/shufflenetv2.py
115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 |
|
load_state_dict(state_dict, strict=True)
load_state_dict - Overloads the base method and calls it to load a modified dict for usage as a backbone
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state_dict |
The state_dict to load |
required | |
strict |
strict loading (see super() docs) |
True
|
Source code in src/super_gradients/training/models/classification_models/shufflenetv2.py
173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 |
|
VGG11/13/16/19 in Pytorch. Adapted from https://github.com/pytorch/vision/blob/master/torchvision/models/vgg.py
Vision Transformer in PyTorch. Reference: [1] Dosovitskiy, Alexey, et al. "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv preprint arXiv:2010.11929 (2020)
Code adapted from https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/vit.py
Attention
Bases: nn.Module
self attention layer with residual connection
Source code in src/super_gradients/training/models/classification_models/vit.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
|
FeedForward
Bases: nn.Module
feed forward block with residual connection
Source code in src/super_gradients/training/models/classification_models/vit.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
|
PatchEmbed
Bases: nn.Module
2D Image to Patch Embedding Using Conv layers (Faster than rearranging + Linear)
Source code in src/super_gradients/training/models/classification_models/vit.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
|
ViT
Bases: BaseClassifier
Source code in src/super_gradients/training/models/classification_models/vit.py
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 |
|
__init__(image_size, patch_size, num_classes, hidden_dim, depth, heads, mlp_dim, in_channels=3, dropout_prob=0.0, emb_dropout_prob=0.0, backbone_mode=False)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
image_size |
tuple
|
Image size tuple for data processing into patches done within the model. |
required |
patch_size |
tuple
|
Patch size tuple for data processing into patches done within the model. |
required |
num_classes |
int
|
Number of classes for the classification head. |
required |
hidden_dim |
int
|
Output dimension of each transformer block. |
required |
depth |
int
|
Number of transformer blocks |
required |
heads |
int
|
Number of attention heads |
required |
mlp_dim |
int
|
Intermediate dimension of the transformer block's feed forward |
required |
in_channels |
input channels |
3
|
|
dropout |
Dropout ratio between the feed forward layers. |
required | |
emb_dropout |
Dropout ratio between after the embedding layer |
required | |
backbone_mode |
If True output after pooling layer |
False
|
Source code in src/super_gradients/training/models/classification_models/vit.py
130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 |
|
ConvertableCompletePipelineModel
Bases: torch.nn.Module
Exportable nn.Module that wraps the model, preprocessing and postprocessing.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
torch.nn.Module
|
torch.nn.Module, the main model. takes input from pre_process' output, and feeds pre_process. |
required |
pre_process |
torch.nn.Module
|
torch.nn.Module, preprocessing module, its output will be model's input. When none (default), set to Identity(). |
None
|
**prep_model_for_conversion_kwargs |
for SgModules- args to be passed to model.prep_model_for_conversion prior to torch.onnx.export call. |
{}
|
Source code in src/super_gradients/training/models/conversion.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
|
convert_from_config(cfg)
Exports model according to cfg.
See: super_gradients/recipes/conversion_params/default_conversion_params.yaml for the full cfg content documentation, and super_gradients/examples/convert_recipe_example/convert_recipe_example.py for usage.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cfg |
DictConfig
|
required |
Returns:
Type | Description |
---|---|
str
|
out_path, the path of the saved .onnx file. |
Source code in src/super_gradients/training/models/conversion.py
250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 |
|
convert_to_coreml(model, out_path, input_size=None, pre_process=None, post_process=None, prep_model_for_conversion_kwargs=None, export_as_ml_program=False, torch_trace_kwargs=None)
Exports a given SG model to CoreML mlprogram or package.
:param model: torch.nn.Module, model to export to CoreML.
:param out_path: str, destination path for the .mlmodel file.
:param input_size: Input shape without batch dimensions ([C,H,W]). Batch size assumed to be 1.
:param pre_process: torch.nn.Module, preprocessing pipeline, will be resolved by TransformsFactory()
:param post_process: torch.nn.Module, postprocessing pipeline, will be resolved by TransformsFactory()
:param prep_model_for_conversion_kwargs: dict, for SgModules- args to be passed to model.prep_model_for_conversion
prior to ct.convert call. Supported keys are:
- input_size - Shape of inputs with batch dimension, [C,H,W] for image inputs.
:param export_as_ml_program: Whether to convert to the new program format (better) or legacy coreml proto file
(Supports more iOS versions and devices, but this format will be deprecated at some point).
:param torch_trace_kwargs: kwargs for torch.jit.trace
Returns:
Type | Description |
---|---|
Path |
Source code in src/super_gradients/training/models/conversion.py
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
|
convert_to_onnx(model, out_path, input_shape=None, pre_process=None, post_process=None, prep_model_for_conversion_kwargs=None, torch_onnx_export_kwargs=None, simplify=True)
Exports model to ONNX.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
torch.nn.Module
|
torch.nn.Module, model to export to ONNX. |
required |
out_path |
str
|
str, destination path for the .onnx file. |
required |
input_shape |
tuple
|
Input shape without batch dimensions ([C,H,W]). Batch size assumed to be 1. DEPRECATED USE input_size KWARG IN prep_model_for_conversion_kwargs INSTEAD. |
None
|
pre_process |
torch.nn.Module
|
torch.nn.Module, preprocessing pipeline, will be resolved by TransformsFactory() |
None
|
post_process |
torch.nn.Module
|
torch.nn.Module, postprocessing pipeline, will be resolved by TransformsFactory() |
None
|
prep_model_for_conversion_kwargs |
dict, for SgModules- args to be passed to model.prep_model_for_conversion prior to torch.onnx.export call. Supported keys are: - input_size - Shape of inputs with batch dimension, [C,H,W] for image inputs. |
None
|
|
torch_onnx_export_kwargs |
kwargs (EXCLUDING: FIRST 3 KWARGS- MODEL, F, ARGS). to be unpacked in torch.onnx.export call |
None
|
|
simplify |
bool
|
bool,whether to apply onnx simplifier method, same as `python -m onnxsim onnx_path onnx_sim_path. When true, the simplified model will be saved in out_path (default=True). |
True
|
Returns:
Type | Description |
---|---|
out_path |
Source code in src/super_gradients/training/models/conversion.py
160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 |
|
onnx_simplify(onnx_path, onnx_sim_path)
onnx simplifier method, same as `python -m onnxsim onnx_path onnx_sim_path
Parameters:
Name | Type | Description | Default |
---|---|---|---|
onnx_path |
str
|
path to onnx model |
required |
onnx_sim_path |
str
|
path for output onnx simplified model |
required |
Source code in src/super_gradients/training/models/conversion.py
274 275 276 277 278 279 280 281 282 283 |
|
prepare_conversion_cfgs(cfg)
Builds the cfg (i.e conversion_params) and experiment_cfg (i.e recipe config according to cfg.experiment_name) to be used by convert_recipe_example
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cfg |
DictConfig
|
DictConfig, converion_params config |
required |
Returns:
Type | Description |
---|---|
cfg, experiment_cfg |
Source code in src/super_gradients/training/models/conversion.py
219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 |
|
CSP Darknet
CSPLayer
Bases: nn.Module
CSP Bottleneck with 3 convolutions
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_channels |
int
|
int, input channels. |
required |
out_channels |
int
|
int, output channels. |
required |
num_bottlenecks |
int
|
int, number of bottleneck conv layers. |
required |
act |
Type[nn.Module]
|
Type[nn.module], activation type. |
required |
shortcut |
bool
|
bool, whether to apply shortcut (i.e add input to result) in bottlenecks (default=True). |
True
|
depthwise |
bool
|
bool, whether to use GroupedConvBlock in last conv in bottlenecks (default=False). |
False
|
expansion |
float
|
float, determines the number of hidden channels (default=0.5). |
0.5
|
Source code in src/super_gradients/training/models/detection_models/csp_darknet53.py
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
|
GroupedConvBlock
Bases: nn.Module
Grouped Conv KxK -> usual Conv 1x1
Source code in src/super_gradients/training/models/detection_models/csp_darknet53.py
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
|
__init__(input_channels, output_channels, kernel, stride, activation_type, padding=None, groups=None)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
groups |
int
|
num of groups in the first conv; if None depthwise separable conv will be used (groups = input channels) |
None
|
Source code in src/super_gradients/training/models/detection_models/csp_darknet53.py
43 44 45 46 47 48 49 50 51 |
|
SPP
Bases: BaseDetectionModule
Source code in src/super_gradients/training/models/detection_models/csp_darknet53.py
135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
|
out_channels
property
Returns:
Type | Description |
---|---|
channels of tensor(s) that will be returned by a module in forward |
ViewModule
Bases: nn.Module
Returns a reshaped version of the input, to be used in None-Backbone Mode
Source code in src/super_gradients/training/models/detection_models/csp_darknet53.py
160 161 162 163 164 165 166 167 168 169 170 |
|
CSPResNetBackbone
Bases: nn.Module
, SupportsReplaceInputChannels
CSPResNet backbone
Source code in src/super_gradients/training/models/detection_models/csp_resnet.py
114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 |
|
__init__(layers, channels, activation, return_idx, use_large_stem, width_mult, depth_mult, use_alpha, pretrained_weights=None, in_channels=3)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
layers |
Tuple[int, ...]
|
Number of blocks in each stage |
required |
channels |
Tuple[int, ...]
|
Number of channels [stem, stage 0, stage 1, stage 2, ...] |
required |
activation |
Type[nn.Module]
|
Used activation type for all child modules. |
required |
return_idx |
Tuple[int, int, int]
|
Indexes of returned feature maps |
required |
use_large_stem |
bool
|
If True, uses 3 conv+bn+act instead of 2 in stem blocks |
required |
width_mult |
float
|
Scaling factor for a number of channels |
required |
depth_mult |
float
|
Scaling factor for a number of blocks in each stage |
required |
use_alpha |
bool
|
If True, enables additional learnable weighting parameter for 1x1 branch in RepVGGBlock |
required |
pretrained_weights |
Optional[str]
|
None
|
|
in_channels |
int
|
Number of input channels. Default: 3 |
3
|
Source code in src/super_gradients/training/models/detection_models/csp_resnet.py
120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 |
|
prep_model_for_conversion(input_size=None, **kwargs)
Prepare the model to be converted to ONNX or other frameworks. Typically, this function will freeze the size of layers which is otherwise flexible, replace some modules with convertible substitutes and remove all auxiliary or training related parts.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_size |
Union[tuple, list]
|
[H,W] |
None
|
Source code in src/super_gradients/training/models/detection_models/csp_resnet.py
233 234 235 236 237 238 239 240 241 242 |
|
CSPResNetBasicBlock
Bases: nn.Module
Source code in src/super_gradients/training/models/detection_models/csp_resnet.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
|
__init__(in_channels, out_channels, activation_type, use_residual_connection=True, use_alpha=False)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_channels |
int
|
required | |
out_channels |
int
|
required | |
activation_type |
Type[nn.Module]
|
required | |
use_residual_connection |
bool
|
Whether to add input x to the output |
True
|
use_alpha |
If True, enables additional learnable weighting parameter for 1x1 branch in RepVGGBlock |
False
|
Source code in src/super_gradients/training/models/detection_models/csp_resnet.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
|
CSPResStage
Bases: nn.Module
Source code in src/super_gradients/training/models/detection_models/csp_resnet.py
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
|
__init__(in_channels, out_channels, num_blocks, stride, activation_type, use_attention=True, use_alpha=False)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_channels |
int
|
Number of input channels |
required |
out_channels |
int
|
Number of output channels |
required |
num_blocks |
Number of blocks in stage |
required | |
stride |
int
|
Desired down-sampling for the stage (Usually 2) |
required |
activation_type |
Type[nn.Module]
|
Non-linearity type used in child modules. |
required |
use_attention |
bool
|
If True, adds EffectiveSEBlock at the end of each stage |
True
|
use_alpha |
bool
|
If True, enables additional learnable weighting parameter for 1x1 branch in underlying RepVGG blocks (PP-Yolo-E Plus) |
False
|
Source code in src/super_gradients/training/models/detection_models/csp_resnet.py
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
|
A base for a detection network built according to the following scheme: * constructed from nested arch_params; * inside arch_params each nested level (module) has an explicit type and its required parameters * each module accepts in_channels and other parameters * each module defines out_channels property on construction
CustomizableDetector
Bases: HasPredict
, SgModule
A customizable detector with backbone -> neck -> heads Each submodule with its parameters must be defined explicitly. Modules should follow the interface of BaseDetectionModule
Source code in src/super_gradients/training/models/detection_models/customizable_detector.py
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 |
|
__init__(backbone, heads, neck=None, num_classes=None, bn_eps=None, bn_momentum=None, inplace_act=True, in_channels=3)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
backbone |
Union[str, dict, HpmStruct, DictConfig]
|
Backbone configuration. |
required |
heads |
Union[str, dict, HpmStruct, DictConfig]
|
Head configuration. |
required |
neck |
Optional[Union[str, dict, HpmStruct, DictConfig]]
|
Neck configuration. |
None
|
num_classes |
int
|
num classes to predict. |
None
|
bn_eps |
Optional[float]
|
Epsilon for batch norm. |
None
|
bn_momentum |
Optional[float]
|
Momentum for batch norm. |
None
|
inplace_act |
Optional[bool]
|
If True, do the operations operation in-place when possible. |
True
|
in_channels |
int
|
number of input channels |
3
|
Source code in src/super_gradients/training/models/detection_models/customizable_detector.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
|
get_post_prediction_callback(*, conf, iou, nms_top_k, max_predictions, multi_label_per_box, class_agnostic_nms)
Get a post prediction callback for this model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
conf |
float
|
A minimum confidence threshold for predictions to be used in post-processing. |
required |
iou |
float
|
A IoU threshold for boxes non-maximum suppression. |
required |
nms_top_k |
int
|
The maximum number of detections to consider for NMS. |
required |
max_predictions |
int
|
The maximum number of detections to return. |
required |
multi_label_per_box |
bool
|
If True, each anchor can produce multiple labels of different classes. If False, each anchor can produce only one label of the class with the highest score. |
required |
class_agnostic_nms |
bool
|
If True, perform class-agnostic NMS (i.e IoU of boxes of different classes is checked). If False NMS is performed separately for each class. |
required |
Returns:
Type | Description |
---|---|
DetectionPostPredictionCallback
|
Source code in src/super_gradients/training/models/detection_models/customizable_detector.py
137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
|
predict(images, iou=None, conf=None, batch_size=32, fuse_model=True, skip_image_resizing=False, nms_top_k=None, max_predictions=None, multi_label_per_box=None, class_agnostic_nms=None, fp16=True)
Predict an image or a list of images.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
images |
ImageSource
|
Images to predict. |
required |
iou |
Optional[float]
|
(Optional) IoU threshold for the nms algorithm. If None, the default value associated to the training is used. |
None
|
conf |
Optional[float]
|
(Optional) Below the confidence threshold, prediction are discarded. If None, the default value associated to the training is used. |
None
|
batch_size |
int
|
Maximum number of images to process at the same time. |
32
|
fuse_model |
bool
|
If True, create a copy of the model, and fuse some of its layers to increase performance. This increases memory usage. |
True
|
skip_image_resizing |
bool
|
If True, the image processor will not resize the images. |
False
|
nms_top_k |
Optional[int]
|
(Optional) The maximum number of detections to consider for NMS. |
None
|
max_predictions |
Optional[int]
|
(Optional) The maximum number of detections to return. |
None
|
multi_label_per_box |
Optional[bool]
|
(Optional) If True, each anchor can produce multiple labels of different classes. If False, each anchor can produce only one label of the class with the highest score. |
None
|
class_agnostic_nms |
Optional[bool]
|
(Optional) If True, perform class-agnostic NMS (i.e IoU of boxes of different classes is checked). If False NMS is performed separately for each class. |
None
|
fp16 |
bool
|
If True, use mixed precision for inference. |
True
|
Source code in src/super_gradients/training/models/detection_models/customizable_detector.py
266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 |
|
predict_webcam(iou=None, conf=None, fuse_model=True, skip_image_resizing=False, nms_top_k=None, max_predictions=None, multi_label_per_box=None, class_agnostic_nms=None, fp16=True)
Predict using webcam.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
iou |
Optional[float]
|
(Optional) IoU threshold for the nms algorithm. If None, the default value associated to the training is used. |
None
|
conf |
Optional[float]
|
(Optional) Below the confidence threshold, prediction are discarded. If None, the default value associated to the training is used. |
None
|
batch_size |
Maximum number of images to process at the same time. |
required | |
fuse_model |
bool
|
If True, create a copy of the model, and fuse some of its layers to increase performance. This increases memory usage. |
True
|
skip_image_resizing |
bool
|
If True, the image processor will not resize the images. |
False
|
nms_top_k |
Optional[int]
|
(Optional) The maximum number of detections to consider for NMS. |
None
|
max_predictions |
Optional[int]
|
(Optional) The maximum number of detections to return. |
None
|
multi_label_per_box |
Optional[bool]
|
(Optional) If True, each anchor can produce multiple labels of different classes. If False, each anchor can produce only one label of the class with the highest score. |
None
|
class_agnostic_nms |
Optional[bool]
|
(Optional) If True, perform class-agnostic NMS (i.e IoU of boxes of different classes is checked). If False NMS is performed separately for each class. |
None
|
fp16 |
bool
|
If True, use mixed precision for inference. |
True
|
Source code in src/super_gradients/training/models/detection_models/customizable_detector.py
310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 |
|
set_dataset_processing_params(class_names=None, image_processor=None, iou=None, conf=None, nms_top_k=None, max_predictions=None, multi_label_per_box=None, class_agnostic_nms=None)
Set the processing parameters for the dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
class_names |
Optional[List[str]]
|
(Optional) Names of the dataset the model was trained on. |
None
|
image_processor |
Optional[Processing]
|
(Optional) Image processing objects to reproduce the dataset preprocessing used for training. |
None
|
iou |
Optional[float]
|
(Optional) IoU threshold for the nms algorithm |
None
|
conf |
Optional[float]
|
(Optional) Below the confidence threshold, prediction are discarded |
None
|
nms_top_k |
Optional[int]
|
(Optional) The maximum number of detections to consider for NMS. |
None
|
max_predictions |
Optional[int]
|
(Optional) The maximum number of detections to return. |
None
|
multi_label_per_box |
Optional[bool]
|
(Optional) If True, each anchor can produce multiple labels of different classes. If False, each anchor can produce only one label of the class with the highest score. |
None
|
class_agnostic_nms |
Optional[bool]
|
(Optional) If True, perform class-agnostic NMS (i.e IoU of boxes of different classes is checked). If False NMS is performed separately for each class. |
None
|
Source code in src/super_gradients/training/models/detection_models/customizable_detector.py
155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 |
|
DarkResidualBlock
Bases: nn.Module
DarkResidualBlock - The Darknet Residual Block
Source code in src/super_gradients/training/models/detection_models/darknet53.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
|
Darknet53
Bases: Darknet53Base
Source code in src/super_gradients/training/models/detection_models/darknet53.py
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
|
forward(x)
forward - Forward pass on the modules list :param x: The input data :return: forward pass for backbone pass or classification pass
Source code in src/super_gradients/training/models/detection_models/darknet53.py
104 105 106 107 108 109 110 |
|
ViewModule
Bases: nn.Module
Returns a reshaped version of the input, to be used in None-Backbone Mode
Source code in src/super_gradients/training/models/detection_models/darknet53.py
114 115 116 117 118 119 120 121 122 123 124 |
|
PPYoloEPostPredictionCallback
Bases: DetectionPostPredictionCallback
Non-Maximum Suppression (NMS) module
Source code in src/super_gradients/training/models/detection_models/pp_yolo_e/post_prediction_callback.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
|
__init__(*, score_threshold, nms_threshold, nms_top_k, max_predictions, multi_label_per_box=True, class_agnostic_nms=False)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
score_threshold |
float
|
Predictions confidence threshold. Predictions with score lower than score_threshold will not participate in Top-K & NMS |
required |
nms_threshold |
float
|
IoU threshold for NMS step. |
required |
nms_top_k |
int
|
Number of predictions participating in NMS step |
required |
max_predictions |
int
|
Maximum number of boxes to return after NMS step |
required |
multi_label_per_box |
bool
|
Controls whether to decode multiple labels per box. True - each anchor can produce multiple labels of different classes that pass confidence threshold check (default). False - each anchor can produce only one label of the class with the highest score. |
True
|
Source code in src/super_gradients/training/models/detection_models/pp_yolo_e/post_prediction_callback.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
|
forward(outputs, device=None)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
outputs |
Any
|
Outputs of model's forward() method |
required |
device |
str
|
(Deprecated) Not used anymore, exists only for sake of keeping the same interface as in the parent class. Will be removed in the SG 3.7.0. A device parameter in case we want to move tensors to a specific device. |
None
|
Returns:
Type | Description |
---|---|
List[List[Tensor]]
|
List of lists of tensors of shape [Ni, 6] where Ni is the number of detections in i-th image. Format of each row is [x1, y1, x2, y2, confidence, class] |
Source code in src/super_gradients/training/models/detection_models/pp_yolo_e/post_prediction_callback.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
|
PPYoloE
Bases: SgModule
, ExportableObjectDetectionModel
, HasPredict
, SupportsInputShapeCheck
Source code in src/super_gradients/training/models/detection_models/pp_yolo_e/pp_yolo_e.py
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 |
|
get_post_prediction_callback(*, conf, iou, nms_top_k, max_predictions, multi_label_per_box, class_agnostic_nms)
Get a post prediction callback for this model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
conf |
float
|
A minimum confidence threshold for predictions to be used in post-processing. |
required |
iou |
float
|
A IoU threshold for boxes non-maximum suppression. |
required |
nms_top_k |
int
|
The maximum number of detections to consider for NMS. |
required |
max_predictions |
int
|
The maximum number of detections to return. |
required |
multi_label_per_box |
bool
|
If True, each anchor can produce multiple labels of different classes. If False, each anchor can produce only one label of the class with the highest score. |
required |
class_agnostic_nms |
bool
|
If True, perform class-agnostic NMS (i.e IoU of boxes of different classes is checked). If False NMS is performed separately for each class. |
required |
Returns:
Type | Description |
---|---|
PPYoloEPostPredictionCallback
|
Source code in src/super_gradients/training/models/detection_models/pp_yolo_e/pp_yolo_e.py
122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
|
predict(images, iou=None, conf=None, batch_size=32, fuse_model=True, skip_image_resizing=False, nms_top_k=None, max_predictions=None, multi_label_per_box=None, class_agnostic_nms=None, fp16=True)
Predict an image or a list of images.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
images |
ImageSource
|
Images to predict. |
required |
iou |
Optional[float]
|
(Optional) IoU threshold for the nms algorithm. If None, the default value associated to the training is used. |
None
|
conf |
Optional[float]
|
(Optional) Below the confidence threshold, prediction are discarded. If None, the default value associated to the training is used. |
None
|
batch_size |
int
|
Maximum number of images to process at the same time. |
32
|
fuse_model |
bool
|
If True, create a copy of the model, and fuse some of its layers to increase performance. This increases memory usage. |
True
|
skip_image_resizing |
bool
|
If True, the image processor will not resize the images. |
False
|
nms_top_k |
Optional[int]
|
(Optional) The maximum number of detections to consider for NMS. |
None
|
max_predictions |
Optional[int]
|
(Optional) The maximum number of detections to return. |
None
|
multi_label_per_box |
Optional[bool]
|
(Optional) If True, each anchor can produce multiple labels of different classes. If False, each anchor can produce only one label of the class with the highest score. |
None
|
class_agnostic_nms |
Optional[bool]
|
(Optional) If True, perform class-agnostic NMS (i.e IoU of boxes of different classes is checked). If False NMS is performed separately for each class. |
None
|
fp16 |
bool
|
If True, the model will use mixed precision for inference. |
True
|
Source code in src/super_gradients/training/models/detection_models/pp_yolo_e/pp_yolo_e.py
264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 |
|
predict_webcam(iou=None, conf=None, fuse_model=True, skip_image_resizing=False, nms_top_k=None, max_predictions=None, multi_label_per_box=None, class_agnostic_nms=None, fp16=True)
Predict using webcam.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
iou |
Optional[float]
|
(Optional) IoU threshold for the nms algorithm. If None, the default value associated to the training is used. |
None
|
conf |
Optional[float]
|
(Optional) Below the confidence threshold, prediction are discarded. If None, the default value associated to the training is used. |
None
|
fuse_model |
bool
|
If True, create a copy of the model, and fuse some of its layers to increase performance. This increases memory usage. |
True
|
skip_image_resizing |
bool
|
If True, the image processor will not resize the images. |
False
|
nms_top_k |
Optional[int]
|
(Optional) The maximum number of detections to consider for NMS. |
None
|
max_predictions |
Optional[int]
|
(Optional) The maximum number of detections to return. |
None
|
multi_label_per_box |
Optional[bool]
|
(Optional) If True, each anchor can produce multiple labels of different classes. If False, each anchor can produce only one label of the class with the highest score. |
None
|
class_agnostic_nms |
Optional[bool]
|
(Optional) If True, perform class-agnostic NMS (i.e IoU of boxes of different classes is checked). If False NMS is performed separately for each class. |
None
|
fp16 |
bool
|
If True, use mixed precision for inference. |
True
|
Source code in src/super_gradients/training/models/detection_models/pp_yolo_e/pp_yolo_e.py
308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 |
|
prep_model_for_conversion(input_size=None, **kwargs)
Prepare the model to be converted to ONNX or other frameworks. Typically, this function will freeze the size of layers which is otherwise flexible, replace some modules with convertible substitutes and remove all auxiliary or training related parts.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_size |
Union[tuple, list]
|
[H,W] |
None
|
Source code in src/super_gradients/training/models/detection_models/pp_yolo_e/pp_yolo_e.py
358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 |
|
set_dataset_processing_params(class_names=None, image_processor=None, iou=None, conf=None, nms_top_k=None, max_predictions=None, multi_label_per_box=None, class_agnostic_nms=None)
Set the processing parameters for the dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
class_names |
Optional[List[str]]
|
(Optional) Names of the dataset the model was trained on. |
None
|
image_processor |
Optional[Processing]
|
(Optional) Image processing objects to reproduce the dataset preprocessing used for training. |
None
|
iou |
Optional[float]
|
(Optional) IoU threshold for the nms algorithm |
None
|
conf |
Optional[float]
|
(Optional) Below the confidence threshold, prediction are discarded |
None
|
nms_top_k |
Optional[int]
|
(Optional) The maximum number of detections to consider for NMS. |
None
|
max_predictions |
Optional[int]
|
(Optional) The maximum number of detections to return. |
None
|
multi_label_per_box |
Optional[bool]
|
(Optional) If True, each anchor can produce multiple labels of different classes. If False, each anchor can produce only one label of the class with the highest score. |
None
|
class_agnostic_nms |
Optional[bool]
|
(Optional) If True, perform class-agnostic NMS (i.e IoU of boxes of different classes is checked). If False NMS is performed separately for each class. |
None
|
Source code in src/super_gradients/training/models/detection_models/pp_yolo_e/pp_yolo_e.py
157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 |
|
PPYoloEDecodingModule
Bases: AbstractObjectDetectionDecodingModule
Decoding module for PPYoloE model. This module used only to export model to ONNX/TensorRT and is not used during training.
Takes in the output of the model and returns the decoded boxes in the format Tuple[Tensor, Tensor] * boxes [batch_size, number_boxes, 4], boxes are in format (x1, y1, x2, y2) * scores [batch_size, number_boxes, number_classes]
Source code in src/super_gradients/training/models/detection_models/pp_yolo_e/pp_yolo_e.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
|
__init__(num_pre_nms_predictions=1000)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_pre_nms_predictions |
int
|
Number of predictions to keep before NMS. This is mainly to reject low-confidence predictions and thus reduce the number of boxes to process in NMS. |
1000
|
Source code in src/super_gradients/training/models/detection_models/pp_yolo_e/pp_yolo_e.py
42 43 44 45 46 47 48 49 50 51 52 |
|
forward(inputs)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inputs |
Tuple[Tuple[Tensor, Tensor], Tuple[Tensor, ...]]
|
Tuple [Tensor, Tensor] * boxes [B, N, 4], boxes are in format (x1, y1, x2, y2) * scores [B, N, C] |
required |
Returns:
Type | Description |
---|---|
Tuple[Tensor, Tensor]
|
|
Source code in src/super_gradients/training/models/detection_models/pp_yolo_e/pp_yolo_e.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
|
infer_total_number_of_predictions(predictions)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inputs |
required |
Returns:
Type | Description |
---|---|
int
|
Source code in src/super_gradients/training/models/detection_models/pp_yolo_e/pp_yolo_e.py
87 88 89 90 91 92 93 94 95 96 97 98 99 |
|
PPYOLOEHead
Bases: nn.Module
Source code in src/super_gradients/training/models/detection_models/pp_yolo_e/pp_yolo_head.py
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 |
|
__init__(num_classes, in_channels, activation=nn.SiLU, fpn_strides=(32, 16, 8), grid_cell_scale=5.0, grid_cell_offset=0.5, reg_max=16, eval_size=None, width_mult=1.0)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_classes |
int
|
required | |
in_channels |
Tuple[int, int, int]
|
Number of channels for each feature map (See width_mult) |
required |
activation |
Type[nn.Module]
|
Type of the activation used in module |
nn.SiLU
|
fpn_strides |
Tuple[int, int, int]
|
Output strides of the feature maps from the neck |
(32, 16, 8)
|
grid_cell_scale |
5.0
|
||
grid_cell_offset |
0.5
|
||
reg_max |
16
|
||
eval_size |
Tuple[int, int]
|
(rows, cols) Size of the image for evaluation. Setting this value can be beneficial for inference speed, since anchors will not be regenerated for each forward call. |
None
|
exclude_nms |
required | ||
exclude_post_process |
required | ||
width_mult |
float
|
A scaling factor applied to in_channels in order. |
1.0
|
Source code in src/super_gradients/training/models/detection_models/pp_yolo_e/pp_yolo_head.py
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
|
bias_init_with_prob(prior_prob=0.01)
initialize conv/fc bias value according to a given probability value.
Source code in src/super_gradients/training/models/detection_models/pp_yolo_e/pp_yolo_head.py
15 16 17 18 |
|
generate_anchors_for_grid_cell(feats, fpn_strides, grid_cell_size=5.0, grid_cell_offset=0.5, dtype=torch.float)
Like ATSS, generate anchors based on grid size.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
feats |
Tuple[Tensor, ...]
|
shape[s, (b, c, h, w)] |
required |
fpn_strides |
Tuple[int, ...]
|
shape[s], stride for each scale feature |
required |
grid_cell_size |
float
|
anchor size |
5.0
|
grid_cell_offset |
float
|
The range is between 0 and 1. |
0.5
|
dtype |
torch.dtype
|
Type of the anchors. |
torch.float
|
Returns:
Type | Description |
---|---|
Tuple[Tensor, Tensor, List[int], Tensor]
|
|
Source code in src/super_gradients/training/models/detection_models/pp_yolo_e/pp_yolo_head.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
|
AbstractYoloBackbone
Bases: SupportsReplaceInputChannels
Source code in src/super_gradients/training/models/detection_models/yolo_base.py
294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 |
|
forward(x)
Source code in src/super_gradients/training/models/detection_models/yolo_base.py
300 301 302 303 304 305 306 307 308 309 310 311 312 313 |
|
Concat
Bases: nn.Module
CONCATENATE A LIST OF TENSORS ALONG DIMENSION
Source code in src/super_gradients/training/models/detection_models/yolo_base.py
169 170 171 172 173 174 175 176 177 |
|
DetectX
Bases: nn.Module
Source code in src/super_gradients/training/models/detection_models/yolo_base.py
180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 |
|
__init__(num_classes, stride, activation_func_type, channels, depthwise=False, groups=None, inter_channels=None)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
stride |
np.ndarray
|
strides of each predicting level |
required |
channels |
list
|
input channels into all detecting layers (from all neck layers that will be used for predicting) |
required |
depthwise |
defines conv type in classification and regression branches (Conv or GroupedConvBlock) depthwise is False by default in favor of a usual Conv |
False
|
|
groups |
int
|
num groups in convs in classification and regression branches; if None default groups will be used according to conv type (1 for Conv and depthwise for GroupedConvBlock) |
None
|
inter_channels |
Union[int, List]
|
channels in classification and regression branches; if None channels[0] will be used by default |
None
|
Source code in src/super_gradients/training/models/detection_models/yolo_base.py
181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 |
|
YoloBase
Bases: SgModule
, ExportableObjectDetectionModel
, HasPredict
, SupportsInputShapeCheck
Source code in src/super_gradients/training/models/detection_models/yolo_base.py
470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 |
|
predict(images, iou=None, conf=None, batch_size=32, fuse_model=True, skip_image_resizing=False, fp16=True)
Predict an image or a list of images.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
images |
ImageSource
|
Images to predict. |
required |
iou |
Optional[float]
|
(Optional) IoU threshold for the nms algorithm. If None, the default value associated to the training is used. |
None
|
conf |
Optional[float]
|
(Optional) Below the confidence threshold, prediction are discarded. If None, the default value associated to the training is used. |
None
|
batch_size |
int
|
Maximum number of images to process at the same time. |
32
|
fuse_model |
bool
|
If True, create a copy of the model, and fuse some of its layers to increase performance. This increases memory usage. |
True
|
skip_image_resizing |
bool
|
If True, the image processor will not resize the images. |
False
|
fp16 |
bool
|
If True, use mixed precision for inference. |
True
|
Source code in src/super_gradients/training/models/detection_models/yolo_base.py
577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 |
|
predict_webcam(iou=None, conf=None, fuse_model=True, skip_image_resizing=False, fp16=True)
Predict using webcam.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
iou |
Optional[float]
|
(Optional) IoU threshold for the nms algorithm. If None, the default value associated to the training is used. |
None
|
conf |
Optional[float]
|
(Optional) Below the confidence threshold, prediction are discarded. If None, the default value associated to the training is used. |
None
|
fuse_model |
bool
|
If True, create a copy of the model, and fuse some of its layers to increase performance. This increases memory usage. |
True
|
skip_image_resizing |
bool
|
If True, the image processor will not resize the images. |
False
|
fp16 |
If True, use mixed precision for inference. |
True
|
Source code in src/super_gradients/training/models/detection_models/yolo_base.py
601 602 603 604 605 606 607 608 609 610 611 612 |
|
prep_model_for_conversion(input_size=None, **kwargs)
A method for preparing the Yolo model for conversion to other frameworks (ONNX, CoreML etc)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_size |
Union[tuple, list]
|
expected input size |
None
|
Returns:
Type | Description |
---|---|
Source code in src/super_gradients/training/models/detection_models/yolo_base.py
688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 |
|
set_dataset_processing_params(class_names=None, image_processor=None, iou=None, conf=None)
Set the processing parameters for the dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
class_names |
Optional[List[str]]
|
(Optional) Names of the dataset the model was trained on. |
None
|
image_processor |
Optional[Processing]
|
(Optional) Image processing objects to reproduce the dataset preprocessing used for training. |
None
|
iou |
Optional[float]
|
(Optional) IoU threshold for the nms algorithm |
None
|
conf |
Optional[float]
|
(Optional) Below the confidence threshold, prediction are discarded |
None
|
Source code in src/super_gradients/training/models/detection_models/yolo_base.py
518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 |
|
YoloDarknetBackbone
Bases: AbstractYoloBackbone
, CSPDarknet53
Implements the CSP_Darknet53 module and inherit the forward pass to extract layers indicated in arch_params
Source code in src/super_gradients/training/models/detection_models/yolo_base.py
316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 |
|
YoloHead
Bases: nn.Module
Source code in src/super_gradients/training/models/detection_models/yolo_base.py
373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 |
|
forward(intermediate_output)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
intermediate_output |
A list of the intermediate prediction of layers specified in the self._inter_layer_idx_to_extract from the Backbone |
required |
Source code in src/super_gradients/training/models/detection_models/yolo_base.py
432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 |
|
YoloRegnetBackbone
Bases: AbstractYoloBackbone
, AnyNetX
Implements the Regnet module and inherits the forward pass to extract layers indicated in arch_params
Source code in src/super_gradients/training/models/detection_models/yolo_base.py
334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 |
|
add_spp_to_stage(anynetx_stage, spp_kernels, activation_type)
staticmethod
Add SPP in the end of an AnyNetX Stage
Source code in src/super_gradients/training/models/detection_models/yolo_base.py
359 360 361 362 363 364 365 366 367 |
|
YoloXDecodingModule
Bases: AbstractObjectDetectionDecodingModule
Source code in src/super_gradients/training/models/detection_models/yolo_base.py
767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 |
|
infer_total_number_of_predictions(predictions)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inputs |
required |
Returns:
Type | Description |
---|---|
int
|
Source code in src/super_gradients/training/models/detection_models/yolo_base.py
805 806 807 808 809 810 811 812 813 814 815 |
|
YoloXPostPredictionCallback
Bases: DetectionPostPredictionCallback
Post-prediction callback to decode YoloX model's output and apply Non-Maximum Suppression (NMS) to get the final predictions.
Source code in src/super_gradients/training/models/detection_models/yolo_base.py
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
|
__init__(conf=0.001, iou=0.6, classes=None, nms_type=NMS_Type.ITERATIVE, max_predictions=300, with_confidence=True, class_agnostic_nms=False, multi_label_per_box=True)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
conf |
float
|
confidence threshold |
0.001
|
iou |
float
|
IoU threshold (used in NMS_Type.ITERATIVE) |
0.6
|
classes |
List[int]
|
(optional list) filter by class (used in NMS_Type.ITERATIVE) |
None
|
nms_type |
NMS_Type
|
the type of nms to use (iterative or matrix) |
NMS_Type.ITERATIVE
|
max_predictions |
int
|
maximum number of boxes to output (used in NMS_Type.MATRIX) |
300
|
with_confidence |
bool
|
in NMS, whether to multiply objectness (used in NMS_Type.ITERATIVE) score with class score |
True
|
class_agnostic_nms |
bool
|
indicates how boxes of different classes will be treated during NMS step (used in NMS_Type.ITERATIVE and NMS_Type.MATRIX) True - NMS will be performed on all classes together. False - NMS will be performed on each class separately (default). |
False
|
multi_label_per_box |
bool
|
controls whether to decode multiple labels per box (used in NMS_Type.ITERATIVE) True - each anchor can produce multiple labels of different classes that pass confidence threshold check (default). False - each anchor can produce only one label of the class with the highest score. |
True
|
Source code in src/super_gradients/training/models/detection_models/yolo_base.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 |
|
forward(x, device=None)
Apply NMS to the raw output of the model and keep only top max_predictions
results.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Union[torch.Tensor, Tuple[torch.Tensor, List[torch.Tensor]]]
|
Raw output of the model, with x[0] expected to be a list of Tensors of shape (cx, cy, w, h, confidence, cls0, cls1, ...) |
required |
Returns:
Type | Description |
---|---|
List of Tensors of shape (x1, y1, x2, y2, conf, cls) |
Source code in src/super_gradients/training/models/detection_models/yolo_base.py
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
|
NDFLHeads
Bases: BaseDetectionModule
, SupportsReplaceNumClasses
Source code in src/super_gradients/training/models/detection_models/yolo_nas/dfl_heads.py
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 |
|
__init__(num_classes, in_channels, heads_list, grid_cell_scale=5.0, grid_cell_offset=0.5, reg_max=16, eval_size=None, width_mult=1.0)
Initializes the NDFLHeads module.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_classes |
int
|
Number of detection classes |
required |
in_channels |
Tuple[int, int, int]
|
Number of channels for each feature map (See width_mult) |
required |
grid_cell_scale |
float
|
5.0
|
|
grid_cell_offset |
float
|
0.5
|
|
reg_max |
int
|
Number of bins in the regression head |
16
|
eval_size |
Optional[Tuple[int, int]]
|
(rows, cols) Size of the image for evaluation. Setting this value can be beneficial for inference speed, since anchors will not be regenerated for each forward call. |
None
|
width_mult |
float
|
A scaling factor applied to in_channels. |
1.0
|
Source code in src/super_gradients/training/models/detection_models/yolo_nas/dfl_heads.py
114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 |
|
YoloNASDFLHead
Bases: BaseDetectionModule
, SupportsReplaceNumClasses
Source code in src/super_gradients/training/models/detection_models/yolo_nas/dfl_heads.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
|
__init__(in_channels, inter_channels, width_mult, first_conv_group_size, num_classes, stride, reg_max, cls_dropout_rate=0.0, reg_dropout_rate=0.0)
Initialize the YoloNASDFLHead
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_channels |
int
|
Input channels |
required |
inter_channels |
int
|
Intermediate number of channels |
required |
width_mult |
float
|
Width multiplier |
required |
first_conv_group_size |
int
|
Group size |
required |
num_classes |
int
|
Number of detection classes |
required |
stride |
int
|
Output stride for this head |
required |
reg_max |
int
|
Number of bins in the regression head |
required |
cls_dropout_rate |
float
|
Dropout rate for the classification head |
0.0
|
reg_dropout_rate |
float
|
Dropout rate for the regression head |
0.0
|
Source code in src/super_gradients/training/models/detection_models/yolo_nas/dfl_heads.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
|
YoloNASPANNeckWithC2
Bases: BaseDetectionModule
A PAN (path aggregation network) neck with 4 stages (2 up-sampling and 2 down-sampling stages) where the up-sampling stages include a higher resolution skip Returns outputs of neck stage 2, stage 3, stage 4
Source code in src/super_gradients/training/models/detection_models/yolo_nas/panneck.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
|
__init__(in_channels, neck1, neck2, neck3, neck4)
Initialize the PAN neck
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_channels |
List[int]
|
Input channels of the 4 feature maps from the backbone |
required |
neck1 |
Union[str, HpmStruct, DictConfig]
|
First neck stage config |
required |
neck2 |
Union[str, HpmStruct, DictConfig]
|
Second neck stage config |
required |
neck3 |
Union[str, HpmStruct, DictConfig]
|
Third neck stage config |
required |
neck4 |
Union[str, HpmStruct, DictConfig]
|
Fourth neck stage config |
required |
Source code in src/super_gradients/training/models/detection_models/yolo_nas/panneck.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
|
YoloNAS
Bases: ExportableObjectDetectionModel
, SupportsInputShapeCheck
, CustomizableDetector
Export to ONNX/TRT Support matrix ONNX files generated with PyTorch 2.0.1 for ONNX opset_version=14
Batch Size | Export Engine | Format | OnnxRuntime 1.13.1 | TensorRT 8.4.2 | TensorRT 8.5.3 | TensorRT 8.6.1 |
---|---|---|---|---|---|---|
1 | ONNX | Flat | Yes | Yes | Yes | Yes |
>1 | ONNX | Flat | Yes | No | No | No |
1 | ONNX | Batch | Yes | No | Yes | Yes |
>1 | ONNX | Batch | Yes | No | No | Yes |
1 | TensorRT | Flat | No | No | Yes | Yes |
>1 | TensorRT | Flat | No | No | Yes | Yes |
1 | TensorRT | Batch | No | Yes | Yes | Yes |
>1 | TensorRT | Batch | No | Yes | Yes | Yes |
Source code in src/super_gradients/training/models/detection_models/yolo_nas/yolo_nas_variants.py
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
|
get_post_prediction_callback(*, conf, iou, nms_top_k, max_predictions, multi_label_per_box, class_agnostic_nms)
Get a post prediction callback for this model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
conf |
float
|
A minimum confidence threshold for predictions to be used in post-processing. |
required |
iou |
float
|
A IoU threshold for boxes non-maximum suppression. |
required |
nms_top_k |
int
|
The maximum number of detections to consider for NMS. |
required |
max_predictions |
int
|
The maximum number of detections to return. |
required |
multi_label_per_box |
bool
|
If True, each anchor can produce multiple labels of different classes. If False, each anchor can produce only one label of the class with the highest score. |
required |
class_agnostic_nms |
bool
|
If True, perform class-agnostic NMS (i.e IoU of boxes of different classes is checked). If False NMS is performed separately for each class. |
required |
Returns:
Type | Description |
---|---|
PPYoloEPostPredictionCallback
|
Source code in src/super_gradients/training/models/detection_models/yolo_nas/yolo_nas_variants.py
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
|
YoloNASDecodingModule
Bases: AbstractObjectDetectionDecodingModule
Source code in src/super_gradients/training/models/detection_models/yolo_nas/yolo_nas_variants.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
|
infer_total_number_of_predictions(predictions)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inputs |
required |
Returns:
Type | Description |
---|---|
int
|
Source code in src/super_gradients/training/models/detection_models/yolo_nas/yolo_nas_variants.py
36 37 38 39 40 41 42 43 44 45 46 47 48 |
|
SequentialWithIntermediates
Bases: nn.Sequential
A Sequential module that can return all intermediate values as a list of Tensors
Source code in src/super_gradients/training/models/detection_models/yolo_nas/yolo_stages.py
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
|
YoloNASBottleneck
Bases: nn.Module
A bottleneck block for YoloNAS. Consists of two consecutive blocks and optional residual connection.
Source code in src/super_gradients/training/models/detection_models/yolo_nas/yolo_stages.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
|
__init__(input_channels, output_channels, block_type, activation_type, shortcut, use_alpha, drop_path_rate=0.0)
Initialize the YoloNASBottleneck block
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_channels |
int
|
Number of input channels |
required |
output_channels |
int
|
Number of output channels |
required |
block_type |
Type[nn.Module]
|
Type of the convolutional block |
required |
activation_type |
Type[nn.Module]
|
Activation type for the convolutional block |
required |
shortcut |
bool
|
If True, adds the residual connection from input to output. |
required |
use_alpha |
bool
|
If True, adds the learnable alpha parameter (multiplier for the residual connection). |
required |
drop_path_rate |
float
|
Drop path rate for the residual path of the block |
0.0
|
Source code in src/super_gradients/training/models/detection_models/yolo_nas/yolo_stages.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
|
YoloNASCSPLayer
Bases: nn.Module
Cross-stage layer module for YoloNAS.
Source code in src/super_gradients/training/models/detection_models/yolo_nas/yolo_stages.py
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
|
__init__(in_channels, out_channels, num_bottlenecks, block_type, activation_type, shortcut=True, use_alpha=True, expansion=0.5, hidden_channels=None, concat_intermediates=False, drop_path_rates=None, dropout_rate=0.0)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_channels |
int
|
Number of input channels. |
required |
out_channels |
int
|
Number of output channels. |
required |
num_bottlenecks |
int
|
Number of bottleneck blocks. |
required |
block_type |
Type[nn.Module]
|
Bottleneck block type. |
required |
activation_type |
Type[nn.Module]
|
Activation type for all blocks. |
required |
shortcut |
bool
|
If True, adds the residual connection from input to output. |
True
|
use_alpha |
bool
|
If True, adds the learnable alpha parameter (multiplier for the residual connection). |
True
|
expansion |
float
|
If hidden_channels is None, hidden_channels is set to in_channels * expansion. |
0.5
|
hidden_channels |
int
|
If not None, sets the number of hidden channels used inside the bottleneck blocks. |
None
|
concat_intermediates |
bool
|
False
|
|
drop_path_rates |
Union[Iterable[float], None]
|
List of drop path probabilities for each bottleneck block. Must have the length equal to the num_bottlenecks or None. |
None
|
dropout_rate |
float
|
Dropout probability before the last convolution in this layer. |
0.0
|
Source code in src/super_gradients/training/models/detection_models/yolo_nas/yolo_stages.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 |
|
YoloNASDownStage
Bases: BaseDetectionModule
Source code in src/super_gradients/training/models/detection_models/yolo_nas/yolo_stages.py
335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 |
|
__init__(in_channels, out_channels, width_mult, num_blocks, depth_mult, activation_type, hidden_channels=None, concat_intermediates=False, drop_path_rates=None, dropout_rate=0.0)
Initializes a YoloNASDownStage.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_channels |
List[int]
|
Number of input channels. |
required |
out_channels |
int
|
Number of output channels. |
required |
width_mult |
float
|
Multiplier for the number of channels in the stage. |
required |
num_blocks |
int
|
Number of blocks in the stage. |
required |
depth_mult |
float
|
Multiplier for the number of blocks in the stage. |
required |
activation_type |
Type[nn.Module]
|
Type of activation to use inside the blocks. |
required |
hidden_channels |
int
|
If not None, sets the number of hidden channels used inside the bottleneck blocks. |
None
|
concat_intermediates |
bool
|
False
|
Source code in src/super_gradients/training/models/detection_models/yolo_nas/yolo_stages.py
337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 |
|
YoloNASStage
Bases: BaseDetectionModule
A single stage module for YoloNAS. It consists of a downsample block (QARepVGGBlock) followed by YoloNASCSPLayer.
Source code in src/super_gradients/training/models/detection_models/yolo_nas/yolo_stages.py
183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 |
|
__init__(in_channels, out_channels, num_blocks, activation_type, hidden_channels=None, concat_intermediates=False, drop_path_rates=None, dropout_rate=0.0, stride=2)
Initialize the YoloNASStage module
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_channels |
int
|
Number of input channels |
required |
out_channels |
int
|
Number of output channels |
required |
num_blocks |
int
|
Number of bottleneck blocks in the YoloNASCSPLayer |
required |
activation_type |
Type[nn.Module]
|
Activation type for all blocks |
required |
hidden_channels |
int
|
If not None, sets the number of hidden channels used inside the bottleneck blocks. |
None
|
concat_intermediates |
bool
|
If True, concatenates the intermediate values from the YoloNASCSPLayer. |
False
|
drop_path_rates |
Union[Iterable[float], None]
|
List of drop path probabilities for each bottleneck block. Must have the length equal to the num_blocks or None. |
None
|
dropout_rate |
float
|
Dropout probability before the last convolution in this layer. |
0.0
|
Source code in src/super_gradients/training/models/detection_models/yolo_nas/yolo_stages.py
189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 |
|
YoloNASStem
Bases: BaseDetectionModule
, SupportsReplaceInputChannels
Stem module for YoloNAS. Consists of a single QARepVGGBlock with stride of two.
Source code in src/super_gradients/training/models/detection_models/yolo_nas/yolo_stages.py
153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 |
|
__init__(in_channels, out_channels, stride=2)
Initialize the YoloNASStem module
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_channels |
int
|
Number of input channels |
required |
out_channels |
int
|
Number of output channels |
required |
Source code in src/super_gradients/training/models/detection_models/yolo_nas/yolo_stages.py
159 160 161 162 163 164 165 166 167 |
|
YoloNASUpStage
Bases: BaseDetectionModule
Upsampling stage for YoloNAS.
Source code in src/super_gradients/training/models/detection_models/yolo_nas/yolo_stages.py
238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 |
|
__init__(in_channels, out_channels, width_mult, num_blocks, depth_mult, activation_type, hidden_channels=None, concat_intermediates=False, reduce_channels=False, drop_path_rates=None, dropout_rate=0.0, upsample_mode=UpsampleMode.CONV_TRANSPOSE)
Initialize the YoloNASUpStage module
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_channels |
List[int]
|
Number of input channels |
required |
out_channels |
int
|
Number of output channels |
required |
width_mult |
float
|
Multiplier for the number of channels in the stage. |
required |
num_blocks |
int
|
Number of bottleneck blocks |
required |
depth_mult |
float
|
Multiplier for the number of blocks in the stage. |
required |
activation_type |
Type[nn.Module]
|
Activation type for all blocks |
required |
hidden_channels |
int
|
If not None, sets the number of hidden channels used inside the bottleneck blocks |
None
|
concat_intermediates |
bool
|
False
|
|
reduce_channels |
bool
|
False
|
Source code in src/super_gradients/training/models/detection_models/yolo_nas/yolo_stages.py
244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 |
|
KDModule
Bases: SgModule
KDModule
class implementing Knowledge Distillation logic as an SgModule
attributes: student: SgModule - the student model teacher: torch.nn.Module- the teacher model run_teacher_on_eval: bool- whether to run self.teacher at eval mode regardless of self.train(mode) arch_params: HpmStruct- Architecture H.P.
Additionally, by passing teacher_input_adapter (torch.nn.Module) one can modify the teacher net to act as if
teacher = torch.nn.Sequential(teacher_input_adapter, teacher). This is useful when teacher net expects a
different input format from the student (for example different normalization).
Equivalent arg for the student model, can be passed through student_input_adapter.
Source code in src/super_gradients/training/models/kd_modules/kd_module.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
|
get(model_name, arch_params=None, num_classes=None, strict_load=StrictLoad.NO_KEY_MATCHING, checkpoint_path=None, pretrained_weights=None, load_backbone=False, download_required_code=True, checkpoint_num_classes=None, num_input_channels=None)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name |
str
|
Defines the model's architecture from models/ALL_ARCHITECTURES |
required |
arch_params |
Optional[dict]
|
Architecture hyper parameters. e.g.: block, num_blocks, etc. |
None
|
num_classes |
Optional[int]
|
Number of classes (defines the net's structure). If None is given, will try to derive from pretrained_weight's corresponding dataset. |
None
|
strict_load |
Union[str, StrictLoad]
|
See super_gradients.common.data_types.enum.strict_load.StrictLoad class documentation for details (default=NO_KEY_MATCHING to suport SG trained checkpoints) |
StrictLoad.NO_KEY_MATCHING
|
checkpoint_path |
Optional[str]
|
The path to the external checkpoint to be loaded. Can be absolute or relative (ie: path/to/checkpoint.pth) path or URL. If provided, will automatically attempt to load the checkpoint. |
None
|
pretrained_weights |
Optional[str]
|
Describe the dataset of the pretrained weights (for example "imagenent"). |
None
|
load_backbone |
bool
|
Load the provided checkpoint to model.backbone instead of model. |
False
|
download_required_code |
bool
|
If model is not found in SG and is downloaded from a remote client, overriding this parameter with False will prevent additional code from being downloaded. This affects only models from remote client. |
True
|
checkpoint_num_classes |
Optional[int]
|
num_classes of checkpoint_path/ pretrained_weights, when checkpoint_path is not None. Used when num_classes != checkpoint_num_class. In this case, the module will be initialized with checkpoint_num_class, then weights will be loaded. Finaly replace_head(new_num_classes=num_classes) is called (useful when wanting to perform transfer learning, from a checkpoint outside of then ones offered in SG model zoo). |
None
|
num_input_channels |
Optional[int]
|
Number of input channels. If None, use the default model's input channels (most likely 3). NOTE: Passing pretrained_weights and checkpoint_path is ill-defined and will raise an error. |
None
|
Source code in src/super_gradients/training/models/model_factory.py
191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 |
|
get_architecture(model_name, arch_params, download_required_code=True, download_platform_weights=True)
Get the corresponding architecture class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name |
str
|
Define the model's architecture from models/ALL_ARCHITECTURES |
required |
arch_params |
HpmStruct
|
Architecture hyper parameters. e.g.: block, num_blocks, etc. |
required |
download_required_code |
bool
|
if model is not found in SG and is downloaded from a remote client, overriding this parameter with False will prevent additional code from being downloaded. This affects only models from remote client. |
True
|
download_platform_weights |
bool
|
bool, when getting a model from the platform, whether to downlaod the pretrained weights as well. In any other case this parameter will be ignored. (default=True). |
True
|
Returns:
Type | Description |
---|---|
Tuple[Type[torch.nn.Module], HpmStruct, str, bool]
|
|
Source code in src/super_gradients/training/models/model_factory.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
|
get_model_name(model)
Get the name of a model loaded by SuperGradients' models.get()
. If the model was not loaded using models.get()
, return None.
Source code in src/super_gradients/training/models/model_factory.py
186 187 188 |
|
instantiate_model(model_name, arch_params, num_classes, pretrained_weights=None, download_required_code=True)
Instantiates nn.Module according to architecture and arch_params, and handles pretrained weights and the required module manipulation (i.e head replacement).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name |
str
|
Define the model's architecture from models/ALL_ARCHITECTURES |
required |
arch_params |
dict
|
Architecture hyper parameters. e.g.: block, num_blocks, etc. |
required |
num_classes |
int
|
Number of classes (defines the net's structure). If None is given, will try to derrive from pretrained_weight's corresponding dataset. |
required |
pretrained_weights |
str
|
Describe the dataset of the pretrained weights (for example "imagenent"). Add |
None
|
download_required_code |
bool
|
if model is not found in SG and is downloaded from a remote client, overriding this parameter with False will prevent additional code from being downloaded. This affects only models from remote client. |
True
|
Returns:
Type | Description |
---|---|
Union[SgModule, torch.nn.Module]
|
Instantiated model i.e torch.nn.Module, architecture_class (will be none when architecture is not str) |
Source code in src/super_gradients/training/models/model_factory.py
97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 |
|
AdaptBlock
Bases: nn.Module
Residual block with deformable convolution
Source code in src/super_gradients/training/models/pose_estimation_models/dekr_hrnet.py
123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 |
|
BasicBlock
Bases: nn.Module
ResNet basic block
Source code in src/super_gradients/training/models/pose_estimation_models/dekr_hrnet.py
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
|
Bottleneck
Bases: nn.Module
ResNet bottleneck block
Source code in src/super_gradients/training/models/pose_estimation_models/dekr_hrnet.py
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
|
DEKRPoseEstimationModel
Bases: SgModule
, HasPredict
Implementation of HRNet model from DEKR paper (https://arxiv.org/abs/2104.02300).
The model takes an image of (B,C,H,W) shape and outputs two tensors (heatmap, offset) as predictions: - heatmap (B, NumJoints+1,H * upsample_factor, W * upsample_factor) - offset (B, NumJoints*2, H * upsample_factor, W * upsample_factor)
Source code in src/super_gradients/training/models/pose_estimation_models/dekr_hrnet.py
297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 |
|
predict(images, conf=None, batch_size=32, fuse_model=True, skip_image_resizing=False, fp16=True)
Predict an image or a list of images.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
images |
ImageSource
|
Images to predict. |
required |
conf |
Optional[float]
|
(Optional) Below the confidence threshold, prediction are discarded. If None, the default value associated to the training is used. |
None
|
batch_size |
int
|
Maximum number of images to process at the same time. |
32
|
fuse_model |
bool
|
If True, create a copy of the model, and fuse some of its layers to increase performance. This increases memory usage. |
True
|
skip_image_resizing |
bool
|
If True, the image processor will not resize the images. |
False
|
fp16 |
bool
|
If True, use mixed precision for inference. |
True
|
Source code in src/super_gradients/training/models/pose_estimation_models/dekr_hrnet.py
630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 |
|
predict_webcam(conf=None, fuse_model=True, skip_image_resizing=False, fp16=True)
Predict using webcam.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
conf |
Optional[float]
|
(Optional) Below the confidence threshold, prediction are discarded. If None, the default value associated to the training is used. |
None
|
fuse_model |
bool
|
If True, create a copy of the model, and fuse some of its layers to increase performance. This increases memory usage. |
True
|
skip_image_resizing |
bool
|
If True, the image processor will not resize the images. |
False
|
Source code in src/super_gradients/training/models/pose_estimation_models/dekr_hrnet.py
652 653 654 655 656 657 658 659 660 661 |
|
set_dataset_processing_params(edge_links, edge_colors, keypoint_colors, image_processor=None, conf=None)
Set the processing parameters for the dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
image_processor |
Optional[Processing]
|
(Optional) Image processing objects to reproduce the dataset preprocessing used for training. |
None
|
conf |
Optional[float]
|
(Optional) Below the confidence threshold, prediction are discarded |
None
|
Source code in src/super_gradients/training/models/pose_estimation_models/dekr_hrnet.py
565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 |
|
DEKRW32NODC
Bases: DEKRPoseEstimationModel
DEKR-W32 model for pose estimation without deformable convolutions.
Source code in src/super_gradients/training/models/pose_estimation_models/dekr_hrnet.py
669 670 671 672 673 674 675 676 677 678 679 680 |
|
PoseRescoringNet
Bases: SgModule
Rescoring network for pose estimation. It takes input features and predicts the single scalar score which is the multiplication factor for original score prediction. This model learns what are the reasonable/possible joint configurations. So it may downweight confidence of impossible joint configurations.
The model is a simple 3-layer MLP with ReLU activation. The input is the concatenation of the predicted poses and prior information in the form of the joint links. See RescoringNet.get_feature() for details. The output is a single scalar value.
Source code in src/super_gradients/training/models/pose_estimation_models/rescoring_net.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
|
forward(poses)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Predicted poses or shape [N, J, 3] or [B, N, J, 3] |
required |
Returns:
Type | Description |
---|---|
Tuple[Tensor, Tensor]
|
Tuple of input poses and corresponding scores |
Source code in src/super_gradients/training/models/pose_estimation_models/rescoring_net.py
39 40 41 42 43 44 45 46 47 48 49 |
|
get_feature(poses, edge_links)
classmethod
Compute the feature vector input to the rescoring network.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
poses |
Tensor
|
[N, J, 3] Predicted poses |
required |
edge_links |
Tensor
|
[L,2] List of joint indices |
required |
Returns:
Type | Description |
---|---|
Tensor
|
[N, L*2+L+J] Feature vector |
Source code in src/super_gradients/training/models/pose_estimation_models/rescoring_net.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
|
YoloNASPoseDFLHead
Bases: BaseDetectionModule
, SupportsReplaceNumClasses
YoloNASPoseDFLHead is the head used in YoloNASPose model. This class implements single-class object detection and keypoints regression on a single scale feature map
Source code in src/super_gradients/training/models/pose_estimation_models/yolo_nas_pose/yolo_nas_pose_dfl_head.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
__init__(in_channels, bbox_inter_channels, pose_inter_channels, pose_regression_blocks, shared_stem, pose_conf_in_class_head, pose_block_use_repvgg, width_mult, first_conv_group_size, num_classes, stride, reg_max, cls_dropout_rate=0.0, reg_dropout_rate=0.0)
Initialize the YoloNASDFLHead
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_channels |
int
|
Input channels |
required |
bbox_inter_channels |
int
|
Intermediate number of channels for box detection & regression |
required |
pose_inter_channels |
int
|
Intermediate number of channels for pose regression |
required |
shared_stem |
bool
|
Whether to share the stem between the pose and bbox heads |
required |
pose_conf_in_class_head |
bool
|
Whether to include the pose confidence in the classification head |
required |
width_mult |
float
|
Width multiplier |
required |
first_conv_group_size |
int
|
Group size |
required |
num_classes |
int
|
Number of keypoints classes for pose regression. Number of detection classes is always 1. |
required |
stride |
int
|
Output stride for this head |
required |
reg_max |
int
|
Number of bins in the regression head |
required |
cls_dropout_rate |
float
|
Dropout rate for the classification head |
0.0
|
reg_dropout_rate |
float
|
Dropout rate for the regression head |
0.0
|
Source code in src/super_gradients/training/models/pose_estimation_models/yolo_nas_pose/yolo_nas_pose_dfl_head.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
|
forward(x)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Input feature map of shape [B, Cin, H, W] |
required |
Returns:
Type | Description |
---|---|
Tuple[Tensor, Tensor, Tensor, Tensor]
|
Tuple of [reg_output, cls_output, pose_regression, pose_logits] - reg_output: Tensor of [B, 4 * (reg_max + 1), H, W] - cls_output: Tensor of [B, 1, H, W] - pose_regression: Tensor of [B, num_classes, 2, H, W] - pose_logits: Tensor of [B, num_classes, H, W] |
Source code in src/super_gradients/training/models/pose_estimation_models/yolo_nas_pose/yolo_nas_pose_dfl_head.py
132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 |
|
YoloNASPoseNDFLHeads
Bases: BaseDetectionModule
, SupportsReplaceNumClasses
Source code in src/super_gradients/training/models/pose_estimation_models/yolo_nas_pose/yolo_nas_pose_ndfl_heads.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 |
|
__init__(num_classes, in_channels, heads_list, grid_cell_scale=5.0, grid_cell_offset=0.5, reg_max=16, inference_mode=False, eval_size=None, width_mult=1.0, pose_offset_multiplier=1.0, compensate_grid_cell_offset=True)
Initializes the NDFLHeads module.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_classes |
int
|
Number of detection classes |
required |
in_channels |
Tuple[int, int, int]
|
Number of channels for each feature map (See width_mult) |
required |
grid_cell_scale |
float
|
A scaling factor applied to the grid cell coordinates. This scaling factor is used to define anchor boxes (see generate_anchors_for_grid_cell). |
5.0
|
grid_cell_offset |
float
|
A fixed offset that is added to the grid cell coordinates. This offset represents a 'center' of the cell and is 0.5 by default. |
0.5
|
reg_max |
int
|
Number of bins in the regression head |
16
|
eval_size |
Optional[Tuple[int, int]]
|
(rows, cols) Size of the image for evaluation. Setting this value can be beneficial for inference speed, since anchors will not be regenerated for each forward call. |
None
|
width_mult |
float
|
A scaling factor applied to in_channels. |
1.0
|
pose_offset_multiplier |
float
|
A scaling factor applied to the pose regression offset. This multiplier is meant to reduce absolute magnitude of weights in pose regression layers. Default value is 1.0. |
1.0
|
compensate_grid_cell_offset |
bool
|
(bool) Controls whether to subtract anchor cell offset from the pose regression. If True, predicted pose coordinates decoded as (offsets + anchors - grid_cell_offset) * stride. If False, predicted pose coordinates decoded as (offsets + anchors) * stride. Default value is True. |
True
|
Source code in src/super_gradients/training/models/pose_estimation_models/yolo_nas_pose/yolo_nas_pose_ndfl_heads.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
|
forward(feats)
Runs the forward for all the underlying heads and concatenate the predictions to a single result.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
feats |
Tuple[Tensor, ...]
|
List of feature maps from the neck of different strides |
required |
Returns:
Type | Description |
---|---|
Union[YoloNasPoseDecodedPredictions, Tuple[YoloNasPoseDecodedPredictions, YoloNasPoseRawOutputs]]
|
Return value depends on the mode: If tracing, a tuple of 4 tensors (decoded predictions) is returned: - pred_bboxes [B, Num Anchors, 4] - Predicted boxes in XYXY format - pred_scores [B, Num Anchors, 1] - Predicted scores for each box - pred_pose_coords [B, Num Anchors, Num Keypoints, 2] - Predicted poses in XY format - pred_pose_scores [B, Num Anchors, Num Keypoints] - Predicted scores for each keypoint In training/eval mode, a tuple of 2 tensors returned: - decoded predictions - they are the same as in tracing mode - raw outputs - a tuple of 8 elements in total, this is needed for training the model. |
Source code in src/super_gradients/training/models/pose_estimation_models/yolo_nas_pose/yolo_nas_pose_ndfl_heads.py
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 |
|
YoloNASPosePostPredictionCallback
Bases: AbstractPoseEstimationPostPredictionCallback
A post-prediction callback for YoloNASPose model. Performs confidence thresholding, Top-K and NMS steps.
Source code in src/super_gradients/training/models/pose_estimation_models/yolo_nas_pose/yolo_nas_pose_post_prediction_callback.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
|
__call__(outputs)
Take YoloNASPose's predictions and decode them into usable pose predictions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
outputs |
Tuple[Tuple[Tensor, Tensor, Tensor, Tensor], ...]
|
Output of the model's forward() method |
required |
Returns:
Type | Description |
---|---|
List[PoseEstimationPredictions]
|
List of decoded predictions for each image in the batch. |
Source code in src/super_gradients/training/models/pose_estimation_models/yolo_nas_pose/yolo_nas_pose_post_prediction_callback.py
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
|
__init__(pose_confidence_threshold, nms_iou_threshold, pre_nms_max_predictions, post_nms_max_predictions)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pose_confidence_threshold |
float
|
Pose detection confidence threshold |
required |
nms_iou_threshold |
float
|
IoU threshold for NMS step. |
required |
pre_nms_max_predictions |
int
|
Number of predictions participating in NMS step |
required |
post_nms_max_predictions |
int
|
Maximum number of boxes to return after NMS step |
required |
Source code in src/super_gradients/training/models/pose_estimation_models/yolo_nas_pose/yolo_nas_pose_post_prediction_callback.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|
YoloNASPose
Bases: CustomizableDetector
, ExportablePoseEstimationModel
, SupportsInputShapeCheck
YoloNASPose model
Exported model support matrix
Batch Size | Format | OnnxRuntime 1.13.1 | TensorRT 8.4.2 | TensorRT 8.5.3 | TensorRT 8.6.1 |
---|---|---|---|---|---|
1 | Flat | Yes | Yes | Yes | Yes |
>1 | Flat | Yes | Yes | Yes | Yes |
1 | Batch | Yes | No | No | Yes |
>1 | Batch | Yes | No | No | Yes |
ONNX files generated with PyTorch 2.0.1 for ONNX opset_version=14
Source code in src/super_gradients/training/models/pose_estimation_models/yolo_nas_pose/yolo_nas_pose_variants.py
93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 |
|
get_input_shape_steps()
Returns the minimum input shape size that the model can accept. For segmentation models the default is 32x32, which corresponds to the largest stride in the encoder part of the model
Source code in src/super_gradients/training/models/pose_estimation_models/yolo_nas_pose/yolo_nas_pose_variants.py
306 307 308 309 310 311 |
|
get_minimum_input_shape_size()
Returns the minimum input shape size that the model can accept. For segmentation models the default is 32x32, which corresponds to the largest stride in the encoder part of the model
Source code in src/super_gradients/training/models/pose_estimation_models/yolo_nas_pose/yolo_nas_pose_variants.py
313 314 315 316 317 318 |
|
predict(images, iou=None, conf=None, pre_nms_max_predictions=None, post_nms_max_predictions=None, batch_size=32, fuse_model=True, skip_image_resizing=False, fp16=True)
Predict an image or a list of images.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
images |
ImageSource
|
Images to predict. |
required |
iou |
Optional[float]
|
(Optional) IoU threshold for the nms algorithm. If None, the default value associated to the training is used. |
None
|
conf |
Optional[float]
|
(Optional) Below the confidence threshold, prediction are discarded. If None, the default value associated to the training is used. |
None
|
batch_size |
int
|
Maximum number of images to process at the same time. |
32
|
fuse_model |
bool
|
If True, create a copy of the model, and fuse some of its layers to increase performance. This increases memory usage. |
True
|
skip_image_resizing |
bool
|
If True, the image processor will not resize the images. |
False
|
fp16 |
bool
|
If True, use mixed precision for inference. |
True
|
Source code in src/super_gradients/training/models/pose_estimation_models/yolo_nas_pose/yolo_nas_pose_variants.py
142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 |
|
predict_webcam(iou=None, conf=None, pre_nms_max_predictions=None, post_nms_max_predictions=None, batch_size=32, fuse_model=True, skip_image_resizing=False, fp16=True)
Predict using webcam.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
iou |
Optional[float]
|
(Optional) IoU threshold for the nms algorithm. If None, the default value associated to the training is used. |
None
|
conf |
Optional[float]
|
(Optional) Below the confidence threshold, prediction are discarded. If None, the default value associated to the training is used. |
None
|
batch_size |
int
|
Maximum number of images to process at the same time. |
32
|
fuse_model |
bool
|
If True, create a copy of the model, and fuse some of its layers to increase performance. This increases memory usage. |
True
|
skip_image_resizing |
bool
|
If True, the image processor will not resize the images. |
False
|
fp16 |
bool
|
If True, use mixed precision for inference. |
True
|
Source code in src/super_gradients/training/models/pose_estimation_models/yolo_nas_pose/yolo_nas_pose_variants.py
176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 |
|
set_dataset_processing_params(edge_links, edge_colors, keypoint_colors, image_processor=None, conf=None, iou=0.7, pre_nms_max_predictions=300, post_nms_max_predictions=100)
Set the processing parameters for the dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
image_processor |
Optional[Processing]
|
(Optional) Image processing objects to reproduce the dataset preprocessing used for training. |
None
|
conf |
Optional[float]
|
(Optional) Below the confidence threshold, prediction are discarded |
None
|
Source code in src/super_gradients/training/models/pose_estimation_models/yolo_nas_pose/yolo_nas_pose_variants.py
280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 |
|
YoloNASPoseDecodingModule
Bases: AbstractPoseEstimationDecodingModule
Source code in src/super_gradients/training/models/pose_estimation_models/yolo_nas_pose/yolo_nas_pose_variants.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
|
forward(inputs)
Decode YoloNASPose model outputs into bounding boxes, confidence scores and pose coordinates and scores
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inputs |
Tuple[Tuple[Tensor, Tensor], Tuple[Tensor, ...]]
|
YoloNASPose model outputs |
required |
Returns:
Type | Description |
---|---|
Tuple of (pred_bboxes, pred_scores, pred_joints) - pred_bboxes: [Batch, num_pre_nms_predictions, 4] Bounding of associated with pose in XYXY format - pred_scores: [Batch, num_pre_nms_predictions, 1] Confidence scores [0..1] for entire pose - pred_joints: [Batch, num_pre_nms_predictions, Num Joints, 3] Joints in (x,y,confidence) format |
Source code in src/super_gradients/training/models/pose_estimation_models/yolo_nas_pose/yolo_nas_pose_variants.py
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
|
infer_total_number_of_predictions(inputs)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inputs |
Any
|
YoloNASPose model outputs |
required |
Returns:
Type | Description |
---|---|
int
|
Source code in src/super_gradients/training/models/pose_estimation_models/yolo_nas_pose/yolo_nas_pose_variants.py
37 38 39 40 41 42 43 44 45 46 47 48 49 |
|
SegmentationHead
Bases: nn.Module
Source code in src/super_gradients/training/models/segmentation_models/common.py
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
replace_num_classes(num_classes)
This method replace the last Conv Classification layer to output a different number of classes. Note that the weights of the new layers are random initiated.
Source code in src/super_gradients/training/models/segmentation_models/common.py
17 18 19 20 21 22 23 |
|
ASPP
Bases: AbstractContextModule
ASPP bottleneck block. Splits the input to len(dilation_list) + 1, (a 1x1 conv) heads of differently dilated convolutions. The different heads will be concatenated and the output channel of each will be the input channel / len(dilation_list) + 1 so as to keep the same output channel as input channel.
Source code in src/super_gradients/training/models/segmentation_models/context_modules.py
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
|
__init__(in_channels, dilation_list, in_out_ratio=1.0, use_bias=False, **kwargs)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dilation_list |
List[int]
|
list of dilation rates, the num of dilation branches should be set so that there is a whole division of the input channels, see assertion below. |
required |
in_out_ratio |
float
|
output / input num of channels ratio. |
1.0
|
use_bias |
bool
|
legacy parameter to support PascalVOC frontier checkpoints that were trained by mistake with extra redundant biases before batchnorm operators. should be set to |
False
|
Source code in src/super_gradients/training/models/segmentation_models/context_modules.py
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 |
|
SPPM
Bases: AbstractContextModule
Simple Pyramid Pooling context Module.
Source code in src/super_gradients/training/models/segmentation_models/context_modules.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
|
__init__(in_channels, inter_channels, out_channels, pool_sizes, upsample_mode=UpsampleMode.BILINEAR, align_corners=False)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inter_channels |
int
|
num channels in each pooling branch. |
required |
out_channels |
int
|
The number of output channels after pyramid pooling module. |
required |
pool_sizes |
List[Union[int, Tuple[int, int]]]
|
spatial output sizes of the pooled feature maps. |
required |
Source code in src/super_gradients/training/models/segmentation_models/context_modules.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
|
prep_model_for_conversion(input_size, stride_ratio=32, **kwargs)
Replace Global average pooling with fixed kernels Average pooling, since dynamic kernel sizes are not supported
when compiling to ONNX: Unsupported: ONNX export of operator adaptive_avg_pool2d, input size not accessible.
Source code in src/super_gradients/training/models/segmentation_models/context_modules.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
|
DAPPMBranch
Bases: nn.Module
Source code in src/super_gradients/training/models/segmentation_models/ddrnet.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
|
__init__(kernel_size, stride, in_planes, branch_planes, inter_mode='bilinear')
A DAPPM branch
Parameters:
Name | Type | Description | Default |
---|---|---|---|
kernel_size |
int
|
the kernel size for the average pooling when stride=0 this parameter is omitted and AdaptiveAvgPool2d over all the input is performed |
required |
stride |
int
|
stride for the average pooling when stride=0: an AdaptiveAvgPool2d over all the input is performed (output is 1x1) when stride=1: no average pooling is performed when stride>1: average polling is performed (scaling the input down and up again) |
required |
in_planes |
int
|
required | |
branch_planes |
int
|
width after the the first convolution |
required |
inter_mode |
str
|
interpolation mode for upscaling |
'bilinear'
|
Source code in src/super_gradients/training/models/segmentation_models/ddrnet.py
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
|
forward(x)
All branches of the DAPPM but the first one receive the output of the previous branch as a second input
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
in branch 0 - the original input of the DAPPM. in other branches - a list containing the original input and the output of the previous branch. |
required |
Source code in src/super_gradients/training/models/segmentation_models/ddrnet.py
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
|
DDRBackBoneBase
Bases: nn.Module
, SupportsReplaceInputChannels
, ABC
A base class defining functions that must be supported by DDRBackBones
Source code in src/super_gradients/training/models/segmentation_models/ddrnet.py
193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 |
|
get_backbone_output_number_of_channels()
Return a dictionary of the shapes of each output of the backbone to determine the in_channels of the skip and compress layers
Source code in src/super_gradients/training/models/segmentation_models/ddrnet.py
201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 |
|
DDRNet
Bases: SegmentationModule
, ExportableSegmentationModel
Source code in src/super_gradients/training/models/segmentation_models/ddrnet.py
271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 |
|
backbone
property
Create a fake backbone module to load backbone pre-trained weights.
__init__(backbone, additional_layers, upscale_module, num_classes, highres_planes, spp_width, head_width, use_aux_heads=False, ssp_inter_mode='bilinear', segmentation_inter_mode='bilinear', skip_block=None, layer5_block=Bottleneck, layer5_bottleneck_expansion=2, classification_mode=False, spp_kernel_sizes=[1, 5, 9, 17, 0], spp_strides=[1, 2, 4, 8, 0], layer3_repeats=1)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
backbone |
DDRBackBoneBase.__class__
|
the low resolution branch of DDR, expected to have specific attributes in the class |
required |
additional_layers |
list
|
list of num blocks for the highres stage and layer5 |
required |
upscale_module |
nn.Module
|
upscale to use in the backbone (DAPPM and Segmentation head are using bilinear interpolation) |
required |
num_classes |
int
|
number of classes |
required |
highres_planes |
int
|
number of channels in the high resolution net |
required |
use_aux_heads |
bool
|
add a second segmentation head (fed from after compress3 + upscale). this head can be used during training (see paper https://arxiv.org/pdf/2101.06085.pdf for details) |
False
|
ssp_inter_mode |
str
|
the interpolation used in the SPP block |
'bilinear'
|
segmentation_inter_mode |
str
|
the interpolation used in the segmentation head |
'bilinear'
|
skip_block |
nn.Module.__class__
|
allows specifying a different block (from 'block') for the skip layer |
None
|
layer5_block |
nn.Module.__class__
|
type of block to use in layer5 and layer5_skip |
Bottleneck
|
layer5_bottleneck_expansion |
int
|
determines the expansion rate for Bottleneck block |
2
|
spp_kernel_sizes |
list
|
list of kernel sizes for the spp module pooling |
[1, 5, 9, 17, 0]
|
spp_strides |
list
|
list of strides for the spp module pooling |
[1, 2, 4, 8, 0]
|
layer3_repeats |
int
|
number of times to repeat the 3rd stage of ddr model, including the paths interchange modules. |
1
|
Source code in src/super_gradients/training/models/segmentation_models/ddrnet.py
272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 |
|
initialize_param_groups(lr, training_params)
Custom param groups for training:
- Different lr for backbone and the rest, if multiply_head_lr
key is in training_params
.
Source code in src/super_gradients/training/models/segmentation_models/ddrnet.py
505 506 507 508 509 510 511 512 513 514 515 516 |
|
DDRNetCustom
Bases: DDRNet
Source code in src/super_gradients/training/models/segmentation_models/ddrnet.py
547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 |
|
__init__(arch_params)
Parse arch_params and translate the parameters to build the original DDRNet architecture
Source code in src/super_gradients/training/models/segmentation_models/ddrnet.py
548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 |
|
RegnetDDRBackBone
Bases: DDRBackBoneBase
Translation of Regnet to fit DDR model
Source code in src/super_gradients/training/models/segmentation_models/ddrnet.py
244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 |
|
SegmentHead
Bases: nn.Module
Source code in src/super_gradients/training/models/segmentation_models/ddrnet.py
140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 |
|
__init__(in_planes, inter_planes, out_planes, scale_factor, inter_mode='bilinear')
Last stage of the segmentation network. Reduces the number of output planes (usually to num_classes) while increasing the size by scale_factor
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_planes |
int
|
width of input |
required |
inter_planes |
int
|
width of internal conv. must be a multiple of scale_factor^2 when inter_mode=pixel_shuffle |
required |
out_planes |
int
|
output width |
required |
scale_factor |
int
|
scaling factor |
required |
inter_mode |
str
|
one of nearest, linear, bilinear, bicubic, trilinear, area or pixel_shuffle. when set to pixel_shuffle, an nn.PixelShuffle will be used for scaling |
'bilinear'
|
Source code in src/super_gradients/training/models/segmentation_models/ddrnet.py
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 |
|
UpscaleOnline
Bases: nn.Module
In some cases the required scale/size for the scaling is known only when the input is received. This class support such cases. only the interpolation mode is set in advance.
Source code in src/super_gradients/training/models/segmentation_models/ddrnet.py
179 180 181 182 183 184 185 186 187 188 189 190 |
|
DDRNet39Backbone
Bases: DDRNet39
A somewhat frankenstein version of the DDRNet39 model that tries to be a feature extractor module.
Source code in src/super_gradients/training/models/segmentation_models/ddrnet_backbones.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
|
LadderBottleneck
Bases: nn.Module
ResNet Bottleneck
Source code in src/super_gradients/training/models/segmentation_models/laddernet.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
|
LadderResNet
Bases: nn.Module
Dilated Pre-trained ResNet Model, which preduces the stride of 8 featuremaps at conv5.
Parameters
block : Block
Class for the residual block. Options are BasicBlockV1, BottleneckV1.
layers : list of int
Numbers of layers in each block
classes : int, default 1000
Number of classification classes.
dilated : bool, default False
Applying dilation strategy to pretrained ResNet yielding a stride-8 model,
typically used in Semantic Segmentation.
norm_layer : object
Normalization layer used in backbone network (default: :class:mxnet.gluon.nn.BatchNorm
;
for Synchronized Cross-GPU BachNormalization).
Reference:
- He, Kaiming, et al. "Deep residual learning for image recognition."
Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
- Yu, Fisher, and Vladlen Koltun. "Multi-scale context aggregation by dilated convolutions."
Source code in src/super_gradients/training/models/segmentation_models/laddernet.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
|
conv3x3(in_planes, out_planes, stride=1)
3x3 convolution with padding
Source code in src/super_gradients/training/models/segmentation_models/laddernet.py
243 244 245 |
|
PPLiteSegBase
Bases: SegmentationModule
The PP_LiteSeg implementation based on PaddlePaddle. The original article refers to "Juncai Peng, Yi Liu, Shiyu Tang, Yuying Hao, Lutao Chu, Guowei Chen, Zewu Wu, Zeyu Chen, Zhiliang Yu, Yuning Du, Qingqing Dang,Baohua Lai, Qiwen Liu, Xiaoguang Hu, Dianhai Yu, Yanjun Ma. PP-LiteSeg: A Superior Real-Time Semantic Segmentation Model. https://arxiv.org/abs/2204.02681".
Source code in src/super_gradients/training/models/segmentation_models/ppliteseg.py
144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 |
|
backbone: nn.Module
property
Support SG load backbone when training.
__init__(num_classes, backbone, projection_channels_list, sppm_inter_channels, sppm_out_channels, sppm_pool_sizes, sppm_upsample_mode, align_corners, decoder_up_factors, decoder_channels, decoder_upsample_mode, head_scale_factor, head_upsample_mode, head_mid_channels, dropout, use_aux_heads, aux_hidden_channels, aux_scale_factors)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
backbone |
AbstractSTDCBackbone
|
Backbone nn.Module should implement the abstract class |
required |
projection_channels_list |
List[int]
|
channels list to project encoder features before fusing with the decoder stream. |
required |
sppm_inter_channels |
int
|
num channels in each sppm pooling branch. |
required |
sppm_out_channels |
int
|
The number of output channels after sppm module. |
required |
sppm_pool_sizes |
List[int]
|
spatial output sizes of the pooled feature maps. |
required |
sppm_upsample_mode |
Union[UpsampleMode, str]
|
Upsample mode to original size after pooling. |
required |
decoder_up_factors |
List[int]
|
list upsample factor per decoder stage. |
required |
decoder_channels |
List[int]
|
list of num_channels per decoder stage. |
required |
decoder_upsample_mode |
Union[UpsampleMode, str]
|
upsample mode in decoder stages, see UpsampleMode for valid options. |
required |
head_scale_factor |
int
|
scale factor for final the segmentation head logits. |
required |
head_upsample_mode |
Union[UpsampleMode, str]
|
upsample mode to final prediction sizes, see UpsampleMode for valid options. |
required |
head_mid_channels |
int
|
num of hidden channels in segmentation head. |
required |
use_aux_heads |
bool
|
set True when training, output extra Auxiliary feature maps from the encoder module. |
required |
aux_hidden_channels |
List[int]
|
List of hidden channels in auxiliary segmentation heads. |
required |
aux_scale_factors |
List[int]
|
list of uppsample factors for final auxiliary heads logits. |
required |
Source code in src/super_gradients/training/models/segmentation_models/ppliteseg.py
153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 |
|
initialize_param_groups(lr, training_params)
Custom param groups for training:
- Different lr for backbone and the rest, if multiply_head_lr
key is in training_params
.
Source code in src/super_gradients/training/models/segmentation_models/ppliteseg.py
261 262 263 264 265 266 267 268 269 270 271 272 |
|
PPLiteSegDecoder
Bases: nn.Module
PPLiteSegDecoder using UAFM blocks to fuse feature maps.
Source code in src/super_gradients/training/models/segmentation_models/ppliteseg.py
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
|
PPLiteSegEncoder
Bases: nn.Module
, SupportsReplaceInputChannels
Encoder for PPLiteSeg, include backbone followed by a context module.
Source code in src/super_gradients/training/models/segmentation_models/ppliteseg.py
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
|
UAFM
Bases: nn.Module
Unified Attention Fusion Module, which uses mean and max values across the spatial dimensions.
Source code in src/super_gradients/training/models/segmentation_models/ppliteseg.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
|
__init__(in_channels, skip_channels, out_channels, up_factor, upsample_mode=UpsampleMode.BILINEAR, align_corners=False)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_channels |
int
|
num_channels of input feature map. |
required |
skip_channels |
int
|
num_channels of skip connection feature map. |
required |
out_channels |
int
|
num out channels after features fusion. |
required |
up_factor |
int
|
upsample scale factor of the input feature map. |
required |
upsample_mode |
Union[UpsampleMode, str]
|
see UpsampleMode for valid options. |
UpsampleMode.BILINEAR
|
Source code in src/super_gradients/training/models/segmentation_models/ppliteseg.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
|
forward(x, skip)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
input feature map to upsample before fusion. |
required | |
skip |
skip connection feature map. |
required |
Source code in src/super_gradients/training/models/segmentation_models/ppliteseg.py
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
|
Implementation of paper: "Rethink Dilated Convolution for Real-time Semantic Segmentation", https://arxiv.org/pdf/2111.09957.pdf Based on original implementation: https://github.com/RolandGao/RegSeg, cloned 23/12/2021, commit c07a833
AdaptiveShortcutBlock
Bases: nn.Module
Adaptive shortcut makes the following adaptations, if needed: Applying pooling if stride > 1 Applying 1x1 conv if in/out channels are different or if pooling was applied If stride is 1 and in/out channels are the same, then the shortcut is just an identity
Source code in src/super_gradients/training/models/segmentation_models/regseg.py
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
|
DBlock
Bases: nn.Module
Source code in src/super_gradients/training/models/segmentation_models/regseg.py
125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 |
|
__init__(in_channels, out_channels, dilations, group_width, stride, se_ratio=4)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dilations |
List[int]
|
a list specifying the required dilations. the input will be split into len(dilations) groups, group [i] will be convolved with grouped dilated (dilations[i]) convolution |
required |
group_width |
int
|
the group width for the dilated convolution(s) |
required |
se_ratio |
int
|
the ratio of the squeeze-and-excitation block w.r.t in_channels (as in the paper) for example: a value of 4 translates to in_channels // 4 |
4
|
Source code in src/super_gradients/training/models/segmentation_models/regseg.py
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
|
RegSegDecoder
Bases: nn.Module
This implementation follows the paper. No 'pattern' in this decoder, so it is specific to 3 stages
Source code in src/super_gradients/training/models/segmentation_models/regseg.py
174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 |
|
SplitDilatedGroupConvBlock
Bases: nn.Module
Splits the input to "dilation groups", following grouped convolution with different dilation for each group
Source code in src/super_gradients/training/models/segmentation_models/regseg.py
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
|
__init__(in_channels, split_dilations, group_width_per_split, stride, bias)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
split_dilations |
List[int]
|
a list specifying the required dilations. the input will be split into len(dilations) groups, group [i] will be convolved with grouped dilated (dilations[i]) convolution |
required |
group_width_per_split |
int
|
the group width for the inner dilated convolution |
required |
Source code in src/super_gradients/training/models/segmentation_models/regseg.py
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
|
EfficientSelfAttention
Bases: nn.Module
Source code in src/super_gradients/training/models/segmentation_models/segformer.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
|
__init__(dim, head, sr_ratio)
Efficient self-attention (https://arxiv.org/pdf/2105.15203.pdf)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dim |
int
|
embedding dimension |
required |
head |
int
|
number of attention heads |
required |
sr_ratio |
int
|
the reduction ratio of the efficient self-attention |
required |
Source code in src/super_gradients/training/models/segmentation_models/segformer.py
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
|
EncoderBlock
Bases: nn.Module
Source code in src/super_gradients/training/models/segmentation_models/segformer.py
135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
|
__init__(dim, head, sr_ratio, dpr)
A single encoder block (https://arxiv.org/pdf/2105.15203.pdf)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dim |
int
|
embedding dimension |
required |
head |
int
|
number of attention heads |
required |
sr_ratio |
int
|
the reduction ratio of the efficient self-attention |
required |
dpr |
float
|
drop-path ratio |
required |
Source code in src/super_gradients/training/models/segmentation_models/segformer.py
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 |
|
MLP
Bases: nn.Module
Source code in src/super_gradients/training/models/segmentation_models/segformer.py
269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 |
|
__init__(dim, embed_dim)
A single Linear layer, with shape pre-processing
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dim |
int
|
input dimension |
required |
embed_dim |
int
|
output dimension |
required |
Source code in src/super_gradients/training/models/segmentation_models/segformer.py
270 271 272 273 274 275 276 277 278 279 |
|
MiTBackBone
Bases: nn.Module
, SupportsReplaceInputChannels
Source code in src/super_gradients/training/models/segmentation_models/segformer.py
163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 |
|
__init__(embed_dims, encoder_layers, eff_self_att_reduction_ratio, eff_self_att_heads, overlap_patch_size, overlap_patch_stride, overlap_patch_pad, in_channels)
Mixed Transformer backbone encoder (https://arxiv.org/pdf/2105.15203.pdf)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
embed_dims |
List[int]
|
the patch embedding dimensions (number of output channels in each encoder stage) |
required |
encoder_layers |
List[int]
|
the number of encoder layers in each encoder stage |
required |
eff_self_att_reduction_ratio |
List[int]
|
the reduction ratios of the efficient self-attention in each stage |
required |
eff_self_att_heads |
List[int]
|
number of efficient self-attention heads in each stage |
required |
overlap_patch_size |
List[int]
|
the patch size of the overlapping patch embedding in each stage |
required |
overlap_patch_stride |
List[int]
|
the patch stride of the overlapping patch embedding in each stage |
required |
overlap_patch_pad |
List[int]
|
the patch padding of the overlapping patch embedding in each stage |
required |
in_channels |
int
|
number of input channels |
required |
Source code in src/super_gradients/training/models/segmentation_models/segformer.py
164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 |
|
MixFFN
Bases: nn.Module
Source code in src/super_gradients/training/models/segmentation_models/segformer.py
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
|
__init__(in_dim, inter_dim)
MixFFN block (https://arxiv.org/pdf/2105.15203.pdf)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_dim |
int
|
input dimension |
required |
inter_dim |
int
|
intermediate dimension |
required |
Source code in src/super_gradients/training/models/segmentation_models/segformer.py
109 110 111 112 113 114 115 116 117 118 119 120 |
|
PatchEmbedding
Bases: nn.Module
Source code in src/super_gradients/training/models/segmentation_models/segformer.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
|
__init__(in_channels, out_channels, patch_size, stride, padding)
Overlapped patch merging (https://arxiv.org/pdf/2105.15203.pdf)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_channels |
int
|
number of input channels |
required |
out_channels |
int
|
number of output channels (embedding dimension) |
required |
patch_size |
int
|
patch size (k for size (k, k)) |
required |
stride |
int
|
patch stride (k for size (k, k)) |
required |
padding |
int
|
patch padding (k for size (k, k)) |
required |
Source code in src/super_gradients/training/models/segmentation_models/segformer.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
|
SegFormer
Bases: SegmentationModule
Source code in src/super_gradients/training/models/segmentation_models/segformer.py
325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 |
|
__init__(num_classes, encoder_embed_dims, encoder_layers, eff_self_att_reduction_ratio, eff_self_att_heads, decoder_embed_dim, overlap_patch_size, overlap_patch_stride, overlap_patch_pad, in_channels=3, sliding_window_crop_size=(1024, 1024), sliding_window_stride=(768, 768))
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_classes |
int
|
number of classes |
required |
encoder_embed_dims |
List[int]
|
the patch embedding dimensions (number of output channels in each encoder stage) |
required |
encoder_layers |
List[int]
|
the number of encoder layers in each encoder stage |
required |
eff_self_att_reduction_ratio |
List[int]
|
the reduction ratios of the efficient self-attention in each stage |
required |
eff_self_att_heads |
List[int]
|
number of efficient self-attention heads in each stage |
required |
overlap_patch_size |
List[int]
|
the patch size of the overlapping patch embedding in each stage |
required |
overlap_patch_stride |
List[int]
|
the patch stride of the overlapping patch embedding in each stage |
required |
overlap_patch_pad |
List[int]
|
the patch padding of the overlapping patch embedding in each stage |
required |
in_channels |
int
|
number of input channels |
3
|
sliding_window_crop_size |
Tuple[int, int]
|
(height, width) the crop size to take from the image for forward with sliding window |
(1024, 1024)
|
sliding_window_stride |
Tuple[int, int]
|
(height, width) the stride size between crops for forward with sliding window |
(768, 768)
|
Source code in src/super_gradients/training/models/segmentation_models/segformer.py
326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 |
|
initialize_param_groups(lr, training_params)
Custom param groups for training:
- Different lr for backbone and the rest, if multiply_head_lr
key is in training_params
.
Source code in src/super_gradients/training/models/segmentation_models/segformer.py
435 436 437 438 439 440 441 442 443 444 445 446 |
|
SegFormerB0
Bases: SegFormerCustom
Source code in src/super_gradients/training/models/segmentation_models/segformer.py
536 537 538 539 540 541 542 543 544 545 546 |
|
__init__(arch_params)
SegFormer B0 architecture
Parameters:
Name | Type | Description | Default |
---|---|---|---|
arch_params |
HpmStruct
|
architecture parameters |
required |
Source code in src/super_gradients/training/models/segmentation_models/segformer.py
538 539 540 541 542 543 544 545 546 |
|
SegFormerB1
Bases: SegFormerCustom
Source code in src/super_gradients/training/models/segmentation_models/segformer.py
549 550 551 552 553 554 555 556 557 558 559 |
|
__init__(arch_params)
SegFormer B1 architecture
Parameters:
Name | Type | Description | Default |
---|---|---|---|
arch_params |
HpmStruct
|
architecture parameters |
required |
Source code in src/super_gradients/training/models/segmentation_models/segformer.py
551 552 553 554 555 556 557 558 559 |
|
SegFormerB2
Bases: SegFormerCustom
Source code in src/super_gradients/training/models/segmentation_models/segformer.py
562 563 564 565 566 567 568 569 570 571 572 |
|
__init__(arch_params)
SegFormer B2 architecture
Parameters:
Name | Type | Description | Default |
---|---|---|---|
arch_params |
HpmStruct
|
architecture parameters |
required |
Source code in src/super_gradients/training/models/segmentation_models/segformer.py
564 565 566 567 568 569 570 571 572 |
|
SegFormerB3
Bases: SegFormerCustom
Source code in src/super_gradients/training/models/segmentation_models/segformer.py
575 576 577 578 579 580 581 582 583 584 585 |
|
__init__(arch_params)
SegFormer B3 architecture
Parameters:
Name | Type | Description | Default |
---|---|---|---|
arch_params |
HpmStruct
|
architecture parameters |
required |
Source code in src/super_gradients/training/models/segmentation_models/segformer.py
577 578 579 580 581 582 583 584 585 |
|
SegFormerB4
Bases: SegFormerCustom
Source code in src/super_gradients/training/models/segmentation_models/segformer.py
588 589 590 591 592 593 594 595 596 597 598 |
|
__init__(arch_params)
SegFormer B4 architecture
Parameters:
Name | Type | Description | Default |
---|---|---|---|
arch_params |
HpmStruct
|
architecture parameters |
required |
Source code in src/super_gradients/training/models/segmentation_models/segformer.py
590 591 592 593 594 595 596 597 598 |
|
SegFormerB5
Bases: SegFormerCustom
Source code in src/super_gradients/training/models/segmentation_models/segformer.py
601 602 603 604 605 606 607 608 609 610 611 |
|
__init__(arch_params)
SegFormer B5 architecture
Parameters:
Name | Type | Description | Default |
---|---|---|---|
arch_params |
HpmStruct
|
architecture parameters |
required |
Source code in src/super_gradients/training/models/segmentation_models/segformer.py
603 604 605 606 607 608 609 610 611 |
|
SegFormerCustom
Bases: SegFormer
Source code in src/super_gradients/training/models/segmentation_models/segformer.py
477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 |
|
__init__(arch_params)
Parse arch_params and translate the parameters to build the SegFormer architecture
Parameters:
Name | Type | Description | Default |
---|---|---|---|
arch_params |
HpmStruct
|
architecture parameters |
required |
Source code in src/super_gradients/training/models/segmentation_models/segformer.py
478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 |
|
SegFormerHead
Bases: nn.Module
Source code in src/super_gradients/training/models/segmentation_models/segformer.py
288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 |
|
__init__(encoder_dims, embed_dim, num_classes)
SegFormer decoder head (https://arxiv.org/pdf/2105.15203.pdf)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
encoder_dims |
List[int]
|
list of encoder embedding dimensions |
required |
embed_dim |
int
|
unified embedding dimension |
required |
num_classes |
int
|
number of predicted classes |
required |
Source code in src/super_gradients/training/models/segmentation_models/segformer.py
289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 |
|
SegmentationModule
Bases: SgModule
, ABC
, HasPredict
, SupportsInputShapeCheck
, ExportableSegmentationModel
Base SegmentationModule class
Source code in src/super_gradients/training/models/segmentation_models/segmentation_module.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
|
backbone: nn.Module
abstractmethod
property
For SgTrainer load_backbone compatibility.
get_input_shape_steps()
Returns the minimum input shape size that the model can accept. For segmentation models the default is 32x32, which corresponds to the largest stride in the encoder part of the model
Source code in src/super_gradients/training/models/segmentation_models/segmentation_module.py
127 128 129 130 131 132 |
|
get_minimum_input_shape_size()
Returns the minimum input shape size that the model can accept. For segmentation models the default is 32x32, which corresponds to the largest stride in the encoder part of the model
Source code in src/super_gradients/training/models/segmentation_models/segmentation_module.py
134 135 136 137 138 139 |
|
predict(images, batch_size=32, fuse_model=True, fp16=True)
Predict an image or a list of images.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
images |
ImageSource
|
Images to predict. |
required |
batch_size |
int
|
Maximum number of images to process at the same time. |
32
|
fuse_model |
bool
|
If True, create a copy of the model, and fuse some of its layers to increase performance. This increases memory usage. |
True
|
fp16 |
bool
|
If True, use mixed precision for inference. |
True
|
Source code in src/super_gradients/training/models/segmentation_models/segmentation_module.py
109 110 111 112 113 114 115 116 117 |
|
predict_webcam(fuse_model=True, fp16=True)
Predict using webcam.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fuse_model |
bool
|
If True, create a copy of the model, and fuse some of its layers to increase performance. This increases memory usage. |
True
|
fp16 |
bool
|
If True, use mixed precision for inference. |
True
|
Source code in src/super_gradients/training/models/segmentation_models/segmentation_module.py
119 120 121 122 123 124 125 |
|
set_dataset_processing_params(class_names=None, image_processor=None)
Set the processing parameters for the dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
class_names |
Optional[List[str]]
|
(Optional) Names of the dataset the model was trained on. |
None
|
image_processor |
Optional[Processing]
|
(Optional) Image processing objects to reproduce the dataset preprocessing used for training. |
None
|
Source code in src/super_gradients/training/models/segmentation_models/segmentation_module.py
76 77 78 79 80 81 82 83 84 85 86 87 88 |
|
Shelfnet
paper: https://arxiv.org/abs/1811.11254 based on: https://github.com/juntang-zhuang/ShelfNet
DecoderHW
Bases: DecoderBase
DecoderHW - The Decoder for the Heavy-Weight ShelfNet Architecture
Source code in src/super_gradients/training/models/segmentation_models/shelfnet.py
184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 |
|
DecoderLW
Bases: DecoderBase
DecoderLW - The Decoder for the Light-Weight ShelfNet Architecture
Source code in src/super_gradients/training/models/segmentation_models/shelfnet.py
217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 |
|
LadderBlockHW
Bases: LadderBlockBase
LadderBlockHW - LadderBlock for the Heavy-Weight ShelfNet Architecture
Source code in src/super_gradients/training/models/segmentation_models/shelfnet.py
306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 |
|
LadderBlockLW
Bases: LadderBlockBase
LadderBlockLW - LadderBlock for the Light-Weight ShelfNet Architecture
Source code in src/super_gradients/training/models/segmentation_models/shelfnet.py
352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 |
|
ShelfBlock
Bases: nn.Module
Source code in src/super_gradients/training/models/segmentation_models/shelfnet.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
|
__init__(in_planes, planes, stride=1, dropout=0.25)
S-Block implementation from the ShelfNet paper :param in_planes: input planes :param planes: output planes :param stride: convolution stride :param dropout: dropout percentage
Source code in src/super_gradients/training/models/segmentation_models/shelfnet.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
|
ShelfNetBase
Bases: ShelfNetModuleBase
ShelfNetBase - ShelfNet Base Generic Architecture
Source code in src/super_gradients/training/models/segmentation_models/shelfnet.py
414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 |
|
update_param_groups(param_groups, lr, epoch, iter, training_params, total_batch)
update_optimizer_for_param_groups - Updates the specific parameters with different LR
Source code in src/super_gradients/training/models/segmentation_models/shelfnet.py
451 452 453 454 455 456 457 458 459 460 461 |
|
ShelfNetHW
Bases: ShelfNetBase
ShelfNetHW - Heavy-Weight Version of ShelfNet
Source code in src/super_gradients/training/models/segmentation_models/shelfnet.py
470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 |
|
initialize_param_groups(lr, training_params)
initialize_optimizer_for_model_param_groups - Initializes the weights of the optimizer Initializes the Backbone, the Output and the Auxilary Head differently :param optimizer_cls: The nn.optim (optimizer class) to initialize :param lr: lr to set for the optimizer :param training_params: :return: list of dictionaries with named params and optimizer attributes
Source code in src/super_gradients/training/models/segmentation_models/shelfnet.py
529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 |
|
ShelfNetLW
Bases: ShelfNetBase
ShelfNetLW - Light-Weight Implementation for ShelfNet
Source code in src/super_gradients/training/models/segmentation_models/shelfnet.py
561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 |
|
initialize_param_groups(lr, training_params)
initialize_optimizer_for_model_param_groups - Initializes the optimizer group params, with 10x learning rate for all but the backbone
:param lr: lr to set for the backbone
:param training_params:
:return: list of dictionaries with named params and optimizer attributes
Source code in src/super_gradients/training/models/segmentation_models/shelfnet.py
613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 |
|
ShelfNetModuleBase
Bases: SgModule
ShelfNetModuleBase - Base class for the different Modules of the ShelfNet Architecture
Source code in src/super_gradients/training/models/segmentation_models/shelfnet.py
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
|
ShelfResNetBackBone
Bases: ResNet
ShelfResNetBackBone - A class that Inherits from the original ResNet class and manipulates the forward pass, to create a backbone for the ShelfNet architecture
Source code in src/super_gradients/training/models/segmentation_models/shelfnet.py
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
|
Implementation of paper: "Rethinking BiSeNet For Real-time Semantic Segmentation", https://arxiv.org/abs/2104.13188 Based on original implementation: https://github.com/MichaelFan01/STDC-Seg, cloned 23/08/2021, commit 59ff37f
AbstractSTDCBackbone
Bases: nn.Module
, SupportsReplaceInputChannels
, ABC
All backbones for STDC segmentation models must implement this class.
Source code in src/super_gradients/training/models/segmentation_models/stdc.py
111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
|
get_backbone_output_number_of_channels()
abstractmethod
Returns:
Type | Description |
---|---|
List[int]
|
list on stages num channels. |
Source code in src/super_gradients/training/models/segmentation_models/stdc.py
120 121 122 123 124 125 |
|
AttentionRefinementModule
Bases: nn.Module
AttentionRefinementModule to apply on the last two backbone stages.
Source code in src/super_gradients/training/models/segmentation_models/stdc.py
296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 |
|
ContextEmbedding
Bases: nn.Module
ContextEmbedding module that use global average pooling to 1x1 to extract context information, and then upsample to original input size.
Source code in src/super_gradients/training/models/segmentation_models/stdc.py
349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 |
|
ContextPath
Bases: nn.Module
ContextPath in STDC output both the Spatial path and Context path. This module include a STDCBackbone and output the stage3 feature map with down_ratio = 8 as the spatial feature map, and context feature map which is a result of upsampling and fusion of context embedding, stage5 and stage4 after Arm modules, Which is also with same resolution of the spatial feature map, down_ration = 8.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
backbone |
AbstractSTDCBackbone
|
Backbone of type AbstractSTDCBackbone that return info about backbone output channels. |
required |
fuse_channels |
int
|
num channels of the fused context path. |
required |
use_aux_heads |
bool
|
set True when training, output extra Auxiliary feature maps of the two last stages of the backbone. |
required |
Source code in src/super_gradients/training/models/segmentation_models/stdc.py
377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 |
|
CustomSTDCSegmentation
Bases: STDCSegmentationBase
Fully customized STDC Segmentation factory module.
Source code in src/super_gradients/training/models/segmentation_models/stdc.py
714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 |
|
FeatureFusionModule
Bases: nn.Module
Fuse features from higher resolution aka, spatial feature map with features from lower resolution with high semantic information aka, context feature map.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
spatial_channels |
int
|
num channels of input from spatial path. |
required |
context_channels |
int
|
num channels of input from context path. |
required |
out_channels |
int
|
num channels of feature fusion module. |
required |
Source code in src/super_gradients/training/models/segmentation_models/stdc.py
316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 |
|
STDCBackbone
Bases: AbstractSTDCBackbone
Source code in src/super_gradients/training/models/segmentation_models/stdc.py
128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 |
|
__init__(block_types, ch_widths, num_blocks, stdc_steps=4, stdc_downsample_mode='avg_pool', in_channels=3, out_down_ratios=(32))
Parameters:
Name | Type | Description | Default |
---|---|---|---|
block_types |
list
|
list of block type for each stage, supported |
required |
ch_widths |
list
|
list of output num of channels for each stage. |
required |
num_blocks |
list
|
list of the number of repeating blocks in each stage. |
required |
stdc_steps |
int
|
num of convs steps in each block. |
4
|
stdc_downsample_mode |
str
|
downsample mode in stdc block, supported |
'avg_pool'
|
in_channels |
int
|
num channels of the input image. |
3
|
out_down_ratios |
Union[tuple, list]
|
down ratio of output feature maps required from the backbone, default (32,) for classification. |
(32)
|
Source code in src/super_gradients/training/models/segmentation_models/stdc.py
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 |
|
STDCBlock
Bases: nn.Module
STDC building block, known as Short Term Dense Concatenate module. In STDC module, the kernel size of first block is 1, and the rest of them are simply set as 3.
Source code in src/super_gradients/training/models/segmentation_models/stdc.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
|
__init__(in_channels, out_channels, steps, stdc_downsample_mode, stride)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
steps |
int
|
The total number of convs in this module, 1 conv 1x1 and (steps - 1) conv3x3. |
required |
stdc_downsample_mode |
str
|
downsample mode in stdc block, supported |
required |
Source code in src/super_gradients/training/models/segmentation_models/stdc.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
|
STDCClassificationBase
Bases: SgModule
Base module for classification model based on STDCs backbones
Source code in src/super_gradients/training/models/segmentation_models/stdc.py
237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 |
|
STDCSegmentationBase
Bases: SgModule
, HasPredict
, SupportsInputShapeCheck
, ExportableSegmentationModel
Base STDC Segmentation Module.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
backbone |
AbstractSTDCBackbone
|
Backbone of type AbstractSTDCBackbone that return info about backbone output channels. |
required |
num_classes |
int
|
num of dataset classes, exclude ignore label. |
required |
context_fuse_channels |
int
|
num of output channels in ContextPath ARM feature fusion. |
required |
ffm_channels |
int
|
num of output channels of Feature Fusion Module. |
required |
aux_head_channels |
int
|
Num of hidden channels in Auxiliary segmentation heads. |
required |
detail_head_channels |
int
|
Num of hidden channels in Detail segmentation heads. |
required |
use_aux_heads |
bool
|
set True when training, attach Auxiliary and Detail heads. For compilation / inference mode set False. |
required |
dropout |
float
|
segmentation heads dropout. |
required |
Source code in src/super_gradients/training/models/segmentation_models/stdc.py
442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 |
|
backbone
property
For Trainer load_backbone compatibility.
initialize_param_groups(lr, training_params)
Custom param groups for STDC training:
- Different lr for context path and heads, if multiply_head_lr
key is in training_params
.
- Add extra Detail loss params to optimizer.
Source code in src/super_gradients/training/models/segmentation_models/stdc.py
601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 |
|
predict(images, batch_size=32, fuse_model=True, fp16=True)
Predict an image or a list of images.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
images |
ImageSource
|
Images to predict. |
required |
batch_size |
int
|
Maximum number of images to process at the same time. |
32
|
fuse_model |
bool
|
If True, create a copy of the model, and fuse some of its layers to increase performance. This increases memory usage. |
True
|
fp16 |
bool
|
If True, use mixed precision for inference. |
True
|
Source code in src/super_gradients/training/models/segmentation_models/stdc.py
681 682 683 684 685 686 687 688 689 |
|
predict_webcam(fuse_model=True, fp16=True)
Predict using webcam.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fuse_model |
bool
|
If True, create a copy of the model, and fuse some of its layers to increase performance. This increases memory usage. |
True
|
fp16 |
bool
|
If True, use mixed precision for inference. |
True
|
Source code in src/super_gradients/training/models/segmentation_models/stdc.py
691 692 693 694 695 696 697 |
|
prep_model_for_conversion(input_size=None, **kwargs)
Prepare model for conversion, force use_aux_heads mode False and delete auxiliary and detail heads. Replace ContextEmbeddingOnline which cause compilation issues and not supported in some compilations, to ContextEmbeddingFixedSize.
Source code in src/super_gradients/training/models/segmentation_models/stdc.py
502 503 504 505 506 507 508 509 510 511 |
|
set_dataset_processing_params(class_names=None, image_processor=None)
Set the processing parameters for the dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
class_names |
Optional[List[str]]
|
(Optional) Names of the dataset the model was trained on. |
None
|
image_processor |
Optional[Processing]
|
(Optional) Image processing objects to reproduce the dataset preprocessing used for training. |
None
|
Source code in src/super_gradients/training/models/segmentation_models/stdc.py
649 650 651 652 653 654 655 656 657 658 659 660 661 |
|
UNet
Bases: UNetCustom
implementation of: "U-Net: Convolutional Networks for Biomedical Image Segmentation", https://arxiv.org/pdf/1505.04597.pdf The upsample operation is done by using bilinear interpolation which is reported to show better results.
Source code in src/super_gradients/training/models/segmentation_models/unet/unet.py
224 225 226 227 228 229 230 231 232 233 234 |
|
UNetBase
Bases: SegmentationModule
Source code in src/super_gradients/training/models/segmentation_models/unet/unet.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 |
|
__init__(num_classes, use_aux_heads, final_upsample_factor, head_hidden_channels, head_upsample_mode, align_corners, backbone_params, context_module, decoder_params, aux_heads_params, dropout)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_classes |
int
|
num classes to predict. |
required |
use_aux_heads |
bool
|
Whether to use auxiliary heads. |
required |
final_upsample_factor |
int
|
Final upsample scale factor after the segmentation head. |
required |
head_hidden_channels |
Optional[int]
|
num channels before the last classification layer. see |
required |
head_upsample_mode |
Union[UpsampleMode, str]
|
UpsampleMode of segmentation and auxiliary heads. |
required |
align_corners |
bool
|
align_corners arg of segmentation and auxiliary heads. |
required |
backbone_params |
dict
|
params to build a |
required |
decoder_params |
dict
|
params to build a |
required |
aux_heads_params |
dict
|
params to initiate auxiliary heads, include the following keys: - use_aux_list: List[bool], whether to append to auxiliary head per encoder stage. - aux_heads_factor: List[int], Upsample factor per encoder stage. - aux_hidden_channels: List[int], Hidden num channels before last classification layer, per encoder stage. - aux_out_channels: List[int], Output channels, can be refers as num_classes, of auxiliary head per encoder stage. |
required |
dropout |
float
|
dropout probability of segmentation and auxiliary heads. |
required |
Source code in src/super_gradients/training/models/segmentation_models/unet/unet.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
|
init_aux_heads(in_channels_list, use_aux_list, aux_heads_factor, aux_hidden_channels, aux_out_channels, dropout, upsample_mode, align_corners=None)
staticmethod
Parameters:
Name | Type | Description | Default |
---|---|---|---|
use_aux_list |
List[bool]
|
whether to append to auxiliary head per encoder stage. |
required |
in_channels_list |
List[int]
|
list of input channels to the auxiliary segmentation heads. |
required |
aux_heads_factor |
List[int]
|
list of upsample scale factors to apply at the end of the auxiliary segmentation heads. |
required |
aux_hidden_channels |
List[int]
|
list of segmentation heads hidden channels. |
required |
aux_out_channels |
List[int]
|
list of segmentation heads out channels, usually set as num_classes or 1 for detail edge heads. |
required |
dropout |
float
|
dropout probability factor. |
required |
upsample_mode |
Union[str, UpsampleMode]
|
see UpsampleMode for supported options. |
required |
Returns:
Type | Description |
---|---|
nn.ModuleList |
Source code in src/super_gradients/training/models/segmentation_models/unet/unet.py
106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
|
initialize_param_groups(lr, training_params)
Custom param groups for training:
- Different lr for head and rest, if multiply_head_lr
key is in training_params
.
Source code in src/super_gradients/training/models/segmentation_models/unet/unet.py
159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
AbstractUpFuseBlock
Bases: nn.Module
, ABC
Abstract class for upsample and fuse UNet decoder building block.
Source code in src/super_gradients/training/models/segmentation_models/unet/unet_decoder.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
|
__init__(in_channels, skip_channels, out_channels, **kwargs)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_channels |
int
|
num_channels of the feature map to be upsample. |
required |
skip_channels |
int
|
num_channels of the skip feature map from higher resolution. |
required |
out_channels |
int
|
num_channels of the output features. |
required |
Source code in src/super_gradients/training/models/segmentation_models/unet/unet_decoder.py
21 22 23 24 25 26 27 |
|
validate_upsample_mode(in_channels, up_factor, upsample_mode, fallback_mode=None)
staticmethod
Validate whether the upsample_mode is supported, and returns the upsample path output channels.
Returns:
Type | Description |
---|---|
Tuple[Union[UpsampleMode, str], int]
|
tuple of upsample_mode and out_channels of the upsample module |
Source code in src/super_gradients/training/models/segmentation_models/unet/unet_decoder.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
|
Decoder
Bases: nn.Module
Source code in src/super_gradients/training/models/segmentation_models/unet/unet_decoder.py
163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 |
|
__init__(skip_channels_list, up_block_repeat_list, skip_expansion, decoder_scale, up_block_types, is_skip_list, min_decoder_channels=1, **up_block_kwargs)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
skip_channels_list |
List[int]
|
num_channels list of skip feature maps from the encoder. |
required |
up_block_repeat_list |
List[int]
|
|
required |
skip_expansion |
float
|
skip expansion ratio value, before fusing the skip features from the encoder with the decoder features, a projection convolution is applied upon the encoder features to project the num_channels by skip_expansion as follows: `num_channels = skip_channels * skip_expansion |
required |
decoder_scale |
float
|
num_channels width ratio between encoder stages and decoder stages. |
required |
min_decoder_channels |
int
|
The minimum num_channels of decoder stages. Useful i.e if we want to keep the width above the num of classes. The num_channels of a decoder stage is determined as follows: |
1
|
up_block_types |
List[Type[AbstractUpFuseBlock]]
|
list of AbstractUpFuseBlock. |
required |
is_skip_list |
List[bool]
|
List of flags whether to use feature-map from encoder stage as skip connection or not. Used to not apply projection convolutions if a certain encoder feature is not aggregate with the decoder. |
required |
up_block_kwargs |
init parameters for fuse blocks. |
{}
|
Source code in src/super_gradients/training/models/segmentation_models/unet/unet_decoder.py
164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 |
|
UpCatBlock
Bases: AbstractUpFuseBlock
Fuse features with concatenation and followed Convolutions.
Source code in src/super_gradients/training/models/segmentation_models/unet/unet_decoder.py
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
|
UpFactorBlock
Bases: AbstractUpFuseBlock
Ignore Skip features, simply apply upsampling and ConvBNRelu layers.
Source code in src/super_gradients/training/models/segmentation_models/unet/unet_decoder.py
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
|
UpSumBlock
Bases: AbstractUpFuseBlock
Fuse features with concatenation and followed Convolutions.
Source code in src/super_gradients/training/models/segmentation_models/unet/unet_decoder.py
125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
|
AbstractUNetBackbone
Bases: nn.Module
, ABC
All backbones for UNet segmentation models must implement this class.
Source code in src/super_gradients/training/models/segmentation_models/unet/unet_encoder.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
|
forward(x)
Returns:
Type | Description |
---|---|
List[torch.Tensor]
|
list of skip features from different resolutions to be fused by the decoder. |
Source code in src/super_gradients/training/models/segmentation_models/unet/unet_encoder.py
48 49 50 51 52 |
|
get_all_number_of_channels()
abstractmethod
Returns:
Type | Description |
---|---|
List[int]
|
list of stages num channels. |
Source code in src/super_gradients/training/models/segmentation_models/unet/unet_encoder.py
41 42 43 44 45 46 |
|
get_backbone_output_number_of_channels()
abstractmethod
Returns:
Type | Description |
---|---|
List[int]
|
list of stages num channels. |
Source code in src/super_gradients/training/models/segmentation_models/unet/unet_encoder.py
34 35 36 37 38 39 |
|
BackboneStage
Bases: nn.Module
, ABC
BackboneStage abstract class to define a stage in UnetBackbone. Each stage include blocks which their amounts is
defined by num_blocks
.
Source code in src/super_gradients/training/models/segmentation_models/unet/unet_encoder.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
|
ConvBaseStage
Bases: BackboneStage
, ABC
Base single conv block implementation, such as, Conv, QARepVGG, and RepVGG stages.
Optionally support different downsample strategy, anti_alias
with the AntiAliasDownsample
and max_pool
with
the nn.MaxPool2d
module.
Source code in src/super_gradients/training/models/segmentation_models/unet/unet_encoder.py
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
|
ConvStage
Bases: ConvBaseStage
Conv stage with ConvBNReLU as building block.
Source code in src/super_gradients/training/models/segmentation_models/unet/unet_encoder.py
216 217 218 219 220 221 222 223 |
|
Encoder
Bases: nn.Module
Source code in src/super_gradients/training/models/segmentation_models/unet/unet_encoder.py
275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 |
|
get_output_number_of_channels()
Return list of encoder output channels, which is backbone output channels and context module output channels in case the context module return different num of channels.
Source code in src/super_gradients/training/models/segmentation_models/unet/unet_encoder.py
286 287 288 289 290 291 292 293 294 |
|
QARepVGGStage
Bases: ConvBaseStage
QARepVGG stage with QARepVGGBlock as building block.
Source code in src/super_gradients/training/models/segmentation_models/unet/unet_encoder.py
166 167 168 169 170 171 172 173 |
|
RegnetXStage
Bases: BackboneStage
RegNetX stage with XBlock as building block.
Source code in src/super_gradients/training/models/segmentation_models/unet/unet_encoder.py
176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 |
|
RepVGGStage
Bases: ConvBaseStage
RepVGG stage with RepVGGBlock as building block.
Source code in src/super_gradients/training/models/segmentation_models/unet/unet_encoder.py
156 157 158 159 160 161 162 163 |
|
STDCStage
Bases: BackboneStage
STDC stage with STDCBlock as building block.
Source code in src/super_gradients/training/models/segmentation_models/unet/unet_encoder.py
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
|
assert_divisible_channels(num_channels, steps)
staticmethod
STDC block refactors the convolution operator by applying several smaller convolution with num of filters that
decrease w.r.t the num of steps. The ratio to the smallest num of channels is 2 ** (steps - 1)
,
thus this method assert that the stage num of channels is divisible by the above ratio.
Source code in src/super_gradients/training/models/segmentation_models/unet/unet_encoder.py
97 98 99 100 101 102 103 104 105 106 107 108 109 |
|
build_stage(in_channels, out_channels, stride, num_blocks, steps, stdc_downsample_mode, **kwargs)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
steps |
int
|
The total number of convs in this module, 1 conv 1x1 and (steps - 1) conv3x3. |
required |
stdc_downsample_mode |
str
|
downsample mode in stdc block, supported |
required |
Returns:
Type | Description |
---|---|
Source code in src/super_gradients/training/models/segmentation_models/unet/unet_encoder.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
|
SgModule
Bases: nn.Module
, SupportsReplaceInputChannels
, SupportsFineTune
Source code in src/super_gradients/training/models/sg_module.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
|
get_exclude_attributes()
This function is used by the EMA. When updating the EMA model, some attributes of the main model (used in training) are updated to the EMA model along with the model weights. By default, all attributes are updated except for private attributes (starting with '_') You can either set include_attributes or exclude_attributes. By returning a non empty list from this function, you override the default behaviour and attributes named in this list will also be excluded from update. Note: if get_include_attributes is not empty, it will override this list. :return: list of attributes to not update from main model to EMA mode
Source code in src/super_gradients/training/models/sg_module.py
39 40 41 42 43 44 45 46 47 48 49 |
|
get_finetune_lr_dict(lr)
Returns a dictionary, mapping lr to the unfrozen part of the network, in the same fashion as using initial_lr in trianing_params when calling Trainer.train(). For example: def get_finetune_lr_dict(self, lr: float) -> Dict[str, float]: return {"default": 0, "head": lr}
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lr |
float
|
float, learning rate for the part of the network to be tuned. |
required |
Returns:
Type | Description |
---|---|
Dict[str, float]
|
learning rate mapping that can be used by super_gradients.training.utils.optimizer_utils.initialize_param_groups |
Source code in src/super_gradients/training/models/sg_module.py
67 68 69 70 71 72 73 74 75 76 77 78 79 |
|
get_include_attributes()
This function is used by the EMA. When updating the EMA model, some attributes of the main model (used in training) are updated to the EMA model along with the model weights. By default, all attributes are updated except for private attributes (starting with '_') You can either set include_attributes or exclude_attributes. By returning a non empty list from this function, you override the default behaviour and only attributes named in this list will be updated. Note: This will also override the get_exclude_attributes list. :return: list of attributes to update from main model to EMA model
Source code in src/super_gradients/training/models/sg_module.py
27 28 29 30 31 32 33 34 35 36 37 |
|
initialize_param_groups(lr, training_params)
Returns:
Type | Description |
---|---|
list
|
list of dictionaries containing the key 'named_params' with a list of named params |
Source code in src/super_gradients/training/models/sg_module.py
10 11 12 13 14 15 |
|
prep_model_for_conversion(input_size=None, **kwargs)
Prepare the model to be converted to ONNX or other frameworks. Typically, this function will freeze the size of layers which is otherwise flexible, replace some modules with convertible substitutes and remove all auxiliary or training related parts.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_size |
Union[tuple, list]
|
[H,W] |
None
|
Source code in src/super_gradients/training/models/sg_module.py
51 52 53 54 55 56 57 |
|
replace_head(**kwargs)
Replace final layer for pretrained models. Since this varies between architectures, we leave it to the inheriting class to implement.
Source code in src/super_gradients/training/models/sg_module.py
59 60 61 62 63 64 65 |
|
update_param_groups(param_groups, lr, epoch, iter, training_params, total_batch)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
param_groups |
list
|
list of dictionaries containing the params |
required |
Returns:
Type | Description |
---|---|
list
|
list of dictionaries containing the params |
Source code in src/super_gradients/training/models/sg_module.py
17 18 19 20 21 22 23 24 25 |
|