Sg loggers
AbstractSGLogger
Bases: ABC
A SGLogger handles all outputs of the training process. Every generated file, log, metrics value, image or other artifacts produced by the trainer will be processed and saved.
Inheriting SGLogger can be used in order to integrate experiment management framework, special storage setting, a specific logging library etc.
Important: The BaseSGLogger class (inheriting from SGLogger) is used by the trainer by default. When defining your own SGLogger you will override all default output functionality. No files will saved to disk and no data will be collected. Make sure you either implement this functionality or use SGLoggers.Compose([BaseSGLogger(...), YourSGLogger(...)]) to build on top of it.
Source code in V3_2/src/super_gradients/common/sg_loggers/abstract_sg_logger.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 |
|
add(tag, obj, global_step=None)
abstractmethod
A generic function for adding any type of data to the SGLogger. By default, this function is not called by the Trainer, BaseSGLogger does nothing with this type of data. But if you need to pass a data type which is not supported by any of the following abstract methods, use this method.
Source code in V3_2/src/super_gradients/common/sg_loggers/abstract_sg_logger.py
23 24 25 26 27 28 29 30 |
|
add_checkpoint(tag, state_dict, global_step=None)
abstractmethod
Add a checkpoint to SGLogger Typically, this function will write a torch file to disk, upload it to remote storage or to experiment management framework.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tag |
str
|
Data identifier |
required |
state_dict |
dict
|
the state dict to save. The state dict includes more than just the model weight and may include any of: net: model weights acc: current accuracy (depends on metrics) epoch: current epoch optimizer_state_dict: optimizer state scaler_state_dict: torch.amp.scaler sate |
required |
global_step |
int
|
Global step value to record |
None
|
Source code in V3_2/src/super_gradients/common/sg_loggers/abstract_sg_logger.py
123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
|
add_config(tag, config)
abstractmethod
Add the configuration (settings and hyperparameters) to the SGLoggers. Typically, this function will add the configuration dictionary to logs, write it to tensorboard, send it to an experiment management framework ect.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tag |
str
|
Data identifier |
required |
config |
dict
|
a dictionary of the experiment config |
required |
Source code in V3_2/src/super_gradients/common/sg_loggers/abstract_sg_logger.py
32 33 34 35 36 37 38 39 40 41 42 |
|
add_file(file_name=None)
abstractmethod
Add a file from the checkpoint directory to the logger (usually, upload the file or adds it to an artifact)
Source code in V3_2/src/super_gradients/common/sg_loggers/abstract_sg_logger.py
140 141 142 143 144 145 |
|
add_histogram(tag, values, bins='auto', global_step=None)
abstractmethod
Add a histogram to SGLogger. Typically, this function will add a histogram to tensorboard or add it to experiment management framework.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tag |
str
|
Data identifier |
required |
values |
Union[torch.Tensor, np.array]
|
Values to build histogram |
required |
bins |
Union[str, np.array, list, int]
|
This determines how the bins are made. If bins is an int, it defines the number of equal-width bins in the given range If bins is a sequence, it defines a monotonically increasing array of bin edges, including the rightmost edge, allowing for non-uniform bin widths. If bins is a string, it defines the method used to calculate the optimal bin width, as defined by https://numpy.org/doc/stable/reference/generated/numpy.histogram_bin_edges.html#numpy.histogram_bin_edges one of [‘sqrt’, ’auto’, ‘fd’, ‘doane’, ‘scott’, ‘stone’...] |
'auto'
|
global_step |
int
|
Global step value to record |
None
|
Source code in V3_2/src/super_gradients/common/sg_loggers/abstract_sg_logger.py
93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
|
add_image(tag, image, data_format='CHW', global_step=None)
abstractmethod
Add a single image to SGLogger. Typically, this function will add an image to tensorboard, save it to disk or add it to experiment management framework.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tag |
str
|
Data identifier |
required |
image |
Union[torch.Tensor, np.array, Image.Image]
|
an image to be added. The values should lie in [0, 255] for type uint8 or [0, 1] for type float. |
required |
data_format |
str
|
Image data format specification of the form CHW, HWC, HW, WH, etc. |
'CHW'
|
global_step |
int
|
Global step value to record |
None
|
Source code in V3_2/src/super_gradients/common/sg_loggers/abstract_sg_logger.py
67 68 69 70 71 72 73 74 75 76 77 78 |
|
add_images(tag, images, data_format='NCHW', global_step=None)
abstractmethod
Add multiple images to SGLogger. Typically, this function will add images to tensorboard, save them to disk or add them to experiment management framework.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tag |
str
|
Data identifier |
required |
images |
Union[torch.Tensor, np.array]
|
images to be added. The values should lie in [0, 255] for type uint8 or [0, 1] for type float. |
required |
data_format |
Image data format specification of the form NCHW, NHWC, NHW, NWH, etc. |
'NCHW'
|
|
global_step |
int
|
Global step value to record |
None
|
Source code in V3_2/src/super_gradients/common/sg_loggers/abstract_sg_logger.py
80 81 82 83 84 85 86 87 88 89 90 91 |
|
add_scalar(tag, scalar_value, global_step=None)
abstractmethod
Add scalar data to SGLogger. Typically, this function will add scalar to tensorboard or other experiment management framework.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tag |
str
|
Data identifier |
required |
scalar_value |
float
|
Value to save |
required |
global_step |
Union[int, TimeUnit]
|
Global step value to record |
None
|
Source code in V3_2/src/super_gradients/common/sg_loggers/abstract_sg_logger.py
44 45 46 47 48 49 50 51 52 53 54 |
|
add_scalars(tag_scalar_dict, global_step=None)
abstractmethod
Adds multiple scalar data to SGLogger. Typically, this function will add scalars to tensorboard or other experiment management framework.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tag_scalar_dict |
dict
|
a dictionary {tag(str): value(float)} of the scalars. |
required |
global_step |
int
|
Global step value to record |
None
|
Source code in V3_2/src/super_gradients/common/sg_loggers/abstract_sg_logger.py
56 57 58 59 60 61 62 63 64 65 |
|
add_text(tag, text_string, global_step=None)
abstractmethod
Add a text to SGLogger. Typically, this function will add a text to tensorboard or add it to experiment management framework.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tag |
str
|
Data identifier |
required |
text_string |
str
|
the text to be added |
required |
global_step |
int
|
Global step value to record |
None
|
Source code in V3_2/src/super_gradients/common/sg_loggers/abstract_sg_logger.py
111 112 113 114 115 116 117 118 119 120 121 |
|
close()
abstractmethod
Close the SGLogger
Source code in V3_2/src/super_gradients/common/sg_loggers/abstract_sg_logger.py
161 162 163 164 165 166 |
|
flush()
abstractmethod
Flush the SGLogger's cache
Source code in V3_2/src/super_gradients/common/sg_loggers/abstract_sg_logger.py
154 155 156 157 158 159 |
|
local_dir()
abstractmethod
A getter for the full/absolute path where all files are saved locally
Returns:
Type | Description |
---|---|
str
|
Source code in V3_2/src/super_gradients/common/sg_loggers/abstract_sg_logger.py
168 169 170 171 172 173 174 |
|
upload()
abstractmethod
Upload any files which should be stored on remote storage
Source code in V3_2/src/super_gradients/common/sg_loggers/abstract_sg_logger.py
147 148 149 150 151 152 |
|
BaseSGLogger
Bases: AbstractSGLogger
Source code in V3_2/src/super_gradients/common/sg_loggers/base_sg_logger.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 |
|
__init__(project_name, experiment_name, storage_location, resumed, training_params, checkpoints_dir_path, tb_files_user_prompt=False, launch_tensorboard=False, tensorboard_port=None, save_checkpoints_remote=True, save_tensorboard_remote=True, save_logs_remote=True, monitor_system=True)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
experiment_name |
str
|
Name used for logging and loading purposes |
required |
storage_location |
str
|
If set to 's3' (i.e. s3://my-bucket) saves the Checkpoints in AWS S3 otherwise saves the Checkpoints Locally |
required |
resumed |
bool
|
If true, then old tensorboard files will NOT be deleted when tb_files_user_prompt=True |
required |
training_params |
TrainingParams
|
training_params for the experiment. |
required |
checkpoints_dir_path |
str
|
Local root directory path where all experiment logging directories will reside. |
required |
tb_files_user_prompt |
bool
|
Asks user for Tensorboard deletion prompt. |
False
|
launch_tensorboard |
bool
|
Whether to launch a TensorBoard process. |
False
|
tensorboard_port |
int
|
Specific port number for the tensorboard to use when launched (when set to None, some free port number will be used |
None
|
save_checkpoints_remote |
bool
|
Saves checkpoints in s3. |
True
|
save_tensorboard_remote |
bool
|
Saves tensorboard in s3. |
True
|
save_logs_remote |
bool
|
Saves log files in s3. |
True
|
monitor_system |
bool
|
Save the system statistics (GPU utilization, CPU, ...) in the tensorboard |
True
|
Source code in V3_2/src/super_gradients/common/sg_loggers/base_sg_logger.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
|
add_checkpoint(tag, state_dict, global_step=None)
Add checkpoint to experiment folder.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tag |
str
|
Identifier of the checkpoint. If None, global_step will be used to name the checkpoint. |
required |
state_dict |
dict
|
Checkpoint state_dict. |
required |
global_step |
int
|
Epoch number. |
None
|
Source code in V3_2/src/super_gradients/common/sg_loggers/base_sg_logger.py
290 291 292 293 294 295 296 297 298 299 300 301 302 303 |
|
add_figure(tag, figure, global_step=None)
Add a text to SGLogger. Typically, this function will add a figure to tensorboard or add it to experiment management framework.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tag |
str
|
Data identifier |
required |
figure |
plt.figure
|
the figure to add |
required |
global_step |
int
|
Global step value to record |
None
|
Source code in V3_2/src/super_gradients/common/sg_loggers/base_sg_logger.py
238 239 240 241 242 243 244 245 246 247 248 |
|
add_images(tag, images, data_format='NCHW', global_step=None)
Add multiple images to SGLogger. Typically, this function will add a set of images to tensorboard, save them to disk or add it to experiment management framework.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tag |
str
|
Data identifier |
required |
images |
Union[torch.Tensor, np.array]
|
images to be added. The values should lie in [0, 255] for type uint8 or [0, 1] for type float. |
required |
data_format |
Image data format specification of the form NCHW, NHWC, CHW, HWC, HW, WH, etc. |
'NCHW'
|
|
global_step |
int
|
Global step value to record |
None
|
Source code in V3_2/src/super_gradients/common/sg_loggers/base_sg_logger.py
189 190 191 192 193 194 195 196 197 198 199 200 |
|
add_model_graph(tag, model, dummy_input)
Add a pytorch model graph to the SGLogger. Only the model structure/architecture will be preserved and collected, NOT the model weights.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tag |
str
|
Data identifier |
required |
model |
torch.nn.Module
|
the model to be added |
required |
dummy_input |
torch.Tensor
|
an input to be used for a forward call on the model |
required |
Source code in V3_2/src/super_gradients/common/sg_loggers/base_sg_logger.py
222 223 224 225 226 227 228 229 230 231 232 |
|
add_scalars(tag_scalar_dict, global_step=None)
add multiple scalars. Unlike Tensorboard implementation, this does not add all scalars with a main tag (all scalars to the same chart). Instead, scalars are added to tensorboard like in add_scalar and are written in log together.
Source code in V3_2/src/super_gradients/common/sg_loggers/base_sg_logger.py
164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 |
|
add_video(tag, video, global_step=None)
Add a single video to SGLogger. Typically, this function will add a video to tensorboard, save it to disk or add it to experiment management framework.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tag |
str
|
Data identifier |
required |
video |
Union[torch.Tensor, np.array]
|
the video to add. shape (N,T,C,H,W) or (T,C,H,W). The values should lie in [0, 255] for type uint8 or [0, 1] for type float. |
required |
global_step |
int
|
Global step value to record |
None
|
Source code in V3_2/src/super_gradients/common/sg_loggers/base_sg_logger.py
202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 |
|
upload()
Upload the local tensorboard and log files to remote system.
Source code in V3_2/src/super_gradients/common/sg_loggers/base_sg_logger.py
255 256 257 258 259 260 261 262 263 264 265 |
|
ClearMLSGLogger
Bases: BaseSGLogger
Source code in V3_2/src/super_gradients/common/sg_loggers/clearml_sg_logger.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 |
|
__init__(project_name, experiment_name, storage_location, resumed, training_params, checkpoints_dir_path, tb_files_user_prompt=False, launch_tensorboard=False, tensorboard_port=None, save_checkpoints_remote=True, save_tensorboard_remote=True, save_logs_remote=True, monitor_system=None)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
project_name |
str
|
ClearML project name that can include many experiments |
required |
experiment_name |
str
|
Name used for logging and loading purposes |
required |
storage_location |
str
|
If set to 's3' (i.e. s3://my-bucket) saves the Checkpoints in AWS S3 otherwise saves the Checkpoints Locally |
required |
resumed |
bool
|
If true, then old tensorboard files will NOT be deleted when tb_files_user_prompt=True |
required |
training_params |
dict
|
training_params for the experiment. |
required |
checkpoints_dir_path |
str
|
Local root directory path where all experiment logging directories will reside. |
required |
tb_files_user_prompt |
bool
|
Asks user for Tensorboard deletion prompt. |
False
|
launch_tensorboard |
bool
|
Whether to launch a TensorBoard process. |
False
|
tensorboard_port |
int
|
Specific port number for the tensorboard to use when launched (when set to None, some free port number will be used |
None
|
save_checkpoints_remote |
bool
|
Saves checkpoints in s3. |
True
|
save_tensorboard_remote |
bool
|
Saves tensorboard in s3. |
True
|
save_logs_remote |
bool
|
Saves log files in s3. |
True
|
monitor_system |
bool
|
Not Available for ClearML logger. Save the system statistics (GPU utilization, CPU, ...) in the tensorboard |
None
|
Source code in V3_2/src/super_gradients/common/sg_loggers/clearml_sg_logger.py
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
|
DagsHubSGLogger
Bases: BaseSGLogger
Source code in V3_2/src/super_gradients/common/sg_loggers/dagshub_sg_logger.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 |
|
__init__(project_name, experiment_name, storage_location, resumed, training_params, checkpoints_dir_path, tb_files_user_prompt=False, launch_tensorboard=False, tensorboard_port=None, save_checkpoints_remote=True, save_tensorboard_remote=True, save_logs_remote=True, monitor_system=None, dagshub_repository=None, log_mlflow_only=False)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
experiment_name |
str
|
Name used for logging and loading purposes |
required |
storage_location |
str
|
If set to 's3' (i.e. s3://my-bucket) saves the Checkpoints in AWS S3 otherwise saves the Checkpoints Locally |
required |
resumed |
bool
|
If true, then old tensorboard files will NOT be deleted when tb_files_user_prompt=True |
required |
training_params |
dict
|
training_params for the experiment. |
required |
checkpoints_dir_path |
str
|
Local root directory path where all experiment logging directories will reside. |
required |
tb_files_user_prompt |
bool
|
Asks user for Tensorboard deletion prompt. |
False
|
launch_tensorboard |
bool
|
Whether to launch a TensorBoard process. |
False
|
tensorboard_port |
int
|
Specific port number for the tensorboard to use when launched (when set to None, some free port number will be used |
None
|
save_checkpoints_remote |
bool
|
Saves checkpoints in s3 and DagsHub. |
True
|
save_tensorboard_remote |
bool
|
Saves tensorboard in s3. |
True
|
save_logs_remote |
bool
|
Saves log files in s3 and DagsHub. |
True
|
monitor_system |
bool
|
Save the system statistics (GPU utilization, CPU, ...) in the tensorboard |
None
|
dagshub_repository |
Optional[str]
|
Format: |
None
|
log_mlflow_only |
bool
|
Skip logging to DVC, use MLflow for all artifacts being logged |
False
|
Source code in V3_2/src/super_gradients/common/sg_loggers/dagshub_sg_logger.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
|
DeciPlatformSGLogger
Bases: BaseSGLogger
Logger responsible to push logs and tensorboard artifacts to Deci platform.
Source code in V3_2/src/super_gradients/common/sg_loggers/deci_platform_sg_logger.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
|
__init__(project_name, experiment_name, storage_location, resumed, training_params, checkpoints_dir_path, model_name, upload_model=True, tb_files_user_prompt=False, launch_tensorboard=False, tensorboard_port=None, save_checkpoints_remote=True, save_tensorboard_remote=True, save_logs_remote=True, monitor_system=True)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
experiment_name |
str
|
Name used for logging and loading purposes |
required |
storage_location |
str
|
If set to 's3' (i.e. s3://my-bucket) saves the Checkpoints in AWS S3 otherwise saves the Checkpoints Locally |
required |
resumed |
bool
|
If true, then old tensorboard files will NOT be deleted when tb_files_user_prompt=True |
required |
training_params |
dict
|
training_params for the experiment. |
required |
checkpoints_dir_path |
str
|
Local root directory path where all experiment logging directories will reside. |
required |
model_name |
str
|
Name of the model to be used for logging. |
required |
upload_model |
bool
|
Whether to upload the model to the Deci Platform or not. |
True
|
tb_files_user_prompt |
bool
|
Asks user for Tensorboard deletion prompt. |
False
|
launch_tensorboard |
bool
|
Whether to launch a TensorBoard process. |
False
|
tensorboard_port |
int
|
Specific port number for the tensorboard to use when launched (when set to None, some free port number will be used |
None
|
save_checkpoints_remote |
bool
|
Saves checkpoints in s3. |
True
|
save_tensorboard_remote |
bool
|
Saves tensorboard in s3. |
True
|
save_logs_remote |
bool
|
Saves log files in s3. |
True
|
monitor_system |
bool
|
Save the system statistics (GPU utilization, CPU, ...) in the tensorboard |
True
|
Source code in V3_2/src/super_gradients/common/sg_loggers/deci_platform_sg_logger.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
|
upload()
Upload both to the destination specified by the user (base behavior), and to Deci platform.
Source code in V3_2/src/super_gradients/common/sg_loggers/deci_platform_sg_logger.py
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
|
log_stdout()
Redirect stdout to DEBUG.
Source code in V3_2/src/super_gradients/common/sg_loggers/deci_platform_sg_logger.py
142 143 144 145 146 147 148 149 150 151 |
|
EpochNumber
dataclass
Bases: TimeUnit
A time unit for epoch number.
Source code in V3_2/src/super_gradients/common/sg_loggers/time_units.py
19 20 21 22 23 24 25 26 27 28 29 30 31 |
|
GlobalBatchStepNumber
dataclass
Bases: TimeUnit
A time unit for representing total number of batches processed, including training and validation ones. Suppose training loader has 320 batches and validation loader has 80 batches. If the current epoch index is 2 (zero-based), and we are on validation loader and current index is 50 (zero-based), then the global batch step is (320 + 80) * 3 + 320 + 50 = 1570.
Source code in V3_2/src/super_gradients/common/sg_loggers/time_units.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
|
TimeUnit
Bases: abc.ABC
Abstract class for time units. This is used to explicitly log the time unit of a metric/loss.
Source code in V3_2/src/super_gradients/common/sg_loggers/time_units.py
5 6 7 8 9 10 11 12 13 14 15 16 |
|
WandBSGLogger
Bases: BaseSGLogger
Source code in V3_2/src/super_gradients/common/sg_loggers/wandb_sg_logger.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 |
|
__init__(project_name, experiment_name, storage_location, resumed, training_params, checkpoints_dir_path, tb_files_user_prompt=False, launch_tensorboard=False, tensorboard_port=None, save_checkpoints_remote=True, save_tensorboard_remote=True, save_logs_remote=True, entity=None, api_server=None, save_code=False, monitor_system=None, save_checkpoint_as_artifact=False, **kwargs)
:save_checkpoint_as_artifact: Save model checkpoint using Weights & Biases Artifact. Note that setting this option to True would save model checkpoints every epoch as a versioned artifact, which will result in use of increased storage usage on Weights & Biases.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
experiment_name |
str
|
Name used for logging and loading purposes |
required |
storage_location |
str
|
If set to 's3' (i.e. s3://my-bucket) saves the Checkpoints in AWS S3 otherwise saves the Checkpoints Locally |
required |
resumed |
bool
|
If true, then old tensorboard files will NOT be deleted when tb_files_user_prompt=True |
required |
training_params |
dict
|
training_params for the experiment. |
required |
checkpoints_dir_path |
str
|
Local root directory path where all experiment logging directories will reside. |
required |
tb_files_user_prompt |
bool
|
Asks user for Tensorboard deletion prompt. |
False
|
launch_tensorboard |
bool
|
Whether to launch a TensorBoard process. |
False
|
tensorboard_port |
int
|
Specific port number for the tensorboard to use when launched (when set to None, some free port number will be used) |
None
|
save_checkpoints_remote |
bool
|
Saves checkpoints in s3. |
True
|
save_tensorboard_remote |
bool
|
Saves tensorboard in s3. |
True
|
save_logs_remote |
bool
|
Saves log files in s3. |
True
|
monitor_system |
bool
|
Not Available for WandB logger. Save the system statistics (GPU utilization, CPU, ...) in the tensorboard |
None
|
save_code |
bool
|
Save current code to wandb |
False
|
Source code in V3_2/src/super_gradients/common/sg_loggers/wandb_sg_logger.py
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
|