Sg loggers
AbstractSGLogger
Bases: ABC
A SGLogger handles all outputs of the training process. Every generated file, log, metrics value, image or other artifacts produced by the trainer will be processed and saved.
Inheriting SGLogger can be used in order to integrate experiment management framework, special storage setting, a specific logging library etc.
Important: The BaseSGLogger class (inheriting from SGLogger) is used by the trainer by default. When defining your own SGLogger you will override all default output functionality. No files will saved to disk and no data will be collected. Make sure you either implement this functionality or use SGLoggers.Compose([BaseSGLogger(...), YourSGLogger(...)]) to build on top of it.
Source code in common/sg_loggers/abstract_sg_logger.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
add(tag, obj, global_step=None)
abstractmethod
A generic function for adding any type of data to the SGLogger. By default, this function is not called by the Trainer, BaseSGLogger does nothing with this type of data. But if you need to pass a data type which is not supported by any of the following abstract methods, use this method.
Source code in common/sg_loggers/abstract_sg_logger.py
21 22 23 24 25 26 27 28 |
|
add_checkpoint(tag, state_dict, global_step=None)
abstractmethod
Add a checkpoint to SGLogger Typically, this function will write a torch file to disk, upload it to remote storage or to experiment management framework.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tag |
str
|
Data identifier |
required |
state_dict |
dict
|
the state dict to save. The state dict includes more than just the model weight and may include any of: net: model weights acc: current accuracy (depends on metrics) epoch: current epoch optimizer_state_dict: optimizer state scaler_state_dict: torch.amp.scaler sate |
required |
global_step |
int
|
Global step value to record |
None
|
Source code in common/sg_loggers/abstract_sg_logger.py
121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
|
add_config(tag, config)
abstractmethod
Add the configuration (settings and hyperparameters) to the SGLoggers. Typically, this function will add the configuration dictionary to logs, write it to tensorboard, send it to an experiment management framework ect.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tag |
str
|
Data identifier |
required |
config |
dict
|
a dictionary of the experiment config |
required |
Source code in common/sg_loggers/abstract_sg_logger.py
30 31 32 33 34 35 36 37 38 39 40 |
|
add_file(file_name=None)
abstractmethod
Add a file from the checkpoint directory to the logger (usually, upload the file or adds it to an artifact)
Source code in common/sg_loggers/abstract_sg_logger.py
138 139 140 141 142 143 |
|
add_histogram(tag, values, bins='auto', global_step=None)
abstractmethod
Add a histogram to SGLogger. Typically, this function will add a histogram to tensorboard or add it to experiment management framework.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tag |
str
|
Data identifier |
required |
values |
Union[torch.Tensor, np.array]
|
Values to build histogram |
required |
bins |
Union[str, np.array, list, int]
|
This determines how the bins are made. If bins is an int, it defines the number of equal-width bins in the given range If bins is a sequence, it defines a monotonically increasing array of bin edges, including the rightmost edge, allowing for non-uniform bin widths. If bins is a string, it defines the method used to calculate the optimal bin width, as defined by https://numpy.org/doc/stable/reference/generated/numpy.histogram_bin_edges.html#numpy.histogram_bin_edges one of [‘sqrt’, ’auto’, ‘fd’, ‘doane’, ‘scott’, ‘stone’...] |
'auto'
|
global_step |
int
|
Global step value to record |
None
|
Source code in common/sg_loggers/abstract_sg_logger.py
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
|
add_image(tag, image, data_format='CHW', global_step=None)
abstractmethod
Add a single image to SGLogger. Typically, this function will add an image to tensorboard, save it to disk or add it to experiment management framework.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tag |
str
|
Data identifier |
required |
image |
Union[torch.Tensor, np.array, Image.Image]
|
an image to be added. The values should lie in [0, 255] for type uint8 or [0, 1] for type float. |
required |
data_format |
str
|
Image data format specification of the form CHW, HWC, HW, WH, etc. |
'CHW'
|
global_step |
int
|
Global step value to record |
None
|
Source code in common/sg_loggers/abstract_sg_logger.py
65 66 67 68 69 70 71 72 73 74 75 76 |
|
add_images(tag, images, data_format='NCHW', global_step=None)
abstractmethod
Add multiple images to SGLogger. Typically, this function will add images to tensorboard, save them to disk or add them to experiment management framework.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tag |
str
|
Data identifier |
required |
images |
Union[torch.Tensor, np.array]
|
images to be added. The values should lie in [0, 255] for type uint8 or [0, 1] for type float. |
required |
data_format |
Image data format specification of the form NCHW, NHWC, NHW, NWH, etc. |
'NCHW'
|
|
global_step |
int
|
Global step value to record |
None
|
Source code in common/sg_loggers/abstract_sg_logger.py
78 79 80 81 82 83 84 85 86 87 88 89 |
|
add_scalar(tag, scalar_value, global_step=None)
abstractmethod
Add scalar data to SGLogger. Typically, this function will add scalar to tensorboard or other experiment management framework.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tag |
str
|
Data identifier |
required |
scalar_value |
float
|
Value to save |
required |
global_step |
int
|
Global step value to record |
None
|
Source code in common/sg_loggers/abstract_sg_logger.py
42 43 44 45 46 47 48 49 50 51 52 |
|
add_scalars(tag_scalar_dict, global_step=None)
abstractmethod
Adds multiple scalar data to SGLogger. Typically, this function will add scalars to tensorboard or other experiment management framework.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tag_scalar_dict |
dict
|
a dictionary {tag(str): value(float)} of the scalars. |
required |
global_step |
int
|
Global step value to record |
None
|
Source code in common/sg_loggers/abstract_sg_logger.py
54 55 56 57 58 59 60 61 62 63 |
|
add_text(tag, text_string, global_step=None)
abstractmethod
Add a text to SGLogger. Typically, this function will add a text to tensorboard or add it to experiment management framework.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tag |
str
|
Data identifier |
required |
text_string |
str
|
the text to be added |
required |
global_step |
int
|
Global step value to record |
None
|
Source code in common/sg_loggers/abstract_sg_logger.py
109 110 111 112 113 114 115 116 117 118 119 |
|
close()
abstractmethod
Close the SGLogger
Source code in common/sg_loggers/abstract_sg_logger.py
159 160 161 162 163 164 |
|
flush()
abstractmethod
Flush the SGLogger's cache
Source code in common/sg_loggers/abstract_sg_logger.py
152 153 154 155 156 157 |
|
local_dir()
abstractmethod
A getter for the full/absolute path where all files are saved locally
Returns:
Type | Description |
---|---|
str
|
Source code in common/sg_loggers/abstract_sg_logger.py
166 167 168 169 170 171 172 |
|
upload()
abstractmethod
Upload any files which should be stored on remote storage
Source code in common/sg_loggers/abstract_sg_logger.py
145 146 147 148 149 150 |
|
BaseSGLogger
Bases: AbstractSGLogger
Source code in common/sg_loggers/base_sg_logger.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 |
|
__init__(project_name, experiment_name, storage_location, resumed, training_params, checkpoints_dir_path, tb_files_user_prompt=False, launch_tensorboard=False, tensorboard_port=None, save_checkpoints_remote=True, save_tensorboard_remote=True, save_logs_remote=True, monitor_system=True)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
experiment_name |
str
|
Name used for logging and loading purposes |
required |
storage_location |
str
|
If set to 's3' (i.e. s3://my-bucket) saves the Checkpoints in AWS S3 otherwise saves the Checkpoints Locally |
required |
resumed |
bool
|
If true, then old tensorboard files will NOT be deleted when tb_files_user_prompt=True |
required |
training_params |
TrainingParams
|
training_params for the experiment. |
required |
checkpoints_dir_path |
str
|
Local root directory path where all experiment logging directories will reside. |
required |
tb_files_user_prompt |
bool
|
Asks user for Tensorboard deletion prompt. |
False
|
launch_tensorboard |
bool
|
Whether to launch a TensorBoard process. |
False
|
tensorboard_port |
int
|
Specific port number for the tensorboard to use when launched (when set to None, some free port number will be used |
None
|
save_checkpoints_remote |
bool
|
Saves checkpoints in s3. |
True
|
save_tensorboard_remote |
bool
|
Saves tensorboard in s3. |
True
|
save_logs_remote |
bool
|
Saves log files in s3. |
True
|
monitor_system |
bool
|
Save the system statistics (GPU utilization, CPU, ...) in the tensorboard |
True
|
Source code in common/sg_loggers/base_sg_logger.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
|
add_checkpoint(tag, state_dict, global_step=None)
Add checkpoint to experiment folder.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tag |
str
|
Identifier of the checkpoint. If None, global_step will be used to name the checkpoint. |
required |
state_dict |
dict
|
Checkpoint state_dict. |
required |
global_step |
int
|
Epoch number. |
None
|
Source code in common/sg_loggers/base_sg_logger.py
286 287 288 289 290 291 292 293 294 295 296 297 298 299 |
|
add_figure(tag, figure, global_step=None)
Add a text to SGLogger. Typically, this function will add a figure to tensorboard or add it to experiment management framework.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tag |
str
|
Data identifier |
required |
figure |
plt.figure
|
the figure to add |
required |
global_step |
int
|
Global step value to record |
None
|
Source code in common/sg_loggers/base_sg_logger.py
234 235 236 237 238 239 240 241 242 243 244 |
|
add_images(tag, images, data_format='NCHW', global_step=None)
Add multiple images to SGLogger. Typically, this function will add a set of images to tensorboard, save them to disk or add it to experiment management framework.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tag |
str
|
Data identifier |
required |
images |
Union[torch.Tensor, np.array]
|
images to be added. The values should lie in [0, 255] for type uint8 or [0, 1] for type float. |
required |
data_format |
Image data format specification of the form NCHW, NHWC, CHW, HWC, HW, WH, etc. |
'NCHW'
|
|
global_step |
int
|
Global step value to record |
None
|
Source code in common/sg_loggers/base_sg_logger.py
185 186 187 188 189 190 191 192 193 194 195 196 |
|
add_model_graph(tag, model, dummy_input)
Add a pytorch model graph to the SGLogger. Only the model structure/architecture will be preserved and collected, NOT the model weights.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tag |
str
|
Data identifier |
required |
model |
torch.nn.Module
|
the model to be added |
required |
dummy_input |
torch.Tensor
|
an input to be used for a forward call on the model |
required |
Source code in common/sg_loggers/base_sg_logger.py
218 219 220 221 222 223 224 225 226 227 228 |
|
add_scalars(tag_scalar_dict, global_step=None)
add multiple scalars. Unlike Tensorboard implementation, this does not add all scalars with a main tag (all scalars to the same chart). Instead, scalars are added to tensorboard like in add_scalar and are written in log together.
Source code in common/sg_loggers/base_sg_logger.py
160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 |
|
add_video(tag, video, global_step=None)
Add a single video to SGLogger. Typically, this function will add a video to tensorboard, save it to disk or add it to experiment management framework.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tag |
str
|
Data identifier |
required |
video |
Union[torch.Tensor, np.array]
|
the video to add. shape (N,T,C,H,W) or (T,C,H,W). The values should lie in [0, 255] for type uint8 or [0, 1] for type float. |
required |
global_step |
int
|
Global step value to record |
None
|
Source code in common/sg_loggers/base_sg_logger.py
198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 |
|
upload()
Upload the local tensorboard and log files to remote system.
Source code in common/sg_loggers/base_sg_logger.py
251 252 253 254 255 256 257 258 259 260 261 |
|
ClearMLSGLogger
Bases: BaseSGLogger
Source code in common/sg_loggers/clearml_sg_logger.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 |
|
__init__(project_name, experiment_name, storage_location, resumed, training_params, checkpoints_dir_path, tb_files_user_prompt=False, launch_tensorboard=False, tensorboard_port=None, save_checkpoints_remote=True, save_tensorboard_remote=True, save_logs_remote=True, monitor_system=None)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
project_name |
str
|
ClearML project name that can include many experiments |
required |
experiment_name |
str
|
Name used for logging and loading purposes |
required |
storage_location |
str
|
If set to 's3' (i.e. s3://my-bucket) saves the Checkpoints in AWS S3 otherwise saves the Checkpoints Locally |
required |
resumed |
bool
|
If true, then old tensorboard files will NOT be deleted when tb_files_user_prompt=True |
required |
training_params |
dict
|
training_params for the experiment. |
required |
checkpoints_dir_path |
str
|
Local root directory path where all experiment logging directories will reside. |
required |
tb_files_user_prompt |
bool
|
Asks user for Tensorboard deletion prompt. |
False
|
launch_tensorboard |
bool
|
Whether to launch a TensorBoard process. |
False
|
tensorboard_port |
int
|
Specific port number for the tensorboard to use when launched (when set to None, some free port number will be used |
None
|
save_checkpoints_remote |
bool
|
Saves checkpoints in s3. |
True
|
save_tensorboard_remote |
bool
|
Saves tensorboard in s3. |
True
|
save_logs_remote |
bool
|
Saves log files in s3. |
True
|
monitor_system |
bool
|
Not Available for ClearML logger. Save the system statistics (GPU utilization, CPU, ...) in the tensorboard |
None
|
Source code in common/sg_loggers/clearml_sg_logger.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
|
DagsHubSGLogger
Bases: BaseSGLogger
Source code in common/sg_loggers/dagshub_sg_logger.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 |
|
__init__(project_name, experiment_name, storage_location, resumed, training_params, checkpoints_dir_path, tb_files_user_prompt=False, launch_tensorboard=False, tensorboard_port=None, save_checkpoints_remote=True, save_tensorboard_remote=True, save_logs_remote=True, monitor_system=None, dagshub_repository=None, log_mlflow_only=False)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
experiment_name |
str
|
Name used for logging and loading purposes |
required |
storage_location |
str
|
If set to 's3' (i.e. s3://my-bucket) saves the Checkpoints in AWS S3 otherwise saves the Checkpoints Locally |
required |
resumed |
bool
|
If true, then old tensorboard files will NOT be deleted when tb_files_user_prompt=True |
required |
training_params |
dict
|
training_params for the experiment. |
required |
checkpoints_dir_path |
str
|
Local root directory path where all experiment logging directories will reside. |
required |
tb_files_user_prompt |
bool
|
Asks user for Tensorboard deletion prompt. |
False
|
launch_tensorboard |
bool
|
Whether to launch a TensorBoard process. |
False
|
tensorboard_port |
int
|
Specific port number for the tensorboard to use when launched (when set to None, some free port number will be used |
None
|
save_checkpoints_remote |
bool
|
Saves checkpoints in s3 and DagsHub. |
True
|
save_tensorboard_remote |
bool
|
Saves tensorboard in s3. |
True
|
save_logs_remote |
bool
|
Saves log files in s3 and DagsHub. |
True
|
monitor_system |
bool
|
Save the system statistics (GPU utilization, CPU, ...) in the tensorboard |
None
|
dagshub_repository |
Optional[str]
|
Format: |
None
|
log_mlflow_only |
bool
|
Skip logging to DVC, use MLflow for all artifacts being logged |
False
|
Source code in common/sg_loggers/dagshub_sg_logger.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
|
DeciPlatformSGLogger
Bases: BaseSGLogger
Logger responsible to push logs and tensorboard artifacts to Deci platform.
Source code in common/sg_loggers/deci_platform_sg_logger.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
|
__init__(project_name, experiment_name, storage_location, resumed, training_params, checkpoints_dir_path, model_name, upload_model=True, tb_files_user_prompt=False, launch_tensorboard=False, tensorboard_port=None, save_checkpoints_remote=True, save_tensorboard_remote=True, save_logs_remote=True, monitor_system=True)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
experiment_name |
str
|
Name used for logging and loading purposes |
required |
storage_location |
str
|
If set to 's3' (i.e. s3://my-bucket) saves the Checkpoints in AWS S3 otherwise saves the Checkpoints Locally |
required |
resumed |
bool
|
If true, then old tensorboard files will NOT be deleted when tb_files_user_prompt=True |
required |
training_params |
dict
|
training_params for the experiment. |
required |
checkpoints_dir_path |
str
|
Local root directory path where all experiment logging directories will reside. |
required |
model_name |
str
|
Name of the model to be used for logging. |
required |
upload_model |
bool
|
Whether to upload the model to the Deci Platform or not. |
True
|
tb_files_user_prompt |
bool
|
Asks user for Tensorboard deletion prompt. |
False
|
launch_tensorboard |
bool
|
Whether to launch a TensorBoard process. |
False
|
tensorboard_port |
int
|
Specific port number for the tensorboard to use when launched (when set to None, some free port number will be used |
None
|
save_checkpoints_remote |
bool
|
Saves checkpoints in s3. |
True
|
save_tensorboard_remote |
bool
|
Saves tensorboard in s3. |
True
|
save_logs_remote |
bool
|
Saves log files in s3. |
True
|
monitor_system |
bool
|
Save the system statistics (GPU utilization, CPU, ...) in the tensorboard |
True
|
Source code in common/sg_loggers/deci_platform_sg_logger.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
|
upload()
Upload both to the destination specified by the user (base behavior), and to Deci platform.
Source code in common/sg_loggers/deci_platform_sg_logger.py
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
|
log_stdout()
Redirect stdout to DEBUG.
Source code in common/sg_loggers/deci_platform_sg_logger.py
142 143 144 145 146 147 148 149 150 151 |
|
WandBSGLogger
Bases: BaseSGLogger
Source code in common/sg_loggers/wandb_sg_logger.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 |
|
__init__(project_name, experiment_name, storage_location, resumed, training_params, checkpoints_dir_path, tb_files_user_prompt=False, launch_tensorboard=False, tensorboard_port=None, save_checkpoints_remote=True, save_tensorboard_remote=True, save_logs_remote=True, entity=None, api_server=None, save_code=False, monitor_system=None, **kwargs)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
experiment_name |
str
|
Name used for logging and loading purposes |
required |
storage_location |
str
|
If set to 's3' (i.e. s3://my-bucket) saves the Checkpoints in AWS S3 otherwise saves the Checkpoints Locally |
required |
resumed |
bool
|
If true, then old tensorboard files will NOT be deleted when tb_files_user_prompt=True |
required |
training_params |
dict
|
training_params for the experiment. |
required |
checkpoints_dir_path |
str
|
Local root directory path where all experiment logging directories will reside. |
required |
tb_files_user_prompt |
bool
|
Asks user for Tensorboard deletion prompt. |
False
|
launch_tensorboard |
bool
|
Whether to launch a TensorBoard process. |
False
|
tensorboard_port |
int
|
Specific port number for the tensorboard to use when launched (when set to None, some free port number will be used) |
None
|
save_checkpoints_remote |
bool
|
Saves checkpoints in s3. |
True
|
save_tensorboard_remote |
bool
|
Saves tensorboard in s3. |
True
|
save_logs_remote |
bool
|
Saves log files in s3. |
True
|
monitor_system |
bool
|
Not Available for WandB logger. Save the system statistics (GPU utilization, CPU, ...) in the tensorboard |
None
|
save_code |
bool
|
Save current code to wandb |
False
|
Source code in common/sg_loggers/wandb_sg_logger.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
|