@yellowcap @oliverroick @alukach @sunu @AliceR
Hi, there,
I meet a problem as below. I hope you can help me, thanks!
INFO:numexpr.utils:Note: detected 168 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
INFO:numexpr.utils:Note: NumExpr detected 168 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
INFO:albumentations.check_version:A new version of Albumentations is available: 2.0.8 (you have 1.4.10). Upgrade using: pip install --upgrade albumentations
/mnt/scratch/users/quinnnew/multi-temporal-crop-classification_classes4_xjcrops/
INFO: Seed set to 0
INFO:lightning.fabric.utilities.seed:Seed set to 0
/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/fabric/plugins/environments/slurm.py:204: The srun command is available on your system but is not used. HINT: If your intention is to run Lightning on SLURM, prepend your python command with srun like so: srun python example_multitemporalcrop_class4_linux.py ...
INFO: Using bfloat16 Automatic Mixed Precision (AMP)
INFO:lightning.pytorch.utilities.rank_zero:Using bfloat16 Automatic Mixed Precision (AMP)
INFO: GPU available: True (cuda), used: True
INFO:lightning.pytorch.utilities.rank_zero:GPU available: True (cuda), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO:lightning.pytorch.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO: HPU available: False, using: 0 HPUs
INFO:lightning.pytorch.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO: Trainer(limit_predict_batches=1) was configured so 1 batch will be used.
INFO:lightning.pytorch.utilities.rank_zero:Trainer(limit_predict_batches=1) was configured so 1 batch will be used.
INFO:root:Loaded weights for HLSBands.BLUE in position 0 of patch embed
INFO:root:Loaded weights for HLSBands.GREEN in position 1 of patch embed
INFO:root:Loaded weights for HLSBands.RED in position 2 of patch embed
INFO:root:Loaded weights for HLSBands.NIR_NARROW in position 3 of patch embed
INFO:root:Loaded weights for HLSBands.SWIR_1 in position 4 of patch embed
INFO:root:Loaded weights for HLSBands.SWIR_2 in position 5 of patch embed
WARNING:root:Decoder UperNetDecoder does not have an includes_head attribute. Falling back to the value of the registry.
/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/terratorch/models/decoders/upernet_decoder.py:37: UserWarning: DeprecationWarning: scale_modules is deprecated and will be removed in future versions. Use LearnedInterpolateToPyramidal neck instead.
warnings.warn(
INFO: You are using a CUDA device ('NVIDIA L4') that has Tensor Cores. To properly utilize them, you should set torch.set_float32_matmul_precision('medium' | 'high') which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
INFO:lightning.pytorch.utilities.rank_zero:You are using a CUDA device ('NVIDIA L4') that has Tensor Cores. To properly utilize them, you should set torch.set_float32_matmul_precision('medium' | 'high') which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
INFO: Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2
INFO:lightning.fabric.utilities.distributed:Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2
INFO:numexpr.utils:Note: detected 168 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
INFO:numexpr.utils:Note: NumExpr detected 168 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
INFO:albumentations.check_version:A new version of Albumentations is available: 2.0.8 (you have 1.4.10). Upgrade using: pip install --upgrade albumentations
/mnt/scratch/users/quinnnew/multi-temporal-crop-classification_classes4_xjcrops/
INFO: [rank: 1] Seed set to 0
INFO:lightning.fabric.utilities.seed:[rank: 1] Seed set to 0
INFO:root:Loaded weights for HLSBands.BLUE in position 0 of patch embed
INFO:root:Loaded weights for HLSBands.GREEN in position 1 of patch embed
INFO:root:Loaded weights for HLSBands.RED in position 2 of patch embed
INFO:root:Loaded weights for HLSBands.NIR_NARROW in position 3 of patch embed
INFO:root:Loaded weights for HLSBands.SWIR_1 in position 4 of patch embed
INFO:root:Loaded weights for HLSBands.SWIR_2 in position 5 of patch embed
WARNING:root:Decoder UperNetDecoder does not have an includes_head attribute. Falling back to the value of the registry.
/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/terratorch/models/decoders/upernet_decoder.py:37: UserWarning: DeprecationWarning: scale_modules is deprecated and will be removed in future versions. Use LearnedInterpolateToPyramidal neck instead.
warnings.warn(
INFO: Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2
INFO:lightning.fabric.utilities.distributed:Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2
INFO: ----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 2 processes
INFO:lightning.pytorch.utilities.rank_zero:----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 2 processes
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
INFO:lightning.pytorch.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
INFO: LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1]
INFO:lightning.pytorch.accelerators.cuda:LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1]
INFO:
INFO:
| Name | Type | Params | Mode
0 | model | PixelWiseModel | 364 M | train
1 | criterion | CrossEntropyLoss | 0 | train
2 | train_metrics | MetricCollection | 0 | train
3 | val_metrics | MetricCollection | 0 | train
4 | test_metrics | ModuleList | 0 | train
364 M Trainable params
0 Non-trainable params
364 M Total params
1,457.832 Total estimated model params size (MB)
625 Modules in train mode
0 Modules in eval mode
INFO:lightning.pytorch.callbacks.model_summary:
| Name | Type | Params | Mode
0 | model | PixelWiseModel | 364 M | train
1 | criterion | CrossEntropyLoss | 0 | train
2 | train_metrics | MetricCollection | 0 | train
3 | val_metrics | MetricCollection | 0 | train
4 | test_metrics | ModuleList | 0 | train
364 M Trainable params
0 Non-trainable params
364 M Total params
1,457.832 Total estimated model params size (MB)
625 Modules in train mode
0 Modules in eval mode
Epoch 0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 772/772 [04:15<00:00, 3.02it/s, v_num=4][rank1]: Traceback (most recent call last):
[rank1]: File "/mnt/scratch/users/quinnnew/Prithvi-EO-2.0-main/examples/example_multitemporalcrop_class4_linux.py", line 190, in
[rank1]: trainer.fit(model, datamodule=data_module)
[rank1]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py", line 538, in fit
[rank1]: call._call_and_handle_interrupt(
[rank1]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/trainer/call.py", line 46, in _call_and_handle_interrupt
[rank1]: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 105, in launch
[rank1]: return function(*args, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py", line 574, in _fit_impl
[rank1]: self._run(model, ckpt_path=ckpt_path)
[rank1]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py", line 981, in _run
[rank1]: results = self._run_stage()
[rank1]: ^^^^^^^^^^^^^^^^^
[rank1]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py", line 1025, in _run_stage
[rank1]: self.fit_loop.run()
[rank1]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/loops/fit_loop.py", line 206, in run
[rank1]: self.on_advance_end()
[rank1]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/loops/fit_loop.py", line 378, in on_advance_end
[rank1]: call._call_callback_hooks(trainer, "on_train_epoch_end", monitoring_callbacks=True)
[rank1]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/trainer/call.py", line 218, in _call_callback_hooks
[rank1]: fn(trainer, trainer.lightning_module, *args, **kwargs)
[rank1]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/callbacks/model_checkpoint.py", line 325, in on_train_epoch_end
[rank1]: self._save_topk_checkpoint(trainer, monitor_candidates)
[rank1]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/callbacks/model_checkpoint.py", line 383, in _save_topk_checkpoint
[rank1]: raise MisconfigurationException(m)
[rank1]: lightning.fabric.utilities.exceptions.MisconfigurationException: ModelCheckpoint(monitor='val/Multiclass_Jaccard_Index') could not find the monitored key in the returned metrics: ['train/loss', 'val/loss', 'val/Accuracy', 'val/multiclassaccuracy_0', 'val/multiclassaccuracy_1', 'val/multiclassaccuracy_2', 'val/multiclassaccuracy_3', 'val/multiclassaccuracy_4', 'val/multiclassaccuracy_5', 'val/multiclassaccuracy_6', 'val/multiclassaccuracy_7', 'val/multiclassaccuracy_8', 'val/multiclassaccuracy_9', 'val/multiclassaccuracy_10', 'val/multiclassaccuracy_11', 'val/multiclassaccuracy_12', 'val/F1_Score', 'val/multiclassjaccardindex_0', 'val/multiclassjaccardindex_1', 'val/multiclassjaccardindex_2', 'val/multiclassjaccardindex_3', 'val/multiclassjaccardindex_4', 'val/multiclassjaccardindex_5', 'val/multiclassjaccardindex_6', 'val/multiclassjaccardindex_7', 'val/multiclassjaccardindex_8', 'val/multiclassjaccardindex_9', 'val/multiclassjaccardindex_10', 'val/multiclassjaccardindex_11', 'val/multiclassjaccardindex_12', 'val/Pixel_Accuracy', 'val/mIoU', 'val/mIoU_Micro', 'train/Accuracy', 'train/multiclassaccuracy_0', 'train/multiclassaccuracy_1', 'train/multiclassaccuracy_2', 'train/multiclassaccuracy_3', 'train/multiclassaccuracy_4', 'train/multiclassaccuracy_5', 'train/multiclassaccuracy_6', 'train/multiclassaccuracy_7', 'train/multiclassaccuracy_8', 'train/multiclassaccuracy_9', 'train/multiclassaccuracy_10', 'train/multiclassaccuracy_11', 'train/multiclassaccuracy_12', 'train/F1_Score', 'train/multiclassjaccardindex_0', 'train/multiclassjaccardindex_1', 'train/multiclassjaccardindex_2', 'train/multiclassjaccardindex_3', 'train/multiclassjaccardindex_4', 'train/multiclassjaccardindex_5', 'train/multiclassjaccardindex_6', 'train/multiclassjaccardindex_7', 'train/multiclassjaccardindex_8', 'train/multiclassjaccardindex_9', 'train/multiclassjaccardindex_10', 'train/multiclassjaccardindex_11', 'train/multiclassjaccardindex_12', 'train/Pixel_Accuracy', 'train/mIoU', 'train/mIoU_Micro', 'epoch', 'step']. HINT: Did you call log('val/Multiclass_Jaccard_Index', value) in the LightningModule?
[rank0]: Traceback (most recent call last):
[rank0]: File "/mnt/scratch/users/quinnnew/Prithvi-EO-2.0-main/examples/example_multitemporalcrop_class4_linux.py", line 190, in
[rank0]: trainer.fit(model, datamodule=data_module)
[rank0]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py", line 538, in fit
[rank0]: call._call_and_handle_interrupt(
[rank0]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/trainer/call.py", line 46, in _call_and_handle_interrupt
[rank0]: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 105, in launch
[rank0]: return function(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py", line 574, in _fit_impl
[rank0]: self._run(model, ckpt_path=ckpt_path)
[rank0]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py", line 981, in _run
[rank0]: results = self._run_stage()
[rank0]: ^^^^^^^^^^^^^^^^^
[rank0]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py", line 1025, in _run_stage
[rank0]: self.fit_loop.run()
[rank0]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/loops/fit_loop.py", line 206, in run
[rank0]: self.on_advance_end()
[rank0]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/loops/fit_loop.py", line 378, in on_advance_end
[rank0]: call._call_callback_hooks(trainer, "on_train_epoch_end", monitoring_callbacks=True)
[rank0]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/trainer/call.py", line 218, in _call_callback_hooks
[rank0]: fn(trainer, trainer.lightning_module, *args, **kwargs)
[rank0]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/callbacks/model_checkpoint.py", line 325, in on_train_epoch_end
[rank0]: self._save_topk_checkpoint(trainer, monitor_candidates)
[rank0]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/callbacks/model_checkpoint.py", line 383, in _save_topk_checkpoint
[rank0]: raise MisconfigurationException(m)
[rank0]: lightning.fabric.utilities.exceptions.MisconfigurationException: ModelCheckpoint(monitor='val/Multiclass_Jaccard_Index') could not find the monitored key in the returned metrics: ['train/loss', 'val/loss', 'val/Accuracy', 'val/multiclassaccuracy_0', 'val/multiclassaccuracy_1', 'val/multiclassaccuracy_2', 'val/multiclassaccuracy_3', 'val/multiclassaccuracy_4', 'val/multiclassaccuracy_5', 'val/multiclassaccuracy_6', 'val/multiclassaccuracy_7', 'val/multiclassaccuracy_8', 'val/multiclassaccuracy_9', 'val/multiclassaccuracy_10', 'val/multiclassaccuracy_11', 'val/multiclassaccuracy_12', 'val/F1_Score', 'val/multiclassjaccardindex_0', 'val/multiclassjaccardindex_1', 'val/multiclassjaccardindex_2', 'val/multiclassjaccardindex_3', 'val/multiclassjaccardindex_4', 'val/multiclassjaccardindex_5', 'val/multiclassjaccardindex_6', 'val/multiclassjaccardindex_7', 'val/multiclassjaccardindex_8', 'val/multiclassjaccardindex_9', 'val/multiclassjaccardindex_10', 'val/multiclassjaccardindex_11', 'val/multiclassjaccardindex_12', 'val/Pixel_Accuracy', 'val/mIoU', 'val/mIoU_Micro', 'train/Accuracy', 'train/multiclassaccuracy_0', 'train/multiclassaccuracy_1', 'train/multiclassaccuracy_2', 'train/multiclassaccuracy_3', 'train/multiclassaccuracy_4', 'train/multiclassaccuracy_5', 'train/multiclassaccuracy_6', 'train/multiclassaccuracy_7', 'train/multiclassaccuracy_8', 'train/multiclassaccuracy_9', 'train/multiclassaccuracy_10', 'train/multiclassaccuracy_11', 'train/multiclassaccuracy_12', 'train/F1_Score', 'train/multiclassjaccardindex_0', 'train/multiclassjaccardindex_1', 'train/multiclassjaccardindex_2', 'train/multiclassjaccardindex_3', 'train/multiclassjaccardindex_4', 'train/multiclassjaccardindex_5', 'train/multiclassjaccardindex_6', 'train/multiclassjaccardindex_7', 'train/multiclassjaccardindex_8', 'train/multiclassjaccardindex_9', 'train/multiclassjaccardindex_10', 'train/multiclassjaccardindex_11', 'train/multiclassjaccardindex_12', 'train/Pixel_Accuracy', 'train/mIoU', 'train/mIoU_Micro', 'epoch', 'step']. HINT: Did you call log('val/Multiclass_Jaccard_Index', value) in the LightningModule?
@yellowcap @oliverroick @alukach @sunu @AliceR
Hi, there,
I meet a problem as below. I hope you can help me, thanks!
INFO:numexpr.utils:Note: detected 168 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
INFO:numexpr.utils:Note: NumExpr detected 168 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
INFO:albumentations.check_version:A new version of Albumentations is available: 2.0.8 (you have 1.4.10). Upgrade using: pip install --upgrade albumentations
/mnt/scratch/users/quinnnew/multi-temporal-crop-classification_classes4_xjcrops/
INFO: Seed set to 0
INFO:lightning.fabric.utilities.seed:Seed set to 0
/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/fabric/plugins/environments/slurm.py:204: The
sruncommand is available on your system but is not used. HINT: If your intention is to run Lightning on SLURM, prepend your python command withsrunlike so: srun python example_multitemporalcrop_class4_linux.py ...INFO: Using bfloat16 Automatic Mixed Precision (AMP)
INFO:lightning.pytorch.utilities.rank_zero:Using bfloat16 Automatic Mixed Precision (AMP)
INFO: GPU available: True (cuda), used: True
INFO:lightning.pytorch.utilities.rank_zero:GPU available: True (cuda), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO:lightning.pytorch.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO: HPU available: False, using: 0 HPUs
INFO:lightning.pytorch.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:
Trainer(limit_predict_batches=1)was configured so 1 batch will be used.INFO:lightning.pytorch.utilities.rank_zero:
Trainer(limit_predict_batches=1)was configured so 1 batch will be used.INFO:root:Loaded weights for HLSBands.BLUE in position 0 of patch embed
INFO:root:Loaded weights for HLSBands.GREEN in position 1 of patch embed
INFO:root:Loaded weights for HLSBands.RED in position 2 of patch embed
INFO:root:Loaded weights for HLSBands.NIR_NARROW in position 3 of patch embed
INFO:root:Loaded weights for HLSBands.SWIR_1 in position 4 of patch embed
INFO:root:Loaded weights for HLSBands.SWIR_2 in position 5 of patch embed
WARNING:root:Decoder UperNetDecoder does not have an
includes_headattribute. Falling back to the value of the registry./mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/terratorch/models/decoders/upernet_decoder.py:37: UserWarning: DeprecationWarning: scale_modules is deprecated and will be removed in future versions. Use LearnedInterpolateToPyramidal neck instead.
warnings.warn(
INFO: You are using a CUDA device ('NVIDIA L4') that has Tensor Cores. To properly utilize them, you should set
torch.set_float32_matmul_precision('medium' | 'high')which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precisionINFO:lightning.pytorch.utilities.rank_zero:You are using a CUDA device ('NVIDIA L4') that has Tensor Cores. To properly utilize them, you should set
torch.set_float32_matmul_precision('medium' | 'high')which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precisionINFO: Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2
INFO:lightning.fabric.utilities.distributed:Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2
INFO:numexpr.utils:Note: detected 168 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
INFO:numexpr.utils:Note: NumExpr detected 168 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
INFO:albumentations.check_version:A new version of Albumentations is available: 2.0.8 (you have 1.4.10). Upgrade using: pip install --upgrade albumentations
/mnt/scratch/users/quinnnew/multi-temporal-crop-classification_classes4_xjcrops/
INFO: [rank: 1] Seed set to 0
INFO:lightning.fabric.utilities.seed:[rank: 1] Seed set to 0
INFO:root:Loaded weights for HLSBands.BLUE in position 0 of patch embed
INFO:root:Loaded weights for HLSBands.GREEN in position 1 of patch embed
INFO:root:Loaded weights for HLSBands.RED in position 2 of patch embed
INFO:root:Loaded weights for HLSBands.NIR_NARROW in position 3 of patch embed
INFO:root:Loaded weights for HLSBands.SWIR_1 in position 4 of patch embed
INFO:root:Loaded weights for HLSBands.SWIR_2 in position 5 of patch embed
WARNING:root:Decoder UperNetDecoder does not have an
includes_headattribute. Falling back to the value of the registry./mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/terratorch/models/decoders/upernet_decoder.py:37: UserWarning: DeprecationWarning: scale_modules is deprecated and will be removed in future versions. Use LearnedInterpolateToPyramidal neck instead.
warnings.warn(
INFO: Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2
INFO:lightning.fabric.utilities.distributed:Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2
INFO: ----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 2 processes
INFO:lightning.pytorch.utilities.rank_zero:----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 2 processes
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
INFO:lightning.pytorch.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
INFO: LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1]
INFO:lightning.pytorch.accelerators.cuda:LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1]
INFO:
INFO:
| Name | Type | Params | Mode
0 | model | PixelWiseModel | 364 M | train
1 | criterion | CrossEntropyLoss | 0 | train
2 | train_metrics | MetricCollection | 0 | train
3 | val_metrics | MetricCollection | 0 | train
4 | test_metrics | ModuleList | 0 | train
364 M Trainable params
0 Non-trainable params
364 M Total params
1,457.832 Total estimated model params size (MB)
625 Modules in train mode
0 Modules in eval mode
INFO:lightning.pytorch.callbacks.model_summary:
| Name | Type | Params | Mode
0 | model | PixelWiseModel | 364 M | train
1 | criterion | CrossEntropyLoss | 0 | train
2 | train_metrics | MetricCollection | 0 | train
3 | val_metrics | MetricCollection | 0 | train
4 | test_metrics | ModuleList | 0 | train
364 M Trainable params
0 Non-trainable params
364 M Total params
1,457.832 Total estimated model params size (MB)
625 Modules in train mode
0 Modules in eval mode
Epoch 0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 772/772 [04:15<00:00, 3.02it/s, v_num=4][rank1]: Traceback (most recent call last):
[rank1]: File "/mnt/scratch/users/quinnnew/Prithvi-EO-2.0-main/examples/example_multitemporalcrop_class4_linux.py", line 190, in
[rank1]: trainer.fit(model, datamodule=data_module)
[rank1]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py", line 538, in fit
[rank1]: call._call_and_handle_interrupt(
[rank1]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/trainer/call.py", line 46, in _call_and_handle_interrupt
[rank1]: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 105, in launch
[rank1]: return function(*args, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py", line 574, in _fit_impl
[rank1]: self._run(model, ckpt_path=ckpt_path)
[rank1]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py", line 981, in _run
[rank1]: results = self._run_stage()
[rank1]: ^^^^^^^^^^^^^^^^^
[rank1]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py", line 1025, in _run_stage
[rank1]: self.fit_loop.run()
[rank1]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/loops/fit_loop.py", line 206, in run
[rank1]: self.on_advance_end()
[rank1]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/loops/fit_loop.py", line 378, in on_advance_end
[rank1]: call._call_callback_hooks(trainer, "on_train_epoch_end", monitoring_callbacks=True)
[rank1]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/trainer/call.py", line 218, in _call_callback_hooks
[rank1]: fn(trainer, trainer.lightning_module, *args, **kwargs)
[rank1]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/callbacks/model_checkpoint.py", line 325, in on_train_epoch_end
[rank1]: self._save_topk_checkpoint(trainer, monitor_candidates)
[rank1]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/callbacks/model_checkpoint.py", line 383, in _save_topk_checkpoint
[rank1]: raise MisconfigurationException(m)
[rank1]: lightning.fabric.utilities.exceptions.MisconfigurationException:
ModelCheckpoint(monitor='val/Multiclass_Jaccard_Index')could not find the monitored key in the returned metrics: ['train/loss', 'val/loss', 'val/Accuracy', 'val/multiclassaccuracy_0', 'val/multiclassaccuracy_1', 'val/multiclassaccuracy_2', 'val/multiclassaccuracy_3', 'val/multiclassaccuracy_4', 'val/multiclassaccuracy_5', 'val/multiclassaccuracy_6', 'val/multiclassaccuracy_7', 'val/multiclassaccuracy_8', 'val/multiclassaccuracy_9', 'val/multiclassaccuracy_10', 'val/multiclassaccuracy_11', 'val/multiclassaccuracy_12', 'val/F1_Score', 'val/multiclassjaccardindex_0', 'val/multiclassjaccardindex_1', 'val/multiclassjaccardindex_2', 'val/multiclassjaccardindex_3', 'val/multiclassjaccardindex_4', 'val/multiclassjaccardindex_5', 'val/multiclassjaccardindex_6', 'val/multiclassjaccardindex_7', 'val/multiclassjaccardindex_8', 'val/multiclassjaccardindex_9', 'val/multiclassjaccardindex_10', 'val/multiclassjaccardindex_11', 'val/multiclassjaccardindex_12', 'val/Pixel_Accuracy', 'val/mIoU', 'val/mIoU_Micro', 'train/Accuracy', 'train/multiclassaccuracy_0', 'train/multiclassaccuracy_1', 'train/multiclassaccuracy_2', 'train/multiclassaccuracy_3', 'train/multiclassaccuracy_4', 'train/multiclassaccuracy_5', 'train/multiclassaccuracy_6', 'train/multiclassaccuracy_7', 'train/multiclassaccuracy_8', 'train/multiclassaccuracy_9', 'train/multiclassaccuracy_10', 'train/multiclassaccuracy_11', 'train/multiclassaccuracy_12', 'train/F1_Score', 'train/multiclassjaccardindex_0', 'train/multiclassjaccardindex_1', 'train/multiclassjaccardindex_2', 'train/multiclassjaccardindex_3', 'train/multiclassjaccardindex_4', 'train/multiclassjaccardindex_5', 'train/multiclassjaccardindex_6', 'train/multiclassjaccardindex_7', 'train/multiclassjaccardindex_8', 'train/multiclassjaccardindex_9', 'train/multiclassjaccardindex_10', 'train/multiclassjaccardindex_11', 'train/multiclassjaccardindex_12', 'train/Pixel_Accuracy', 'train/mIoU', 'train/mIoU_Micro', 'epoch', 'step']. HINT: Did you calllog('val/Multiclass_Jaccard_Index', value)in theLightningModule?[rank0]: Traceback (most recent call last):
[rank0]: File "/mnt/scratch/users/quinnnew/Prithvi-EO-2.0-main/examples/example_multitemporalcrop_class4_linux.py", line 190, in
[rank0]: trainer.fit(model, datamodule=data_module)
[rank0]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py", line 538, in fit
[rank0]: call._call_and_handle_interrupt(
[rank0]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/trainer/call.py", line 46, in _call_and_handle_interrupt
[rank0]: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 105, in launch
[rank0]: return function(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py", line 574, in _fit_impl
[rank0]: self._run(model, ckpt_path=ckpt_path)
[rank0]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py", line 981, in _run
[rank0]: results = self._run_stage()
[rank0]: ^^^^^^^^^^^^^^^^^
[rank0]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py", line 1025, in _run_stage
[rank0]: self.fit_loop.run()
[rank0]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/loops/fit_loop.py", line 206, in run
[rank0]: self.on_advance_end()
[rank0]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/loops/fit_loop.py", line 378, in on_advance_end
[rank0]: call._call_callback_hooks(trainer, "on_train_epoch_end", monitoring_callbacks=True)
[rank0]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/trainer/call.py", line 218, in _call_callback_hooks
[rank0]: fn(trainer, trainer.lightning_module, *args, **kwargs)
[rank0]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/callbacks/model_checkpoint.py", line 325, in on_train_epoch_end
[rank0]: self._save_topk_checkpoint(trainer, monitor_candidates)
[rank0]: File "/mnt/scratch/users/quinnnew/anaconda3/lib/python3.12/site-packages/lightning/pytorch/callbacks/model_checkpoint.py", line 383, in _save_topk_checkpoint
[rank0]: raise MisconfigurationException(m)
[rank0]: lightning.fabric.utilities.exceptions.MisconfigurationException:
ModelCheckpoint(monitor='val/Multiclass_Jaccard_Index')could not find the monitored key in the returned metrics: ['train/loss', 'val/loss', 'val/Accuracy', 'val/multiclassaccuracy_0', 'val/multiclassaccuracy_1', 'val/multiclassaccuracy_2', 'val/multiclassaccuracy_3', 'val/multiclassaccuracy_4', 'val/multiclassaccuracy_5', 'val/multiclassaccuracy_6', 'val/multiclassaccuracy_7', 'val/multiclassaccuracy_8', 'val/multiclassaccuracy_9', 'val/multiclassaccuracy_10', 'val/multiclassaccuracy_11', 'val/multiclassaccuracy_12', 'val/F1_Score', 'val/multiclassjaccardindex_0', 'val/multiclassjaccardindex_1', 'val/multiclassjaccardindex_2', 'val/multiclassjaccardindex_3', 'val/multiclassjaccardindex_4', 'val/multiclassjaccardindex_5', 'val/multiclassjaccardindex_6', 'val/multiclassjaccardindex_7', 'val/multiclassjaccardindex_8', 'val/multiclassjaccardindex_9', 'val/multiclassjaccardindex_10', 'val/multiclassjaccardindex_11', 'val/multiclassjaccardindex_12', 'val/Pixel_Accuracy', 'val/mIoU', 'val/mIoU_Micro', 'train/Accuracy', 'train/multiclassaccuracy_0', 'train/multiclassaccuracy_1', 'train/multiclassaccuracy_2', 'train/multiclassaccuracy_3', 'train/multiclassaccuracy_4', 'train/multiclassaccuracy_5', 'train/multiclassaccuracy_6', 'train/multiclassaccuracy_7', 'train/multiclassaccuracy_8', 'train/multiclassaccuracy_9', 'train/multiclassaccuracy_10', 'train/multiclassaccuracy_11', 'train/multiclassaccuracy_12', 'train/F1_Score', 'train/multiclassjaccardindex_0', 'train/multiclassjaccardindex_1', 'train/multiclassjaccardindex_2', 'train/multiclassjaccardindex_3', 'train/multiclassjaccardindex_4', 'train/multiclassjaccardindex_5', 'train/multiclassjaccardindex_6', 'train/multiclassjaccardindex_7', 'train/multiclassjaccardindex_8', 'train/multiclassjaccardindex_9', 'train/multiclassjaccardindex_10', 'train/multiclassjaccardindex_11', 'train/multiclassjaccardindex_12', 'train/Pixel_Accuracy', 'train/mIoU', 'train/mIoU_Micro', 'epoch', 'step']. HINT: Did you calllog('val/Multiclass_Jaccard_Index', value)in theLightningModule?