How to fine tune the model
Hi, I need adapt the model to detect objects related to food. I want to know if It is possible train the model over the pretrained model and how to do it. Also would be desiderable to know how my dataset have to be labeled in order to feed the model. Thanks.
Just use a MPL Classifier I use YOLO v.whatever
Hi there, here are some useful resources on how to fine-tune DETR:
https://huggingface.co/docs/transformers/main/en/model_doc/detr#resources
I want to know if It is possible train the model over the pretrained model and how to do it.
To fine tune this model, I personally used the Jupyter Notebook at How to Train DETR with π€ Transformers on a Custom Dataset as a guide.
Also would be desiderable to know how my dataset have to be labeled in order to feed the model.
There are many ways to label your dataset. The approach I've taken is to use Label Studio, an open-source solution for labeling data collaboratively. You can export the labels in whatever format suits you. COCO works best for the Notebook I've shared.
I want to know if It is possible train the model over the pretrained model and how to do it.
To fine tune this model, I personally used the Jupyter Notebook at How to Train DETR with π€ Transformers on a Custom Dataset as a guide.
Also would be desiderable to know how my dataset have to be labeled in order to feed the model.
There are many ways to label your dataset. The approach I've taken is to use Label Studio, an open-source solution for labeling data collaboratively. You can export the labels in whatever format suits you. COCO works best for the Notebook I've shared.
Thanks for providing the Jupyter Notebook link. It is nice to run code on cloud. But if the dataset is too large, Google doesn't allow long-time training.
Is it possible to just copy-past all the codes down to local PC and train the model according to personal needs?
Is it possible to just copy-past all the codes down to local PC and train the model according to personal needs?
Yup! That's what I did.
I'm trying to fine tune with my local coco dataset
Only one class and image size is 512*512
Same above notebook is giving error
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
Hi
@martiannomad
! can you provide a full traceback and minimal example? Also try to make your image contiguous, that might help (can't say more without additional information)
https://pytorch.org/docs/stable/generated/torch.Tensor.contiguous.html#torch.Tensor.contiguous
https://numpy.org/doc/stable/reference/generated/numpy.ascontiguousarray.html
here is the complete traceback, I have only used the same notebook above. Also the image is like this 512 X 512 with only one class. Sure I'll look into these images but If you think of something while looking traceback pls share
Line of Code
from pytorch_lightning import Trainer
%cd {HOME}
# settings
MAX_EPOCHS = 1
# pytorch_lightning < 2.0.0
# trainer = Trainer(gpus=1, max_epochs=MAX_EPOCHS, gradient_clip_val=0.1, accumulate_grad_batches=8, log_every_n_steps=5)
# pytorch_lightning >= 2.0.0
#
trainer = Trainer(devices=1, accelerator="mps", max_epochs=MAX_EPOCHS, gradient_clip_val=0.1, accumulate_grad_batches=8, log_every_n_steps=5)
trainer.fit(model)
: UserWarning: This is now an optional IPython functionality, setting dhist requires you to install the `pickleshare` library.
self.shell.db['dhist'] = compress_dhist(dhist)[-100:]
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/defect_detection
| Name | Type | Params | Mode
--------------------------------------------------------
0 | model | DetrForObjectDetection | 41.5 M | eval
--------------------------------------------------------
41.3 M Trainable params
222 K Non-trainable params
41.5 M Total params
166.007 Total estimated model params size (MB)
0 Modules in train mode
399 Modules in eval mode
{
"name": "RuntimeError",
"message": "view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.",
"stack": "---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[26], line 15
8 # pytorch_lightning < 2.0.0
9 # trainer = Trainer(gpus=1, max_epochs=MAX_EPOCHS, gradient_clip_val=0.1, accumulate_grad_batches=8, log_every_n_steps=5)
10
11 # pytorch_lightning >= 2.0.0
12 #
13 trainer = Trainer(devices=1, accelerator=\"mps\", max_epochs=MAX_EPOCHS, gradient_clip_val=0.1, accumulate_grad_batches=8, log_every_n_steps=5)
---> 15 trainer.fit(model)
File /.env/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py:538, in Trainer.fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
536 self.state.status = TrainerStatus.RUNNING
537 self.training = True
--> 538 call._call_and_handle_interrupt(
539 self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
540 )
File /.env/lib/python3.11/site-packages/pytorch_lightning/trainer/call.py:47, in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
45 if trainer.strategy.launcher is not None:
46 return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
---> 47 return trainer_fn(*args, **kwargs)
49 except _TunerExitException:
50 _call_teardown_hook(trainer)
File /.env/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py:574, in Trainer._fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
567 assert self.state.fn is not None
568 ckpt_path = self._checkpoint_connector._select_ckpt_path(
569 self.state.fn,
570 ckpt_path,
571 model_provided=True,
572 model_connected=self.lightning_module is not None,
573 )
--> 574 self._run(model, ckpt_path=ckpt_path)
576 assert self.state.stopped
577 self.training = False
File /.env/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py:981, in Trainer._run(self, model, ckpt_path)
976 self._signal_connector.register_signal_handlers()
978 # ----------------------------
979 # RUN THE TRAINER
980 # ----------------------------
--> 981 results = self._run_stage()
983 # ----------------------------
984 # POST-Training CLEAN UP
985 # ----------------------------
986 log.debug(f\"{self.__class__.__name__}: trainer tearing down\")
File /.env/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py:1025, in Trainer._run_stage(self)
1023 self._run_sanity_check()
1024 with torch.autograd.set_detect_anomaly(self._detect_anomaly):
-> 1025 self.fit_loop.run()
1026 return None
1027 raise RuntimeError(f\"Unexpected state {self.state}\")
File /.env/lib/python3.11/site-packages/pytorch_lightning/loops/fit_loop.py:205, in _FitLoop.run(self)
203 try:
204 self.on_advance_start()
--> 205 self.advance()
206 self.on_advance_end()
207 self._restarting = False
File /.env/lib/python3.11/site-packages/pytorch_lightning/loops/fit_loop.py:363, in _FitLoop.advance(self)
361 with self.trainer.profiler.profile(\"run_training_epoch\"):
362 assert self._data_fetcher is not None
--> 363 self.epoch_loop.run(self._data_fetcher)
File /.env/lib/python3.11/site-packages/pytorch_lightning/loops/training_epoch_loop.py:140, in _TrainingEpochLoop.run(self, data_fetcher)
138 while not self.done:
139 try:
--> 140 self.advance(data_fetcher)
141 self.on_advance_end(data_fetcher)
142 self._restarting = False
File /.env/lib/python3.11/site-packages/pytorch_lightning/loops/training_epoch_loop.py:250, in _TrainingEpochLoop.advance(self, data_fetcher)
247 with trainer.profiler.profile(\"run_training_batch\"):
248 if trainer.lightning_module.automatic_optimization:
249 # in automatic optimization, there can only be one optimizer
--> 250 batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs)
251 else:
252 batch_output = self.manual_optimization.run(kwargs)
File /.env/lib/python3.11/site-packages/pytorch_lightning/loops/optimization/automatic.py:183, in _AutomaticOptimization.run(self, optimizer, batch_idx, kwargs)
172 if (
173 # when the strategy handles accumulation, we want to always call the optimizer step
174 not self.trainer.strategy.handles_gradient_accumulation and self.trainer.fit_loop._should_accumulate()
(...)
180 # -------------------
181 # automatic_optimization=True: perform ddp sync only when performing optimizer_step
182 with _block_parallel_sync_behavior(self.trainer.strategy, block=True):
--> 183 closure()
185 # ------------------------------
186 # BACKWARD PASS
187 # ------------------------------
188 # gradient update with accumulated gradients
189 else:
190 self._optimizer_step(batch_idx, closure)
File /.env/lib/python3.11/site-packages/pytorch_lightning/loops/optimization/automatic.py:144, in Closure.__call__(self, *args, **kwargs)
142
@override
143 def __call__(self, *args: Any, **kwargs: Any) -> Optional[Tensor]:
--> 144 self._result = self.closure(*args, **kwargs)
145 return self._result.loss
File /.env/lib/python3.11/site-packages/torch/utils/_contextlib.py:116, in context_decorator.<locals>.decorate_context(*args, **kwargs)
113 @functools.wraps(func)
114 def decorate_context(*args, **kwargs):
115 with ctx_factory():
--> 116 return func(*args, **kwargs)
File /.env/lib/python3.11/site-packages/pytorch_lightning/loops/optimization/automatic.py:138, in Closure.closure(self, *args, **kwargs)
135 self._zero_grad_fn()
137 if self._backward_fn is not None and step_output.closure_loss is not None:
--> 138 self._backward_fn(step_output.closure_loss)
140 return step_output
File /.env/lib/python3.11/site-packages/pytorch_lightning/loops/optimization/automatic.py:239, in _AutomaticOptimization._make_backward_fn.<locals>.backward_fn(loss)
238 def backward_fn(loss: Tensor) -> None:
--> 239 call._call_strategy_hook(self.trainer, \"backward\", loss, optimizer)
File /.env/lib/python3.11/site-packages/pytorch_lightning/trainer/call.py:319, in _call_strategy_hook(trainer, hook_name, *args, **kwargs)
316 return None
318 with trainer.profiler.profile(f\"[Strategy]{trainer.strategy.__class__.__name__}.{hook_name}\"):
--> 319 output = fn(*args, **kwargs)
321 # restore current_fx when nested context
322 pl_module._current_fx_name = prev_fx_name
File /.env/lib/python3.11/site-packages/pytorch_lightning/strategies/strategy.py:212, in Strategy.backward(self, closure_loss, optimizer, *args, **kwargs)
209 assert self.lightning_module is not None
210 closure_loss = self.precision_plugin.pre_backward(closure_loss, self.lightning_module)
--> 212 self.precision_plugin.backward(closure_loss, self.lightning_module, optimizer, *args, **kwargs)
214 closure_loss = self.precision_plugin.post_backward(closure_loss, self.lightning_module)
215 self.post_backward(closure_loss)
File /.env/lib/python3.11/site-packages/pytorch_lightning/plugins/precision/precision.py:72, in Precision.backward(self, tensor, model, optimizer, *args, **kwargs)
52
@override
53 def backward( # type: ignore[override]
54 self,
(...)
59 **kwargs: Any,
60 ) -> None:
61 r\"\"\"Performs the actual backpropagation.
62
63 Args:
(...)
70
71 \"\"\"
---> 72 model.backward(tensor, *args, **kwargs)
File /.env/lib/python3.11/site-packages/pytorch_lightning/core/module.py:1101, in LightningModule.backward(self, loss, *args, **kwargs)
1099 self._fabric.backward(loss, *args, **kwargs)
1100 else:
-> 1101 loss.backward(*args, **kwargs)
File /.env/lib/python3.11/site-packages/torch/_tensor.py:581, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
571 if has_torch_function_unary(self):
572 return handle_torch_function(
573 Tensor.backward,
574 (self,),
(...)
579 inputs=inputs,
580 )
--> 581 torch.autograd.backward(
582 self, gradient, retain_graph, create_graph, inputs=inputs
583 )
File /.env/lib/python3.11/site-packages/torch/autograd/__init__.py:347, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
342 retain_graph = create_graph
344 # The reason we repeat the same comment below is that
345 # some Python versions print out the first line of a multi-line function
346 # calls in the traceback and some print out the last line
--> 347 _engine_run_backward(
348 tensors,
349 grad_tensors_,
350 retain_graph,
351 create_graph,
352 inputs,
353 allow_unreachable=True,
354 accumulate_grad=True,
355 )
File /.env/lib/python3.11/site-packages/torch/autograd/graph.py:825, in _engine_run_backward(t_outputs, *args, **kwargs)
823 unregister_hooks = _register_logging_hooks_on_whole_graph(t_outputs)
824 try:
--> 825 return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
826 t_outputs, *args, **kwargs
827 ) # Calls into the C++ engine to run the backward pass
828 finally:
829 if attach_logging_hooks:
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead."
}
The traceback is not that useful, can't identify the cause with it... Let me know if you identify the reason. Did you try other models, e.g. RT-DETR? Other transformers version/ lightning version?
I tried Yolo with yolo
format datset and yoloobb
dataset and yolo is working fine
The data is labeled in label-studio with orientation and downloaded COCO format from there. Nonetheless, the annotation in the other cells working perfectly it means dataset and annotation is completely aligned. What are your thoughts.
Also, No I've not tried with RT-DETR. What models do you recommend to try other then YOLO, any material / code that does the fine tuning instead of writing will be helpful.