Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsatisfactory training results on SwinIR #166

Open
DX3906ghh opened this issue Nov 13, 2024 · 4 comments
Open

Unsatisfactory training results on SwinIR #166

DX3906ghh opened this issue Nov 13, 2024 · 4 comments

Comments

@DX3906ghh
Copy link

After I prepared the dataset and environment, I followed the steps of BasicSR to train SwinIR, but I didn’t get good results. On x4, the psnr of set5 only has 30.6348.but the result of paper is 32.72

@clodiamery
Copy link

@DX3906ghh , Can I get on some help from you how can train the code on my data, please!

@DX3906ghh
Copy link
Author

@DX3906ghh , Can I get on some help from you how can train the code on my data, please!,请问您可以帮助我如何在我的数据上训练代码吗?

of course OK

@clodiamery
Copy link

clodiamery commented Jan 14, 2025

Thank you so much for your reply.

1- Please, what is the computer resources that I need to train it on my own dataset if I have 800 images 250*250 gray?

2- I upload the code on my Drive to run it by using google colab.
I updated the json file to be like this:

"datasets": {
    "train": {
      "name": "train_dataset"           // just name
      , "dataset_type": "jpeg"         // "dncnn" | "dnpatch" | "fdncnn" | "ffdnet" | "sr" | "srmd" | "dpsr" | "plain" | "plainpatch" | "jpeg"
      , "dataroot_H": "train/hr"// path of H training dataset. DIV2K (800 training images) + Flickr2K (2650 images) + BSD500 (400 training&testing images) + WED(4744 images) in SwinIR
      , "dataroot_L": "train/lr"   //null      // path of L training dataset

And I used this command to run it :
!python -m torch.distributed.launch --nproc_per_node=1 --master_port=1234 /content/drive/MyDrive/colab_notebooks/SwinIR-main/main_train_psnr.py --opt /content/drive/MyDrive/colab_notebooks/SwinIR-main/options/swinir/train_swinir_car_jpeg.json --dist True

But I have this error:

/usr/local/lib/python3.10/dist-packages/torch/distributed/launch.py:208: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use-env is set by default in torchrun.
If your script expects `--local-rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See 
https://pytorch.org/docs/stable/distributed.html#launch-utility for 
further instructions

  main()
----project SwinIR------
usage: main_train_psnr.py [-h] [--opt OPT] [--launcher LAUNCHER] [--local_rank LOCAL_RANK]
                          [--dist DIST] [--task TASK] [--scale SCALE] [--noise NOISE]
                          [--jpeg JPEG] [--folder_lq FOLDER_LQ] [--folder_gt FOLDER_GT]
                          [--model_save_dir MODEL_SAVE_DIR] [--chart_save_dir CHART_SAVE_DIR]
main_train_psnr.py: error: unrecognized arguments: --local-rank=0
E0114 10:51:36.380000 3268 torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 2) local_rank: 0 (pid: 3280) of binary: /usr/bin/python3
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launch.py", line 208, in <module>
    main()
  File "/usr/local/lib/python3.10/dist-packages/typing_extensions.py", line 2853, in wrapper
    return arg(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launch.py", line 204, in main
    launch(args)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launch.py", line 189, in launch
    run(args)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 910, in run
    elastic_launch(
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 138, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
/content/drive/MyDrive/colab_notebooks/SwinIR-main/main_train_psnr.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2025-01-14_10:51:36
  host      : 443d37f8eccf
  rank      : 0 (local_rank: 0)
  exitcode  : 2 (pid: 3280)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

@mearcla
Copy link

mearcla commented Feb 22, 2025

@DX3906ghh, Please I have question, I trained train_swinir_car_jpeg.json on my dataset, the model that I got are:

  • m1_E.pth
  • m2_G.pth
  • m3_optimizerG.pth
    What is the difference between them? and which one I should use to enhance my images?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants