Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug with the LSTMCell's inside the decoder part of the SAR_Resnet31 model in both backends (TF and PT) #1411

Closed
Tracked by #1074
bowentkruse opened this issue Dec 20, 2023 · 0 comments · Fixed by #1513
Assignees
Labels
critical High priority framework: pytorch Related to PyTorch backend framework: tensorflow Related to TensorFlow backend module: models Related to doctr.models topic: text recognition Related to the task of text recognition type: bug Something isn't working
Milestone

Comments

@bowentkruse
Copy link

Bug description

This bug report is in relation to QA Discussion Post 1410

A sar_resnet31 model was trained using train_pytorch.py on a custom dataset. In training validation, models reached low 90% in exact and partial match. By using the same scripts --test_only flag the models achieved 89% partial and exact match. However, when using the model for inference in multiple different ways (see code snippet section below), the model was unable to produce any complete matches.

I posted this topic in QA Discussion Post 1410, and was recommended by @felixT2K that I create a bug report. @felixT2K suspected that there was a bug with the LSTMCell's inside the decoder part of the model in both backends (TF and PT).

Code snippet to reproduce the bug

Using --test_only flag for train_pytorch.py

python train_pytorch.py sar_resnet31 --test-only --resume sar_resnet31_20231208-145408.pt --val_path=stencils-1/test --vocab english --pretrained --input_size 48 -b 64

Output Validation loss: 0.0948886 (Exact: 92.81% | Partial: 92.81%)

Using mindee doc recommended method (see original QA Post for more detail):

reco_model = sar_resnet31(pretrained=False, pretrained_backbone=False, vocab=VOCABS['ascii_letters'])

reco_params = torch.load("Weights/sar_resnet31_20231212-132905.pt", map_location="cpu")

reco_model.load_state_dict(reco_params)

reco_predictor = RecognitionPredictor(PreProcessor((48, 48 * 4), preserve_aspect_ratio=True, batch_size=16, mean=(0.694, 0.695, 0.693), std=(0.299, 0.296, 0.301)), reco_model)

Attempting to recreate the way train_pytorch.py load and uses the model (see original QA Post for more detail):

transform_pipeline = Compose([
    T.Resize((48, 48 * 4), preserve_aspect_ratio=True),
    transforms.Normalize(mean=(0.694, 0.695, 0.693), std=(0.299, 0.296, 0.301))
])

# Load model architecture and state of given checkpoint
def load_model(arch, vocab, checkpoint):
    model = recognition.__dict__[arch](pretrained=False, pretrained_backbone=False, vocab=VOCABS[vocab])
    model_checkpoint = torch.load(checkpoint, map_location='cpu')
    model.load_state_dict(model_checkpoint)
    if torch.cuda.is_available():
        # Map to single GPU
        torch.cuda.set_device(0)
        model = model.cuda()

    return model

def infer(batch, model):
    model.eval()
    batch = batch.cuda()
    with torch.no_grad():
        predictions = model(batch)
    return predictions

def preprocess_images(image_paths:List) -> torch.Tensor:
    processed_images = []

    for image_path in image_paths:
        
        img = (tensor_from_numpy(image_path, dtype=torch.float32)
               if isinstance(image_path, np.ndarray)
               else read_img_as_tensor(image_path, dtype=torch.float32)
               )
        img = transform_pipeline(img)
        processed_images.append(img)

    processed_images = torch.stack(processed_images)

    return processed_images

Error traceback

No specific error, just unexpected behavior from model implementations.

Environment

Collecting environment information...

DocTR version: v0.7.0
TensorFlow version: N/A
PyTorch version: 2.1.1+cu121 (torchvision 0.16.1+cu121)
OpenCV version: 4.8.1
OS: Ubuntu 22.04.3 LTS
Python version: 3.10.12
Is CUDA available (TensorFlow): N/A
Is CUDA available (PyTorch): Yes
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: NVIDIA RTX A4000 Laptop GPU
Nvidia driver version: 530.30.02
cuDNN version: Could not collect

Deep Learning backend

>>> from doctr.file_utils import is_tf_available, is_torch_available
>>> print(f"is_tf_available: {is_tf_available()}")
is_tf_available: False
>>> print(f"is_torch_available: {is_torch_available()}")
is_torch_available: True
@bowentkruse bowentkruse added the type: bug Something isn't working label Dec 20, 2023
@felixdittrich92 felixdittrich92 added critical High priority module: models Related to doctr.models framework: pytorch Related to PyTorch backend framework: tensorflow Related to TensorFlow backend topic: text recognition Related to the task of text recognition labels Dec 20, 2023
@felixdittrich92 felixdittrich92 added this to the 0.9.0 milestone Dec 20, 2023
@felixdittrich92 felixdittrich92 changed the title Bug with the LSTMCell's inside the decoder part of the model in both backends (TF and PT) Bug with the LSTMCell's inside the decoder part of the SAR_Resnet31 model in both backends (TF and PT) Dec 21, 2023
@felixdittrich92 felixdittrich92 self-assigned this Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
critical High priority framework: pytorch Related to PyTorch backend framework: tensorflow Related to TensorFlow backend module: models Related to doctr.models topic: text recognition Related to the task of text recognition type: bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants