Bug with the LSTMCell's inside the decoder part of the SAR_Resnet31 model in both backends (TF and PT) #1411

bowentkruse · 2023-12-20T15:21:10Z

Bug description

This bug report is in relation to QA Discussion Post 1410

A sar_resnet31 model was trained using train_pytorch.py on a custom dataset. In training validation, models reached low 90% in exact and partial match. By using the same scripts --test_only flag the models achieved 89% partial and exact match. However, when using the model for inference in multiple different ways (see code snippet section below), the model was unable to produce any complete matches.

I posted this topic in QA Discussion Post 1410, and was recommended by @felixT2K that I create a bug report. @felixT2K suspected that there was a bug with the LSTMCell's inside the decoder part of the model in both backends (TF and PT).

Code snippet to reproduce the bug

Using --test_only flag for train_pytorch.py

python train_pytorch.py sar_resnet31 --test-only --resume sar_resnet31_20231208-145408.pt --val_path=stencils-1/test --vocab english --pretrained --input_size 48 -b 64

Output Validation loss: 0.0948886 (Exact: 92.81% | Partial: 92.81%)

Using mindee doc recommended method (see original QA Post for more detail):

reco_model = sar_resnet31(pretrained=False, pretrained_backbone=False, vocab=VOCABS['ascii_letters'])

reco_params = torch.load("Weights/sar_resnet31_20231212-132905.pt", map_location="cpu")

reco_model.load_state_dict(reco_params)

reco_predictor = RecognitionPredictor(PreProcessor((48, 48 * 4), preserve_aspect_ratio=True, batch_size=16, mean=(0.694, 0.695, 0.693), std=(0.299, 0.296, 0.301)), reco_model)

Attempting to recreate the way train_pytorch.py load and uses the model (see original QA Post for more detail):

transform_pipeline = Compose([
    T.Resize((48, 48 * 4), preserve_aspect_ratio=True),
    transforms.Normalize(mean=(0.694, 0.695, 0.693), std=(0.299, 0.296, 0.301))
])

# Load model architecture and state of given checkpoint
def load_model(arch, vocab, checkpoint):
    model = recognition.__dict__[arch](pretrained=False, pretrained_backbone=False, vocab=VOCABS[vocab])
    model_checkpoint = torch.load(checkpoint, map_location='cpu')
    model.load_state_dict(model_checkpoint)
    if torch.cuda.is_available():
        # Map to single GPU
        torch.cuda.set_device(0)
        model = model.cuda()

    return model

def infer(batch, model):
    model.eval()
    batch = batch.cuda()
    with torch.no_grad():
        predictions = model(batch)
    return predictions

def preprocess_images(image_paths:List) -> torch.Tensor:
    processed_images = []

    for image_path in image_paths:
        
        img = (tensor_from_numpy(image_path, dtype=torch.float32)
               if isinstance(image_path, np.ndarray)
               else read_img_as_tensor(image_path, dtype=torch.float32)
               )
        img = transform_pipeline(img)
        processed_images.append(img)

    processed_images = torch.stack(processed_images)

    return processed_images

Error traceback

No specific error, just unexpected behavior from model implementations.

Environment

Collecting environment information...

DocTR version: v0.7.0
TensorFlow version: N/A
PyTorch version: 2.1.1+cu121 (torchvision 0.16.1+cu121)
OpenCV version: 4.8.1
OS: Ubuntu 22.04.3 LTS
Python version: 3.10.12
Is CUDA available (TensorFlow): N/A
Is CUDA available (PyTorch): Yes
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: NVIDIA RTX A4000 Laptop GPU
Nvidia driver version: 530.30.02
cuDNN version: Could not collect

Deep Learning backend

>>> from doctr.file_utils import is_tf_available, is_torch_available
>>> print(f"is_tf_available: {is_tf_available()}")
is_tf_available: False
>>> print(f"is_torch_available: {is_torch_available()}")
is_torch_available: True

The text was updated successfully, but these errors were encountered:

bowentkruse added the type: bug Something isn't working label Dec 20, 2023

felixdittrich92 added critical High priority module: models Related to doctr.models framework: pytorch Related to PyTorch backend framework: tensorflow Related to TensorFlow backend topic: text recognition Related to the task of text recognition labels Dec 20, 2023

felixdittrich92 added this to the 0.9.0 milestone Dec 20, 2023

felixdittrich92 changed the title ~~Bug with the LSTMCell's inside the decoder part of the model in both backends (TF and PT)~~ Bug with the LSTMCell's inside the decoder part of the SAR_Resnet31 model in both backends (TF and PT) Dec 21, 2023

felixdittrich92 self-assigned this Feb 9, 2024

This was referenced Feb 9, 2024

Release tracker - v0.9.0 #1074

Closed

[detection / classification] Train upcoming detection architecture & orientation models #1459

Open

felixdittrich92 mentioned this issue Mar 15, 2024

[Fix] sar_resnet31 TF + PT #1513

Merged

felixdittrich92 closed this as completed in #1513 Mar 15, 2024

felixdittrich92 mentioned this issue Jun 6, 2024

Release tracker - v0.10.0 #1634

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug with the LSTMCell's inside the decoder part of the SAR_Resnet31 model in both backends (TF and PT) #1411

Bug with the LSTMCell's inside the decoder part of the SAR_Resnet31 model in both backends (TF and PT) #1411

bowentkruse commented Dec 20, 2023

Bug with the LSTMCell's inside the decoder part of the SAR_Resnet31 model in both backends (TF and PT) #1411

Bug with the LSTMCell's inside the decoder part of the SAR_Resnet31 model in both backends (TF and PT) #1411

Comments

bowentkruse commented Dec 20, 2023

Bug description

Code snippet to reproduce the bug

Using --test_only flag for train_pytorch.py

Using mindee doc recommended method (see original QA Post for more detail):

Attempting to recreate the way train_pytorch.py load and uses the model (see original QA Post for more detail):

Error traceback

Environment

Deep Learning backend