Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pages in Image-list.txt are not guaranteed to be in order. #76

Closed
callisto-jovy opened this issue Jun 19, 2024 · 10 comments
Closed

Pages in Image-list.txt are not guaranteed to be in order. #76

callisto-jovy opened this issue Jun 19, 2024 · 10 comments

Comments

@callisto-jovy
Copy link

After finding that a lot of my OCR'd PDFs had pages that were out of oder, I checked the output generated by the plugin and found this:

Example of the output generated:

/Users/username/Zotero/storage/2GESJB9C/page-15.png
/Users/username/Zotero/storage/2GESJB9C/page-01.png
/Users/username/Zotero/storage/2GESJB9C/page-14.png
/Users/username/Zotero/storage/2GESJB9C/page-02.png
/Users/username/Zotero/storage/2GESJB9C/page-16.png
/Users/username/Zotero/storage/2GESJB9C/page-17.png
...

As you can clearly see, the pages are completely out of order.
The issue is caused by iterating over the directorie's contents and simply performing a pattern-match. This, however, does not guarantee for the items to be input into the array in order. Now there are two possibilities: Sort the array after the fact or populate the array while extracting the PNGs. If you'd like me to, I will open a PR to resolve this issue.

I'm just surprised that seemingly noone has had this issue before me.

@stweil
Copy link
Member

stweil commented Jun 19, 2024

As long as the new png files are created in the right order, reading the directory will typically return the same order. But you are right, there is no gurantee. That only might explain why it does not occur often and why nobody reported it up to now.

A PR is very welcome.

@stweil
Copy link
Member

stweil commented Jun 19, 2024

It looks like src/chrome/content/zoteroocr.js for Zotero 6 uses a different implementation which should work fine, so only src/zotero-ocr.js for Zotero 7 needs a patch (either sort the directory entries or use the same code as for Zotero 6).

@aborel
Copy link
Collaborator

aborel commented Jun 19, 2024

Thanks for the PR!
I don't have time to review right now, and there are more changes than I would like due to different JS style conventions, but the proposed logic is probably fine. If another admin wants to accept the PR as is and deploy the bugfix ASAP, I can live with that.

@callisto-jovy
Copy link
Author

As long as the new png files are created in the right order, reading the directory will typically return the same order. But you are right, there is no gurantee. That only might explain why it does not occur often and why nobody reported it up to now.

A PR is very welcome.

You're right, at any rate, the PNGs should be created one-by-one. Seems like IOUtils.getChildren isn't concerned with any order. Might also differ based on the platform-implementation.

@aaajjjsss
Copy link

Hiya,
Wondering if anyone has any insights on how to fix this? I too am finding pdf pages out of order when using the plugin with Zotero 7. It worked so well with Zotero 6.
I am very out of my element with all of this stuff, so apologies if the fix is noted above and I just can't understand it.

@callisto-jovy
Copy link
Author

Hiya, Wondering if anyone has any insights on how to fix this? I too am finding pdf pages out of order when using the plugin with Zotero 7. It worked so well with Zotero 6. I am very out of my element with all of this stuff, so apologies if the fix is noted above and I just can't understand it.

There is do direct fix for this behaviour, you'd have to apply the code changes I submitted in #77 and rebuild the plugin yourself. The easiest way -- if you are not tech-savy -- would be to fork the repo on my profile and just follow the instructions in the README on how to build and install the plugin.

@aborel
Copy link
Collaborator

aborel commented Jul 8, 2024

I have pushed a simpler fix (I hope) for this problem. @callisto-jovy could you test 28474b5 and tell us if it works on your system?

@stweil
Copy link
Member

stweil commented Jul 8, 2024

@aaajjjsss, maybe you also want to test whether aborel's latest fix works for you and report your results here before I create a bugfix release.

@aaajjjsss
Copy link

It works beautifully! Thank you all so much. :)

@stweil
Copy link
Member

stweil commented Jul 8, 2024

The issue should be fixed with our new release 0.7.3.

@stweil stweil closed this as completed Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants