-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pages in Image-list.txt are not guaranteed to be in order. #76
Comments
As long as the new png files are created in the right order, reading the directory will typically return the same order. But you are right, there is no gurantee. That only might explain why it does not occur often and why nobody reported it up to now. A PR is very welcome. |
It looks like src/chrome/content/zoteroocr.js for Zotero 6 uses a different implementation which should work fine, so only src/zotero-ocr.js for Zotero 7 needs a patch (either sort the directory entries or use the same code as for Zotero 6). |
Thanks for the PR! |
You're right, at any rate, the PNGs should be created one-by-one. Seems like IOUtils.getChildren isn't concerned with any order. Might also differ based on the platform-implementation. |
Hiya, |
There is do direct fix for this behaviour, you'd have to apply the code changes I submitted in #77 and rebuild the plugin yourself. The easiest way -- if you are not tech-savy -- would be to fork the repo on my profile and just follow the instructions in the README on how to build and install the plugin. |
I have pushed a simpler fix (I hope) for this problem. @callisto-jovy could you test 28474b5 and tell us if it works on your system? |
@aaajjjsss, maybe you also want to test whether aborel's latest fix works for you and report your results here before I create a bugfix release. |
It works beautifully! Thank you all so much. :) |
The issue should be fixed with our new release 0.7.3. |
After finding that a lot of my OCR'd PDFs had pages that were out of oder, I checked the output generated by the plugin and found this:
Example of the output generated:
As you can clearly see, the pages are completely out of order.
The issue is caused by iterating over the directorie's contents and simply performing a pattern-match. This, however, does not guarantee for the items to be input into the array in order. Now there are two possibilities: Sort the array after the fact or populate the array while extracting the PNGs. If you'd like me to, I will open a PR to resolve this issue.
I'm just surprised that seemingly noone has had this issue before me.
The text was updated successfully, but these errors were encountered: