Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New and improved Python bindings #1629

Merged
merged 27 commits into from
Mar 8, 2024
Merged

Conversation

elicn
Copy link
Contributor

@elicn elicn commented Jun 8, 2022

Re-implemented Unicorn Python bindings while remaining compatible with the original bindings API.

Highlights:

  • OOP design for improved efficiency and extensibility
  • Pythonic code featuring type annotations to facilitate its usage
  • Cleaner code and documented methods
  • Lower memory footprint
  • Improved Uc resources deallocation

Changes:

  • UcError now provides additional information, when applicable, to provide the users with better error context (e.g. the target address and access size for invalid memory accesses)
  • Intel arch now supports reading and writing ZMM registers
  • ARM arch now supports reading and writing Q registers

Copy link
Member

@wtdcode wtdcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, thanks for your brilliant work!

The overall design fits my taste but there are a few problems to be fixed. See my comments below.

if dllname not in loaded_dlls:
lib_file = path / dllname

if str(path.parent) == '.' or lib_file.exists():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str(path.parent) == '.' seems not equavelent to original logic? I suppose it should be str(path).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understood it, the original logic was trying to figure out whether this is a relative library path (i.e. either a filename without a parent directory, or just not an absolute path). Testing for parent == "." is the way to determine this. Did you encounter any problem loading the library this way? (I admit I haven't tested it thoroughtly)

I am not fully comfortable with how the library lookup code ended up, but considering the previous implementation I think it looks very nice after all..


def __new__(cls, arch: int, mode: int):
# prevent direct instantiation of unicorn subclasses
assert cls is Uc, f'{cls.__name__} is not meant to be instantiated directly'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kills the possibility of inheriting Uc class, which I did in implementing multitask. A quick PoC:

from unicorn import *

class A(Uc):
    
    def __init__(self, arch: int, mode: int):
        super().__init__(arch, mode)
        

a = A(UC_ARCH_X86, UC_MODE_32)

I understand you are trying to return a subclass to handle arch-specific things, however I doubt why not choose pimpl design? e.g. Uc holds a reference to an arch-specific object instead of being a subclass. IMHO, hacking with __new__ is a bit dangerous, especially we care about the compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not think of any reasonable usecase for extending the Uc class, but if you think it is necessary I can patch it to allow user-defined subclasses. Regarding the multitask implementation, I don't know how it looks like - but perhaps we could integrate this into this model, still "hiding" the subclasses from the user..?

About using __new__ to create the required subclass: that is not a hack - that is a common way to use this special method. The idea here was to organize everything in OOP (with all the benefits coming from it) and at the same time don't let the user be bothered about which class to instantiate - they just init Uc, which picks the right subclass for them. The user should not be aware (or care) of the specific subclass.

The incapsulation in the PIMPL design makes things more complicated and packs the code with trivial wrapper calls (i.e. forwarding the the exact same parameters to a method with exact same name). This design pattern is more reasonable for inflexible languages like C++, rather than Python. However, I don't see how it helps with the multithreading implementation because we'll still have a predefined list of subclasses to incapsulate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This multitask code is here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played around with this code quite a bit in order to find a way to keep the clean OOP approach and still allow users to extend the Uc class. As far as I can tell we have two options for that:

  1. Require external Uc subclasses to be implemented as "pimpl", assuming this is probably a rare use case anyway. Looking at the multitask code, it seems that such a change would not affect it too much.
  2. Require external Uc subclasses to be decorated with something like @ucsubclass. That way is a bit hacky and involves some meta-programming black magic, but it looks clean from user perspective. Taking your example above:
@ucsubclass
class A(Uc):
    def __init__(self, arch: int, mode: int):
        super().__init__(arch, mode)

I lean towards the 2nd option; what do you think?

)


__all__ = ['Uc', 'UcContext']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a bit aggressive as stated above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Importing * from a library is considered a bad practice and should be avoided.
As far as I can tell the Unicorn bindings library is organized in way that __init__.py already imports all the needed modules and "flatten" the import path (i.e. you can import everything directly from unicorn; if you can't import - it means you shouldn't).

Keep in mind that the previous implementation kept everything in one big file, so there was no consideration of a clean namespace. Having said that, I think that moving to a new major version (uc2) is a good oportunity to make such changes that could break some scripts - but at the same time will be trivial to fix by the users by adding a few more imports.

return cls(*param)


class UcAArch32(Uc):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As stated further in the review below, my thought is that, such the arch-specific class might be used as a member of the Uc object. This in fact is pimpl design instead of your factory design if I understand it correctly.

@wtdcode
Copy link
Member

wtdcode commented Jun 26, 2022

See my PoC in #1633

@github-actions github-actions bot added the stale label Aug 28, 2022
@github-actions github-actions bot closed this Sep 12, 2022
@wtdcode wtdcode reopened this Sep 12, 2022
@wtdcode wtdcode removed the stale label Sep 12, 2022
@elicn
Copy link
Contributor Author

elicn commented Oct 9, 2022

@wtdcode, any thoughts or comments, considering this comment and this commit?

@aquynh
Copy link
Member

aquynh commented Oct 17, 2022

we have been thinking about merging this after 2.0.1 release.

can you add some sample code under bindings/python/ to demonstrate the new API?

@elicn
Copy link
Contributor Author

elicn commented Oct 19, 2022

can you add some sample code under bindings/python/ to demonstrate the new API?

The external facing API was not modified, it is only the internal implementation that was changed. Except minor additions (e.g. some archs now support more regs) nothing really changed in terms of functionality. UcError now provides more info, so that could be one possible feature to show. Do you have anything specific on your mind..?

@aquynh
Copy link
Member

aquynh commented Oct 19, 2022 via email

@wtdcode wtdcode added enhancement and removed stale labels Mar 13, 2023
@wtdcode wtdcode added this to the Unicorn 2.1.0 milestone Aug 6, 2023
@wtdcode wtdcode mentioned this pull request Feb 11, 2024
3 tasks
@wtdcode wtdcode merged commit 399f204 into unicorn-engine:dev Mar 8, 2024
@wtdcode
Copy link
Member

wtdcode commented Mar 8, 2024

merged and thanks for your patient.

I'm doing a careful comparison to ensure nothing breaks.

@elicn
Copy link
Contributor Author

elicn commented Mar 8, 2024

It passes all local tests, if that counts.

@wtdcode
Copy link
Member

wtdcode commented Mar 8, 2024

Our tests for Python bindings are not complete and thus I'm looking into the history to manually port missing features.

@elicn
Copy link
Contributor Author

elicn commented Mar 8, 2024

Well, every time there was an update to the Python binding I changed my code to accommodate. Anyway, if something is missing or incompatible, let me know and I'll fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants