Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IR: Mutable IR nodes and default in-place updates in Transformers #520

Open
mlange05 opened this issue Mar 31, 2025 · 1 comment
Open

IR: Mutable IR nodes and default in-place updates in Transformers #520

mlange05 opened this issue Mar 31, 2025 · 1 comment
Labels
enhancement New feature or request

Comments

@mlange05
Copy link
Collaborator

Is your feature request related to a problem? Please describe.

One of the core behavioural default of the Loki IR is that IR nodes are immutable (except for a few internal exceptions) and that Transformer classes by default replicate/clone nodes. As the use cases of Loki have expanded and toolchains have become more complex, this default is creating limitations with regards to cleaner interior node updates and extending transformer behaviour - eg. for true deep-copies or smarter updates to the associated .source property.

In particular, the internal hack to bypass pydantic's by explicitly setting "private" via the node.__dict__ is restrictive, as it means that constructor arguments like source cannot be privatised or made transient, as pydantic onlye permits "extra" constructor arguments for mutable dataclasses (https://docs.pydantic.dev/latest/concepts/models/#extra-data)

All the above creates the "uniqueness issue" first tripped in issue #123 where the inclusion of the source property (and it's non-identity with regards to line numbers) means that two otherwise identical nodes are unique when parsed by the fparser frontend from source, but not if they have been created identically via the IR constructor API. Or in other words, the following currently does not work:

node = ir.Assignment(lhs=..., rhs=...)
assert node == node.clone()  # This is fine
assert not node is node.clone()  # This is also fine, and works as expected
assert node == node.clone(source=Source(...))  # Not ok, as source is part of identity

Contrary to the suggestion in issue #123 adding a UID to the nodes might work around this issue for now, but it does not fix the issue that "source" (a constructor argument), should not part of the interior node hash and equality definition. However, this has some far-reaching consequences for the interior design...

Describe the solution you'd like

This issues track the interior re-write that would enable fixing this issue for good by changing some of the interior design assumptions:

  • Make IR nodes mutable and clonable in the obvious ways
  • Clearly define equality (==) and identity (is) of two nodes and use native pydantic semnatics to define core attributes and "private" decorative or transient properties (source, dataflow symbols, etc.) of nodes
  • If nodes are mutable, we should also shift default Transformer to "in-place" updates, with additional testing for "shallow clone" (current default) and true "deep clone" replication in transformers

The above of course constitutes quite some major surgery and must be implemented with care as there might be a few places that implicitly assume current default behaviour.

Describe alternatives you've considered

Issue #123 was created as a faster workaround, but when integrating the respective PR #153 I struggled to fix the underlying issue properly - hence this new issue. I've also considered making source a true interior property and remove it from the constructor, but this is also quite invasive, and after offline discussions, I think I'd prefer fixing this "properly" at the root cause.

Additional context

This is a long-standing issue that might need addressing in multiple PRs. This issue is intended to track those efforts. Due to the invasive nature of the proposed change we might also delay this for a little bit yet.

Organisation

ECMWF

@mlange05 mlange05 added the enhancement New feature or request label Mar 31, 2025
@mlange05
Copy link
Collaborator Author

⬆ To clarify, I use the term "private" or "transient" in the above to mean object attributes, or "fields" in pydantic speak, that are not part of the formal type-definition of the class and therefore do not count towards the implicit definition of equality (==) for pydantic dataclasses (eg. _live_symbols and the other dataflow fields).

This was referenced Mar 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant