Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Break up call binding into two phases #16546

Merged
merged 37 commits into from
Mar 21, 2025

Conversation

dcreager
Copy link
Member

@dcreager dcreager commented Mar 7, 2025

This breaks up call binding into two phases:

  • Matching parameters just looks at the names and kinds (positional/keyword) of each formal and actual parameters, and matches them up. Most of the current call binding errors happen during this phase.

  • Once we have matched up formal and actual parameters, we can infer types of each actual parameter, and check that each one is assignable to the corresponding formal parameter type.

As part of this, we add information to each formal parameter about whether it is a type form or not. Once PEP 747 is finalized, we can hook that up to this internal type form representation. This replaces the ParameterExpectations type, which did the same thing in a more ad hoc way.

While we're here, we add a new fluent API for building Parameters, which makes our signature constructors a bit nicer to read. We also eliminate a TODO where we were consuming types from the argument list instead of the bound parameter list when evaluating our special-case known functions.

Closes #15460

@AlexWaygood AlexWaygood added the red-knot Multi-file analysis & type inference label Mar 7, 2025
@dcreager dcreager force-pushed the dcreager/two-phase-binding branch from 02c3920 to 6a3bc70 Compare March 18, 2025 18:10
* main:
  [playground] Avoid concurrent deployments (#16834)
  [red-knot] Infer `lambda` return type as `Unknown` (#16695)
  [red-knot] Move `name` field on parameter kind (#16830)
  [red-knot] Emit errors for more AST nodes that are invalid (or only valid in specific contexts) in type expressions (#16822)
  [playground] Use cursor for clickable elements (#16833)
  [red-knot] Deploy playground on main (#16832)
  Red Knot Playground (#12681)
  [syntax-errors] PEP 701 f-strings before Python 3.12 (#16543)
Copy link
Contributor

github-actions bot commented Mar 18, 2025

mypy_primer results

No ecosystem changes detected ✅

Signature::new(
Parameters::new([
Parameter::positional_only(Some(Name::new_static("a")))
.type_form()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These type_form calls are the magic lines that replace ParameterExpectations

Comment on lines -2676 to -2690
// For certain known callables, we have special-case logic to determine the return type
// in a way that isn't directly expressible in the type system. Each special case
// listed here should have a corresponding clause above in `signatures`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This special-case return type logic moves over to a private method on Bindings, which we guarantee is always called after checking argument types.

pub(crate) struct CallArguments<'a, 'db>(Vec<Argument<'a, 'db>>);
pub(crate) struct CallArguments<'a, 'db> {
bound_self: Option<Argument<'a, 'db>>,
arguments: Rc<[Argument<'a, 'db>]>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This representation lets us handle different union elements/overloads that have different bound self parameters without having to make clones of all of the other arguments. That's not just an optimization, it also ensures that there's only a single slice of non-bound arguments that we perform type inference on. (That type inference now happens after we've matched up arguments and parameters — which is when we prepend bound self parameters)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth making this a comment on arguments. The reason of Rc is definetely not "appearant" (and one of our first uses in Red Knot?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I fully understand

That's not just an optimization, it also ensures that there's only a single slice of non-bound arguments that we perform type inference on.

Or, I'm not sure I fully understand its implication.

Once we have matched up formal and actual parameters, we can infer types of each actual parameter, and check that each one is assignable to the corresponding formal parameter type.

What I understand from your sentence in the PR summary is that we only perform type inference once we matched all parameters? Does this happen exactly once or is it possible that this operation is performed more than once (e.g. once for every matching signature?)

I'm asking because I'm a bit concerned about the Cell use in Argument because we then loose the static assertion that type inference can happen exactly once (you could hold on to multiple Arguments that all point to the same shared slice but then infer the type with differently matched parameters). Would it be possible to use Rc::get_mut (or try_mut) instead of using interior mutability to enforce that all other references to the Argument slice got dropped or are they still alive at that point? If they're still alive, isn't the state of their Arguments now misleading because the types are inferred for another matched binding?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I understand from your sentence in the PR summary is that we only perform type inference once we matched all parameters?

Yes that's right

Does this happen exactly once or is it possible that this operation is performed more than once (e.g. once for every matching signature?)

This is related to @carljm comment #16546 (comment)

As of right now, we peform type inference on each argument at most once. We do not infer a different type for an argument for each matching signature. As we move towards supporting generics, we will have to do more work per argument for each matching signature. But I am purposefully trying to keep this PR simpler by not bringing that into play yet.

I'm asking because I'm a bit concerned about the Cell use in Argument because we then loose the static assertion that type inference can happen exactly once (you could hold on to multiple Arguments that all point to the same shared slice but then infer the type with differently matched parameters).

Before we had that as a static guarantee because Argument had no mutuator methods — you had to provide the inferred type when you created it. If I understand your concern, it's less about using Cell in particular and more about having a mutator method at all, is that right? Would you have the same worry if it was a more normal &mut self mutator method? Because there would still be no static guarantee that you didn't call it twice.

I could pull the argument types out into a separate Rust type, which you would create after match_parameters and then pass in to check_types. But I think then we'd lose the static guarantee that the ArgumentTypes that you pass in in step two was created for the CallArguments that you pass in in step one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand your concern, it's less about using Cell in particular and more about having a mutator method at all, is that right?

My main concern is how we avoid that we don't accidentally end up mutating the same underlying Argument that is shared by using Rc<[Argument]>. The benefit I see of using Rc::get_mut and panicking if it is shared is that we would detect that mistake (at least at runtime).

I don't think that using &mut helps here because my concern is about having two CallArguments that both share the same arguments. Requiring a &mut CallArguments doesn't prevent mutating a when b points to the same arguments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can use Rc::get_mut because with how it's formulated right now, we need that sharing. Each signature might have to bind a different self/cls argument to the front of the argument list, but we also only want to perform type inference once on the (non-bound) arguments that are the same for each signature. The bindings themselves have to store the arguments so that they can be type-checked in the later call, and the outer code in infer.rs also needs to store the arguments so that it can do the actual type inference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how we avoid that we don't accidentally end up mutating the same underlying Argument that is shared by using Rc<[Argument]>

My understanding is that the whole point of this structure is that we do want to mutate the same underlying shared Argument; it gives us a place to store the inferred type for that argument, to avoid inferring it multiple times.

Comment on lines 65 to 74
form: Cell<ArgumentForm>,

/// The inferred type of this argument. Will be `Type::Unknown` if we haven't inferred a type
/// for this argument yet.
#[allow(clippy::struct_field_names)]
argument_type: Cell<Type<'db>>,

/// The inferred type of this argument when/if it is used as a `TypeForm`. Will be
/// `Type::Unknown` if we haven't inferred a type-form type for this argument yet.
type_form_type: Cell<Type<'db>>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We either need these cells, or the Rc slice up above needs to be an Rc<RefCell<[_]>>, so that the type inference code can update these fields with the inferred types after we've already matched parameters. Since these types are both Copy this seemed the cleanest.

Comment on lines 176 to 177
const VALUE = 1 << 0;
const TYPE_FORM = 1 << 1;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's really needed, but using bitflags here instead of an enum or bool means that we support an argument that might be used as a type form for one union element/overload, and a value for another.

Comment on lines 32 to 33
pub(crate) struct Bindings<'call, 'db> {
signatures: Signatures<'db>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bindings now takes ownership of both the signatures and (a clone of the) arguments of its call site, since that information needs to carry over from match_parameters to check_types.

if function.has_known_class_decorator(db, KnownClass::Classmethod)
&& function.decorators(db).len() == 1
{
match overload.parameter_types() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're using parameter_types everywhere now, which lets close off a TODO and remove CallArguments::{first,second,third}_argument

@dcreager dcreager marked this pull request as ready for review March 18, 2025 20:18
@MichaReiser
Copy link
Member

Is this PR WIP or ready for review?

@dcreager dcreager changed the title [red-knot] WIP: Break up call binding into two phases [red-knot] Break up call binding into two phases Mar 18, 2025
/// The inferred type of this argument. Will be `Type::Unknown` if we haven't inferred a type
/// for this argument yet.
#[allow(clippy::struct_field_names)]
argument_type: Cell<Type<'db>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What were your considerations in using Unknown to represent "no type inferred yet" vs using an Option<Type> here? Given that Unknown is also a valid inferred type for an argument, the latter seems semantically clearer about whether we've inferred a type for this argument yet. But maybe in practice we don't need that clarity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moot with the new represenation, where you have to provide a callback to infer each argument type when constructing a CallArgumentTypes. You now either (a) have all argument types uninferred, in which case you have a CallArguments, or (b) have inferred all argument types, in which case you have a CallArgumentTypes.

pub(crate) struct CallArguments<'a, 'db>(Vec<Argument<'a, 'db>>);
pub(crate) struct CallArguments<'a, 'db> {
bound_self: Option<Argument<'a, 'db>>,
arguments: Rc<[Argument<'a, 'db>]>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth making this a comment on arguments. The reason of Rc is definetely not "appearant" (and one of our first uses in Red Knot?)

pub(crate) struct CallArguments<'a, 'db>(Vec<Argument<'a, 'db>>);
pub(crate) struct CallArguments<'a, 'db> {
bound_self: Option<Argument<'a, 'db>>,
arguments: Rc<[Argument<'a, 'db>]>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I fully understand

That's not just an optimization, it also ensures that there's only a single slice of non-bound arguments that we perform type inference on.

Or, I'm not sure I fully understand its implication.

Once we have matched up formal and actual parameters, we can infer types of each actual parameter, and check that each one is assignable to the corresponding formal parameter type.

What I understand from your sentence in the PR summary is that we only perform type inference once we matched all parameters? Does this happen exactly once or is it possible that this operation is performed more than once (e.g. once for every matching signature?)

I'm asking because I'm a bit concerned about the Cell use in Argument because we then loose the static assertion that type inference can happen exactly once (you could hold on to multiple Arguments that all point to the same shared slice but then infer the type with differently matched parameters). Would it be possible to use Rc::get_mut (or try_mut) instead of using interior mutability to enforce that all other references to the Argument slice got dropped or are they still alive at that point? If they're still alive, isn't the state of their Arguments now misleading because the types are inferred for another matched binding?

@dcreager
Copy link
Member Author

I'm reworking this with a different way of representing arguments and types per @MichaReiser and @carljm's comments. Putting this back to draft while I do that.

@dcreager dcreager marked this pull request as draft March 19, 2025 15:02
Copy link
Contributor

github-actions bot commented Mar 19, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

Copy link
Member Author

@dcreager dcreager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've pushed up a new representation that I think is nicer, and addresses most of the concerns.

In summary:

CallArguments goes back to being a simple list of Arguments, which look exactly like they did before. This does not store argument types.

Once you are ready to perform type inference on an argument list, you transform the CallArguments into a CallArgumentTypes. You provide a callback to infer each argument type. That makes it impossible to have incomplete partially inferred state.

This also makes it clearer which information each phase requires, since match_parameters takes in a CallArguments, and check_types takes in a CallArgumentTypes.

We don't store CallArguments in any of the bindings types anymore, which means we don't need to make them cheap to clone, and we don't need any interior mutability tricks to let the type inference code update shared state.

To handle bound self/cls parameters, we now use VecDeque instead of Vec for both CallArguments and CallArgumentTypes. That lets us cheaply push the bound parameter onto the front of the argument list when needed, and pop it back off when we're done.

/// The inferred type of this argument. Will be `Type::Unknown` if we haven't inferred a type
/// for this argument yet.
#[allow(clippy::struct_field_names)]
argument_type: Cell<Type<'db>>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moot with the new represenation, where you have to provide a callback to infer each argument type when constructing a CallArgumentTypes. You now either (a) have all argument types uninferred, in which case you have a CallArguments, or (b) have inferred all argument types, in which case you have a CallArgumentTypes.

Comment on lines 67 to 74
/// The inferred type of this argument. Will be `Type::Unknown` if we haven't inferred a type
/// for this argument yet.
#[allow(clippy::struct_field_names)]
argument_type: Cell<Type<'db>>,

/// The inferred type of this argument when/if it is used as a `TypeForm`. Will be
/// `Type::Unknown` if we haven't inferred a type-form type for this argument yet.
type_form_type: Cell<Type<'db>>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My more pressing (and mundane) concern was that store_expression_type currently panics if you try to store a type for an expression more than once.

In working on a test case, I realized that this is an issue already, even without inferring argument types separately for each signature — if you were to use an argument as both a value and a type, we would infer its expression twice, and end up trying to store a type for that expression twice.

So for now I've changed this PR to say that an argument must be used either as a value or as a type form for all of the signatures in any particular call site.

.try_call(db, &CallArguments::positional([self, instance, owner]))
.try_call(db, CallArgumentTypes::positional([self, instance, owner]))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of cases like this where we make a call "internally". Here we're not really inferring the types of the arguments; they're provided directly. This is also handled nicely with the new representation: you construct CallArgumentTypes, since you have argument types ready to specify.

@dcreager dcreager marked this pull request as ready for review March 19, 2025 20:45
Comment on lines 11 to 19
/// Push an extra synthetic argument (for a `self` or `cls` parameter) to the front of this
/// argument list.
pub(crate) fn push_self(&mut self) {
self.0.push_front(Argument::Synthetic);
}

/// Create a [`CallArguments`] from an iterator over non-variadic positional argument types.
pub(crate) fn positional(positional_tys: impl IntoIterator<Item = Type<'db>>) -> Self {
positional_tys
.into_iter()
.map(Argument::Positional)
.collect()
/// Pop the extra synthetic argument from the front of this argument list.
pub(crate) fn pop_self(&mut self) {
self.0.pop_front();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need an assertion that the front attribute is Synthetic to avoid cases where someone calls pop_self without having called push_self before?

Or should we use an Option for the self argument (I don't know if that's the approach that you had initially), as it seems that we never need to return a slice (which wouldn't work with VecDeque anyway) and we only ever need to mutate the first element.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to a with_self method that does the pushing/popping in the right pattern, instead of adding assertions or more complex types to ensure the caller does the right thing.

Copy link
Member

@dhruvmanila dhruvmanila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't done a thorough review, I'm mainly looking at the PR to keep up with the changes happening on main. I think the changes looks good, will defer it to Carl/Micha for the final review.

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this. Love the Parameter fluent builder, love the new type-safe distinction between CallArguments and CallArgumentTypes. Fantastic!

Comment on lines 40 to 43
///
/// TODO: We will eventually infer completely different argument types for each signature, once
/// we are able to use the annotated parameter types as type contexts for that inference. At
/// that point, this field will move down into `CallBinding` or `Binding`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO may not be accurate, given our conversation today? Totally up to you how you want to handle this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

* main: (26 commits)
  Use the common `OperatorPrecedence` for the parser (#16747)
  [red-knot] Check subtype relation between callable types (#16804)
  [red-knot] Check whether two callable types are equivalent (#16698)
  [red-knot] Ban most `Type::Instance` types in type expressions (#16872)
  Special-case value-expression inference of special form subscriptions (#16877)
  [syntax-errors] Fix star annotation before Python 3.11 (#16878)
  Recognize `SyntaxError:` as an error code for ecosystem checks (#16879)
  [red-knot] add test cases result in false positive errors (#16856)
  Bump 0.11.1 (#16871)
  Allow discovery of venv in VIRTUAL_ENV env variable (#16853)
  Split git pathspecs in change determination onto separate lines (#16869)
  Use the correct base commit for change determination (#16857)
  Separate `BitXorOr` into `BitXor` and `BitOr` precedence (#16844)
  Server: Allow `FixAll` action in presence of version-specific syntax errors (#16848)
  [`refurb`] Fix starred expressions fix (`FURB161`) (#16550)
  [`flake8-executable`] Add pytest and uv run to help message for `shebang-missing-python` (`EXE003`) (#16855)
  Show more precise messages in invalid type expressions (#16850)
  [`flake8-executables`] Allow `uv run` in shebang line for `shebang-missing-python` (`EXE003`) (#16849)
  Add `--exit-non-zero-on-format` (#16009)
  [red-knot] Ban list literals in most contexts in type expressions (#16847)
  ...
@dcreager dcreager merged commit c03c28d into main Mar 21, 2025
23 checks passed
@dcreager dcreager deleted the dcreager/two-phase-binding branch March 21, 2025 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[red-knot] match arguments to parameters before inferring argument types
5 participants