[red-knot] Break up call binding into two phases #16546

dcreager · 2025-03-07T01:55:50Z

This breaks up call binding into two phases:

Matching parameters just looks at the names and kinds (positional/keyword) of each formal and actual parameters, and matches them up. Most of the current call binding errors happen during this phase.
Once we have matched up formal and actual parameters, we can infer types of each actual parameter, and check that each one is assignable to the corresponding formal parameter type.

As part of this, we add information to each formal parameter about whether it is a type form or not. Once PEP 747 is finalized, we can hook that up to this internal type form representation. This replaces the ParameterExpectations type, which did the same thing in a more ad hoc way.

While we're here, we add a new fluent API for building Parameters, which makes our signature constructors a bit nicer to read. We also eliminate a TODO where we were consuming types from the argument list instead of the bound parameter list when evaluating our special-case known functions.

Closes #15460

* main: [playground] Avoid concurrent deployments (#16834) [red-knot] Infer `lambda` return type as `Unknown` (#16695) [red-knot] Move `name` field on parameter kind (#16830) [red-knot] Emit errors for more AST nodes that are invalid (or only valid in specific contexts) in type expressions (#16822) [playground] Use cursor for clickable elements (#16833) [red-knot] Deploy playground on main (#16832) Red Knot Playground (#12681) [syntax-errors] PEP 701 f-strings before Python 3.12 (#16543)

github-actions · 2025-03-18T18:49:41Z

`mypy_primer` results

No ecosystem changes detected ✅

dcreager · 2025-03-18T18:45:53Z

crates/red_knot_python_semantic/src/types.rs

+                        Signature::new(
+                            Parameters::new([
+                                Parameter::positional_only(Some(Name::new_static("a")))
+                                    .type_form()


These type_form calls are the magic lines that replace ParameterExpectations

dcreager · 2025-03-18T18:47:08Z

crates/red_knot_python_semantic/src/types.rs

-            // For certain known callables, we have special-case logic to determine the return type
-            // in a way that isn't directly expressible in the type system. Each special case
-            // listed here should have a corresponding clause above in `signatures`.


This special-case return type logic moves over to a private method on Bindings, which we guarantee is always called after checking argument types.

dcreager · 2025-03-18T18:55:27Z

crates/red_knot_python_semantic/src/types/call/arguments.rs

-pub(crate) struct CallArguments<'a, 'db>(Vec<Argument<'a, 'db>>);
+pub(crate) struct CallArguments<'a, 'db> {
+    bound_self: Option<Argument<'a, 'db>>,
+    arguments: Rc<[Argument<'a, 'db>]>,


This representation lets us handle different union elements/overloads that have different bound self parameters without having to make clones of all of the other arguments. That's not just an optimization, it also ensures that there's only a single slice of non-bound arguments that we perform type inference on. (That type inference now happens after we've matched up arguments and parameters — which is when we prepend bound self parameters)

I think it's worth making this a comment on arguments. The reason of Rc is definetely not "appearant" (and one of our first uses in Red Knot?)

I don't think I fully understand

That's not just an optimization, it also ensures that there's only a single slice of non-bound arguments that we perform type inference on.

Or, I'm not sure I fully understand its implication.

Once we have matched up formal and actual parameters, we can infer types of each actual parameter, and check that each one is assignable to the corresponding formal parameter type.

What I understand from your sentence in the PR summary is that we only perform type inference once we matched all parameters? Does this happen exactly once or is it possible that this operation is performed more than once (e.g. once for every matching signature?)

I'm asking because I'm a bit concerned about the Cell use in Argument because we then loose the static assertion that type inference can happen exactly once (you could hold on to multiple Arguments that all point to the same shared slice but then infer the type with differently matched parameters). Would it be possible to use Rc::get_mut (or try_mut) instead of using interior mutability to enforce that all other references to the Argument slice got dropped or are they still alive at that point? If they're still alive, isn't the state of their Arguments now misleading because the types are inferred for another matched binding?

What I understand from your sentence in the PR summary is that we only perform type inference once we matched all parameters?

Yes that's right

Does this happen exactly once or is it possible that this operation is performed more than once (e.g. once for every matching signature?)

This is related to @carljm comment #16546 (comment)

As of right now, we peform type inference on each argument at most once. We do not infer a different type for an argument for each matching signature. As we move towards supporting generics, we will have to do more work per argument for each matching signature. But I am purposefully trying to keep this PR simpler by not bringing that into play yet.

I'm asking because I'm a bit concerned about the Cell use in Argument because we then loose the static assertion that type inference can happen exactly once (you could hold on to multiple Arguments that all point to the same shared slice but then infer the type with differently matched parameters).

Before we had that as a static guarantee because Argument had no mutuator methods — you had to provide the inferred type when you created it. If I understand your concern, it's less about using Cell in particular and more about having a mutator method at all, is that right? Would you have the same worry if it was a more normal &mut self mutator method? Because there would still be no static guarantee that you didn't call it twice.

I could pull the argument types out into a separate Rust type, which you would create after match_parameters and then pass in to check_types. But I think then we'd lose the static guarantee that the ArgumentTypes that you pass in in step two was created for the CallArguments that you pass in in step one.

If I understand your concern, it's less about using Cell in particular and more about having a mutator method at all, is that right?

My main concern is how we avoid that we don't accidentally end up mutating the same underlying Argument that is shared by using Rc<[Argument]>. The benefit I see of using Rc::get_mut and panicking if it is shared is that we would detect that mistake (at least at runtime).

I don't think that using &mut helps here because my concern is about having two CallArguments that both share the same arguments. Requiring a &mut CallArguments doesn't prevent mutating a when b points to the same arguments.

I don't think we can use Rc::get_mut because with how it's formulated right now, we need that sharing. Each signature might have to bind a different self/cls argument to the front of the argument list, but we also only want to perform type inference once on the (non-bound) arguments that are the same for each signature. The bindings themselves have to store the arguments so that they can be type-checked in the later call, and the outer code in infer.rs also needs to store the arguments so that it can do the actual type inference.

how we avoid that we don't accidentally end up mutating the same underlying Argument that is shared by using Rc<[Argument]>

My understanding is that the whole point of this structure is that we do want to mutate the same underlying shared Argument; it gives us a place to store the inferred type for that argument, to avoid inferring it multiple times.

dcreager · 2025-03-18T18:57:03Z

crates/red_knot_python_semantic/src/types/call/arguments.rs

+    form: Cell<ArgumentForm>,
+
+    /// The inferred type of this argument. Will be `Type::Unknown` if we haven't inferred a type
+    /// for this argument yet.
+    #[allow(clippy::struct_field_names)]
+    argument_type: Cell<Type<'db>>,
+
+    /// The inferred type of this argument when/if it is used as a `TypeForm`. Will be
+    /// `Type::Unknown` if we haven't inferred a type-form type for this argument yet.
+    type_form_type: Cell<Type<'db>>,


We either need these cells, or the Rc slice up above needs to be an Rc<RefCell<[_]>>, so that the type inference code can update these fields with the inferred types after we've already matched parameters. Since these types are both Copy this seemed the cleanest.

dcreager · 2025-03-18T18:58:14Z

crates/red_knot_python_semantic/src/types/call/arguments.rs

+        const VALUE = 1 << 0;
+        const TYPE_FORM = 1 << 1;


I'm not sure if it's really needed, but using bitflags here instead of an enum or bool means that we support an argument that might be used as a type form for one union element/overload, and a value for another.

dcreager · 2025-03-18T18:58:47Z

crates/red_knot_python_semantic/src/types/call/bind.rs

+pub(crate) struct Bindings<'call, 'db> {
+    signatures: Signatures<'db>,


Bindings now takes ownership of both the signatures and (a clone of the) arguments of its call site, since that information needs to carry over from match_parameters to check_types.

dcreager · 2025-03-18T19:00:55Z

crates/red_knot_python_semantic/src/types/call/bind.rs

+                    if function.has_known_class_decorator(db, KnownClass::Classmethod)
+                        && function.decorators(db).len() == 1
+                    {
+                        match overload.parameter_types() {


We're using parameter_types everywhere now, which lets close off a TODO and remove CallArguments::{first,second,third}_argument

MichaReiser · 2025-03-18T21:26:26Z

Is this PR WIP or ready for review?

carljm · 2025-03-19T00:40:18Z

crates/red_knot_python_semantic/src/types/call/arguments.rs

+    /// The inferred type of this argument. Will be `Type::Unknown` if we haven't inferred a type
+    /// for this argument yet.
+    #[allow(clippy::struct_field_names)]
+    argument_type: Cell<Type<'db>>,


What were your considerations in using Unknown to represent "no type inferred yet" vs using an Option<Type> here? Given that Unknown is also a valid inferred type for an argument, the latter seems semantically clearer about whether we've inferred a type for this argument yet. But maybe in practice we don't need that clarity?

This is moot with the new represenation, where you have to provide a callback to infer each argument type when constructing a CallArgumentTypes. You now either (a) have all argument types uninferred, in which case you have a CallArguments, or (b) have inferred all argument types, in which case you have a CallArgumentTypes.

MichaReiser · 2025-03-19T07:12:34Z

crates/red_knot_python_semantic/src/types/call/arguments.rs

-pub(crate) struct CallArguments<'a, 'db>(Vec<Argument<'a, 'db>>);
+pub(crate) struct CallArguments<'a, 'db> {
+    bound_self: Option<Argument<'a, 'db>>,
+    arguments: Rc<[Argument<'a, 'db>]>,


I think it's worth making this a comment on arguments. The reason of Rc is definetely not "appearant" (and one of our first uses in Red Knot?)

crates/red_knot_python_semantic/src/types/call/arguments.rs

MichaReiser · 2025-03-19T07:25:21Z

crates/red_knot_python_semantic/src/types/call/arguments.rs

-pub(crate) struct CallArguments<'a, 'db>(Vec<Argument<'a, 'db>>);
+pub(crate) struct CallArguments<'a, 'db> {
+    bound_self: Option<Argument<'a, 'db>>,
+    arguments: Rc<[Argument<'a, 'db>]>,


I don't think I fully understand

That's not just an optimization, it also ensures that there's only a single slice of non-bound arguments that we perform type inference on.

Or, I'm not sure I fully understand its implication.

Once we have matched up formal and actual parameters, we can infer types of each actual parameter, and check that each one is assignable to the corresponding formal parameter type.

What I understand from your sentence in the PR summary is that we only perform type inference once we matched all parameters? Does this happen exactly once or is it possible that this operation is performed more than once (e.g. once for every matching signature?)

I'm asking because I'm a bit concerned about the Cell use in Argument because we then loose the static assertion that type inference can happen exactly once (you could hold on to multiple Arguments that all point to the same shared slice but then infer the type with differently matched parameters). Would it be possible to use Rc::get_mut (or try_mut) instead of using interior mutability to enforce that all other references to the Argument slice got dropped or are they still alive at that point? If they're still alive, isn't the state of their Arguments now misleading because the types are inferred for another matched binding?

crates/red_knot_python_semantic/src/types/call/arguments.rs

crates/red_knot_python_semantic/src/types/call/bind.rs

crates/red_knot_python_semantic/src/types/call.rs

dcreager · 2025-03-19T15:01:58Z

I'm reworking this with a different way of representing arguments and types per @MichaReiser and @carljm's comments. Putting this back to draft while I do that.

github-actions · 2025-03-19T20:12:29Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

dcreager

Okay, I've pushed up a new representation that I think is nicer, and addresses most of the concerns.

In summary:

CallArguments goes back to being a simple list of Arguments, which look exactly like they did before. This does not store argument types.

Once you are ready to perform type inference on an argument list, you transform the CallArguments into a CallArgumentTypes. You provide a callback to infer each argument type. That makes it impossible to have incomplete partially inferred state.

This also makes it clearer which information each phase requires, since match_parameters takes in a CallArguments, and check_types takes in a CallArgumentTypes.

We don't store CallArguments in any of the bindings types anymore, which means we don't need to make them cheap to clone, and we don't need any interior mutability tricks to let the type inference code update shared state.

To handle bound self/cls parameters, we now use VecDeque instead of Vec for both CallArguments and CallArgumentTypes. That lets us cheaply push the bound parameter onto the front of the argument list when needed, and pop it back off when we're done.

crates/red_knot_python_semantic/src/types/call/arguments.rs

dcreager · 2025-03-19T20:26:09Z

crates/red_knot_python_semantic/src/types/call/arguments.rs

+    /// The inferred type of this argument. Will be `Type::Unknown` if we haven't inferred a type
+    /// for this argument yet.
+    #[allow(clippy::struct_field_names)]
+    argument_type: Cell<Type<'db>>,


This is moot with the new represenation, where you have to provide a callback to infer each argument type when constructing a CallArgumentTypes. You now either (a) have all argument types uninferred, in which case you have a CallArguments, or (b) have inferred all argument types, in which case you have a CallArgumentTypes.

dcreager · 2025-03-19T20:29:23Z

crates/red_knot_python_semantic/src/types/call/arguments.rs

+    /// The inferred type of this argument. Will be `Type::Unknown` if we haven't inferred a type
+    /// for this argument yet.
+    #[allow(clippy::struct_field_names)]
+    argument_type: Cell<Type<'db>>,
+
+    /// The inferred type of this argument when/if it is used as a `TypeForm`. Will be
+    /// `Type::Unknown` if we haven't inferred a type-form type for this argument yet.
+    type_form_type: Cell<Type<'db>>,


My more pressing (and mundane) concern was that store_expression_type currently panics if you try to store a type for an expression more than once.

In working on a test case, I realized that this is an issue already, even without inferring argument types separately for each signature — if you were to use an argument as both a value and a type, we would infer its expression twice, and end up trying to store a type for that expression twice.

So for now I've changed this PR to say that an argument must be used either as a value or as a type form for all of the signatures in any particular call site.

crates/red_knot_python_semantic/src/types/call/arguments.rs

crates/red_knot_python_semantic/src/types/call/bind.rs

crates/red_knot_python_semantic/src/types/call.rs

dcreager · 2025-03-19T20:39:38Z

crates/red_knot_python_semantic/src/types.rs

-                .try_call(db, &CallArguments::positional([self, instance, owner]))
+                .try_call(db, CallArgumentTypes::positional([self, instance, owner]))


There are a lot of cases like this where we make a call "internally". Here we're not really inferring the types of the arguments; they're provided directly. This is also handled nicely with the new representation: you construct CallArgumentTypes, since you have argument types ready to specify.

MichaReiser · 2025-03-20T07:27:21Z

crates/red_knot_python_semantic/src/types/call/arguments.rs

+    /// Push an extra synthetic argument (for a `self` or `cls` parameter) to the front of this
+    /// argument list.
+    pub(crate) fn push_self(&mut self) {
+        self.0.push_front(Argument::Synthetic);
    }

-    /// Create a [`CallArguments`] from an iterator over non-variadic positional argument types.
-    pub(crate) fn positional(positional_tys: impl IntoIterator<Item = Type<'db>>) -> Self {
-        positional_tys
-            .into_iter()
-            .map(Argument::Positional)
-            .collect()
+    /// Pop the extra synthetic argument from the front of this argument list.
+    pub(crate) fn pop_self(&mut self) {
+        self.0.pop_front();


Do we need an assertion that the front attribute is Synthetic to avoid cases where someone calls pop_self without having called push_self before?

Or should we use an Option for the self argument (I don't know if that's the approach that you had initially), as it seems that we never need to return a slice (which wouldn't work with VecDeque anyway) and we only ever need to mutate the first element.

I changed this to a with_self method that does the pushing/popping in the right pattern, instead of adding assertions or more complex types to ensure the caller does the right thing.

crates/red_knot_python_semantic/src/types/call/arguments.rs

dhruvmanila

I haven't done a thorough review, I'm mainly looking at the PR to keep up with the changes happening on main. I think the changes looks good, will defer it to Carl/Micha for the final review.

crates/red_knot_python_semantic/src/types/call/bind.rs

Co-authored-by: Micha Reiser <[email protected]>

carljm

Love this. Love the Parameter fluent builder, love the new type-safe distinction between CallArguments and CallArgumentTypes. Fantastic!

carljm · 2025-03-21T00:22:33Z

crates/red_knot_python_semantic/src/types/call/bind.rs

+    ///
+    /// TODO: We will eventually infer completely different argument types for each signature, once
+    /// we are able to use the annotated parameter types as type contexts for that inference. At
+    /// that point, this field will move down into `CallBinding` or `Binding`.


This TODO may not be accurate, given our conversation today? Totally up to you how you want to handle this.

* main: (26 commits) Use the common `OperatorPrecedence` for the parser (#16747) [red-knot] Check subtype relation between callable types (#16804) [red-knot] Check whether two callable types are equivalent (#16698) [red-knot] Ban most `Type::Instance` types in type expressions (#16872) Special-case value-expression inference of special form subscriptions (#16877) [syntax-errors] Fix star annotation before Python 3.11 (#16878) Recognize `SyntaxError:` as an error code for ecosystem checks (#16879) [red-knot] add test cases result in false positive errors (#16856) Bump 0.11.1 (#16871) Allow discovery of venv in VIRTUAL_ENV env variable (#16853) Split git pathspecs in change determination onto separate lines (#16869) Use the correct base commit for change determination (#16857) Separate `BitXorOr` into `BitXor` and `BitOr` precedence (#16844) Server: Allow `FixAll` action in presence of version-specific syntax errors (#16848) [`refurb`] Fix starred expressions fix (`FURB161`) (#16550) [`flake8-executable`] Add pytest and uv run to help message for `shebang-missing-python` (`EXE003`) (#16855) Show more precise messages in invalid type expressions (#16850) [`flake8-executables`] Allow `uv run` in shebang line for `shebang-missing-python` (`EXE003`) (#16849) Add `--exit-non-zero-on-format` (#16009) [red-knot] Ban list literals in most contexts in type expressions (#16847) ...

AlexWaygood added the red-knot Multi-file analysis & type inference label Mar 7, 2025

dcreager added 16 commits March 18, 2025 09:57

Separate Argument and ArgumentKind

e42260c

ty → argument_type

7d360cf

annotated_ty → annotated_type

4dad1e4

Nicer Parameter constructors

c1cc486

default_ty → default_type

09e02c1

Separate mutator for argument type

3c4d4d1

Bindings takes ownership of signatures

4afde56

Bindings takes ownership of arguments

6cf7aa1

Separate bind into two methods

301df2f

Move special case return types into Bindings

1b6da5c

Better representation for bound self arguments

f9319fe

Infer argument types in between two binding steps

30a6879

Argument type form type

93bcc5a

Add type_form to parameters

e6e5a10

Use type_form params/args instead of ParameterExpectations

b390eb6

Use parameter_types for all special cases

6a3bc70

dcreager force-pushed the dcreager/two-phase-binding branch from 02c3920 to 6a3bc70 Compare March 18, 2025 18:10

dcreager added 2 commits March 18, 2025 14:51

Add some comments

7288b9c

Add comment

fef96e9

dcreager commented Mar 18, 2025

View reviewed changes

Fix tests

202ad82

dcreager marked this pull request as ready for review March 18, 2025 20:18

dcreager requested review from carljm, AlexWaygood and sharkdp as code owners March 18, 2025 20:18

dcreager changed the title ~~[red-knot] WIP: Break up call binding into two phases~~ [red-knot] Break up call binding into two phases Mar 18, 2025

carljm reviewed Mar 19, 2025

View reviewed changes

MichaReiser reviewed Mar 19, 2025

View reviewed changes

dcreager marked this pull request as draft March 19, 2025 15:02

Use VecDeques for arguments and types

d3850ee

dcreager added 5 commits March 19, 2025 16:17

Remove old comment

77190a0

Infer types when constructing CallArgumentTypes

3d88d74

evaluate_known_cases

01be7d5

Remove some unneeded clones

51fcef9

Use option<usize>

14f6b01

dcreager commented Mar 19, 2025

View reviewed changes

dcreager marked this pull request as ready for review March 19, 2025 20:45

dcreager added 3 commits March 19, 2025 16:47

Add conflicting form test case

338f6b9

Report conflicting forms for call as a whole, not for any signature

cc1ff0f

lint

56de906

MichaReiser approved these changes Mar 20, 2025

View reviewed changes

dhruvmanila reviewed Mar 20, 2025

View reviewed changes

crates/red_knot_python_semantic/src/types/call/bind.rs Outdated Show resolved Hide resolved

dcreager and others added 5 commits March 20, 2025 09:13

Update crates/red_knot_python_semantic/src/types/call/arguments.rs

82da3d3

Co-authored-by: Micha Reiser <[email protected]>

with_self instead of push/pop_self

c1ab7c9

Fix docs

cdc6888

clippy

bae5aa5

Update docs

2794b03

dcreager mentioned this pull request Mar 20, 2025

[red-knot] type context (bidirectional checking) #16838

Open

carljm approved these changes Mar 21, 2025

View reviewed changes

dcreager added 3 commits March 21, 2025 09:14

Fix merge conflicts

55f6ecd

Remove moot todo

b19628b

dcreager merged commit c03c28d into main Mar 21, 2025
23 checks passed

dcreager deleted the dcreager/two-phase-binding branch March 21, 2025 13:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[red-knot] Break up call binding into two phases #16546

[red-knot] Break up call binding into two phases #16546

dcreager commented Mar 7, 2025 •

edited

Loading

github-actions bot commented Mar 18, 2025 •

edited

Loading

dcreager Mar 18, 2025

dcreager Mar 18, 2025

dcreager Mar 18, 2025

MichaReiser Mar 19, 2025

MichaReiser Mar 19, 2025

dcreager Mar 19, 2025

MichaReiser Mar 19, 2025

dcreager Mar 19, 2025

carljm Mar 19, 2025

dcreager Mar 18, 2025

dcreager Mar 18, 2025

dcreager Mar 18, 2025

dcreager Mar 18, 2025

MichaReiser commented Mar 18, 2025

carljm Mar 19, 2025

dcreager Mar 19, 2025

MichaReiser Mar 19, 2025

MichaReiser Mar 19, 2025

dcreager commented Mar 19, 2025

github-actions bot commented Mar 19, 2025 •

edited

Loading

dcreager left a comment

dcreager Mar 19, 2025

dcreager Mar 19, 2025

dcreager Mar 19, 2025

MichaReiser Mar 20, 2025

dcreager Mar 20, 2025

dhruvmanila left a comment

carljm left a comment

carljm Mar 21, 2025

dcreager Mar 21, 2025

		pub(crate) struct Bindings<'call, 'db> {
		signatures: Signatures<'db>,

		.try_call(db, &CallArguments::positional([self, instance, owner]))
		.try_call(db, CallArgumentTypes::positional([self, instance, owner]))

[red-knot] Break up call binding into two phases #16546

[red-knot] Break up call binding into two phases #16546

Conversation

dcreager commented Mar 7, 2025 • edited Loading

github-actions bot commented Mar 18, 2025 • edited Loading

mypy_primer results

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichaReiser commented Mar 18, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dcreager commented Mar 19, 2025

github-actions bot commented Mar 19, 2025 • edited Loading

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

dcreager left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhruvmanila left a comment

Choose a reason for hiding this comment

carljm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dcreager commented Mar 7, 2025 •

edited

Loading

github-actions bot commented Mar 18, 2025 •

edited

Loading

`mypy_primer` results

github-actions bot commented Mar 19, 2025 •

edited

Loading

`ruff-ecosystem` results