-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[red-knot] Break up call binding into two phases #16546
Conversation
02c3920
to
6a3bc70
Compare
* main: [playground] Avoid concurrent deployments (#16834) [red-knot] Infer `lambda` return type as `Unknown` (#16695) [red-knot] Move `name` field on parameter kind (#16830) [red-knot] Emit errors for more AST nodes that are invalid (or only valid in specific contexts) in type expressions (#16822) [playground] Use cursor for clickable elements (#16833) [red-knot] Deploy playground on main (#16832) Red Knot Playground (#12681) [syntax-errors] PEP 701 f-strings before Python 3.12 (#16543)
|
Signature::new( | ||
Parameters::new([ | ||
Parameter::positional_only(Some(Name::new_static("a"))) | ||
.type_form() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These type_form
calls are the magic lines that replace ParameterExpectations
// For certain known callables, we have special-case logic to determine the return type | ||
// in a way that isn't directly expressible in the type system. Each special case | ||
// listed here should have a corresponding clause above in `signatures`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This special-case return type logic moves over to a private method on Bindings
, which we guarantee is always called after checking argument types.
pub(crate) struct CallArguments<'a, 'db>(Vec<Argument<'a, 'db>>); | ||
pub(crate) struct CallArguments<'a, 'db> { | ||
bound_self: Option<Argument<'a, 'db>>, | ||
arguments: Rc<[Argument<'a, 'db>]>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This representation lets us handle different union elements/overloads that have different bound self
parameters without having to make clones of all of the other arguments. That's not just an optimization, it also ensures that there's only a single slice of non-bound arguments that we perform type inference on. (That type inference now happens after we've matched up arguments and parameters — which is when we prepend bound self
parameters)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth making this a comment on arguments
. The reason of Rc
is definetely not "appearant" (and one of our first uses in Red Knot?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I fully understand
That's not just an optimization, it also ensures that there's only a single slice of non-bound arguments that we perform type inference on.
Or, I'm not sure I fully understand its implication.
Once we have matched up formal and actual parameters, we can infer types of each actual parameter, and check that each one is assignable to the corresponding formal parameter type.
What I understand from your sentence in the PR summary is that we only perform type inference once we matched all parameters? Does this happen exactly once or is it possible that this operation is performed more than once (e.g. once for every matching signature?)
I'm asking because I'm a bit concerned about the Cell
use in Argument
because we then loose the static assertion that type inference can happen exactly once (you could hold on to multiple Arguments
that all point to the same shared slice but then infer the type with differently matched parameters). Would it be possible to use Rc::get_mut
(or try_mut
) instead of using interior mutability to enforce that all other references to the Argument
slice got dropped or are they still alive at that point? If they're still alive, isn't the state of their Argument
s now misleading because the types are inferred for another matched binding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I understand from your sentence in the PR summary is that we only perform type inference once we matched all parameters?
Yes that's right
Does this happen exactly once or is it possible that this operation is performed more than once (e.g. once for every matching signature?)
This is related to @carljm comment #16546 (comment)
As of right now, we peform type inference on each argument at most once. We do not infer a different type for an argument for each matching signature. As we move towards supporting generics, we will have to do more work per argument for each matching signature. But I am purposefully trying to keep this PR simpler by not bringing that into play yet.
I'm asking because I'm a bit concerned about the
Cell
use inArgument
because we then loose the static assertion that type inference can happen exactly once (you could hold on to multipleArguments
that all point to the same shared slice but then infer the type with differently matched parameters).
Before we had that as a static guarantee because Argument
had no mutuator methods — you had to provide the inferred type when you created it. If I understand your concern, it's less about using Cell
in particular and more about having a mutator method at all, is that right? Would you have the same worry if it was a more normal &mut self
mutator method? Because there would still be no static guarantee that you didn't call it twice.
I could pull the argument types out into a separate Rust type, which you would create after match_parameters
and then pass in to check_types
. But I think then we'd lose the static guarantee that the ArgumentTypes
that you pass in in step two was created for the CallArguments
that you pass in in step one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand your concern, it's less about using Cell in particular and more about having a mutator method at all, is that right?
My main concern is how we avoid that we don't accidentally end up mutating the same underlying Argument
that is shared by using Rc<[Argument]>
. The benefit I see of using Rc::get_mut
and panicking if it is shared is that we would detect that mistake (at least at runtime).
I don't think that using &mut
helps here because my concern is about having two CallArguments
that both share the same arguments
. Requiring a &mut CallArguments
doesn't prevent mutating a
when b
points to the same arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can use Rc::get_mut
because with how it's formulated right now, we need that sharing. Each signature might have to bind a different self
/cls
argument to the front of the argument list, but we also only want to perform type inference once on the (non-bound) arguments that are the same for each signature. The bindings themselves have to store the arguments so that they can be type-checked in the later call, and the outer code in infer.rs
also needs to store the arguments so that it can do the actual type inference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how we avoid that we don't accidentally end up mutating the same underlying
Argument
that is shared by usingRc<[Argument]>
My understanding is that the whole point of this structure is that we do want to mutate the same underlying shared Argument
; it gives us a place to store the inferred type for that argument, to avoid inferring it multiple times.
form: Cell<ArgumentForm>, | ||
|
||
/// The inferred type of this argument. Will be `Type::Unknown` if we haven't inferred a type | ||
/// for this argument yet. | ||
#[allow(clippy::struct_field_names)] | ||
argument_type: Cell<Type<'db>>, | ||
|
||
/// The inferred type of this argument when/if it is used as a `TypeForm`. Will be | ||
/// `Type::Unknown` if we haven't inferred a type-form type for this argument yet. | ||
type_form_type: Cell<Type<'db>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We either need these cells, or the Rc
slice up above needs to be an Rc<RefCell<[_]>>
, so that the type inference code can update these fields with the inferred types after we've already matched parameters. Since these types are both Copy
this seemed the cleanest.
const VALUE = 1 << 0; | ||
const TYPE_FORM = 1 << 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it's really needed, but using bitflags here instead of an enum or bool means that we support an argument that might be used as a type form for one union element/overload, and a value for another.
pub(crate) struct Bindings<'call, 'db> { | ||
signatures: Signatures<'db>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bindings
now takes ownership of both the signatures and (a clone of the) arguments of its call site, since that information needs to carry over from match_parameters
to check_types
.
if function.has_known_class_decorator(db, KnownClass::Classmethod) | ||
&& function.decorators(db).len() == 1 | ||
{ | ||
match overload.parameter_types() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're using parameter_types
everywhere now, which lets close off a TODO and remove CallArguments::{first,second,third}_argument
Is this PR WIP or ready for review? |
/// The inferred type of this argument. Will be `Type::Unknown` if we haven't inferred a type | ||
/// for this argument yet. | ||
#[allow(clippy::struct_field_names)] | ||
argument_type: Cell<Type<'db>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What were your considerations in using Unknown to represent "no type inferred yet" vs using an Option<Type>
here? Given that Unknown
is also a valid inferred type for an argument, the latter seems semantically clearer about whether we've inferred a type for this argument yet. But maybe in practice we don't need that clarity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is moot with the new represenation, where you have to provide a callback to infer each argument type when constructing a CallArgumentTypes
. You now either (a) have all argument types uninferred, in which case you have a CallArguments
, or (b) have inferred all argument types, in which case you have a CallArgumentTypes
.
pub(crate) struct CallArguments<'a, 'db>(Vec<Argument<'a, 'db>>); | ||
pub(crate) struct CallArguments<'a, 'db> { | ||
bound_self: Option<Argument<'a, 'db>>, | ||
arguments: Rc<[Argument<'a, 'db>]>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth making this a comment on arguments
. The reason of Rc
is definetely not "appearant" (and one of our first uses in Red Knot?)
pub(crate) struct CallArguments<'a, 'db>(Vec<Argument<'a, 'db>>); | ||
pub(crate) struct CallArguments<'a, 'db> { | ||
bound_self: Option<Argument<'a, 'db>>, | ||
arguments: Rc<[Argument<'a, 'db>]>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I fully understand
That's not just an optimization, it also ensures that there's only a single slice of non-bound arguments that we perform type inference on.
Or, I'm not sure I fully understand its implication.
Once we have matched up formal and actual parameters, we can infer types of each actual parameter, and check that each one is assignable to the corresponding formal parameter type.
What I understand from your sentence in the PR summary is that we only perform type inference once we matched all parameters? Does this happen exactly once or is it possible that this operation is performed more than once (e.g. once for every matching signature?)
I'm asking because I'm a bit concerned about the Cell
use in Argument
because we then loose the static assertion that type inference can happen exactly once (you could hold on to multiple Arguments
that all point to the same shared slice but then infer the type with differently matched parameters). Would it be possible to use Rc::get_mut
(or try_mut
) instead of using interior mutability to enforce that all other references to the Argument
slice got dropped or are they still alive at that point? If they're still alive, isn't the state of their Argument
s now misleading because the types are inferred for another matched binding?
I'm reworking this with a different way of representing arguments and types per @MichaReiser and @carljm's comments. Putting this back to draft while I do that. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've pushed up a new representation that I think is nicer, and addresses most of the concerns.
In summary:
CallArguments
goes back to being a simple list of Argument
s, which look exactly like they did before. This does not store argument types.
Once you are ready to perform type inference on an argument list, you transform the CallArguments
into a CallArgumentTypes
. You provide a callback to infer each argument type. That makes it impossible to have incomplete partially inferred state.
This also makes it clearer which information each phase requires, since match_parameters
takes in a CallArguments
, and check_types
takes in a CallArgumentTypes
.
We don't store CallArguments
in any of the bindings types anymore, which means we don't need to make them cheap to clone, and we don't need any interior mutability tricks to let the type inference code update shared state.
To handle bound self
/cls
parameters, we now use VecDeque
instead of Vec
for both CallArguments
and CallArgumentTypes
. That lets us cheaply push the bound parameter onto the front of the argument list when needed, and pop it back off when we're done.
/// The inferred type of this argument. Will be `Type::Unknown` if we haven't inferred a type | ||
/// for this argument yet. | ||
#[allow(clippy::struct_field_names)] | ||
argument_type: Cell<Type<'db>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is moot with the new represenation, where you have to provide a callback to infer each argument type when constructing a CallArgumentTypes
. You now either (a) have all argument types uninferred, in which case you have a CallArguments
, or (b) have inferred all argument types, in which case you have a CallArgumentTypes
.
/// The inferred type of this argument. Will be `Type::Unknown` if we haven't inferred a type | ||
/// for this argument yet. | ||
#[allow(clippy::struct_field_names)] | ||
argument_type: Cell<Type<'db>>, | ||
|
||
/// The inferred type of this argument when/if it is used as a `TypeForm`. Will be | ||
/// `Type::Unknown` if we haven't inferred a type-form type for this argument yet. | ||
type_form_type: Cell<Type<'db>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My more pressing (and mundane) concern was that
store_expression_type
currently panics if you try to store a type for an expression more than once.
In working on a test case, I realized that this is an issue already, even without inferring argument types separately for each signature — if you were to use an argument as both a value and a type, we would infer its expression twice, and end up trying to store a type for that expression twice.
So for now I've changed this PR to say that an argument must be used either as a value or as a type form for all of the signatures in any particular call site.
.try_call(db, &CallArguments::positional([self, instance, owner])) | ||
.try_call(db, CallArgumentTypes::positional([self, instance, owner])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of cases like this where we make a call "internally". Here we're not really inferring the types of the arguments; they're provided directly. This is also handled nicely with the new representation: you construct CallArgumentTypes
, since you have argument types ready to specify.
/// Push an extra synthetic argument (for a `self` or `cls` parameter) to the front of this | ||
/// argument list. | ||
pub(crate) fn push_self(&mut self) { | ||
self.0.push_front(Argument::Synthetic); | ||
} | ||
|
||
/// Create a [`CallArguments`] from an iterator over non-variadic positional argument types. | ||
pub(crate) fn positional(positional_tys: impl IntoIterator<Item = Type<'db>>) -> Self { | ||
positional_tys | ||
.into_iter() | ||
.map(Argument::Positional) | ||
.collect() | ||
/// Pop the extra synthetic argument from the front of this argument list. | ||
pub(crate) fn pop_self(&mut self) { | ||
self.0.pop_front(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need an assertion that the front attribute is Synthetic
to avoid cases where someone calls pop_self
without having called push_self
before?
Or should we use an Option
for the self
argument (I don't know if that's the approach that you had initially), as it seems that we never need to return a slice (which wouldn't work with VecDeque
anyway) and we only ever need to mutate the first element.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this to a with_self
method that does the pushing/popping in the right pattern, instead of adding assertions or more complex types to ensure the caller does the right thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't done a thorough review, I'm mainly looking at the PR to keep up with the changes happening on main. I think the changes looks good, will defer it to Carl/Micha for the final review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this. Love the Parameter fluent builder, love the new type-safe distinction between CallArguments
and CallArgumentTypes
. Fantastic!
/// | ||
/// TODO: We will eventually infer completely different argument types for each signature, once | ||
/// we are able to use the annotated parameter types as type contexts for that inference. At | ||
/// that point, this field will move down into `CallBinding` or `Binding`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This TODO may not be accurate, given our conversation today? Totally up to you how you want to handle this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
* main: (26 commits) Use the common `OperatorPrecedence` for the parser (#16747) [red-knot] Check subtype relation between callable types (#16804) [red-knot] Check whether two callable types are equivalent (#16698) [red-knot] Ban most `Type::Instance` types in type expressions (#16872) Special-case value-expression inference of special form subscriptions (#16877) [syntax-errors] Fix star annotation before Python 3.11 (#16878) Recognize `SyntaxError:` as an error code for ecosystem checks (#16879) [red-knot] add test cases result in false positive errors (#16856) Bump 0.11.1 (#16871) Allow discovery of venv in VIRTUAL_ENV env variable (#16853) Split git pathspecs in change determination onto separate lines (#16869) Use the correct base commit for change determination (#16857) Separate `BitXorOr` into `BitXor` and `BitOr` precedence (#16844) Server: Allow `FixAll` action in presence of version-specific syntax errors (#16848) [`refurb`] Fix starred expressions fix (`FURB161`) (#16550) [`flake8-executable`] Add pytest and uv run to help message for `shebang-missing-python` (`EXE003`) (#16855) Show more precise messages in invalid type expressions (#16850) [`flake8-executables`] Allow `uv run` in shebang line for `shebang-missing-python` (`EXE003`) (#16849) Add `--exit-non-zero-on-format` (#16009) [red-knot] Ban list literals in most contexts in type expressions (#16847) ...
This breaks up call binding into two phases:
Matching parameters just looks at the names and kinds (positional/keyword) of each formal and actual parameters, and matches them up. Most of the current call binding errors happen during this phase.
Once we have matched up formal and actual parameters, we can infer types of each actual parameter, and check that each one is assignable to the corresponding formal parameter type.
As part of this, we add information to each formal parameter about whether it is a type form or not. Once PEP 747 is finalized, we can hook that up to this internal type form representation. This replaces the
ParameterExpectations
type, which did the same thing in a more ad hoc way.While we're here, we add a new fluent API for building
Parameter
s, which makes our signature constructors a bit nicer to read. We also eliminate a TODO where we were consuming types from the argument list instead of the bound parameter list when evaluating our special-case known functions.Closes #15460