Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Introduce ArbitraryRule for property tests #16862

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

cake-monotone
Copy link
Contributor

Summary

This feature was suggested by @AlexWaygood.

Problem with the Previous Approach

In property tests with a premise, the test could not perform valid checks until it randomly generated a type that satisfied the premise. This led to unnecessary computational waste.

This was particularly problematic in tests with three arguments, significantly reducing the efficiency of generating valid test cases.

For example:

type_property_test!(
    union_equivalence_not_order_dependent, db,
    forall types s, t, u.
        s.is_fully_static(db) && t.is_fully_static(db) && u.is_fully_static(db) => <property>
);

Improvements

The type_property_test macro now explicitly requires both the identifiers and their corresponding ArbitraryRule.

type_property_test!(
    intersection_equivalence_not_order_dependent, db,
    forall (s: FullyStaticTy, t: FullyStaticTy, u: FullyStaticTy),
        <property>
);

Now, s, t, and u—which are declared as FullyStaticTy—will only be generated from types that satisfy ty.is_fully_static().

The rules AnyTy, FullyStaticTy, and SingletonTy each maintain their own random pools for generating Ty instances.
For more details, please refer to the changes in arbitrary.rs.

Test Plan

A new ensure_rules module has been added to verify that the types generated under each rule satisfy the expected conditions.

You can run the following command to validate the implementation:

cargo test -p red_knot_python_semantic -- --ignored ensure_rules

@cake-monotone cake-monotone changed the title Introduce ArbitraryRule for property tests [red-knot] Introduce ArbitraryRule for property tests Mar 20, 2025
Copy link
Contributor

mypy_primer results

No ecosystem changes detected ✅

@MichaReiser MichaReiser added the red-knot Multi-file analysis & type inference label Mar 20, 2025
Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR!

I see a significant impact on an individual test using SingletonTy (~75% improvement), a smaller impact on tests using FullyStaticTy (~15% improvement), and a fairly small impact on overall runtime of the stable tests (~4% improvement).

This perf improvement is nice, but perhaps not as large overall as I might have hoped for from this change, I think mostly because most of the tests still use AnyTy anyway.

Personally I am not sure this perf improvement is worth the coverage drawbacks I discuss in my inline comment. Would like to hear other opinions.

Comment on lines +244 to +266
Ty::Never,
Ty::BuiltinsFunction("chr"),
Ty::BuiltinsFunction("ascii"),
Ty::BuiltinsBoundMethod {
class: "str",
method: "isascii",
},
Ty::BuiltinsBoundMethod {
class: "int",
method: "bit_length",
},
int_lit,
bool_lit,
Ty::StringLiteral(""),
Ty::StringLiteral("a"),
Ty::LiteralString,
Ty::BytesLiteral(""),
Ty::BytesLiteral("\x00"),
Ty::KnownClassInstance(KnownClass::Object),
Ty::KnownClassInstance(KnownClass::Str),
Ty::KnownClassInstance(KnownClass::Int),
Ty::AlwaysFalsy,
Ty::AlwaysTruthy,
Copy link
Contributor

@carljm carljm Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me like there are some fully-static types in the above list that are not included here, silently reducing our type coverage for tests using FullyStaticTy. (To pick just one example, BuiltinClassLiteral('str') is both fully-static and a singleton, but is not included either here or below.)

I think this is a significant drawback of this PR. Rather than having a single pool of candidate types, with the actual implemention of is_fully_static and is_singleton deciding which types fall into which categories, we now have to manually categorize the candidate types. And while the new ensure_rules tests help validate that we don't include any type in a category to which it doesn't belong, there's nothing helping us validate that we aren't silently losing coverage by omitting types wrongly.

If we do decide to go ahead with this PR, I would at least prefer if we could not repeat ourselves. That is, have a single list of non-fully-static types, a single list of static types, and a single list of singleton types. FullyStaticTy would combine the latter two lists, AnyTy would combine all three, SingletonTy would use only the latter list.

Copy link
Contributor Author

@cake-monotone cake-monotone Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, I think I should explain that my initial intuition was off. I had assumed that using is_fully_static would make it very unlikely for valid tests—but it turns out that’s not really the case.

From my simulation, the probability that all three variables of AnyTy satisfy the is_fully_static premise is around 65%. That means, out of 100 test runs, about 65 actually test the property, which is higher than I expected.

The key point I was trying to focus on wasn’t performance—it was to remove meaningless tests. If the probability of satisfying a property’s premise approaches zero, the test becomes meaningless. This PR was based on that assumption, so the exact ratio of AnyTy to FullyStaticTy wasn’t my main concern.
But as you've pointed out, FullyStaticTy doesn’t seem to be as effective as I initially thought.

In conclusion, I think it's totally fine to close this PR.


Here are a few more stats from the simulation, which might be helpful for future property test reviews:

  • Probability that t.is_singleton() holds: ~9%
  • Probability that !t.is_fully_static() holds: ~10%
  • Probability that s.is_equivalent_to(db, t) && t.is_equivalent_to(db, u) holds: ~0.1%
  • Probability that s.is_gradual_equivalent_to(db, t) holds: ~1%

These probabilities will likely decrease as we add more types to the random pool. But for now, it seems we can compensate by increasing the number of tests. If we move forward with similar PRs in the future, I think we should aim to eliminate premises with very low probability, not fully static.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, I hadn't fully processed the fact that if we have a premise of low probability, that doesn't just impact performance of the test, it also impacts usefulness of the test; a premise of 10% probability means we need 10x the QUICKCHECK_TESTS value to get the same number of actual tests of the implication. I agree this is a stronger motivation for the PR, and could be worth the risk of missing coverage of some type that we fail to include in the right list.

I would consider going ahead with this PR if we implement the above suggestion to avoid repetition in the type lists (so each type is listed exactly once, making it easier to evaluate whether each type is in the right place). We could also consider eliminating FullyStaticTy and just separating SingletonTy, but I kind of think if we go down this route we may as well do it fully.

But I would also like to hear @sharkdp 's opinion, so maybe no need to do any updates here until he is back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks great! It's something I've been meaning to implement as well. I kind of liked the generic and simple premise => property form, but with the syntactic change to the macro, this reads even better, I think.

with the actual implemention of is_fully_static and is_singleton deciding which types fall into which categories, we now have to manually categorize the candidate types.

Maybe we can maintain that property? We could have a single pool of types and then categorize them into (static, LazyLock) specialized lists for usage in those dedicated Arbitrary implementations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I’ll go ahead with the PR.

As sharkdp suggested, I think keeping a single pool and letting the Arbitrary implementations handle the categorization makes sense — it should make it easier to support more premises in the future.

Copy link
Contributor

@sharkdp sharkdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this!

///
macro_rules! type_property_test {
($test_name:ident, $db:ident, forall types $($types:ident),+ . $property:expr) => {
($test_name:ident, $db:ident, forall ($($type_idents:ident: $type_rule:ty),+), $property:expr) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked the mathematical . here in the syntax, but it's not like it's particularly important, if it's difficult to keep.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, no problem! I’ll go ahead and change it.

/// where
/// - `t1`, `t2`, ..., `tn` are identifiers that represent arbitrary types
/// - `Rule1`, `Rule2`, ..., `RuleN` specify constraints on the generated types.
/// - `<property>`is an expression using these identifiers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// - `<property>`is an expression using these identifiers
/// - `<property>` is an expression using these identifiers

Comment on lines +56 to +60
// Some `$property` expressions use iterators, which may be lazily evaluated.
// This can lead to lifetime issues with `$db` if `$property` captures references to it.
// To avoid this, we force `$property` to be evaluated eagerly using a closure.
#[allow(clippy::redundant_closure_call)]
(|| $property)()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this become a problem just now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s because of the macro for $premise.
Since it has been generating !($premise) || ($conclusion), I think the || operator affects the evaluation.

/// A wrapper for a type that implements the `quickcheck::Arbitrary` trait.
/// This struct enables the generation of arbitrary types for property test.
#[derive(Clone)]
pub(crate) struct QuickcheckArgument<T: ArbitraryRule> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call this BoundedType or similar? It represents an arbitrary type that is bound by a constraint. And ArbitraryRule could be a TypeBound. There might be a small risk for confusion with type variable bounds, but they're quite similar concepts, so it might be fine?

const MAX_SIZE: u32 = 2;
QuickcheckArgument {
ty: Rule::generate_type_recursively(g, MAX_SIZE),
_marker: std::marker::PhantomData,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a constructor so we can avoid these _marker: … lines?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants