Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eth/catalyst: avoid a race in the SimulatedBeacon Stop call #31328

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

eljobe
Copy link

@eljobe eljobe commented Mar 6, 2025

Fixes: #31327

@fjl fjl changed the title Avoid a race in the SimulatedBeacon Stop call eth/catalyst: avoid a race in the SimulatedBeacon Stop call Mar 7, 2025
eljobe added 2 commits March 7, 2025 13:13
This allows the inner loop in the case of "on-demand" commits to bail out if the
txPool has been terminated before the doCommit channel is closed.

This has been tested with a known-flaky test which, before this commit was
spinlooping and logging unfathomable reams of warning messages. Now, when the
race occurs, only a single instance of the warning is logged.
@eljobe
Copy link
Author

eljobe commented Mar 7, 2025

The irony of requiring two commits instead of one to introduce a function called fallibleCommit is not lost on me.

@jwasinger
Copy link
Contributor

So the thing is... a failure from sealBlock doesn't necessarily indicate that that the client is closed/closing. For example, if we rewind via debug_setHead concurrently while trying to commit, we can fail attempting to build a payload.

@eljobe
Copy link
Author

eljobe commented Mar 7, 2025

Well-spotted. So, this makes me want to go back to calling Sync() because that's the code that actually can only error when the txPool has already terminated.

Another option would be to create a special error type to be returned from Sync() when the pool is already terminated, then, conditionally break from the spinloop only if the error from fallibleCommit is of that new type. Sound like a plan? If so, I'll get to coding.

@jwasinger
Copy link
Contributor

Let's just use a call to Sync. That should be fine.

@jwasinger
Copy link
Contributor

jwasinger commented Mar 7, 2025

actually, I would:

  • create a function commit which returns (common.Hash, error) (same as fallableCommit but less of a mouthful)
  • return a special package-private error type if the invocation to Sync within returns an error (I think Sync only errors if the pool is terminated)
  • if the result of the commit indicates that the pool terminated, use that as the break condition for the loop.

This way, we only break the spinloop when it is definitely because the
transaction pool has already been terminated.
@eljobe
Copy link
Author

eljobe commented Mar 7, 2025

Okay. Now, it's doing what you suggested. Thanks for the advice.

This addresses some review feedback.
@@ -181,7 +192,7 @@ func (c *SimulatedBeacon) sealBlock(withdrawals []*types.Withdrawal, timestamp u
// behavior, the pool will be explicitly blocked on its reset before
// continuing to the block production below.
if err := c.eth.APIBackend.TxPool().Sync(); err != nil {
return fmt.Errorf("failed to sync txpool: %w", err)
return &errTxPoolTerminated{fmt.Errorf("failed to sync txpool: %w", err)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just turn it into, not really a need for wrapping another error type errTxPoolTerminated error

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fear that I have not clearly undrestood this suggestion. But, I did something to move the "failed to sync txpool:" part of the message into the errTxPoolTerminated.Error implementation.

@jwasinger
Copy link
Contributor

I think the general approach here is fine, and we should merge this (after cleaning up the error checking, RE comments from Marius). But just a thought: maybe a cleaner solution that accomplishes the same thing would be to add a member method Closed that returns if the txpool has been closed, and we poll that instead of checking for a specific error returned when committing.

Use `errors.Is` instead of `errors.As` since we don't need to access special
fields in the custom error type.

Rather than generating a new error to wrap, just capture the exisitng error.
@eljobe
Copy link
Author

eljobe commented Mar 10, 2025

I think the general approach here is fine, and we should merge this (after cleaning up the error checking, RE comments from Marius). But just a thought: maybe a cleaner solution that accomplishes the same thing would be to add a member method Closed that returns if the txpool has been closed, and we poll that instead of checking for a specific error returned when committing.

I'm definitely open to switching to that implementation instead. But, I "slightly" prefer the existing one. I think library APIs are best when you don't need to remember to make multiple calls to the library to get the information the calling code needs to make progress. With the error-handling approach, it is clear that the work we're trying to do is commit and that there is a special error condition of committing in which we want to stop. I find it slightly less clear if we check TxPool.Closed() or !TxPool.Running() in each pass through the inner loop. Because it feels like this is somehow a completely separate state that can transition without affecting the outcome of the call to commit (which it isn't.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

eth/catalyst SimulatedBeacon can spinloop when txpool closing and doCommit closing race
3 participants