-
Notifications
You must be signed in to change notification settings - Fork 755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix replica not able to initate election in time when epoch fails #1009
Conversation
…poch If multiple primary nodes go down at the same time, their replica nodes will initiate the elections at the same time. There is a certain probability that the replicas will initate the elections in the same epoch. And obviously, in our current election mechanism, only one replica node can eventually get the enough votes, and the other replica node will fail to win due the the insufficient majority, and then its election will time out and we will wait for the retry, which result in a long failure time. If another node has been won the election in the failover epoch, we can assume that my election has failed and we can retry as soom as possible. Signed-off-by: Binbin <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #1009 +/- ##
============================================
+ Coverage 70.70% 70.73% +0.02%
============================================
Files 114 114
Lines 63147 63151 +4
============================================
+ Hits 44648 44669 +21
+ Misses 18499 18482 -17
|
Signed-off-by: Binbin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great bug, @enjoy-binbin! The fix LGTM overall.
Co-authored-by: Ping Xie <[email protected]> Signed-off-by: Binbin <[email protected]>
Signed-off-by: Binbin <[email protected]>
Signed-off-by: Binbin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Signed-off-by: Binbin <[email protected]>
Signed-off-by: Binbin <[email protected]>
@madolson @zuiderkwast do you guys want to take a look with this? |
Signed-off-by: Binbin <[email protected]>
Signed-off-by: Binbin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix looks good!
The PR title says "Optimize ..." but it is more than an optimization. Actually a bug fix? Please improve the title. :)
Co-authored-by: Viktor Söderqvist <[email protected]> Signed-off-by: Binbin <[email protected]>
I think it is indeed more of an optimization, making the election fail ASAP and retrying ASAP. But it can also considered a bug fix, maybe: Fix replica not able to initate election in time when epoch fails? |
The macos job failed. It's probably not related to this PR. It's a fascinating crash log though:
|
i try to fix the test in #1288 |
…lkey-io#1009) If multiple primary nodes go down at the same time, their replica nodes will initiate the elections at the same time. There is a certain probability that the replicas will initate the elections in the same epoch. And obviously, in our current election mechanism, only one replica node can eventually get the enough votes, and the other replica node will fail to win due the the insufficient majority, and then its election will time out and we will wait for the retry, which result in a long failure time. If another node has been won the election in the failover epoch, we can assume that my election has failed and we can retry as soom as possible. Signed-off-by: Binbin <[email protected]>
After valkey-io#1009, we will reset the election when we received a claim with an equal or higher epoch since a node can win an election in the past. But we need to consider the time before the node actually obtains the failover_auth_epoch. The failover_auth_epoch default is 0, so before the node actually get the failover epoch, we might wrongly reset the election. This is probably harmless, but will produce misleading log output and may delay election by a cron cycle or beforesleep. Now we will only reset the election when a node is actually obtains the failover epoch. Signed-off-by: Binbin <[email protected]>
…#1339) After #1009, we will reset the election when we received a claim with an equal or higher epoch since a node can win an election in the past. But we need to consider the time before the node actually obtains the failover_auth_epoch. The failover_auth_epoch default is 0, so before the node actually get the failover epoch, we might wrongly reset the election. This is probably harmless, but will produce misleading log output and may delay election by a cron cycle or beforesleep. Now we will only reset the election when a node is actually obtains the failover epoch. Signed-off-by: Binbin <[email protected]>
…valkey-io#1339) After valkey-io#1009, we will reset the election when we received a claim with an equal or higher epoch since a node can win an election in the past. But we need to consider the time before the node actually obtains the failover_auth_epoch. The failover_auth_epoch default is 0, so before the node actually get the failover epoch, we might wrongly reset the election. This is probably harmless, but will produce misleading log output and may delay election by a cron cycle or beforesleep. Now we will only reset the election when a node is actually obtains the failover epoch. Signed-off-by: Binbin <[email protected]>
We may rely on auth_time to determine whether a failover is in progress, like valkey-io#1009, so it is best to reset it. Signed-off-by: Binbin <[email protected]>
We may rely on auth_time to determine whether a failover is in progress, like #1009, so it is best to reset it. Signed-off-by: Binbin <[email protected]>
If multiple primary nodes go down at the same time, their replica nodes will
initiate the elections at the same time. There is a certain probability that
the replicas will initate the elections in the same epoch.
And obviously, in our current election mechanism, only one replica node can
eventually get the enough votes, and the other replica node will fail to win
due the the insufficient majority, and then its election will time out and
we will wait for the retry, which result in a long failure time.
If another node has been won the election in the failover epoch, we can assume
that my election has failed and we can retry as soom as possible.