Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keep collecting status when apply failed #2991

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Garrybest
Copy link
Member

@Garrybest Garrybest commented Dec 26, 2022

Signed-off-by: Garrybest [email protected]

What type of PR is this?
/kind bug

What this PR does / why we need it:
If there is something wrong with applied condition, we should not erase the status in binding status. Applied information is from execution controller whereas status is from work status controller.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Keep aggregating status from work to binding even if application from binding to work fails.

@karmada-bot karmada-bot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 26, 2022
@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign kevin-wangzefeng after the PR has been reviewed.
You can assign the PR to them by writing /assign @kevin-wangzefeng in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Dec 26, 2022
@codecov-commenter
Copy link

codecov-commenter commented Dec 26, 2022

Codecov Report

Merging #2991 (bcae500) into master (c8954de) will increase coverage by 0.44%.
The diff coverage is 72.00%.

@@            Coverage Diff             @@
##           master    #2991      +/-   ##
==========================================
+ Coverage   38.58%   39.02%   +0.44%     
==========================================
  Files         206      206              
  Lines       18798    18807       +9     
==========================================
+ Hits         7253     7340      +87     
+ Misses      11117    11032      -85     
- Partials      428      435       +7     
Flag Coverage Δ
unittests 39.02% <72.00%> (+0.44%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/controllers/status/workstatus_controller.go 0.00% <0.00%> (ø)
pkg/util/helper/workstatus.go 41.90% <75.00%> (+31.29%) ⬆️
pkg/search/proxy/store/util.go 92.89% <0.00%> (-1.43%) ⬇️
pkg/util/names/names.go 96.38% <0.00%> (-0.17%) ⬇️
pkg/util/namespace.go 100.00% <0.00%> (ø)
...scheduler/core/spreadconstraint/select_clusters.go 86.04% <0.00%> (+0.33%) ⬆️
pkg/controllers/binding/common.go 26.47% <0.00%> (+0.45%) ⬆️
.../scheduler/core/spreadconstraint/group_clusters.go 95.20% <0.00%> (+2.01%) ⬆️
pkg/util/serviceaccount.go 90.27% <0.00%> (+2.55%) ⬆️
... and 1 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@jwcesign
Copy link
Member

Can we add ut for this? Also, we need release notes.

@karmada-bot karmada-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 28, 2022
@jwcesign
Copy link
Member

/lgtm cc @XiShanYongYe-Chang for checking

@XiShanYongYe-Chang
Copy link
Member

/assign

@RainbowMango
Copy link
Member

If there is something wrong with applied condition, we should not erase the status in binding status.

Can you explain it in more detail? Where do we erase the status? and why we shouldn't do that?

@Garrybest
Copy link
Member Author

Where do we erase the status?

If failed to apply, aggregated status will not be set even though status exists in work status and this function just returns.

if !applied {
aggregatedStatus := workv1alpha2.AggregatedStatusItem{
ClusterName: clusterName,
Applied: applied,
AppliedMessage: appliedMsg,
Health: workv1alpha2.ResourceUnknown,
}
statuses = append(statuses, aggregatedStatus)
continue
}

Why we shouldn't do that?

Actually, apply is sort of synchronization from 'karmada' to 'member clusters', it only take effects on resource metadata and spec. However, status collection is a synchronization from 'member clusters' to 'karmada'. So it is unresonable to skip status aggregation when failed to apply because these two actions have no identical relationship.

Copy link
Member

@XiShanYongYe-Chang XiShanYongYe-Chang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Garrybest, I think the modification is reasonable.

Do you encounter this problem? After a resource fails to be applied, the status of the work exists.

In addition, we have a similar logic. Please confirm whether to modify it:

@XiShanYongYe-Chang
Copy link
Member

This one looks more like a cleanup, what do you think @Garrybest?

@Garrybest
Copy link
Member Author

Do you encounter this problem? After a resource fails to be applied, the status of the work exists.

Yes, I got a validation webhook in member cluster which rejects the modification whereas it had no effects on status collection.

In addition, we have a similar logic.

Cool.

This one looks more like a cleanup, what do you think @Garrybest?

I'm not sure about the category, cc @RainbowMango for help😄.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Signed-off-by: Garrybest <[email protected]>

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Signed-off-by: Garrybest <[email protected]>
@RainbowMango RainbowMango added this to the v1.5 milestone Dec 30, 2022
@RainbowMango
Copy link
Member

So it is unresonable to skip status aggregation when failed to apply because these two actions have no identical relationship.

The reason why skip status aggregation is the status is probably an ambiguous one. It's not the right status of current resource template.
For instance:

  1. We deployed a Deployment(v1), and get the status back, it works as expected.
  2. Then, we update the Deployment to V2, if we can't apply it to member clusters, then we still get the V1 status. It will bring ambiguity.

@Garrybest
Copy link
Member Author

For instance:

  1. We deployed a Deployment(v1), and get the status back, it works as expected.
  2. Then, we update the Deployment to V2, if we can't apply it to member clusters, then we still get the V1 status. It will bring ambiguity.

Got it, it makes sense. Well, in this case, deployment of status will be erased here. So users will see a deployment with 0 ready replicas and 0 available replicas from status, maybe this situation brings a panic for users.

These two different logic represents different consideration and scenarios, which one is more sensible? Glad to here more opinions😊.

@RainbowMango
Copy link
Member

So users will see a deployment with 0 ready replicas and 0 available replicas from status, maybe this situation brings a panic for users.

Yes, I agree that just erasing the previous status is not ideal behavior, but I think we have sent the event to the resource template, isn't it? it should be a relief.

@Garrybest
Copy link
Member Author

Actually I'm just concerned that users are likely to have their own controllers in karmada control plane which depend on object status to reconcile and the handling process could be disrupted since status does not collect correctly under this circumstance.

Well, it's just some ideas, sort of design tradeoff😄. We may need more evidences and scenario about this patch.

@RainbowMango
Copy link
Member

Well, it's just some ideas, sort of design tradeoff😄. We may need more evidences and scenario about this patch.

Yes. kind of tradeoff here.

@RainbowMango RainbowMango modified the milestones: v1.5, v1.6 Mar 7, 2023
@RainbowMango RainbowMango modified the milestones: v1.6, v1.7 May 29, 2023
@RainbowMango
Copy link
Member

I'm trying to figure out which PR/Issue should be included in the coming v1.7 release which is planned at the end of this month.
I guess we don't have enough time for this feature, so I'm moving this to v1.8.

@XiShanYongYe-Chang Could you please confirm if we still facing this issue? I feel we solved a similar issue somewhere. Not sure about that.

@RainbowMango RainbowMango modified the milestones: v1.7, v1.8 Aug 25, 2023
@RainbowMango RainbowMango modified the milestones: v1.8, v1.9 Nov 30, 2023
@RainbowMango RainbowMango removed this from the v1.9 milestone Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants