Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pullsecret checks to include auth tokens and hive checks #653

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

nephomaniac
Copy link

@nephomaniac nephomaniac commented Jan 23, 2025

This is related to OSD-21669. This attempts to update the existing pull secret validations to include:

  • backplane config/connection validation (prevent working against the wrong cluster)
  • Add OCM account email validation for all checked auths
  • Add OCM auth token validation for all checked auths
  • OCM registry_credentials checks against cluster openshift-config/pull-secret and hive CD pull-secret.
  • OCM access_token checks against cluster openshift-config/pull-secret and hive CD pull-secret.
  • Allow access_token checks to run without impersonation when the tool detects the cluster is owned by the current OCM account/user.
  • Add ability for the util to work with clusters in different OCM environments. This is needed to allow integration and stage cluster testing (Hive always runs in Prod)

./osdctl -S cluster validate-pull-secret -h

Attempts to validate if a cluster's pull-secret auth and email values are in sync with account,
registry_credential, and access token data stored in OCM.

Usage:
osdctl cluster validate-pull-secret [CLUSTER_ID] [flags]

Examples:

# Compare OCM Access-Token, Registry-Credentials, and Account Email against cluster's secret
# Note: OCM permissions may generate prompt to exclude AccessToken checks
osdctl cluster validate-pull-secret ${CLUSTER_ID} --reason "OSD-XYZ"

# Compare both Cluster and Hive secrets to OCM
osdctl cluster validate-pull-secret ${CLUSTER_ID} --reason "OSD-XYZ" --hive

# Run against STAGE or INTEGRATION Cluster + Hive
# Note: Current OCM env vars are assumed to be STAGE/INT,
#        and a production OCM config must be provided in this case
osdctl cluster validate-pull-secret ${CLUSTER_ID} --reason "OSD-XYZ" --hive-ocm ~/.config/ocm/ocm.prod.json --hive

# Exclude Access-Token, and Registry-Credential checks...
osdctl cluster validate-pull-secret ${CLUSTER_ID} --reason "OSD-XYZ" --no-token --no-regcreds

# Check Hive only...
osdctl cluster validate-pull-secret ${CLUSTER_ID} --reason "OSD-XYZ" --hive-only

Flags:
-h, --help help for validate-pull-secret
--hive Check secret values on Hive against OCM for this target cluster
--hive-ocm string Path to OCM 'prod' config used to connect to hive when target cluster is using 'stage' or 'integration' envs
--hive-only Exclude checks against target cluster. Check Hive only.
--no-regcreds Exclude OCM Registry Credentials checks against cluster secret
--no-token Exclude OCM Access Token checks against cluster secret
--reason string The reason for this command to be run (usually an OHSS or PD ticket).
-v, --verbose int debug=4, (default)info=3, warn=2, error=1 (default 3)

Example run...

 ./osdctl -S cluster validate-pull-secret 99b7ceb2-89a9-4d44-ba82-322f4c86386c --reason "testing ps validations OSD-21669" --hive-ocm ~/.config/ocm/ocm.prod.json --hive


TARGET CLUSTER:
----------          ----                               ---------                                    ------      ----  ------
OCM_SOURCE          AUTH                               NAMESPACE                                    SECRET      ATTR  RESULT
----------          ----                               ---------                                    ------      ----  ------
Account             cloud.openshift.com                openshift-config                             pull-secret email PASS
registry_credential Redhat_registry.connect.redhat.com openshift-config                             pull-secret email PASS
registry_credential Redhat_registry.connect.redhat.com openshift-config                             pull-secret token PASS
registry_credential Redhat_registry.redhat.io          openshift-config                             pull-secret email PASS
registry_credential Redhat_registry.redhat.io          openshift-config                             pull-secret token PASS
registry_credential Quay_quay.io                       openshift-config                             pull-secret email PASS
registry_credential Quay_quay.io                       openshift-config                             pull-secret token PASS
access_token        cloud.openshift.com                openshift-config                             pull-secret token PASS
access_token        cloud.openshift.com                openshift-config                             pull-secret email PASS
access_token        quay.io                            openshift-config                             pull-secret token PASS
access_token        quay.io                            openshift-config                             pull-secret email PASS
access_token        registry.connect.redhat.com        openshift-config                             pull-secret token PASS
access_token        registry.connect.redhat.com        openshift-config                             pull-secret email PASS
access_token        registry.redhat.io                 openshift-config                             pull-secret token PASS
access_token        registry.redhat.io                 openshift-config                             pull-secret email PASS

HIVE:
----------          ----                               ---------                                    ------      ----  ------
OCM_SOURCE          AUTH                               NAMESPACE                                    SECRET      ATTR  RESULT
----------          ----                               ---------                                    ------      ----  ------
Account             cloud.openshift.com                uhc-staging-2gpb72u2f1l001618e7utfvuhgv97mef pull        email PASS
registry_credential Redhat_registry.connect.redhat.com uhc-staging-2gpb72u2f1l001618e7utfvuhgv97mef pull        email PASS
registry_credential Redhat_registry.connect.redhat.com uhc-staging-2gpb72u2f1l001618e7utfvuhgv97mef pull        token PASS
registry_credential Redhat_registry.redhat.io          uhc-staging-2gpb72u2f1l001618e7utfvuhgv97mef pull        email PASS
registry_credential Redhat_registry.redhat.io          uhc-staging-2gpb72u2f1l001618e7utfvuhgv97mef pull        token PASS
registry_credential Quay_quay.io                       uhc-staging-2gpb72u2f1l001618e7utfvuhgv97mef pull        email PASS
registry_credential Quay_quay.io                       uhc-staging-2gpb72u2f1l001618e7utfvuhgv97mef pull        token PASS
access_token        cloud.openshift.com                uhc-staging-2gpb72u2f1l001618e7utfvuhgv97mef pull        token PASS
access_token        cloud.openshift.com                uhc-staging-2gpb72u2f1l001618e7utfvuhgv97mef pull        email PASS
access_token        quay.io                            uhc-staging-2gpb72u2f1l001618e7utfvuhgv97mef pull        token PASS
access_token        quay.io                            uhc-staging-2gpb72u2f1l001618e7utfvuhgv97mef pull        email PASS
access_token        registry.connect.redhat.com        uhc-staging-2gpb72u2f1l001618e7utfvuhgv97mef pull        token PASS
access_token        registry.connect.redhat.com        uhc-staging-2gpb72u2f1l001618e7utfvuhgv97mef pull        email PASS
access_token        registry.redhat.io                 uhc-staging-2gpb72u2f1l001618e7utfvuhgv97mef pull        token PAS

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 23, 2025
Copy link
Contributor

openshift-ci bot commented Jan 23, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

openshift-ci bot commented Jan 23, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: nephomaniac
Once this PR has been reviewed and has the lgtm label, please assign devppratik for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 31, 2025
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 7, 2025
@nephomaniac nephomaniac marked this pull request as ready for review February 7, 2025 20:32
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 7, 2025
Copy link
Contributor

openshift-ci bot commented Feb 8, 2025

@nephomaniac: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

emailOCM := o.account.Email()
o.log.Info(o.ctx, "Found email for cluster's OCM account: %s\n", emailOCM)

if !o.hiveOnly {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for flags is pretty unintuitive, is there a way to better structure the code so that it's more clear what operations are performed based on the flag combinations?


if !o.hiveOnly {
// get the pull secret in cluster
err = getPullSecretElevated(o.clusterID, o.reason, pullSecret)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current validate-pull-secret-email command we have an integration with managed-scripts to retrieve the email without admin elevation, is there a reason we should discard that and return to generate Compliance Alerts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears this is the same behavior as the existing osdctl command, I believe it always runs as Backplane Cluster Admin when fetching the pull secret on the target cluster?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did have an integration with managed-scripts in the past #488 but it was removed due to a bug in the implementation, I was wrongly under the impression that it was still there. I think we can revisit this idea with a new implementation, I opened a ticket for it https://issues.redhat.com/browse/OSD-28292

if len(userName) <= 0 {
err = fmt.Errorf("found empty 'username' for account:'%s', needed for accessToken", o.account.HREF())
} else {
accessToken, err = o.getAccessTokenFromOCM(userName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check requires Region Lead permission to pull the secret from OCM, I don't see this stated anywhere in a message to the cli user or in the help command. Ideally it should require user confirmation if the flag is active e.g.:
"Retrieving credentials from OCM require Region Lead permissions, continue? [y,N]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For test purposes, the command 'may' not always require RL perms. If the cluster is owned by the current OCM users executing this request, the command will not 'impersonate'.
If the command fails for perms issues this error is provided:

"AccessToken ops may require 'region lead' permissions to execute.\n"+

and then the user is prompted if they'd like to continue:
fmt.Printf("Error fetching OCM AccessToken:%s'.\nWould you like to continue with validations? ", err)

We could also add a prompt during the prerun() when the cli args indicate the user will want to fetch the AccessToken?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could be more upfront and add it to the description and during command runtime. The message could also be less ambiguous:

AccessToken ops may require 'region lead' permissions to execute.

This leaves doubts about when it is required, the user needs to find this information elsewhere.

I suggest something like:

"Fetching the AccessToken from OCM requires Region Lead permissions unless you're the cluster owner. Would you like to continue?"

}

/* Checks against hive to confirm hive is sync'd with OCM */
if o.checkHive {
Copy link
Contributor

@zmird-r zmird-r Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why we need to check the credentials stored in Hive.
We don't have a formal process to rotate the entire pull secret for managed openshift, once the credentials are stored in Hive after cluster creation, AFIK there's no other process that changes them and would cause them to get out of sync (except owner transfer but those are manual processes with their own tooling).

Updates to PS are not supported in managed openshift so I don't see why complicate the code with this check for a situation we can already detect from the PS on the cluster. It seems redundant.

Copy link
Contributor

@zmird-r zmird-r Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also generates an additional Compliance Alert Ticket

Copy link
Author

@nephomaniac nephomaniac Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part was added to help a user validate the hive data + syncsets. The hive checks do not run by default, and can be run independent of the target cluster checks. This would only need to be checked if the checks on the target cluster fail. In that case the comp ticket is likely justified?
This is currently waiting for feedback on the actual update secret process and tooling. If this is not a touch point then validating this would be mis-leading.

Copy link
Contributor

@zmird-r zmird-r Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Hive team confirmed here https://redhat-internal.slack.com/archives/CE3ETN3J8/p1739453387930049 that the secret is not used in any capacity after provisioning.

Having that validation is indeed misleading, it raises doubt on

  • The role of that secret (is it actually important?)
  • Follow-up action if it is found not up to date: should it be updated and how?
  • When this validation should be run and as part of which investigation

I think we should remove this check but if you have more doubts, could you please open a thread on the sre channel so we can gather more feedback from the team?

@zmird-r
Copy link
Contributor

zmird-r commented Feb 10, 2025

/label tide/merge-method-squash

@openshift-ci openshift-ci bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Feb 10, 2025
@zmird-r
Copy link
Contributor

zmird-r commented Feb 10, 2025

Wouldn't it be easier to rename the command we currently have to: validate-pull-secret-email and create a new command named validate-pull-secret-credentials? The two checks even if related have different steps, it would simplify the code.

@joshbranham
Copy link
Contributor

Filed https://issues.redhat.com/browse/OSD-28215 for better tracking of the extended validation

@nephomaniac nephomaniac marked this pull request as draft February 14, 2025 17:30
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants