Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(MADR): resource identifier format #12756

Open
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

lobkovilya
Copy link
Contributor

@lobkovilya lobkovilya commented Feb 5, 2025

Motivation

The goal is to improve Inspect API and introduce an identifier as part of the URL path, i.e :5681/_rules/<identifier>. See the discussion

Better to review the rendered version as it contains tables.

Fix #12044

@lobkovilya lobkovilya requested a review from a team as a code owner February 5, 2025 10:16
@lobkovilya lobkovilya added the ci/skip-test PR: Don't run unit and e2e tests (maybe this is just a doc change) label Feb 5, 2025
Copy link
Contributor

github-actions bot commented Feb 5, 2025

Reviewer Checklist

🔍 Each of these sections need to be checked by the reviewer of the PR 🔍:
If something doesn't apply please check the box and add a justification if the reason is non obvious.

  • Is the PR title satisfactory? Is this part of a larger feature and should be grouped using > Changelog?
  • PR description is clear and complete. It Links to relevant issue as well as docs and UI issues
  • This will not break child repos: it doesn't hardcode values (.e.g "kumahq" as an image registry)
  • IPv6 is taken into account (.e.g: no string concatenation of host port)
  • Tests (Unit test, E2E tests, manual test on universal and k8s)
    • Don't forget ci/ labels to run additional/fewer tests
  • Does this contain a change that needs to be notified to users? In this case, UPGRADE.md should be updated.
  • Does it need to be backported according to the backporting policy? (this GH action will add "backport" label based on these file globs, if you want to prevent it from adding the "backport" label use no-backport-autolabel label)


#### [Issue #12093](https://github.com/kumahq/kuma/issues/12093): xds configs, outbound listeners should use the clustername instead of an IP/port combo

We name outbounds like `outbound:10.43.205.116:6379` where IP address doesn't give any useful information.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even more when there are multiple IPs right?

Co-authored-by: Charly Molter <[email protected]>
Signed-off-by: Ilya Lobkov <[email protected]>
Copy link
Contributor

@lahabana lahabana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff

| Inbound Cluster | `localhost:<port>` | Dataplane (with sectionName to select port) |
| Outbound Cluster | legacy clusters - `<kuma.io/service>-hash(dst.tags)`<br>legacy clusters cross-mesh - `<kuma.io/service>-hash(dst.tags)_<mesh>`<br>new clusters - `<mesh>_<name>_<namespace>_<zone>_<short-name>_<port>` | Mesh*Service (with sectionName to select port) |
| Route | Routes are set on Listener on VirtualHost.<br>On inbound - `inbound:<kuma.io/service>`<br>On outbound - `<hash_sha256([]Match{...})>` | Correlates with a set of MeshHTTPRoutes |
| Secret | `name = <category>:<scope>:<identifier>`<br>`category = "mesh_ca" \| "identity_cert"`<br>`scope = "secret"`<br>`identifier = "all" \| <mesh_name>` | TODO |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd be the kri no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turned out to be less straightforward with Secrets. Only mesh_ca in builtin mode has a real secret resource in the store we could use for kri. In all other cases, secrets are either generated in memory (i.e. identity_cert), or obtained using DataSource (i.e. CA provided), or fetched from external places (i.e. Vault). As we don't have problems with secret naming at that moment and also naming requirements are not yet clear for secrets, I'd move them out of the scope of this MADR. I've added a section:

Secrets naming in Envoy

Renaming secrets in Envoy is out of the scope of this MADR. Only mesh_ca secret in builtin mode has a corresponding Secret resource in the store. In all other cases, secrets are either stored in memory, or obtained using DataSource or fetched from external systems, so it's not straightforward how to apply kri concept. Also, it's not clear what problems and limitations do we have when it comes to naming secrets in Envoy.

|-------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------|
| Listener | `<gateway-name>:<protocol>:<port>` where `gateway-name` is `MeshGatewayResource.Meta.Name` | MeshGateway (with sectionName to select the listener) |
| VirtualHost | `<hostname>` | |
| Cluster | `<kuma.io/service>-hash(merge(dpp.tags, gateway.tags, listener.tags))` | Pair of MeshGateway and Mesh*Service |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what does this actually look like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much we can do right now as we generate a unique cluster per MeshGateway listener. I looked a bit into this and I think we should be able to stop doing that. I've created an issue #13129 and referenced it in the MADR. Once we get rid of hash-suffixes in MeshGateway cluster names we can go with kri from Mesh*Service.

Copy link
Contributor

@lahabana lahabana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good. I think the only gap for getting an approval are:

  1. showing actual new names in your table once the whole migration is done
  2. telling the future of stat_prefix/alt_stat_name

The rest are mostly small questions/nits

```
envoy_cluster_upstream_cx_total{envoy_cluster_name="localhost_5000"} 0
```
so there is no reason to replace `:` with `_`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the past there were stuff in the metric name, I remember we even had some code to change things around. My is that this is the main reason no?

Also when not using prom format doesn't it actually use : as a separator see:

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the past there were stuff in the metric name, I remember we even had some code to change things around. My is that this is the main reason no?

Yeah maybe, but then it's still unclear, why don't they replace : in stat_prefix? IMO it's just odd that stat_prefix and alt_stat_name don't have consistent behavior.

Also when not using prom format doesn't it actually use : as a separator see:

Even though envoy uses : as a separator in /clusters endpoint, it still takes cluster.name, not cluster.alt_stat_name:

image

The cluster in the example named kuma:readiness

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's more reason for not using stat_prefix and all_stat_name anymore no?
^ saw your note later

Should you ask Envoy slack? Maybe you can get an answer...

}
```

I can't find explanation why `alt_stat_name` and `stat_prefix` are treated differently by Envoy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A use case could be that stat_prefix and alt_stat_name parameters can be used to consolidate statistics from multiple similar entities under a single name.

For example, if you have separate IPv4 and IPv6 clusters named my-cluster:v4 and my-cluster:v6, you can use alt_stat_name: my-cluster for both clusters to aggregate their statistics under the common name my-cluster when exporting metrics. This allows you to track related entities as a unified group while maintaining distinct names for configuration purposes.

WDYT?

Copy link
Contributor Author

@lobkovilya lobkovilya Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no, I don't question why there are an actual name and stats name, I just don't understand why stat_prefix doesn't behave the same way as alt_stat_name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw is there a specific reason for us to use state_prefix and alt_stat_name or will these become unecessary now that we have kri?

Seems unecessary from your note later

@lobkovilya lobkovilya added this to the 2.10.x milestone Feb 17, 2025
@johncowen
Copy link
Contributor

I just had a thought/question which is maybe just for my understanding, but how come something like SPIFFE IDs aren't an option here? I think that even from my level of understanding, I guess the /'s etc in SPIFFE URIs are problematic for all the reasons why we've landed on a _? Are there more reasons?

@lahabana
Copy link
Contributor

I just had a thought/question which is maybe just for my understanding, but how come something like SPIFFE IDs aren't an option here? I think that even from my level of understanding, I guess the /'s etc in SPIFFE URIs are problematic for all the reasons why we've landed on a _? Are there more reasons?

Spiffe Id only exist for Dataplane identity.

lobkovilya and others added 3 commits February 20, 2025 17:23
Co-authored-by: Charly Molter <[email protected]>
Signed-off-by: Ilya Lobkov <[email protected]>
Co-authored-by: Charly Molter <[email protected]>
Signed-off-by: Ilya Lobkov <[email protected]>
Co-authored-by: Charly Molter <[email protected]>
Signed-off-by: Ilya Lobkov <[email protected]>
```

**Pros:**
- Better handling of gaps; no need for `;;` when a value is not defined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This format is definitely more readable by human.

@jijiechen
Copy link
Member

This MADR is so well authored.
I approve it with the preference of option 1 as it's more short and most usage scenarios are in intermediate software layers instead of by human.

jijiechen
jijiechen previously approved these changes Feb 26, 2025
@lahabana
Copy link
Contributor

Should we cover this #12147 in the MADR?

Signed-off-by: Ilya Lobkov <[email protected]>
…o docs/madr-70

Signed-off-by: Ilya Lobkov <[email protected]>

# Conflicts:
#	docs/madr/decisions/070-resource-identifier.md
Comment on lines 182 to 183
| FilterChain | for external services - `<kuma.io/service>_<mesh>` | MeshExternalService | kri_meshexternalservice_mesh-1___es1 |
| Route | for external services - `outbound:<kuma.io/service>` | MeshExternalService | kri_meshexternalservice_mesh-1___es1 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MeshExternalService resource has a namespace right? So kri_meshexternalservice_mesh-1___es1 is a typo ?

Comment on lines 334 to 335
`resource-type` is a lowercased singular resource type, i.e. `meshservice`, `meshtimeout`, etc.
We use this name in [kumactl](https://github.com/kumahq/kuma/blob/d7ec0a2b1ac19208fb7dd9726309e3cf8cdc5848/pkg/core/resources/apis/meshservice/api/v1alpha1/zz_generated.resource.go#L153).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one open thought here, should we consider using the resource type short name? Like mtp for meshtrafficpermission

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the shortName I think there's already code to make sure all short names are unique. I think the brievity will be welcomed

Copy link
Contributor

@lahabana lahabana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add at the beginning this what's out of scope.

Because here you talk about a lot of things because we plan to use this in a lot of places but we don't clarify that the only thing we do here is to define the format.
We don't define how it's going to be rolled out to the Envoy config or what the http api looks like


#### URL path

For example `:5681/path-prefix/<identifier>/path-suffix`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path-prefix is not compliant with our specs. So I'd clearly state we're defining the http API for retrieving by resource identifier is out of scope here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll add that it's out of scope, here it's just an example for clarity. Why isn't path-prefix compliant with our specs?

| | Name | Correlated Resources |
|-------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------|
| Cluster | `meshpassthrough_<protocol>_<match-value>_<port>`<br>when `<port> == 0` Kuma sets port equal to `*`<br>`match-value = CIDR \| IP \| Domain`<br>`CIDR = i.e. "192.0.2.0/24" or "2001:db8::/32"`<br>`IP = i.e. "192.0.2.1", or 2001:db8::68", or ::ffff:192.0.2.1"`<br>`Domain = <dns-name> \| *.<dns-name>`<br>`dns-name = ^([a-zA-Z0-9_]{1}[a-zA-Z0-9_-]{0,62}){1}(\.[a-zA-Z0-9_]{1}[a-zA-Z0-9_-]{0,62})*[\._]?$`<br> | – |
| FilterChain | when `<match-value> == <protocol>` -> `meshpassthrough_<protocol>_<port>`<br>when `<match-value> != <protocol>` -> `meshpassthrough_<protocol>_<match-value>_<port>`<br>when port is 0 we put `*`, i.e. `meshpassthrough_http_*`<br>`<match-value>` is defined above | – |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So was are the names in envoy here?

|------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------|
| Listener | `kuma:dns`<br>`kuma:envoy:admin`<br>`kuma:metrics:prometheus`<br>`_kuma:metrics:opentelemetry:<backendName>` | – |
| VirtualHost | listener's name | – |
| Internal Cluster | `kuma:readiness`<br>`kuma:envoy:admin`<br>`kuma:metrics:hijacker`<br>`_kuma:metrics:opentelemetry:<backendName>`<br>`tracing:`<br>`access_log_sink`<br>`ads_cluster` | – |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we list the ones that will stay in the future and the ones we can get rid of once we deprecated old policies?
If there are we keep around, could we migrate them to new names?

}
```

I can't find explanation why `alt_stat_name` and `stat_prefix` are treated differently by Envoy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw is there a specific reason for us to use state_prefix and alt_stat_name or will these become unecessary now that we have kri?

Seems unecessary from your note later

```
envoy_cluster_upstream_cx_total{envoy_cluster_name="localhost_5000"} 0
```
so there is no reason to replace `:` with `_`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's more reason for not using stat_prefix and all_stat_name anymore no?
^ saw your note later

Should you ask Envoy slack? Maybe you can get an answer...

resource-identifier = *(ALPHA / DIGIT / "-" / "." / "_" / "~" / "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" / "@")
```

_Note: if we'd like to use resource identifier in URL query the charset is significantly smaller:_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should require this.

Comment on lines 334 to 335
`resource-type` is a lowercased singular resource type, i.e. `meshservice`, `meshtimeout`, etc.
We use this name in [kumactl](https://github.com/kumahq/kuma/blob/d7ec0a2b1ac19208fb7dd9726309e3cf8cdc5848/pkg/core/resources/apis/meshservice/api/v1alpha1/zz_generated.resource.go#L153).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the shortName I think there's already code to make sure all short names are unique. I think the brievity will be welcomed

There is an identifier format from Amazon called [ARN](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html). We can adopt a similar approach, but using `_`:

```
kri_<resource-type>_<mesh>_<zone>_<namespace>_<resource-name>_<section-name>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have an optional suffix ? I already envision use cases where actually we can't sumup with a single kri even with a sectionName and we still need to append some hash at the end?

I'd argue strongly against the hashes if it came up but I'm sure it's gonna come up.

Another probably better option is to explicitly state that if needs for a suffix was required we will need to create a MADR. This enable us to leave this door open but clearly say it should never be required. From looking at the discussion in: #13129 clearly it seems that it'd be necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reminded me we should probably treat sectionName as an optional suffix, because when it's missing, we don't want to have a trailing _, like

kri_meshservice_mesh-1_us-east-1_kuma-demo_mt1_

So maybe we can put sectionName after some delimiter, because strictly speaking sectionName is not a resource identifier, it's a reference inside the specific resource:

kri_<resource-type>_<mesh>_<zone>_<namespace>_<resource-name>&sectionName=httpport
OR
kri_<resource-type>_<mesh>_<zone>_<namespace>_<resource-name>&sn=httpport

and then like you said if we need another suffix, we will create MADR to add another KV pair behind &, i.e. hash

kri_<resource-type>_<mesh>_<zone>_<namespace>_<resource-name>&sn=httpport,hash=e3a0069bda479bd1

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have section be the default with a trailing _ the rest can wait for a future MADR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/skip-test PR: Don't run unit and e2e tests (maybe this is just a doc change)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add endpoint to retrieve resources by resourceIdentifier
7 participants