Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A proposal of cached health checker extension for Envoy. #24757

Closed
hudayou opened this issue Jan 4, 2023 · 5 comments
Closed

A proposal of cached health checker extension for Envoy. #24757

hudayou opened this issue Jan 4, 2023 · 5 comments
Labels
area/health_checking stale stalebot believes this issue/PR has not been touched recently

Comments

@hudayou
Copy link
Contributor

hudayou commented Jan 4, 2023

Background

Envoy is an excellent Sidecar in the field of Service Mesh. The use of modern C++ guarantees its high performance, but also increases the difficulty for developers. Although Envoy also supports redis and thrift customized health checkers. It would be nice to have another custom health checker to enhance its flexibility. We introduce a proposal that allows Envoy to support a cached health checker extension to avoid aggressive and unnecessary health checks.

Design

By adding the cached health checker extension for Envoy. Users can avoid aggressive and unnecessary health checks.

The specific architecture is as follows:

       +---------------------+
       | Envoy               |
       +---------------------+
       |    Singleton        |             psubscribe------------>+-----+
       | CachedHealthChecker +<--------------------------tls----->+REDIS+
       |    HealthStatusMap  |<------------pmessage               +-+---+
       +--------+------------+             get--------------------->+   ^
     OnInterval |            |                                          |set
       +--------+------------+                                +---------+---+
       | ActiveHealthCheck   |                                |HealthChecker|
       +---------------------+                                +-------------+

Looking forward to your ideas/comments on this proposal! @mattklein123 @htuch @alyssawilk @lizan thanks.

@mattklein123
Copy link
Member

Thanks for opening the tracking issue. There isn't enough information in here to comment much one way or the other as to the technical design and whether there will be support for adding this. A few high level questions:

  1. Are you targeting this for contrib or core extension? I assume you have no sponsor for core so you are going for contrib? Can you clarify?
  2. I skimmed the PR and a few high level comments: a) there is going to be little interest to bring in a new third party dep for redis. Envoy already supports Redis as a protocol so if we move forward with this we will want to use the built-in redis support for talking to redis.
  3. Some type of caching system generally makes sense to me, but I would like to see a design that has pluggable caching implementations. Redis might be one, but perhaps someone could also built a gRPC API for example.
  4. More fundamentally, is this even needed as opposed to just using EDS to push health state changes directly to each envoy node? It seems like your control plane could actually do all of this redis intregration and push the updates.
  5. Please flesh out the overall design much more. It's not really clear to me how the system is primed, what the failure modes are, etc. I would recommend a more complete gdoc with your proposal.

Thanks

@mattklein123 mattklein123 added area/health_checking and removed triage Issue requires triage labels Jan 4, 2023
@hudayou
Copy link
Contributor Author

hudayou commented Jan 4, 2023

  1. We are targeting this for core extension like redis and thrift custom health checker. Can you sponse us for this? Otherwise we can do self-sponsor extensions.
  2. The builtin redis is specially designed for load balancing upstream redis servers, it is lacking support of pubsub and select command which is required by us.
    Redis PubSub primitives support #4424
  3. Yep, we can definitely allow more caching solutions to be plugged into it down the road.
  4. EDS is good, but directly using a cache will require less indirection in between. We are not replacing EDS, it's just another extension like redis and thrift.
  5. I'll do a gdoc as requested.

@hudayou
Copy link
Contributor Author

hudayou commented Jan 5, 2023

@mattklein123 Here is a brief google doc for the basic idea:

https://docs.google.com/document/d/1YH68XF12FkbxroRkWsVA2JaZ5Rd--k4p/edit#

Thanks for the suggestions

@github-actions
Copy link

github-actions bot commented Feb 5, 2023

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Feb 5, 2023
@github-actions
Copy link

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/health_checking stale stalebot believes this issue/PR has not been touched recently
Projects
None yet
Development

No branches or pull requests

2 participants