Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GSoC] Develop a caching library for etcd #19371

Open
serathius opened this issue Feb 10, 2025 · 27 comments
Open

[GSoC] Develop a caching library for etcd #19371

serathius opened this issue Feb 10, 2025 · 27 comments

Comments

@serathius
Copy link
Member

serathius commented Feb 10, 2025

Submitted as project as part of Google Summer of Code with @MadhavJivrajani as second mentor.

While etcd is a powerful distributed key-value store, building scalable infrastructure management systems directly on top of it can be challenging. Kubernetes has demonstrated the effectiveness of the reconciliation pattern for managing complex deployments, and its watch cache plays a crucial role in achieving scalability. However, this crucial caching mechanism is tightly coupled with Kubernetes and not readily available for general etcd usage. Projects like Cilium and Calico Typha, while successfully using etcd for control planes, have had to implement custom solutions to address this gap.

This project addresses the need for a standardized, performant caching solution for etcd, enabling easier adoption of the reconciliation pattern and simplifying the development of scalable etcd-based systems. By providing a generic watch cache implementation, we aim to lower the barrier to entry for building robust and efficient infrastructure management tools on etcd.

Goals:

  • Develop a generic proxy that provides feature parity to K8s watch cache
  • Enable possibility of integrating into K8s and Cilium.

Milestones:

  • Cache for watch requests, stores history of watch events and demultiplexes requests.
  • Cache for non-consistent list requests, stores latest state of etcd in Btree cache. Cache is fed by Range response, that is later updated by subscribing to updates from Watch Cache.
  • Handling requests during cache initialization and re-intialization.
  • Testing, including e2e and robustness tests
  • Metrics for cache size, cache latency etc
  • Benchmarks for watch and read throughput.
  • Support for custom encoder/decoder
  • Support for custom indexing
  • Support for consistent reads
  • Support for exact stale reads, by storing snapshots of btree.

I'm proposing to locate the project within the etcd mono repo, but as a separate package, that will not be released/tagged until it's ready. Proposed package name: go.etcd.io/cache. Client library would be developed under go.etcd.io/cache/client.

/cc @fuweid @MadhavJivrajani @ahrtr @henrybear327

@ahrtr
Copy link
Member

ahrtr commented Feb 10, 2025

High level I agree with the improvement & direction, as performance should be one of the key areas that we should spend more effort on. It will definitely ensure the long-term success of etcd.

@fuweid
Copy link
Member

fuweid commented Feb 10, 2025

We need to develop a generic cache for etcd, that allows users to easily addopt multi layered caching architecture similar to K8s. Having an official library would allow us to properly test it ensuring it's correctness and performance.

Sounds great. It could be more efficient to make and evaluate changes as an official library. +1 for help, if need.

@serathius serathius changed the title Develop a caching library for etcd [GSoC] Develop a caching library for etcd Feb 10, 2025
@serathius
Copy link
Member Author

cc @ahrtr @ivanvc any preference where development should happen. My proposal:

I'm proposing to locate the project within the etcd mono repo, but as a separate package, that will not be released/tagged until it's ready. Proposed package name: go.etcd.io/cache. Client library would be developed under go.etcd.io/cache/client.

@ahrtr
Copy link
Member

ahrtr commented Feb 10, 2025

I'm proposing to locate the project within the etcd mono repo, but as a separate package

It should be OK.

go.etcd.io/cache/client

I think all packages in the etcd mono repo should have the same prefix go.etcd.io/etcd/. Also is the cache dedicated for the watch scenario, or potentially be for other cases as well? Could you provide more context or details before we make any detailed decision?

@serathius
Copy link
Member Author

I think all packages in the etcd mono repo should have the same prefix go.etcd.io/etcd/

Ok, don't think it should be a problem.

I expect that on top level of hierarchy we will want client cache, and standalone cache server (like a grpc proxy but based on new cache library with configurable caching, covering all Range types and with proper guarantees). Within the client cache we will have separate watch de-multiplexer and cache for range requests.

@ahrtr
Copy link
Member

ahrtr commented Feb 10, 2025

I expect that on top level of hierarchy we will want client cache, and standalone cache server (like a grpc proxy but based on new cache library with configurable caching, covering all Range types and with proper guarantees). Within the client cache we will have separate watch de-multiplexer and cache for range requests.

Can we have a spec & design doc for these?

@serathius
Copy link
Member Author

No, I was just providing more context. Using go.etcd.io/etcd/cache package should be ok.

@abdurrehman107
Copy link
Member

This sounds exciting, and I’d love to take it up as part of Google Summer of Code. The idea of a standardized caching solution for etcd is impactful and I'd love to implement this as my project.

I'm currently exploring how we've implemented caching in k8s and I look forward to mirroring something similar for etcd in this project. Looking forward to the opportunity to contribute and collaborate with everyone on this.

@mutokrm
Copy link

mutokrm commented Feb 28, 2025

Hi, I have a question!

It seems like the b-tree structure in api-server has recently been introduced. Can I ask what's been encouraging the community to strengthen the etcd caching logic?

My intention is to know if there were specific team goals behind the recent activities on api-server and this proposal :)

@MadhavJivrajani
Copy link
Contributor

@mutokrm the main motivations can be found here in this KEP:https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/4988-snapshottable-api-server-cache

@MadhavJivrajani
Copy link
Contributor

For folks following along here, here's a few pointers to take a look at to gain some context:

@burhanuddin6
Copy link

burhanuddin6 commented Mar 9, 2025

Hello Everyone,
My name is Burhanuddin and I am a CS undergrad.
I am looking forward to contribute to etcd. To get started, I have learned a bit about Kubernetes upto the level of building CRDs. I have also started learning Go programming language since I am new to it. What I am doing next is:

  1. do a quickstart tutorial to understand the etcd project better
  2. follow developer guide from here: https://etcd.io/docs/v3.5/

What I need help with is @serathius @MadhavJivrajani

  1. I can't find any contributor guide in the docs to be able to set up etcd in my local environment and test changes locally.
  2. since this project seems more involved in terms of complexity (which is my motivation for contributing to this repo), I will appreciate if you can guide me on what the development setup for the project would look like, what parts of the project will be more involved.
  3. I have a lot of questions around the project idea but I will defer them till I can get a working setup locally.

So I will start with the docs and setting up project locally and see what things I need help with. However, in order to not get overwhelmed with the complexity of the project, I will need guidance on the local setup.

@serathius @MadhavJivrajani is there some other channel for communication?

@Symorglass
Copy link

Hello @serathius and @MadhavJivrajani,

My name is Sywen, I am an early career software engineer. I found this opportunity through the GSoC page and I am really excited about contributing to build a generic caching library at a lower level. I’ve been diving into both Kubernetes watch cache implementation and etcd codebase, definitely a lot to unpack!

I understand that this project is aiming for an eventual goal to replace the K8s builtin library, and I’m very interested in contributing not just during GSoC but potentially as a long term contributor if possible, but I wanted to clarify a few aspects regarding the project scope and design:

  • I noticed that the project is labeled small, but given the complexity—watch multiplexing, B-tree indexing, stale reads, and cache consistency—it looks quite ambitious. I’d love more details to understand how much of the architecture design is predefined vs how much contributors would be shaping in terms of granularity, as expected in the application to contribute.
  • I’d love your advice on whether we are targeting a PoC or aiming for a production ready implementation within the GSoC timeline.

Thanks!!

@serathius
Copy link
Member Author

I noticed that the project is labeled small, but given the complexity—watch multiplexing, B-tree indexing, stale reads, and cache consistency—it looks quite ambitious. I’d love more details to understand how much of the architecture design is predefined vs how much contributors would be shaping in terms of granularity, as expected in the application to contribute

Small size was based on two factors; there is a reference implementation in K8s that matches 1to1 what we want to do; code will be independent from rest of code, meaning no legacy code to learn/integrate.

I’d love your advice on whether we are targeting a PoC or aiming for a production ready implementation within the GSoC timeline.

Production ready in K8s takes at least 1 year :P

@K-minutti
Copy link

Hi, I also came across this project through GSoC, and I’m excited about the potential of a generic caching library and proxy for etcd. I’d love to contribute for the long haul and help make it production-ready.

@MadhavJivrajani, thanks for the links to the additional context.

@POABOB
Copy link

POABOB commented Mar 19, 2025

Hi @MadhavJivrajani and @serathius, my name is Bob. I am a Software engineer. It’s my pleasure to contribute the etcd caching and proxy features when I found this opportunity through the GSoC page.

Here are my prepared works:

  1. Learn the knowledge of etcd project.
  2. Learn how it work when kubernetes using the etcd.
  3. Read the content of links for more context.

I am looking forward to this chance!

@FouoF
Copy link

FouoF commented Mar 20, 2025

Hi, I'm Jeff. I'm an undergraduate student and interested in this GSOC project.
I have gone through related information. Before further attempting, I have some assumption to be checked and some question to figure out.

Assumptions

  1. The goal of the project is a client library which have the same interface with etcd client v3 but provide cached list (with B-tree) and cached watch (with watchcache) API.
  2. All reference code is located in "k8s.io/apiserver/pkg/storage".

Questions

  1. The client library should be absolutely independent of k8s (I guess this is the original goal) or partly dependence on k8s is allowed (e.g. The runtime.object may be useful for projects like Cilium and Calico mentioned above).
  2. I noticed that some components of cache in k8s are under active development (e.g. The delegator, according to KEP-4568 whose checklist is empty seems not ready now), should I wait them to be stable before further attempting?

I'm looking forward to further discussion on this project!

@MadhavJivrajani
Copy link
Contributor

MadhavJivrajani commented Mar 22, 2025

Hi all,
Replying to questions in this comment, and as a reminder, public communication is far more appreciated than DMs! So feel free to ping us here on this issue, or on the sig-etcd channel on the Kubernetes slack.

Please also note that responses may be delayed due to a high volume of queries and KubeCon taking place in the first week of April. You are strongly encouraged to bring your questions to the etcd slack channel in order to get them answered.


@burhanuddin6

is there some other channel for communication?

We have a slack channel on the Kubernetes slack (slack.k8s.io) called #sig-etcd

Please also see: https://github.com/kubernetes/community/tree/master/sig-etcd


@FouoF

The client library should be absolutely independent of k8s (I guess this is the original goal) or partly dependence on k8s is allowed (e.g. The runtime.object may be useful for projects like Cilium and Calico mentioned above).

That is correct. It is completely okay to use dependencies if needed. However, it should not exist as part of the Kubernetes codebase for the reasons mentioned in the issue.

I noticed that some components of cache in k8s are under active development (e.g. The delegator, according to KEP-4568 whose checklist is empty seems not ready now), should I wait them to be stable before further attempting?

You won't need to wait for these. Ideally in the long run, features like KEP-4568 will simply call into the library that we build and we don't necessarily need to rely on their implementation.

@Ikenna-Okpala
Copy link

Hello 👋,

I am Ikenna, a senior computer science student with an interest in distributed systems. I have taken classes in distributed systems, networking, and databases, and I enjoy exploring these domains outside the classroom. This will be a fun project to work on.

@kriyanshii
Copy link

Hi everyone, I'm applying for GSoC under CNCF to work on developing a generic watch cache for etcd. My proposal aims to create a caching layer similar to Kubernetes' watch cache but as a standalone package (go.etcd.io/cache) to improve scalability and simplify infrastructure management on etcd. This will help projects relying on etcd, like Cilium and Calico, by providing a standardized solution for caching watch events and list requests.

A bit about me—I’m a software engineer primarily working with Go, and I enjoy building scalable backend systems and efficient algorithms. I’ve previously worked on distributed systems concepts like MapReduce and have been exploring geospatial data processing. I’m excited about this project as it aligns with my interest in making infrastructure tools more efficient and developer-friendly.

@serathius
Copy link
Member Author

serathius commented Mar 26, 2025

CCing @marseel who offered to help and feedback about potential Cilium integration. Marcel works at Isovalent on Cillium scalability and is the Chair of Kubernetes SIG-scalability.

/cc @marseel

@davvyin
Copy link

davvyin commented Mar 27, 2025

Hi, it seems to be a very interesting challenge to take on.

A little info about me: I am a newly grad cs student, and have some experience dealing with k8s during internship and learning the distributed system (kv, paxos to be specific).

I’m particularly interested in the challenge of making the cache reusable without losing efficiency. Balancing generalization (custom indexing) with performance (like fast watch demultiplexes and list latency) seems important, and I’d love to quantify and help with that .

A quick question: given the complexity, are you envisioning a thin compatibility layer over K8s internals, or a ground-up reimplementation guided by its design?

Looking forward to contributing and learning through this!

@FouoF
Copy link

FouoF commented Mar 28, 2025

Hi, @serathius and @marseel, I checked the implementation of kvstore in cilium, it used a simple map for watch cache so the integration is not hard as long as the new cache library is compatible with etcd client v3. And for calico, the etcd v3 client is recommended to replace caliico typha.
As the cache in K8s codespace is naturally compatible with K8s, can we use successfully integration with common users like cilium and calico as a milestone?

@serathius
Copy link
Member Author

This is a project in etcd repository and mentors approval rights are limited to etcd. Milestones should not depend on other projects that we don't have merge right. We might collaborate, might collect feedback, might propose a PoC, but we cannot take the dependency.

@FouoF
Copy link

FouoF commented Mar 29, 2025

This is a project in etcd repository and mentors approval rights are limited to etcd. Milestones should not depend on other projects that we don't have merge right. We might collaborate, might collect feedback, might propose a PoC, but we cannot take the dependency.

Thanks for your reply. I will first focus on etcd repository itself and keep considering the need of potential users during design.

@Lumen-jane
Copy link

Lumen-jane commented Mar 31, 2025

Hello everyone, I just got to know about GSoC few days ago, I don't know if it's too late for me contribute.

I'm a devOps Engineer, and I haven't contributed to OS before, but I'm willing to learn.

I'm still going through the whole document so as to know where to add my contribution.

But I can't find the link to the slack channel please

@serathius
@MadhavJivrajani

@MadhavJivrajani
Copy link
Contributor

@Lumen-jane please see #19371 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests