-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcd reading member-k0-key.pem every second (again) #19505
Comments
You raised this as a bug, what's the exact issue that impacts your business? You also mentions the performance impact. If it's only related to performance, this ticket should be a feature request, also it'd be better to demo how much performance may be affected with or without the dynamically loading certificate feature. |
I don't have a business. It was just observation, disks are slow, ram is fast. That is not a guess, that is a fact. I have never mentioned of disabling dynamically loading the certificate. I just mentioned on doing it using another method, to avoid reading the certificates from disk at every call, that is expensive. You are not expecting to have the certificate replaced multiple times per second, are you? Why not keeping it in memory and loading it from disk at a scheduled manner, once per 10 seconds, for instance? Talking about the bug, you are right. That can be a feature request. A request could be:
|
My intention with this was doing a contribution. Nothing is broken but there is nothing that can't be improved. I have included my evidences that the file is being read from disk multiple times a second on an idle k8s cluster. I don't think that writing code to prove that keeping the certificate in memory is quicker than reading it from disk is going to discover something that can't be predicted. I am not versed with Go and the etcd code, but if someone get interested and agree with me that a little of performance increment can be archived with this I am more than happy to have contributed, and perhaps, with some help, do a code contribution. |
Please note that if rereading private key had significant footprint on etcd, issue would have been discovered earlier. I suspect that after first call, the file is just cached by operating system and we never need to hit the disk. The overhead would only come from syscall which is much smaller. There is always things that can be optimized, but doing so is always a tradeoff. A tradeoff with complexity, code readability or with other execution path. Before we optimize here we need to:
To measure overhead would be good to see some profiles showing that this cost is not negligible. For this case I expect that caching private key might result in two things: delay in dynamic certificate reload, create potential inconsistency with reload of certificate. We need to cache not only private key, but certificate too to ensure they match. |
Bug report criteria
What happened?
I have observed that etcd is reading the private key from disk continuously. The issue observed when an audit rule has been applied to catch the private keys being read.
The issue was discovered running a kubernetes cluster with three nodes deployed with kubespray in three virtual machines managed by vagrant.
What did you expect to happen?
etcd should read the key at start up and keep it in memory for further use.
My previous ticket was closed without any chance of justifying my point of view.
Here goes the replica:
Supporting dynamically loading the certificate does not imply on reading it from disk at every call.
The certificate can be held for a specific amount of time, and replaced when that time is expired. That could be even configurable.
Reading the file at every call can cause a bottleneck and harm the performance.
How can we reproduce it (as minimally and precisely as possible)?
On a k8s control plane node install auditd.
Create an audit rules for monitoring the key file
Inspect audit logs
Anything else we need to know?
No response
Etcd version (please run commands below)
Etcd configuration (command line flags or environment variables)
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
Relevant log output
The text was updated successfully, but these errors were encountered: