Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dnsdist: cache miss when all tcp only backends are down #15337

Open
2 tasks done
pizchen opened this issue Mar 21, 2025 · 9 comments
Open
2 tasks done

dnsdist: cache miss when all tcp only backends are down #15337

pizchen opened this issue Mar 21, 2025 · 9 comments

Comments

@pizchen
Copy link
Contributor

pizchen commented Mar 21, 2025

  • Program: dnsdist
  • Issue type: Bug report

Short description

Environment

  • Operating system: Ubuntu 22.04.2 LTS with podman 3.4.4
  • Software version: dnsdist 2.0.0-alpha
  • Software source:
    Compiled by myself according to steps in repository builder directory

Steps to reproduce

  1. s = newServer({address="192.168.1.10:53",pool="test",name="test1",tcpOnly=true})
  2. getPool("test"):newPacketCache(100)
  3. setVerbose(true)
  4. Run a dig query from a client with udp request
  5. Ensure query returns success
  6. Run a few the same queries again, ensure cache hit (TTL shall be long enough)
  7. s:setDown()
  8. Perform the same query from client

Expected behaviour

Expect cache hit happens as a result of step 8

Actual behaviour

Currently cache miss is happening

Other information

After digging it seems this is related with cache key calculation strategy in dnsdist. More details can be found in #15335

@rgacogne
Copy link
Member

I'll think about it a bit more, but right now I think what we are currently doing is the only safe approach with the information we have: the query has been received over UDP, we have no information about whether the server is TCP-only, so we do a UDP-based lookup.
We could perhaps improve the situation:

  • having more information about the server, perhaps via a TCP-only flag that could be applied to the pool. At the moment it feels like a can of worms.
  • when we know that we have no server available, we could do a "second-chance" cache lookup for matching TCP-based answers, but we would need to be very careful about not sending a response bigger than the client EDNS UDP payload size (or 512 if there is no EDNS).

@omoerbeek
Copy link
Member

omoerbeek commented Mar 21, 2025

I can see a manual flag go wrong indeed, but does an automatically set flag: "all servers in this pool are TCP only" have the same drawback? This flag should be re-computed on each pool modification.

@pizchen
Copy link
Contributor Author

pizchen commented Mar 21, 2025

A few points in my mind for your considerations:

  • is it a safe assumption that all servers in the same pool will serve the same set or RRs? Followings are assuming yes.
  • in a single pool, some servers could be tcp only and some may support both udp/tcp
  • for a given RR (long size results), tcp servers would return complete result (less than cache entry max size, greater than 512 or possible EDNS size in future query)
  • for udp servers they may return a TC if received query for that RR and the received EDNS size is not enough
  • do we really need to cache the udp based result if TC is set? for what purpose?
  • if we only cache tcp result (the complete one), even upstream is UDP, we can judge based on the received ENDS in query, and the cache entry size, to know we should send back a TC or not. No need to depend on a udp server result.

I guess the main focus is that, if an upstream is udp, is it safe to find a cache result inserted from a tcp backend? I would think it is safe If we are not blindly sending back cache result as a response.
On the other hand, if upstream is tcp based, find a cache entry that returns from an udp backend looks safe.

@rgacogne
Copy link
Member

do we really need to cache the udp based result if TC is set? for what purpose?

Yes, otherwise we need to go the backend for every UDP query. Even if we assume that the backend is eventually going to give us a response over TCP (and it might never happen), if it's larger than the EDNS UDP payload size (or 512 if there is no EDNS in the query) we will have to serve the truncated answer. We used to not cache empty queries and it caused a lot of problems.

if we only cache tcp result (the complete one), even upstream is UDP, we can judge based on the received ENDS in query, and the cache entry size, to know we should send back a TC or not. No need to depend on a udp server result.

Once we get the TCP answer, which might never happen. Also note that we might get a UDP response from the backend that is not truncated but different than the TCP one, because all required records fit but some optional ones did not, so in this case we need to keep the UDP answer as otherwise we would have to tell UDP clients to retry over TCP because the response does not fit, degrading performance.

@pizchen
Copy link
Contributor Author

pizchen commented Mar 29, 2025

I can see that things get too complicated, with including additional RR parts into cache key calculation.

I ran into another issue that when I run a query with EDNS payload 800, while actual result is 1000, so the downstream server returns result with TC. Then the client auto switched to tcp and get the correct result. However, all next the same queries will always get cache miss and can only be relayed to the downstream server first, only after get a TC from the server, then switch to tcp, and can hit the cache and get a result back. Is this reasonable?

In this case, if it is too complicated and error prone to provide a single solution for all possible use case combinations, then we may consider providing configurable options so user can choose a cache key strategy for its own use case. For example, the very basic strategy is to only use dnsheader/qname/class/type, then other parts can be added based on user's configuration, so additional RRs or transport type can be configurable.

@pizchen
Copy link
Contributor Author

pizchen commented Mar 29, 2025

And I did a quick test that, for the same udp query, when I do followings:

  • do not include EDNS0 payload size
  • use EDNS0 payload size 512
  • use EDNS0 payload size 513
  • use EDNS0 payload size 514

Each of above query will create a new cache entry. I don't think this is a quite reasonable behavior.

@rgacogne
Copy link
Member

And I did a quick test that, for the same udp query, when I do followings:

* do not include EDNS0 payload size

* use EDNS0 payload size 512

* use EDNS0 payload size 513

* use EDNS0 payload size 514

Each of above query will create a new cache entry. I don't think this is a quite reasonable behavior.

What dnsdist has is a packet cache, which has some caveats like this one. On the other hand it's very fast, and doesn't aim to replace more complicated caches that the backend pretty much always has, especially since the backend has much more information about the response. So we need to weight the benefits of making the cache more complex, because if in a non-negligible percentage of the cases it slows down the processing compared to no packet cache at all, it's a net loss.

So, correctness issues are not acceptable and we will fix them. Misses that could be turned into hits, these are possible improvements that will be implemented if and only if they don't significantly increase the complexity.

@pizchen
Copy link
Contributor Author

pizchen commented Mar 30, 2025

I ran into another issue that when I run a query with EDNS payload 800, while actual result is 1000, so the downstream server returns result with TC. Then the client auto switched to tcp and get the correct result. However, all next the same queries will always get cache miss and can only be relayed to the downstream server first, only after get a TC from the server, then switch to tcp, and can hit the cache and get a result back. Is this reasonable?

This is due to that, the original udp query only has client cookie, while the fall back tcp query cotains both client and server cookie. Though the cookie content is skipped for cache key calculation, the RR data length and cookie option length both are used. So the fall back tcp query always has a different key than the original udp query.

@pizchen
Copy link
Contributor Author

pizchen commented Mar 30, 2025

What dnsdist has is a packet cache, which has some caveats like this one. On the other hand it's very fast, and doesn't aim to replace more complicated caches that the backend pretty much always has, especially since the backend has much more information about the response. So we need to weight the benefits of making the cache more complex, because if in a non-negligible percentage of the cases it slows down the processing compared to no packet cache at all, it's a net loss.

So, correctness issues are not acceptable and we will fix them. Misses that could be turned into hits, these are possible improvements that will be implemented if and only if they don't significantly increase the complexity.

Ok, agree that my purposed designed test cases with different payload size is likely not realistic. In real world, one client is unlikely to change payload size dynamically. Multiple clients may be, while usually different client programs will be interested in different QNAME as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants