-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jetstream KV watcher does not watch after NATS server restart #1094
Comments
Once reconnected it will just forever log consumer not active rather than recover. |
I am not sure what you mean @ripienaar. As long as I don't use option sub, err := js.Subscribe("$KV.bucket.>", func(msg *nats.Msg) {
spew.Dump(msg.Data)
})
if err != nil {
return err
}
defer sub.Unsubscribe()
<-ctx.Done() |
I mean normal KV watch fail in that way |
Ah, yeah, right. Failure would also be good so that It can be handled and restarted. |
Here I restarted the server mid-watch, the watcher never worked again and consumer was not recreated. Something not happy with ordered consumers recovery logic. |
@ripienaar we feel confident this is a client issue and does not involve server or unsure? |
I see the problem happen simply by creating an ordered push consumer between version 0.0.33 (works) of natscli and version 0.0.34 (current version, doesn't work), regardless of server version 2.8.4 or 2.9. Given the code path of my test (using Symptom is: you create an ephemeral ordered push consumer on a stream in a client application, you then kill and restart the server, the client app reconnects but the ordered push consumer doesn't receive any messages from that point on (and you see those 'consumer not active' errors). Looking at the traffic when triggering this problem between the two versions of
|
After restart the consumer of the watch is gone, nats.go doesn't recreate it ever or does any consumer info's or anything (-DVV trace checked), seems like a client issue |
It turned out this happens because of the recent change in ordered consumer (#989). Now, ordered consumer always has @ripienaar @derekcollison not sure what the approach here should be - at the very least I think we should allow the customer to change the value of My question is - which setting should be the default one? This was a breaking change in the client so we might want to revert and do a patch release, unless there is a good reason to change this behavior. |
Seems to me that during a single invocation of ordered consumer it would know it’s state - what msg it received last. So if the consumer is lost it can recreate it to continue from last known position. It would have to do this anyway regardless of storage type since many things can happen to consumers. If it did that it would be reliable even with memory storage. |
I think more clients can be vulnerable to this issue. @ripienaar That's basically applying what we do on sequence mismatch also to reconnects, so should be pretty straightforward fix. |
Yes since everyone copy go I suspect you are right. |
You would do the recreate any time the consumer is gone right not just on reconnects? |
Well, as we know server does not notify client that consumer is gone, so I would assume two scenarios: Those scenarios will be way more often with recent change to Ordered Consumer being R1 and in memory, so we should make sure all clients are recreating ordered consumers properly. It's not only about copying Go, but also ADR needing some additional info about recreation. |
Sounds good to me. I also noticed we treat timeouts on create - like you are connected but maybe foreign jetstream is down - as a critical failure that returns timeout error to ordered consumer. Should probably make sure that kind of failures are also handled as for retry rather than fail |
Thanks for the comments on this, yeah, recreating the consumer sound good. @ripienaar When working on this I'll also change the timeout on create behavior. |
IIRC ordered consumers create new consumers when they miss a sequence number or fail to receive heartbeats, so that should continue to work. Obviously does not, but not sure why the heartbeat missed code that kicks in creating a new consumer not working. |
@derekcollison Because this was never implemented: #789. The activityCheck would simply send an error to the async error callback (if one is provided). There was no special handling for ordered consumers (as you can see, the activityCheck was introduced in OrderedConsumer PR but not recreating it). |
Ah! Thank you. Got it now, I think we keep them in memory but add logic to recreate on missed heartbeats. Apologies that was not done. |
KV watcher does not do its job after Nats server is restarted.
nats.go version v1.17.0
nats server v2.9.1
Steps or code to reproduce the issue:
Expected result:
After nats restart watcher backs to printing latest KV updates
Actual result:
Watcher gives no updates
Error happens time to time nats: consumer not active on connection [101] for subscription on "$KV.bucket.>"
js.Subscribe() without
nats.OrderedConsumer()
option behaves correctly, but this option is built-in into kv.Watch...Thanks.
The text was updated successfully, but these errors were encountered: