Facing issue with c# client in 3 node partitioned k8s cluster #763
Replies: 4 comments 2 replies
-
on your services what is the number of threads and how many CPUs? I suspect threadpool maybe starving if you're seeing a lot of timeouts on jetstream publish. I recommend to await on the publishes rather than collecting them as tasks or use Concurrent publishing if you need to publish fast. there is an example here |
Beta Was this translation helpful? Give feedback.
-
Hello @mtmk, |
Beta Was this translation helpful? Give feedback.
-
Hey @mtmk
This doesn't work as well and I observe the same exceptions as described above. Apparently, the code which I've had shared before (collecting all the produce tasks into list and executing it together) works very well in my local. Since the full application code is proprietary, I can’t share it directly, but I’m working on a sample app to reproduce the issue and will share it once it’s ready. In the meantime, I’d really appreciate any insights or suggestions on what might be causing these timeouts. This has become a significant roadblock for our NATS implementation, and I’d be grateful for any guidance to help us resolve it. If you need more specifics about our setup (e.g., metrics, logs), just let me know—I’m happy to provide what I can. |
Beta Was this translation helpful? Give feedback.
-
hi @romitkmehta, also been experimenting with a new JetStream publish API here if you have some to have a look would appreciate the feedback. thank you |
Beta Was this translation helpful? Give feedback.
-
We have a 3 node nats io jetstream set up in k8s with the below configuration
we have 3 partitoned stream and this is one of the streams below
Nats Node config: 2vCPU 4GB RAM
Note we are not using helm for deployment but rather using helm template files & deploying by ourseleves with GOMEMLIMIT of 3GB (FIle is added at end of question)
Also pls note: I have verified logs of all 3 broker nodes and it does not have any errors or exceptions from nats broker side.
nats stream info :
This is one of the consumer configurations for LRPS stream:
As per #654 I have created single instance of connection per microservice & multiple application threads produce on same connection. This is the logic of producer
Consistently we are getting this exceptions in microservice application logs & things are not working smoothly in our application
7:34:31 ~ Error ~ Exception while producing to topic LRPS.Test.TestTopic1 Exception : NATS.Client.JetStream.NatsJSPublishNoResponseException: No response received from the server at NATS.Client.JetStream.NatsJSContext.PublishAsync[T](String subject, T data, INatsSerialize
1 serializer, NatsJSPubOpts opts, NatsHeaders headers, CancellationToken cancellationToken) at NatsUtility.Core.Services.NatsService.ProduceAsync(String topic, Message
2 message) in /app/NatsUtility.Core/Services/NatsService.cs:line 247we have over 1000+ occurances of this exception but we can see in our test cases around 70% things work but 30% dont work fine.
Sometimes when starting consumers we get this exception too
NATS.Client.JetStream.NatsJSApiNoResponseException: No API response received from the server: Timeout at NATS.Client.JetStream.NatsJSContext.JSRequestAsync[TRequest,TResponse](String subject, TRequest request, CancellationToken cancellationToken) at NATS.Client.JetStream.NatsJSContext.JSRequestResponseAsync[TRequest,TResponse](String subject, TRequest request, CancellationToken cancellationToken) at NATS.Client.JetStream.NatsJSContext.CreateOrUpdateConsumerInternalAsync(String stream, ConsumerConfig config, ConsumerCreateAction action, CancellationToken cancellationToken) at NATS.Client.JetStream.NatsJSContext.CreateOrUpdateConsumerAsync(String stream, ConsumerConfig config, CancellationToken cancellationToken) at NatsUtility.Core.Services.NatsConsumer.StartJetstreamConsumerAsync(Boolean isPartitioned, Int32 partitionId) in /app/NatsUtility.Core/Services/NatsConsumer.cs:line 169
As stated in #349, We assumed our nodes were corrupted but we confirmed by running the same test on only 1 broker with 1 broker connection as shown below with replication of 1
Original Connection String: "nats://nats-0.nats-headless.nats.svc.cluster.local:4222,nats://nats-1.nats-headless.nats.svc.cluster.local:4222,nats://nats-2.nats-headless.nats.svc.cluster.local:4222"
New Connection String: "nats://nats-0.nats-headless.nats.svc.cluster.local:4222" (Tested 1 by 1 from applications)
We are able to see same exception even during 1 broker connection tests. Can anyone pls provide any feedback on what we are doing wrong?
Nats.yaml
Beta Was this translation helpful? Give feedback.
All reactions