-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inferencing not working with P2P in latest version. #3968
Comments
i'm using docker compose, here's the config. https://github.com/j4ys0n/local-ai-stack |
it is related to edgevpn, it somehow see peer's and addresses but cannot connect with them. I has tried NAT traversal using libp2p+pubsub and I have managed to make peer discovery and establish p2p connection by randez-vous point. In your case, if you know your worker address you can just put worker address into ENV of local-ai as gRPC external backend addresses. |
@pratikbin can you share the server logs with debug enabled? Also, could you try with master images to double check? that'd be helpful. Thank you! |
There you go. let me know if you need anything else |
I cannot say if I have the EXACT same issue. I haven't debugged the libp2p/edgevpn code yet. But P2P doesn't work. HOWEVER, if I run It DOES work. I run a small 1B model to just verify that something runs as a baseline for this functionality. I can manage with this, though the P2P stuff is really helpful in auto discovery. the host local ai instance also detects peers fine, the GRPC just never seems to get the Oh and an obvious thing I notice is the defaulting to binding to 127.0.0.1 but IDK if the VPN somehow bypasses the loopback limitation somehow. |
mmh ok that looks weird: what's the environment? it looks like they can auto-discover correctly, but it exhausts somehow resource limits. Typically that is set by looking at the system env. Did you also try to bump the UDP buffer sizes? https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes#non-bsd |
shall try this one |
LocalAI version:
localai/localai:latest-gpu-nvidia-cuda-12
LocalAI version: v2.22.1 (015835d)
Environment, CPU architecture, OS, and Version:
Linux localai3 6.8.12-2-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-2 (2024-09-05T10:03Z) x86_64 GNU/Linux
(Proxmox LXC, Debian. AMD EPYC 7302P (16 cores allocated)/64GB RAM
Describe the bug
When testing distributed inferencing, i select a model (qwen 2.5 14b), send a chat message, the model loads on both instances (main and worker) and then the model does not respond and the model unloads on the worker. (watching with nvitop)
To Reproduce
description above should reproduce, i tried a few times.
Expected behavior
model should not unload & chat should complete
Logs
worker logs
main logs
Additional context
this worked in the last version, though i'm not sure what that was at this point (~2 weeks ago)
model loads and works fine without the worker.
The text was updated successfully, but these errors were encountered: