Inferencing not working with P2P in latest version. #3968

j4ys0n · 2024-10-26T05:44:29Z

LocalAI version:

localai/localai:latest-gpu-nvidia-cuda-12
LocalAI version: v2.22.1 (015835d)

Environment, CPU architecture, OS, and Version:

Linux localai3 6.8.12-2-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-2 (2024-09-05T10:03Z) x86_64 GNU/Linux
(Proxmox LXC, Debian. AMD EPYC 7302P (16 cores allocated)/64GB RAM

Describe the bug

When testing distributed inferencing, i select a model (qwen 2.5 14b), send a chat message, the model loads on both instances (main and worker) and then the model does not respond and the model unloads on the worker. (watching with nvitop)

To Reproduce

description above should reproduce, i tried a few times.

Expected behavior

model should not unload & chat should complete

Logs

worker logs

{"level":"INFO","time":"2024-10-26T05:07:23.924Z","caller":"discovery/dht.go:115","message":" Bootstrapping DHT"}
create_backend: using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA RTX 2000 Ada Generation, compute capability 8.9, VMM: yes
Starting RPC server on 127.0.0.1:46609, backend memory: 16380 MB
Accepted client connection, free_mem=17175674880, total_mem=17175674880
Client connection closed
Accepted client connection, free_mem=17175674880, total_mem=17175674880
Client connection closed
Accepted client connection, free_mem=17175674880, total_mem=17175674880
Client connection closed
Accepted client connection, free_mem=17175674880, total_mem=17175674880
Client connection closed

Accepted client connection, free_mem=17175674880, total_mem=17175674880
Client connection closed
Accepted client connection, free_mem=17175674880, total_mem=17175674880
Client connection closed
Accepted client connection, free_mem=17175674880, total_mem=17175674880
Client connection closed
Accepted client connection, free_mem=17175674880, total_mem=17175674880
Client connection closed

main logs

5:25AM INF Success ip=my.ip.address latency="960.876µs" method=POST status=200 url=/v1/chat/completions
5:25AM INF Trying to load the model 'qwen2.5-14b-instruct' with the backend '[llama-cpp llama-ggml llama-cpp-fallback rwkv stablediffusion whisper piper huggingface bert-embeddings /build/backend/python/rerankers/run.sh /build/backend/python/diffusers/run.sh /build/backend/python/vall-e-x/run.sh /build/backend/python/parler-tts/run.sh /build/backend/python/sentencetransformers/run.sh /build/backend/python/mamba/run.sh /build/backend/python/openvoice/run.sh /build/backend/python/coqui/run.sh /build/backend/python/bark/run.sh /build/backend/python/transformers-musicgen/run.sh /build/backend/python/transformers/run.sh /build/backend/python/exllama2/run.sh /build/backend/python/sentencetransformers/run.sh /build/backend/python/autogptq/run.sh /build/backend/python/vllm/run.sh]'
5:25AM INF [llama-cpp] Attempting to load
5:25AM INF Loading model 'qwen2.5-14b-instruct' with backend llama-cpp
5:25AM INF [llama-cpp-grpc] attempting to load with GRPC variant
5:25AM INF Redirecting 127.0.0.1:35625 to /ip4/worker-ip/udp/44701/quic-v1
5:25AM INF Redirecting 127.0.0.1:35625 to /ip4/worker-ip/udp/44701/quic-v1
5:25AM INF Redirecting 127.0.0.1:35625 to /ip4/worker-ip/udp/44701/quic-v1
5:25AM INF Redirecting 127.0.0.1:35625 to /ip4/worker-ip/udp/44701/quic-v1
5:25AM INF Success ip=127.0.0.1 latency="35.55µs" method=GET status=200 url=/readyz
5:26AM INF Node localai-oYURMqpWCR is offline, deleting
Error accepting:  accept tcp 127.0.0.1:35625: use of closed network connection

Additional context

this worked in the last version, though i'm not sure what that was at this point (~2 weeks ago)
model loads and works fine without the worker.

The text was updated successfully, but these errors were encountered:

j4ys0n · 2024-10-26T05:47:57Z

i'm using docker compose, here's the config. https://github.com/j4ys0n/local-ai-stack

JackBekket · 2024-11-16T00:45:27Z

it is related to edgevpn, it somehow see peer's and addresses but cannot connect with them.

I has tried NAT traversal using libp2p+pubsub and I have managed to make peer discovery and establish p2p connection by randez-vous point.

In your case, if you know your worker address you can just put worker address into ENV of local-ai as gRPC external backend addresses.

mintyleaf · 2024-11-28T16:07:48Z

@mudler

fixed by #4220
more info in duplicate (sorry for that) #4214

vpereira · 2024-12-04T14:19:02Z

fixed by #4220
more info in duplicate (sorry for that) #4214

looks like it still didn't make to the images quay.io/go-skynet/local-ai:latest-cpu or the (dockerhub) localai/localai:latest-cpu?

pratikbin · 2025-02-11T07:12:09Z

Still not able to do p2p inferencing even if workers are online v2.25.0 (07655c0c2e0e5fe2bca86339a12237b69d258636)

server and workers envs

CONTEXT_SIZE: "512"
THREADS: "4"
MODELS_PATH: /models
LLAMACPP_PARALLEL: "999"
TOKEN: &p2ptoken "xxx"
P2P_TOKEN: *p2ptoken
LOCALAI_P2P_TOKEN: *p2ptoken
LOCALAI_P2P_LISTEN_MADDRS: /ip4/0.0.0.0/tcp/8888

Server args: run --p2p

Worker args: worker p2p-llama-cpp-rpc '--llama-cpp-args=-H 0.0.0.0 -p 8082 -m 4096'

Container ports 8082 and 8888 are open using ports.containerPort

k8s based setup

attaching logs below for server and one of the worker out of 2

local-ai-worker.log
local-ai-server.log

mudler · 2025-02-13T18:22:28Z

Still not able to do p2p inferencing even if workers are online v2.25.0 (07655c0c2e0e5fe2bca86339a12237b69d258636)

server and workers envs

CONTEXT_SIZE: "512"
THREADS: "4"
MODELS_PATH: /models
LLAMACPP_PARALLEL: "999"
TOKEN: &p2ptoken "xxx"
P2P_TOKEN: *p2ptoken
LOCALAI_P2P_TOKEN: *p2ptoken
LOCALAI_P2P_LISTEN_MADDRS: /ip4/0.0.0.0/tcp/8888

Server args: run --p2p

Worker args: worker p2p-llama-cpp-rpc '--llama-cpp-args=-H 0.0.0.0 -p 8082 -m 4096'

Container ports 8082 and 8888 are open using ports.containerPort

k8s based setup

attaching logs below for server and one of the worker out of 2

local-ai-worker.log local-ai-server.log

@pratikbin can you share the server logs with debug enabled? DEBUG=true. It looks like some versions incompatibilities.

Also, could you try with master images to double check? that'd be helpful. Thank you!

pratikbin · 2025-02-14T08:51:42Z

There you go. let me know if you need anything else

localai-server.log
localai-worker-1.log

pcfreak30 · 2025-02-17T00:43:13Z

I cannot say if I have the EXACT same issue. I haven't debugged the libp2p/edgevpn code yet.

But P2P doesn't work.

HOWEVER, if I run local-ai worker llama-cpp-rpc --llama-cpp-args=" -p 8080" on my worker node and do LLAMACPP_GRPC_SERVERS="192.168.1.236:8080" DEBUG=true local-ai run falcon3-1b-instruct

It DOES work. I run a small 1B model to just verify that something runs as a baseline for this functionality.

I can manage with this, though the P2P stuff is really helpful in auto discovery.

the host local ai instance also detects peers fine, the GRPC just never seems to get the LLAMACPP_GRPC_SERVERS updated per the logs.

Oh and an obvious thing I notice is the defaulting to binding to 127.0.0.1 but IDK if the VPN somehow bypasses the loopback limitation somehow.

mudler · 2025-02-17T14:44:31Z

There you go. let me know if you need anything else

localai-server.log localai-worker-1.log

mmh ok that looks weird: what's the environment? it looks like they can auto-discover correctly, but it exhausts somehow resource limits. Typically that is set by looking at the system env.

Did you also try to bump the UDP buffer sizes? https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes#non-bsd

pratikbin · 2025-02-25T08:17:58Z

I cannot say if I have the EXACT same issue. I haven't debugged the libp2p/edgevpn code yet.

But P2P doesn't work.

HOWEVER, if I run local-ai worker llama-cpp-rpc --llama-cpp-args=" -p 8080" on my worker node and do LLAMACPP_GRPC_SERVERS="192.168.1.236:8080" DEBUG=true local-ai run falcon3-1b-instruct

It DOES work. I run a small 1B model to just verify that something runs as a baseline for this functionality.

I can manage with this, though the P2P stuff is really helpful in auto discovery.

the host local ai instance also detects peers fine, the GRPC just never seems to get the LLAMACPP_GRPC_SERVERS updated per the logs.

Oh and an obvious thing I notice is the defaulting to binding to 127.0.0.1 but IDK if the VPN somehow bypasses the loopback limitation somehow.

shall try this one

j4ys0n added bug Something isn't working unconfirmed labels Oct 26, 2024

This was referenced Feb 13, 2025

p2p inferencing not work #4214

Closed

fix(p2p): parse correctly ExtraLLamaCPPArgs #4220

Merged

mudler mentioned this issue Feb 17, 2025

chore(deps): Bump edgevpn to v0.30.1 #4840

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inferencing not working with P2P in latest version. #3968

Inferencing not working with P2P in latest version. #3968

j4ys0n commented Oct 26, 2024

j4ys0n commented Oct 26, 2024

JackBekket commented Nov 16, 2024

mintyleaf commented Nov 28, 2024

vpereira commented Dec 4, 2024 •

edited

Loading

pratikbin commented Feb 11, 2025

mudler commented Feb 13, 2025

pratikbin commented Feb 14, 2025

pcfreak30 commented Feb 17, 2025 •

edited

Loading

mudler commented Feb 17, 2025

pratikbin commented Feb 25, 2025

Inferencing not working with P2P in latest version. #3968

Inferencing not working with P2P in latest version. #3968

Comments

j4ys0n commented Oct 26, 2024

j4ys0n commented Oct 26, 2024

JackBekket commented Nov 16, 2024

mintyleaf commented Nov 28, 2024

vpereira commented Dec 4, 2024 • edited Loading

pratikbin commented Feb 11, 2025

mudler commented Feb 13, 2025

pratikbin commented Feb 14, 2025

pcfreak30 commented Feb 17, 2025 • edited Loading

mudler commented Feb 17, 2025

pratikbin commented Feb 25, 2025

vpereira commented Dec 4, 2024 •

edited

Loading

pcfreak30 commented Feb 17, 2025 •

edited

Loading