Bug: Load time on rpc server with multiple machines #9820

angelosathanasiadis · 2024-10-10T10:54:28Z

What happened?

I have managed to run the rpc server on 2 different machines running ubuntu (with different IPs) with the following commands:

1st machine: bin/rpc-server -H MY_PUPLIC_IP -p 50052
2nd machine: bin/llama-cli -m ../tinydolphin-2.8.2-1.1b-laser.Q4_K_M.gguf -p "Hello, my name is" --repeat-penalty 1.0 -n 6 --rpc MY_PUPLIC_IP:50052 -ngl 99

I have noticed that the load time is huge (Compared to running the model localy using rpc server, where it is only 600ms.):
llama_perf_sampler_print: sampling time = 0,14 ms / 12 runs ( 0,01 ms per token, 82758,62 tokens per second)
llama_perf_context_print: load time = 55658,27 ms
llama_perf_context_print: prompt eval time = 426,00 ms / 6 tokens ( 71,00 ms per token, 14,08 tokens per second)
llama_perf_context_print: eval time = 997,43 ms / 5 runs ( 199,49 ms per token, 5,01 tokens per second)
llama_perf_context_print: total time = 1424,04 ms / 11 tokens

My question is what exaclty happens during the load time?
If I assume that the model exists in all machines, is there the capability to load the model localy instead of loading it through the network?

Name and Version

version: 3789 (d39e267)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

What operating system are you seeing the problem on?

No response

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

rgerganov · 2024-10-10T12:21:34Z

My question is what exaclty happens during the load time?

Model layers are being transferred to the RPC server

If I assume that the model exists in all machines, is there the capability to load the model localy instead of loading it through the network?

This has been requested several times, @slaren put some ideas here

angelosathanasiadis added bug-unconfirmed medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable) labels Oct 10, 2024

github-actions bot added the stale label Nov 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Load time on rpc server with multiple machines #9820

Bug: Load time on rpc server with multiple machines #9820

angelosathanasiadis commented Oct 10, 2024

rgerganov commented Oct 10, 2024

Bug: Load time on rpc server with multiple machines #9820

Bug: Load time on rpc server with multiple machines #9820

Comments

angelosathanasiadis commented Oct 10, 2024

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

rgerganov commented Oct 10, 2024