Skip to content

Releases: ggml-org/llama.cpp

b4958

25 Mar 18:35
ef19c71
Compare
Choose a tag to compare
run: de-duplicate fmt and format functions and optimize (#11596)

b4957

25 Mar 11:57
053b3f9
Compare
Choose a tag to compare
ggml-cpu : update KleidiAI to v1.5.0 (#12568)

ggml-cpu : bug fix related to KleidiAI LHS packing

Signed-off-by: Dan Johansson <[email protected]>

b4956

25 Mar 11:34
e2f5601
Compare
Choose a tag to compare
SYCL: disable Q4_0 reorder optimization (#12560)

ggml-ci

b4953

25 Mar 08:01
2d77d88
Compare
Choose a tag to compare
context : fix worst-case reserve outputs (#12545)

ggml-ci

b4951

24 Mar 17:03
2b65ae3
Compare
Choose a tag to compare
opencl: simplify kernel embedding logic in cmakefile (#12503)

Co-authored-by: Max Krasnyansky <[email protected]>

b4948

24 Mar 12:15
00d5380
Compare
Choose a tag to compare
llama-vocab : add SuperBPE pre-tokenizer (#12532)

b4947

24 Mar 11:43
7ea7503
Compare
Choose a tag to compare
CUDA: Fix clang warnings (#12540)

Signed-off-by: Xiaodong Ye <[email protected]>

b4946

24 Mar 11:37
c54f6b7
Compare
Choose a tag to compare
mmap : skip resource limit checks on AIX (#12541)

b4945

24 Mar 07:38
9b169a4
Compare
Choose a tag to compare
vulkan: fix mul_mat_vec failure in backend tests (#12529)

The OOB calculation could be wrong if the last iteration was during one of
the unrolled loops. Adjust the unrolling counts to avoid this. Add a couple
new backend tests that hit this failure on NVIDIA GPUs.

b4944

23 Mar 19:17
77f9c6b
Compare
Choose a tag to compare
server : Add verbose output to OAI compatible chat endpoint. (#12246)

Add verbose output to server_task_result_cmpl_final::to_json_oaicompat_chat_stream, making it conform with server_task_result_cmpl_final::to_json_oaicompat_chat, as well as the other to_json methods.