Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b4958
b4957
ggml-cpu : update KleidiAI to v1.5.0 (#12568) ggml-cpu : bug fix related to KleidiAI LHS packing Signed-off-by: Dan Johansson <[email protected]>
b4956
SYCL: disable Q4_0 reorder optimization (#12560) ggml-ci
b4953
context : fix worst-case reserve outputs (#12545) ggml-ci
b4951
opencl: simplify kernel embedding logic in cmakefile (#12503) Co-authored-by: Max Krasnyansky <[email protected]>
b4948
llama-vocab : add SuperBPE pre-tokenizer (#12532)
b4947
CUDA: Fix clang warnings (#12540) Signed-off-by: Xiaodong Ye <[email protected]>
b4946
mmap : skip resource limit checks on AIX (#12541)
b4945
vulkan: fix mul_mat_vec failure in backend tests (#12529) The OOB calculation could be wrong if the last iteration was during one of the unrolled loops. Adjust the unrolling counts to avoid this. Add a couple new backend tests that hit this failure on NVIDIA GPUs.
b4944
server : Add verbose output to OAI compatible chat endpoint. (#12246) Add verbose output to server_task_result_cmpl_final::to_json_oaicompat_chat_stream, making it conform with server_task_result_cmpl_final::to_json_oaicompat_chat, as well as the other to_json methods.