cpu: de-duplicate some of the operators and refactor #1144

cmdr2 · 2025-03-13T04:54:51Z

Note: This PR only deletes lines from ggml-cpu.c, it does not modify any functions in it. I'm not sure why git diff tries to combine them as changes. Standard diff shows the correct diff of ggml-cpu.c: https://gist.github.com/cmdr2/a76df5af311417619788e8330b1908b3

This PR de-duplicates some of the easily-templatized functions in ggml-cpu.c. It takes inspiration from binbcast.cu and common.cuh.

Binary: add, sub, mul, div
Unary: abs, sgn, neg, step, tanh, elu, relu, sigmoid, hardsigmoid, exp, hardswish, sqr, sqrt, sin, cos, log

This removes the op implementation functions from ggml-cpu.c (around 2000 lines). As a side-effect, all the functions now support bf16 as well as non-contiguous src1.

The performance is the same as the current implementation. It also passes all the runners on ggml-ci, which tested non-contiguous inputs (in SAM) and vDSP on Mac.

src/ggml-cpu/binary_ops.cpp

src/ggml-cpu/binary_ops.h

ggerganov · 2025-03-13T13:45:19Z

src/ggml-cpu/unary_ops.cpp

+    GGML_ASSERT( nb0 == sizeof(dst_t));
+    GGML_ASSERT(nb00 == sizeof(src0_t));
+
+    const auto [ir0, ir1] = get_thread_range(params, src0);


Hopefully all C++ compilers support structured bindings today. If we encounter issues, we might have to return simple struct of integers.

ggerganov · 2025-03-13T14:16:36Z

Wait for @slaren's review before merging. Thanks!

cmdr2 · 2025-03-13T14:21:07Z

Wait for slaren's review before merging. Thanks!

@ggerganov Yes, will do :)

Thanks for the comments, fixed them in the latest commit. The CI runner also passes.

slaren · 2025-03-13T19:14:32Z

I will review when I have a chance, but I may not be able to do it for a few days.

cmdr2 · 2025-03-14T06:29:55Z

Thanks @slaren No hurry :)

@slaren @ggerganov I made some progress on the next PR (a bigger refactor). It's working after the refactor, and passes the runners on ggml-ci. ggml-cpu.c is at about 3500 lines after this refactoring (down from ~15,000 lines).

Just thought of sharing the approach early, to see if you see any obvious red-flags.

I'll split it into two PRs:

Move the SIMD Mapping lines into a separate header file, and the ggml_vec_ function lines into a separate header file. The vec file imports the SIMD mappings file. No other changes.
Move all the operator function lines (except mul_mat) into a separate C++ file. This file imports the vec functions header.

The operator functions file needed a very small number of cosmetic changes, like replacing direct references to type_traits_cpu with ggml_get_type_traits_cpu(), using static_cast<ggml_op_pool>(opts[0]) etc.

At a broad level, does this approach raise any obvious red-flags for you? It seems to work on all the ggml-ci runners, and doesn't seem to skip any CPU extension. But I'll test and read the diff a few more times before submitting the PRs.

Thanks

cpu: de-duplicate some of the operators and refactor

52ccfb3

cmdr2 mentioned this pull request Mar 13, 2025

ggml : refactor ggml-cpu.c into multiple C++ source files ggml-org/llama.cpp#10180

Open

cmdr2 requested review from slaren and ggerganov March 13, 2025 04:58

ggerganov reviewed Mar 13, 2025

View reviewed changes

src/ggml-cpu/binary_ops.cpp Outdated Show resolved Hide resolved

src/ggml-cpu/binary_ops.cpp Outdated Show resolved Hide resolved

src/ggml-cpu/binary_ops.h Outdated Show resolved Hide resolved

ggerganov reviewed Mar 13, 2025

View reviewed changes

ggerganov approved these changes Mar 13, 2025

View reviewed changes

Fix PR comments

2a32266

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cpu: de-duplicate some of the operators and refactor #1144

cpu: de-duplicate some of the operators and refactor #1144

cmdr2 commented Mar 13, 2025

ggerganov Mar 13, 2025

ggerganov commented Mar 13, 2025

cmdr2 commented Mar 13, 2025

slaren commented Mar 13, 2025

cmdr2 commented Mar 14, 2025 •

edited

Loading

cpu: de-duplicate some of the operators and refactor #1144

Are you sure you want to change the base?

cpu: de-duplicate some of the operators and refactor #1144

Conversation

cmdr2 commented Mar 13, 2025

ggerganov Mar 13, 2025

Choose a reason for hiding this comment

ggerganov commented Mar 13, 2025

cmdr2 commented Mar 13, 2025

slaren commented Mar 13, 2025

cmdr2 commented Mar 14, 2025 • edited Loading

cmdr2 commented Mar 14, 2025 •

edited

Loading