Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu: de-duplicate some of the operators and refactor #1144

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

cmdr2
Copy link
Collaborator

@cmdr2 cmdr2 commented Mar 13, 2025

Note: This PR only deletes lines from ggml-cpu.c, it does not modify any functions in it. I'm not sure why git diff tries to combine them as changes. Standard diff shows the correct diff of ggml-cpu.c: https://gist.github.com/cmdr2/a76df5af311417619788e8330b1908b3

This PR de-duplicates some of the easily-templatized functions in ggml-cpu.c. It takes inspiration from binbcast.cu and common.cuh.

  • Binary: add, sub, mul, div
  • Unary: abs, sgn, neg, step, tanh, elu, relu, sigmoid, hardsigmoid, exp, hardswish, sqr, sqrt, sin, cos, log

This removes the op implementation functions from ggml-cpu.c (around 2000 lines). As a side-effect, all the functions now support bf16 as well as non-contiguous src1.

The performance is the same as the current implementation. It also passes all the runners on ggml-ci, which tested non-contiguous inputs (in SAM) and vDSP on Mac.

GGML_ASSERT( nb0 == sizeof(dst_t));
GGML_ASSERT(nb00 == sizeof(src0_t));

const auto [ir0, ir1] = get_thread_range(params, src0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully all C++ compilers support structured bindings today. If we encounter issues, we might have to return simple struct of integers.

@ggerganov
Copy link
Member

Wait for @slaren's review before merging. Thanks!

@cmdr2
Copy link
Collaborator Author

cmdr2 commented Mar 13, 2025

Wait for slaren's review before merging. Thanks!

@ggerganov Yes, will do :)

Thanks for the comments, fixed them in the latest commit. The CI runner also passes.

@slaren
Copy link
Member

slaren commented Mar 13, 2025

I will review when I have a chance, but I may not be able to do it for a few days.

@cmdr2
Copy link
Collaborator Author

cmdr2 commented Mar 14, 2025

Thanks @slaren No hurry :)

@slaren @ggerganov I made some progress on the next PR (a bigger refactor). It's working after the refactor, and passes the runners on ggml-ci. ggml-cpu.c is at about 3500 lines after this refactoring (down from ~15,000 lines).

Just thought of sharing the approach early, to see if you see any obvious red-flags.

I'll split it into two PRs:

  1. Move the SIMD Mapping lines into a separate header file, and the ggml_vec_ function lines into a separate header file. The vec file imports the SIMD mappings file. No other changes.
  2. Move all the operator function lines (except mul_mat) into a separate C++ file. This file imports the vec functions header.

The operator functions file needed a very small number of cosmetic changes, like replacing direct references to type_traits_cpu with ggml_get_type_traits_cpu(), using static_cast<ggml_op_pool>(opts[0]) etc.

At a broad level, does this approach raise any obvious red-flags for you? It seems to work on all the ggml-ci runners, and doesn't seem to skip any CPU extension. But I'll test and read the diff a few more times before submitting the PRs.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants