Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Depthwise 2D convolution #1152

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

Acly
Copy link

@Acly Acly commented Mar 20, 2025

This PR adds kernels for depthwise 2D convolution (CPU only for now).

There is an existing ggml_conv_2d_dw based on im2col + mul_mat, but it has high overhead. The approach makes sense for regular conv2d since it can profit from fast gemm, but depthwise convolution is much simpler and im2col will always slow it down I think.

Timings (W=256, H=256, C=256)

Method Layout Time
ggml_conv_2d_dw WHCN 320 ms ± 25
ggml_depthwise_conv_2d WHCN 25 ms ± 5
ggml_depthwise_conv_2d CWHN 8 ms ± 0.5

Timings (W=1024, H=1024, C=3)

Method Layout Time
ggml_conv_2d_dw WHCN 54.6 ms ± 5
ggml_depthwise_conv_2d WHCN 8.4 ms ± 2
ggml_depthwise_conv_2d CWHN 5.2 ms ± 1

I didn't replace ggml_conv_2d_dw because it supports more backends (and dilation).

Memory layout

Having channels/depth most contiguous in memory allows for better vectorization. It also improves memory access for im2col in regular 2D convolutions, and can avoid many costly ggml_cont(ggml_permute(...)). Since the default for 2D ops on the API seems to be spatial dimension first, this is kept in place, and opportunity to use channels-first kernel is detected from strides. Could also make that more explicit.

Background

I've implemented MobileSAM (fast SAM variant with TinyViT as image encoder) here. Runtime was ~2.1s initially, with depthwise convolution eating a sizeable chunk. After changing memory layout and optimizing conv2d it now runs in 570ms (PyTorch: 608ms, ONNX: 549ms).

Ryzen 5 5600X (6 core, AVX2), windows, OpenBLAS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant