Depthwise 2D convolution #1152

Acly · 2025-03-20T19:36:49Z

This PR adds kernels for depthwise 2D convolution (CPU only for now).

There is an existing ggml_conv_2d_dw based on im2col + mul_mat, but it has high overhead. The approach makes sense for regular conv2d since it can profit from fast gemm, but depthwise convolution is much simpler and im2col will always slow it down I think.

Timings (W=256, H=256, C=256)

Method	Layout	Time
`ggml_conv_2d_dw`	WHCN	320 ms ± 25
`ggml_depthwise_conv_2d`	WHCN	25 ms ± 5
`ggml_depthwise_conv_2d`	CWHN	8 ms ± 0.5

Timings (W=1024, H=1024, C=3)

Method	Layout	Time
`ggml_conv_2d_dw`	WHCN	54.6 ms ± 5
`ggml_depthwise_conv_2d`	WHCN	8.4 ms ± 2
`ggml_depthwise_conv_2d`	CWHN	5.2 ms ± 1

I didn't replace ggml_conv_2d_dw because it supports more backends (and dilation).

Memory layout

Having channels/depth most contiguous in memory allows for better vectorization. It also improves memory access for im2col in regular 2D convolutions, and can avoid many costly ggml_cont(ggml_permute(...)). Since the default for 2D ops on the API seems to be spatial dimension first, this is kept in place, and opportunity to use channels-first kernel is detected from strides. Could also make that more explicit.

Background

I've implemented MobileSAM (fast SAM variant with TinyViT as image encoder) here. Runtime was ~2.1s initially, with depthwise convolution eating a sizeable chunk. After changing memory layout and optimizing conv2d it now runs in 570ms (PyTorch: 608ms, ONNX: 549ms).

Ryzen 5 5600X (6 core, AVX2), windows, OpenBLAS

ggml-cpu : kernels for faster depthwise 2D convolution

0d5d3df

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Depthwise 2D convolution #1152

Depthwise 2D convolution #1152

Acly commented Mar 20, 2025

Depthwise 2D convolution #1152

Are you sure you want to change the base?

Depthwise 2D convolution #1152

Conversation

Acly commented Mar 20, 2025

Timings (W=256, H=256, C=256)

Timings (W=1024, H=1024, C=3)

Memory layout

Background