Maximum possible parallelization design (in circuit logic) #1287

jurevreca12 · 2025-02-26T11:42:19Z

jurevreca12
Feb 26, 2025

Hey,
I am working on comparing chisel4ml, hls4ml and FINN based designs, when they are set to the maximum possible parallelization level, i.e., the neural network is implemented "in circuit". For that I am trying to set the proper parameters
of the various IPs in FINN to its maximum possible level using a custom transformation. For now I am using only HLS IP, to implement fully-connected, convolutional and maximum pooling layers. The aforementioned transformation is shown bellow:

For the MVAU_hls IP, I believe I have set the required attributes correctly, however I am not sure about the ConvolutionInputGenerator_hls and the VVAU_hls. For ConvolutionInputGenerator there is a SIMD setting,
which I am not sure what it refers to. Also what is the parallel_window parameter? For VVAU_hls, I am guessing that SIMD should be set to the kernel_size times the number of input channels? And for the PE it should be set to the number of outputs (i.e. out_size[0]*out_size[1]*out_ch)?

Additionally, I am wondering about the difference and proper configuration of the Pool, streamingmaxpool, and globalaccpool_hls. Which one should I use for maximum performance on maximum pooling layers?

The code of the experiments is available at: https://github.com/jurevreca12/c4ml_test_runs/blob/phd-experiments/test_finn.py

More generally, is there some paper/documentation where the convolutional and maximum pooling IPs are explained?

Thank you for your help and best regards.

Answered by fpjentzsch

Feb 26, 2025

Yes, at the bottom is also a more detailed description: https://finn-dev.readthedocs.io/en/latest/internals.html#folding
So, for the ConvInpGen you want parallel_window=1 and SIMD=C.

Well, to parallelize further would require parallelization across the spatial dimensions (H, W), which FINN does not support yet.
Some time ago I worked on this PR (#789) to add this degree of parallelization (controlled via the new folding parameter "M" or "MMV"), but it is currently outdated and I don't know when/if I'll find the time to rework it.

View full answer

fpjentzsch · 2025-02-26T13:42:56Z

fpjentzsch
Feb 26, 2025
Collaborator

Hi,

folding of MVAU, VVAU, and ConvInpGen is described here in the documentation: https://finn-dev.readthedocs.io/en/latest/internals.html#constraints-to-folding-factors-per-layer

For the MaxPool you can use Pool (set PE to #Channels) or StreamingMaxPool (set PE to #Channels or don't set it at all, because the 2D case will always use max PE anyways (see here)).

8 replies

jurevreca12 Feb 26, 2025
Author

So the table of constraints can be interpreted as max settings?
In other words,
for MVAU:
MH % PE == 0 & MW % SIMD == 0 -> implies PE should be MH and SIMD MW to be at max?
for VVAU:
k_h * k_w % SIMD == 0 & channels % PE == 0 -> implies SIMD = k_h * k_w and PE= channels
for POOL:
inp_channels % PE == 0 -> implies PE should be inp_channels?
for ConvolutionInputGenerator:
inp_channels % SIMD == 0 -> SIMD = inp_channels

That is, are those the maximum possible parallelization values? or are there any other parameters that could be set to parallelize the computation further?

jurevreca12 Feb 26, 2025
Author

Also, I watched a video titled Tutorial: FINN HLS Library - Hardware Scheduling and Folding which suggests for convolutional layers (I am guessing VVAU) SIMD should be set to k_h * k_w * IFMChannels.

fpjentzsch Feb 26, 2025
Collaborator

Yes, at the bottom is also a more detailed description: https://finn-dev.readthedocs.io/en/latest/internals.html#folding
So, for the ConvInpGen you want parallel_window=1 and SIMD=C.

Well, to parallelize further would require parallelization across the spatial dimensions (H, W), which FINN does not support yet.
Some time ago I worked on this PR (#789) to add this degree of parallelization (controlled via the new folding parameter "M" or "MMV"), but it is currently outdated and I don't know when/if I'll find the time to rework it.

Answer selected by jurevreca12

fpjentzsch Feb 26, 2025
Collaborator

That video applies to MVAUs. MW = k_h * k_w * IFMChannels and MH = OFMChannels

jurevreca12 Feb 26, 2025
Author

Thanks for the help :-)

jurevreca12 Feb 27, 2025
Author

It seems that VVAU_hls currently limits SIMD parallelism to 1.
https://github.com/Xilinx/finn-hlslib/blob/master/vvau.hpp#L92C1-L92C66
Just adding this here in case some one else runs in to this.

fpjentzsch Feb 27, 2025
Collaborator

No, the finn-hlslib dev branch and the version pinned in FINN (you should use the FINN dev branch by the way) has SIMD support: https://github.com/Xilinx/finn-hlslib/blob/dev/vvau.hpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maximum possible parallelization design (in circuit logic) #1287

{{title}}

Replies: 1 comment 8 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Maximum possible parallelization design (in circuit logic) #1287

jurevreca12 Feb 26, 2025

Replies: 1 comment · 8 replies

fpjentzsch Feb 26, 2025 Collaborator

jurevreca12 Feb 26, 2025 Author

jurevreca12 Feb 26, 2025 Author

fpjentzsch Feb 26, 2025 Collaborator

fpjentzsch Feb 26, 2025 Collaborator

jurevreca12 Feb 26, 2025 Author

jurevreca12 Feb 27, 2025 Author

fpjentzsch Feb 27, 2025 Collaborator

jurevreca12
Feb 26, 2025

Replies: 1 comment 8 replies

fpjentzsch
Feb 26, 2025
Collaborator

jurevreca12 Feb 26, 2025
Author

jurevreca12 Feb 26, 2025
Author

fpjentzsch Feb 26, 2025
Collaborator

fpjentzsch Feb 26, 2025
Collaborator

jurevreca12 Feb 26, 2025
Author

jurevreca12 Feb 27, 2025
Author

fpjentzsch Feb 27, 2025
Collaborator