No OpenCL platforms reported #6951

perrymacmurray · 2021-05-17T15:01:09Z

Windows Build Number

21382.1

WSL Version

WSL 2
WSL 1

Kernel Version

5.10.16.3

Distro Version

Ubuntu 20.04

Other Software

Inside WSL:
clinfo (for checking OpenCL platforms)
CUDA 11.3 (docker container runs with NVIDIA_DISABLE_REQUIRE=1, as it otherwise thinks it's running 11.0)
Docker 20.10.6, build 370c289 (with custom container)
nvidia-docker2 2.5.0-1

On Windows:
NVIDIA Graphics Driver for CUDA on WSL 470.14

Repro Steps

I installed the Nvidia drivers and docker as according to Nvidia's user guide
I am however running an older version of nvidia-docker2 (and dependencies) as according to a forum post here

Additionally, I have also installed the CUDA on WSL driver here

Steps:
Run clinfo (both in and outside of the Docker container)

Expected Behavior

clinfo should return the graphics card (in my case, GTX 970) as an OpenCL platform

Actual Behavior

clinfo reports 0 platforms available, both inside the container and just on WSL

Diagnostic Logs

cuda
nvidia-container-cli
glxinfo (from inside of container)
glxinfo (from WSL, outside of container)
wsl.etl

chrisfranko · 2021-07-17T03:29:10Z

One day <3

TGM · 2021-12-06T16:00:49Z

Any update here?

wuweijia1994 · 2021-12-14T01:41:53Z

Check for any update

richgel999 · 2021-12-19T02:26:36Z

Yes, OpenCL is a crucial feature. We're putting together a native Linux box for testing next week due to this.

bridgerrholt · 2021-12-19T04:23:29Z

This would be wonderful for my team. We have considered rewriting everything in cuda, but that has major downsides. Until OpenCL support is released, we are stuck dual-booting.

lmeyerov · 2021-12-24T01:57:17Z

YEP

jincheng-ai · 2021-12-27T02:36:24Z

hope for any update

jincheng-ai · 2021-12-27T02:36:37Z

hope for any update

lmeyerov · 2021-12-27T05:31:21Z

In theory OpenCL/WSL2 may now work for Intel Integrated Graphics GPUs: https://devblogs.microsoft.com/commandline/oneapi-l0-openvino-and-opencl-coming-to-the-windows-subsystem-for-linux-for-intel-gpus/

Trying a few days ago, I didn't see any CPU platforms get registered (I am on AMD for CPU) nor any GPU (I am on Nvidia for GPU)

clausagerskov · 2022-03-15T18:19:41Z

any info about whether NVIDIA GPU computing is planned to be supported for OpenCL?

Foosec · 2022-05-21T17:23:23Z

Any new info a year later?

richgel999 · 2022-05-21T17:36:03Z

Any new info a year later?

Better late than never, right?

HO-COOH · 2022-06-15T15:02:39Z

Still waiting

liubola · 2022-09-23T01:33:28Z

Still waiting

Eboubaker · 2022-11-28T13:13:41Z

same issue when trying to run a boost example program

terminate called after throwing an instance of 'boost::wrapexcept<boost::compute::no_device_found>'
  what():  No OpenCL device found

husmen · 2023-02-05T16:32:32Z

I should have checked this before wasting a whole day trying to get it to work ... still waiting

73ad · 2023-02-13T16:13:38Z

I should have checked this before wasting a whole day trying to get it to work ... still waiting

same

jorgevazquezperez · 2023-02-14T11:56:13Z

I found the solution, as it is going to be usual from now on, by asking ChatGPT. To set up OpenCL on WSL, you can follow these general steps:

Install a Linux distribution in WSL, such as Ubuntu, and make sure it is up to date.
Install OpenCL driver for your GPU. You can download the appropriate driver from the GPU vendor's website and follow the installation instructions.
Install the OpenCL development package, which contains the necessary libraries, headers, and tools to develop OpenCL applications. You can do this by running the following command in a terminal:

sudo apt-get install ocl-icd-opencl-dev

Install an OpenCL implementation, such as the open-source OpenCL runtime from the Khronos Group called "POCL." You can install POCL by running the following command:

sudo apt-get install pocl-opencl-icd

Set the LD_LIBRARY_PATH environment variable to point to the OpenCL libraries. You can do this by adding the following line to your ~/.bashrc file:

export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/:$LD_LIBRARY_PATH

After completing these steps, you should be able to use OpenCL on WSL. Note that the specific steps and packages required may vary depending on the Linux distribution, GPU hardware, and OpenCL implementation you are using.

lmeyerov · 2023-02-14T14:07:02Z

@jorgevazquezperez is that proven working or a hallucination?

jorgevazquezperez · 2023-02-14T15:46:25Z

Proven working. It is needed to note that I have only achieved it with the CPU, but I am in process to be able to do it with the GPU. I attach you a picture with the results and I will keep you updated with the GPU version (as I imagine that it is the one you all are looking forward to). If you need more info just tell me!

PD: I am using python with the pyopencl library.

lmeyerov · 2023-02-14T16:09:24Z

Yes, afaict CPU and integrated Intel GPU should work, but unclear if/how Nvidia

gyferlim · 2023-08-12T17:58:27Z

I manage to get Intel OpenCL working in WSL2, I think.

Follow the instructions given here : https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-linux/2023-0/configure-wsl-2-for-gpu-workflows.html#UBUNTU-22-04-JAMMY
create a file name "intel.icd" in /etc/OpenCL/vendors , with

/usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so

Number of platforms                               2
  Platform Name                                   Intel(R) OpenCL Graphics
  Platform Vendor                                 Intel(R) Corporation
  Platform Version                                OpenCL 3.0
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_byte_addressable_store cl_khr_device_uuid cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_suggested_local_work_size cl_intel_split_work_group_barrier cl_khr_fp64 cl_khr_subgroups cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_ext_float_atomics cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_spirv_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_khr_gl_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info
  Platform Extensions with Version                cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_device_uuid                                               0x400000 (1.0.0)
                                                  cl_khr_fp16                                                      0x400000 (1.0.0)
                                                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_intel_command_queue_families                                  0x400000 (1.0.0)
                                                  cl_intel_subgroups                                               0x400000 (1.0.0)
                                                  cl_intel_required_subgroup_size                                  0x400000 (1.0.0)
                                                  cl_intel_subgroups_short                                         0x400000 (1.0.0)
                                                  cl_khr_spir                                                      0x400000 (1.0.0)
                                                  cl_intel_accelerator                                             0x400000 (1.0.0)
                                                  cl_intel_driver_diagnostics                                      0x400000 (1.0.0)
                                                  cl_khr_priority_hints                                            0x400000 (1.0.0)
                                                  cl_khr_throttle_hints                                            0x400000 (1.0.0)
                                                  cl_khr_create_command_queue                                      0x400000 (1.0.0)
                                                  cl_intel_subgroups_char                                          0x400000 (1.0.0)
                                                  cl_intel_subgroups_long                                          0x400000 (1.0.0)
                                                  cl_khr_il_program                                                0x400000 (1.0.0)
                                                  cl_intel_mem_force_host_memory                                   0x400000 (1.0.0)
                                                  cl_khr_subgroup_extended_types                                   0x400000 (1.0.0)
                                                  cl_khr_subgroup_non_uniform_vote                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_ballot                                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_non_uniform_arithmetic                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle                                          0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle_relative                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_clustered_reduce                                 0x400000 (1.0.0)
                                                  cl_intel_device_attribute_query                                  0x400000 (1.0.0)
                                                  cl_khr_suggested_local_work_size                                 0x400000 (1.0.0)
                                                  cl_intel_split_work_group_barrier                                0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)
                                                  cl_khr_subgroups                                                 0x400000 (1.0.0)
                                                  cl_intel_spirv_media_block_io                                    0x400000 (1.0.0)
                                                  cl_intel_spirv_subgroups                                         0x400000 (1.0.0)
                                                  cl_khr_spirv_no_integer_wrap_decoration                          0x400000 (1.0.0)
                                                  cl_intel_unified_shared_memory                                   0x400000 (1.0.0)
                                                  cl_khr_mipmap_image                                              0x400000 (1.0.0)
                                                  cl_khr_mipmap_image_writes                                       0x400000 (1.0.0)
                                                  cl_ext_float_atomics                                             0x400000 (1.0.0)
                                                  cl_intel_planar_yuv                                              0x400000 (1.0.0)
                                                  cl_intel_packed_yuv                                              0x400000 (1.0.0)
                                                  cl_intel_motion_estimation                                       0x400000 (1.0.0)
                                                  cl_intel_device_side_avc_motion_estimation                       0x400000 (1.0.0)
                                                  cl_intel_spirv_device_side_avc_motion_estimation                 0x400000 (1.0.0)
                                                  cl_intel_advanced_motion_estimation                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_image2d_from_buffer                                       0x400000 (1.0.0)
                                                  cl_khr_depth_images                                              0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_intel_media_block_io                                          0x400000 (1.0.0)
                                                  cl_khr_gl_sharing                                                0x400000 (1.0.0)
                                                  cl_khr_gl_depth_images                                           0x400000 (1.0.0)
                                                  cl_khr_gl_event                                                  0x400000 (1.0.0)
                                                  cl_khr_gl_msaa_sharing                                           0x400000 (1.0.0)
                                                  cl_intel_va_api_media_sharing                                    0x400000 (1.0.0)
                                                  cl_intel_sharing_format_query                                    0x400000 (1.0.0)
                                                  cl_khr_pci_bus_info                                              0x400000 (1.0.0)
  Platform Numeric Version                        0xc00000 (3.0.0)
  Platform Extensions function suffix             INTEL
  Platform Host timer resolution                  1ns

  Platform Name                                   Portable Computing Language
  Platform Vendor                                 The pocl project
  Platform Version                                OpenCL 3.0 PoCL 4.1-pre main-0-ga3e43d58  Linux, Debug+Asserts, RELOC, SPIR, LLVM 14.0.0, SLEEF, POCL_DEBUG
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_pocl_content_size
  Platform Extensions with Version                cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_pocl_content_size                                             0x400000 (1.0.0)
  Platform Numeric Version                        0xc00000 (3.0.0)
  Platform Extensions function suffix             POCL
  Platform Host timer resolution                  0ns

  Platform Name                                   Intel(R) OpenCL Graphics
Number of devices                                 1
  Device Name                                     Intel(R) Graphics [0x5917]
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 3.0 NEO
  Device UUID                                     86801759-0700-0000-0002-000000000000
  Driver UUID                                     32332e32-322e-3236-3531-362e31380000
  Valid Device LUID                               No
  Device LUID                                     5017-c9c1fd7f0000
  Device Node Mask                                0
  Device Numeric Version                          0xc00000 (3.0.0)
  Driver Version                                  23.22.26516.18
  Device OpenCL C Version                         OpenCL C 1.2
  Device OpenCL C all versions                    OpenCL C                                                         0x400000 (1.0.0)
                                                  OpenCL C                                                         0x401000 (1.1.0)
                                                  OpenCL C                                                         0x402000 (1.2.0)
                                                  OpenCL C                                                         0xc00000 (3.0.0)
  Device OpenCL C features                        __opencl_c_int64                                                 0xc00000 (3.0.0)
                                                  __opencl_c_3d_image_writes                                       0xc00000 (3.0.0)
                                                  __opencl_c_images                                                0xc00000 (3.0.0)
                                                  __opencl_c_read_write_images                                     0xc00000 (3.0.0)
                                                  __opencl_c_atomic_order_acq_rel                                  0xc00000 (3.0.0)
                                                  __opencl_c_atomic_order_seq_cst                                  0xc00000 (3.0.0)
                                                  __opencl_c_atomic_scope_all_devices                              0xc00000 (3.0.0)
                                                  __opencl_c_atomic_scope_device                                   0xc00000 (3.0.0)
                                                  __opencl_c_generic_address_space                                 0xc00000 (3.0.0)
                                                  __opencl_c_program_scope_global_variables                        0xc00000 (3.0.0)
                                                  __opencl_c_work_group_collective_functions                       0xc00000 (3.0.0)
                                                  __opencl_c_subgroups                                             0xc00000 (3.0.0)
                                                  __opencl_c_pipes                                                 0xc00000 (3.0.0)
                                                  __opencl_c_fp64                                                  0xc00000 (3.0.0)
  Latest comfornace test passed                   v2022-04-22-00
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               24
  Max clock frequency                             1150MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Preferred work group size multiple (device)     32
  Preferred work group size multiple (kernel)     32
  Max sub-groups per work group                   32
  Sub-group sizes (Intel)                         8, 16, 32
  Preferred / native vector sizes
    char                                                16 / 16
    short                                                8 / 8
    int                                                  4 / 4
    long                                                 1 / 1
    half                                                 8 / 8        (cl_khr_fp16)
    float                                                1 / 1
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              5101215744 (4.751GiB)
  Error Correction support                        No
  Max memory allocation                           1073741824 (1024MiB)
  Unified memory for Host and Device              Yes
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   Yes
    Fine-grained system sharing                   No
    Atomics                                       Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Preferred alignment for atomics
    SVM                                           64 bytes
    Global                                        64 bytes
    Local                                         64 bytes
  Atomic memory capabilities                      relaxed, acquire/release, sequentially-consistent, work-group scope, device scope, all-devices scope
  Atomic fence capabilities                       relaxed, acquire/release, sequentially-consistent, work-item scope, work-group scope, device scope, all-devices scope
  Max size for global variable                    65536 (64KiB)
  Preferred total size of global vars             1073741824 (1024MiB)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        786432 (768KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            67108864 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   4 bytes
    Pitch alignment for 2D image buffers          4 pixels
    Max 2D image size                             16384x16384 pixels
    Max planar YUV image size                     16384x16352 pixels
    Max 3D image size                             16384x16384x2048 pixels
    Max number of read image args                 128
    Max number of write image args                128
    Max number of read/write image args           128
  Pipe support                                    Yes
  Max number of pipe args                         16
  Max active pipe reservations                    1
  Max pipe packet size                            1024
  Local memory type                               Local
  Local memory size                               65536 (64KiB)
  Max number of constant args                     8
  Max constant buffer size                        1073741824 (1024MiB)
  Generic address space support                   Yes
  Max size of kernel argument                     2048 (2KiB)
  Queue properties (on host)
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Device enqueue capabilities                     (n/a)
  Queue properties (on device)
    Out-of-order execution                        No
    Profiling                                     No
    Preferred size                                0
    Max size                                      0
  Max queues on device                            0
  Max events on device                            0
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      83ns
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Non-uniform work-groups                       Yes
    Work-group collective functions               Yes
    Sub-group independent forward progress        Yes
    IL version                                    SPIR-V_1.2
    ILs with version                              SPIR-V                                                           0x402000 (1.2.0)
    SPIR versions                                 1.2
  printf() buffer size                            4194304 (4MiB)
  Built-in kernels                                block_motion_estimate_intel;block_advanced_motion_estimate_check_intel;block_advanced_motion_estimate_bidirectional_check_intel;
  Built-in kernels with version                   block_motion_estimate_intel                                      0x400000 (1.0.0)
                                                  block_advanced_motion_estimate_check_intel                       0x400000 (1.0.0)
                                                  block_advanced_motion_estimate_bidirectional_check_intel         0x400000 (1.0.0)
  Motion Estimation accelerator version (Intel)   2
    Device-side AVC Motion Estimation version     1
      Supports texture sampler use                Yes
      Supports preemption                         No
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_device_uuid cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_suggested_local_work_size cl_intel_split_work_group_barrier cl_khr_fp64 cl_khr_subgroups cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_ext_float_atomics cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_spirv_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_khr_gl_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info
  Device Extensions with Version                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_device_uuid                                               0x400000 (1.0.0)
                                                  cl_khr_fp16                                                      0x400000 (1.0.0)
                                                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_intel_command_queue_families                                  0x400000 (1.0.0)
                                                  cl_intel_subgroups                                               0x400000 (1.0.0)
                                                  cl_intel_required_subgroup_size                                  0x400000 (1.0.0)
                                                  cl_intel_subgroups_short                                         0x400000 (1.0.0)
                                                  cl_khr_spir                                                      0x400000 (1.0.0)
                                                  cl_intel_accelerator                                             0x400000 (1.0.0)
                                                  cl_intel_driver_diagnostics                                      0x400000 (1.0.0)
                                                  cl_khr_priority_hints                                            0x400000 (1.0.0)
                                                  cl_khr_throttle_hints                                            0x400000 (1.0.0)
                                                  cl_khr_create_command_queue                                      0x400000 (1.0.0)
                                                  cl_intel_subgroups_char                                          0x400000 (1.0.0)
                                                  cl_intel_subgroups_long                                          0x400000 (1.0.0)
                                                  cl_khr_il_program                                                0x400000 (1.0.0)
                                                  cl_intel_mem_force_host_memory                                   0x400000 (1.0.0)
                                                  cl_khr_subgroup_extended_types                                   0x400000 (1.0.0)
                                                  cl_khr_subgroup_non_uniform_vote                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_ballot                                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_non_uniform_arithmetic                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle                                          0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle_relative                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_clustered_reduce                                 0x400000 (1.0.0)
                                                  cl_intel_device_attribute_query                                  0x400000 (1.0.0)
                                                  cl_khr_suggested_local_work_size                                 0x400000 (1.0.0)
                                                  cl_intel_split_work_group_barrier                                0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)
                                                  cl_khr_subgroups                                                 0x400000 (1.0.0)
                                                  cl_intel_spirv_media_block_io                                    0x400000 (1.0.0)
                                                  cl_intel_spirv_subgroups                                         0x400000 (1.0.0)
                                                  cl_khr_spirv_no_integer_wrap_decoration                          0x400000 (1.0.0)
                                                  cl_intel_unified_shared_memory                                   0x400000 (1.0.0)
                                                  cl_khr_mipmap_image                                              0x400000 (1.0.0)
                                                  cl_khr_mipmap_image_writes                                       0x400000 (1.0.0)
                                                  cl_ext_float_atomics                                             0x400000 (1.0.0)
                                                  cl_intel_planar_yuv                                              0x400000 (1.0.0)
                                                  cl_intel_packed_yuv                                              0x400000 (1.0.0)
                                                  cl_intel_motion_estimation                                       0x400000 (1.0.0)
                                                  cl_intel_device_side_avc_motion_estimation                       0x400000 (1.0.0)
                                                  cl_intel_spirv_device_side_avc_motion_estimation                 0x400000 (1.0.0)
                                                  cl_intel_advanced_motion_estimation                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_image2d_from_buffer                                       0x400000 (1.0.0)
                                                  cl_khr_depth_images                                              0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_intel_media_block_io                                          0x400000 (1.0.0)
                                                  cl_khr_gl_sharing                                                0x400000 (1.0.0)
                                                  cl_khr_gl_depth_images                                           0x400000 (1.0.0)
                                                  cl_khr_gl_event                                                  0x400000 (1.0.0)
                                                  cl_khr_gl_msaa_sharing                                           0x400000 (1.0.0)
                                                  cl_intel_va_api_media_sharing                                    0x400000 (1.0.0)
                                                  cl_intel_sharing_format_query                                    0x400000 (1.0.0)
                                                  cl_khr_pci_bus_info                                              0x400000 (1.0.0)

  Platform Name                                   Portable Computing Language
Number of devices                                 1
  Device Name                                     cpu-Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
  Device Vendor                                   GenuineIntel
  Device Vendor ID                                0x6c636f70
  Device Version                                  OpenCL 3.0 PoCL HSTR: cpu-x86_64-pc-linux-gnu-skylake
  Device Numeric Version                          0xc00000 (3.0.0)
  Driver Version                                  4.1-pre main-0-ga3e43d58
  Device OpenCL C Version                         OpenCL C 1.2 PoCL
  Device OpenCL C all versions                    OpenCL C                                                         0x400000 (1.0.0)
                                                  OpenCL C                                                         0x401000 (1.1.0)
                                                  OpenCL C                                                         0x402000 (1.2.0)
                                                  OpenCL C                                                         0xc00000 (3.0.0)
  Device OpenCL C features                        __opencl_c_3d_image_writes                                       0xc00000 (3.0.0)
                                                  __opencl_c_images                                                0xc00000 (3.0.0)
                                                  __opencl_c_atomic_order_acq_rel                                  0xc00000 (3.0.0)
                                                  __opencl_c_atomic_order_seq_cst                                  0xc00000 (3.0.0)
                                                  __opencl_c_atomic_scope_device                                   0xc00000 (3.0.0)
                                                  __opencl_c_program_scope_global_variables                        0xc00000 (3.0.0)
                                                  __opencl_c_generic_address_space                                 0xc00000 (3.0.0)
                                                  __opencl_c_subgroups                                             0xc00000 (3.0.0)
                                                  __opencl_c_atomic_scope_all_devices                              0xc00000 (3.0.0)
                                                  __opencl_c_read_write_images                                     0xc00000 (3.0.0)
                                                  __opencl_c_fp64                                                  0xc00000 (3.0.0)
                                                  __opencl_c_int64                                                 0xc00000 (3.0.0)
  Latest comfornace test passed                   v2022-04-19-01
  Device Type                                     CPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               8
  Max clock frequency                             2111MHz
  Device Partition                                (core)
    Max number of sub-devices                     8
    Supported partition types                     equally, by counts
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             4096x4096x4096
  Max work group size                             4096
  Preferred work group size multiple (device)     8
  Preferred work group size multiple (kernel)     8
  Max sub-groups per work group                   128
  Sub-group sizes (Intel)                         1, 2, 4, 8, 16, 32, 64, 128, 256, 512
  Preferred / native vector sizes
    char                                                16 / 16
    short                                               16 / 16
    int                                                  8 / 8
    long                                                 4 / 4
    half                                                 0 / 0        (n/a)
    float                                                8 / 8
    double                                               4 / 4        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              4613160960 (4.296GiB)
  Error Correction support                        No
  Max memory allocation                           2147483648 (2GiB)
  Unified memory for Host and Device              Yes
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   Yes
    Fine-grained system sharing                   No
    Atomics                                       Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Preferred alignment for atomics
    SVM                                           64 bytes
    Global                                        64 bytes
    Local                                         64 bytes
  Atomic memory capabilities                      relaxed, acquire/release, sequentially-consistent, work-group scope, device scope, all-devices scope
  Atomic fence capabilities                       relaxed, acquire/release, sequentially-consistent, work-item scope, work-group scope, device scope
  Max size for global variable                    64000 (62.5KiB)
  Preferred total size of global vars             262144 (256KiB)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        8388608 (8MiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            134217728 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             8192x8192 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                128
    Max number of read/write image args           128
  Pipe support                                    No
  Max number of pipe args                         0
  Max active pipe reservations                    0
  Max pipe packet size                            0
  Local memory type                               Global
  Local memory size                               262144 (256KiB)
  Max number of constant args                     8
  Max constant buffer size                        262144 (256KiB)
  Generic address space support                   Yes
  Max size of kernel argument                     1024
  Queue properties (on host)
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Device enqueue capabilities                     (n/a)
  Queue properties (on device)
    Out-of-order execution                        No
    Profiling                                     No
    Preferred size                                0
    Max size                                      0
  Max queues on device                            0
  Max events on device                            0
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            Yes
    Non-uniform work-groups                       No
    Work-group collective functions               No
    Sub-group independent forward progress        Yes
    IL version                                    (n/a)
    ILs with version                              (n/a)
    SPIR versions                                 (n/a)
  printf() buffer size                            16777216 (16MiB)
  Built-in kernels                                pocl.add.i8;org.khronos.openvx.scale_image.nn.u8;org.khronos.openvx.scale_image.bl.u8;org.khronos.openvx.tensor_convert_depth.wrap.u8.f32
  Built-in kernels with version                   pocl.add.i8                                                      0x402000 (1.2.0)
                                                  org.khronos.openvx.scale_image.nn.u8                             0x402000 (1.2.0)
                                                  org.khronos.openvx.scale_image.bl.u8                             0x402000 (1.2.0)
                                                  org.khronos.openvx.tensor_convert_depth.wrap.u8.f32              0x402000 (1.2.0)
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_command_buffer cl_khr_subgroups cl_intel_unified_shared_memory cl_khr_subgroup_ballot cl_khr_subgroup_shuffle cl_intel_subgroups cl_intel_required_subgroup_size cl_khr_spir cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics
  Device Extensions with Version                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_khr_command_buffer                                              0x9000 (0.9.0)
                                                  cl_khr_subgroups                                                 0x400000 (1.0.0)
                                                  cl_intel_unified_shared_memory                                   0x400000 (1.0.0)
                                                  cl_khr_subgroup_ballot                                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle                                          0x400000 (1.0.0)
                                                  cl_intel_subgroups                                               0x400000 (1.0.0)
                                                  cl_intel_required_subgroup_size                                  0x400000 (1.0.0)
                                                  cl_khr_spir                                                      0x801000 (2.1.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)


NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Intel(R) OpenCL Graphics
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [INTEL]
  clCreateContext(NULL, ...) [default]            Success [INTEL]
  clCreateContext(NULL, ...) [other]              Success [POCL]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 Intel(R) OpenCL Graphics
    Device Name                                   Intel(R) Graphics [0x5917]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Intel(R) OpenCL Graphics
    Device Name                                   Intel(R) Graphics [0x5917]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Intel(R) OpenCL Graphics
    Device Name                                   Intel(R) Graphics [0x5917]

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.3.1
  ICD loader Profile                              OpenCL 3.0

lmeyerov · 2023-08-12T18:01:53Z

This issue is about Nvidia cards not being shown, not intel/amd

Entropy512 · 2023-08-16T19:00:50Z

This issue is about Nvidia cards not being shown, not intel/amd

The title of the issue is simply "No OpenCL platforms reported" - not "No NVidia OpenCL platforms reported"

perrymacmurray · 2023-08-16T19:06:50Z

This issue is about Nvidia cards not being shown, not intel/amd

The title of the issue is simply "No OpenCL platforms reported" - not "No NVidia OpenCL platforms reported"

The issue is about Nvidia cards not being shown.

edmondium · 2023-11-05T05:14:39Z

Date: Nov 4, 2021 https://devblogs.microsoft.com/commandline/oneapi-l0-openvino-and-opencl-coming-to-the-windows-subsystem-for-linux-for-intel-gpus/

extra helpful info:

user@WSL2:~$ sudo clinfo
Number of platforms                               3
  Platform Name                                   Intel(R) OpenCL HD Graphics
  Platform Vendor                                 Intel(R) Corporation
  Platform Version                                OpenCL 3.0
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_fp64 cl_khr_subgroups cl_intel_spirv_device_side_avc_motion_estimation cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory_preview cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info
  Platform Extensions with Version                cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_fp16                                                      0x400000 (1.0.0)
                                                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_intel_command_queue_families                                  0x400000 (1.0.0)
                                                  cl_intel_subgroups                                               0x400000 (1.0.0)
                                                  cl_intel_required_subgroup_size                                  0x400000 (1.0.0)
                                                  cl_intel_subgroups_short                                         0x400000 (1.0.0)
                                                  cl_khr_spir                                                      0x400000 (1.0.0)
                                                  cl_intel_accelerator                                             0x400000 (1.0.0)
                                                  cl_intel_driver_diagnostics                                      0x400000 (1.0.0)
                                                  cl_khr_priority_hints                                            0x400000 (1.0.0)
                                                  cl_khr_throttle_hints                                            0x400000 (1.0.0)
                                                  cl_khr_create_command_queue                                      0x400000 (1.0.0)
                                                  cl_intel_subgroups_char                                          0x400000 (1.0.0)
                                                  cl_intel_subgroups_long                                          0x400000 (1.0.0)
                                                  cl_khr_il_program                                                0x400000 (1.0.0)
                                                  cl_intel_mem_force_host_memory                                   0x400000 (1.0.0)
                                                  cl_khr_subgroup_extended_types                                   0x400000 (1.0.0)
                                                  cl_khr_subgroup_non_uniform_vote                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_ballot                                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_non_uniform_arithmetic                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle                                          0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle_relative                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_clustered_reduce                                 0x400000 (1.0.0)
                                                  cl_intel_device_attribute_query                                  0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)
                                                  cl_khr_subgroups                                                 0x400000 (1.0.0)
                                                  cl_intel_spirv_device_side_avc_motion_estimation                 0x400000 (1.0.0)
                                                  cl_intel_spirv_media_block_io                                    0x400000 (1.0.0)
                                                  cl_intel_spirv_subgroups                                         0x400000 (1.0.0)
                                                  cl_khr_spirv_no_integer_wrap_decoration                          0x400000 (1.0.0)
                                                  cl_intel_unified_shared_memory_preview                           0x400000 (1.0.0)
                                                  cl_khr_mipmap_image                                              0x400000 (1.0.0)
                                                  cl_khr_mipmap_image_writes                                       0x400000 (1.0.0)
                                                  cl_intel_planar_yuv                                              0x400000 (1.0.0)
                                                  cl_intel_packed_yuv                                              0x400000 (1.0.0)
                                                  cl_intel_motion_estimation                                       0x400000 (1.0.0)
                                                  cl_intel_device_side_avc_motion_estimation                       0x400000 (1.0.0)
                                                  cl_intel_advanced_motion_estimation                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_image2d_from_buffer                                       0x400000 (1.0.0)
                                                  cl_khr_depth_images                                              0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_intel_media_block_io                                          0x400000 (1.0.0)
                                                  cl_intel_va_api_media_sharing                                    0x400000 (1.0.0)
                                                  cl_intel_sharing_format_query                                    0x400000 (1.0.0)
                                                  cl_khr_pci_bus_info                                              0x400000 (1.0.0)
  Platform Numeric Version                        0xc00000 (3.0.0)
  Platform Extensions function suffix             INTEL
  Platform Host timer resolution                  1ns

  Platform Name                                   Intel(R) OpenCL HD Graphics
Number of devices                                 1
  Device Name                                     Intel(R) Graphics [0x5917]
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 3.0 NEO
  Device Numeric Version                          0xc00000 (3.0.0)
  Driver Version                                  21.35.20826
  Device OpenCL C Version                         OpenCL C 3.0
  Device OpenCL C all versions                    OpenCL C                                                         0x400000 (1.0.0)
                                                  OpenCL C                                                         0x401000 (1.1.0)
                                                  OpenCL C                                                         0x402000 (1.2.0)
                                                  OpenCL C                                                         0x800000 (2.0.0)
                                                  OpenCL C                                                         0xc00000 (3.0.0)
  Device OpenCL C features                        __opencl_c_int64                                                 0xc00000 (3.0.0)
                                                  __opencl_c_3d_image_writes                                       0xc00000 (3.0.0)
                                                  __opencl_c_images                                                0xc00000 (3.0.0)
                                                  __opencl_c_read_write_images                                     0xc00000 (3.0.0)
                                                  __opencl_c_atomic_order_acq_rel                                  0xc00000 (3.0.0)
                                                  __opencl_c_atomic_order_seq_cst                                  0xc00000 (3.0.0)
                                                  __opencl_c_atomic_scope_all_devices                              0xc00000 (3.0.0)
                                                  __opencl_c_atomic_scope_device                                   0xc00000 (3.0.0)
                                                  __opencl_c_generic_address_space                                 0xc00000 (3.0.0)
                                                  __opencl_c_program_scope_global_variables                        0xc00000 (3.0.0)
                                                  __opencl_c_work_group_collective_functions                       0xc00000 (3.0.0)
                                                  __opencl_c_subgroups                                             0xc00000 (3.0.0)
                                                  __opencl_c_device_enqueue                                        0xc00000 (3.0.0)
                                                  __opencl_c_pipes                                                 0xc00000 (3.0.0)
                                                  __opencl_c_fp64                                                  0xc00000 (3.0.0)
  Latest comfornace test passed                   v2021-06-16-00
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               24
  Max clock frequency                             1150MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Preferred work group size multiple (device)     32
  Preferred work group size multiple (kernel)     32
  Max sub-groups per work group                   32
  Sub-group sizes (Intel)                         8, 16, 32
  Preferred / native vector sizes
    char                                                16 / 16
    short                                                8 / 8
    int                                                  4 / 4
    long                                                 1 / 1
    half                                                 8 / 8        (cl_khr_fp16)
    float                                                1 / 1
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              5101215744 (4.751GiB)
  Error Correction support                        No
  Max memory allocation                           1073741824 (1024MiB)
  Unified memory for Host and Device              Yes
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   Yes
    Fine-grained system sharing                   No
    Atomics                                       Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Preferred alignment for atomics
    SVM                                           64 bytes
    Global                                        64 bytes
    Local                                         64 bytes
  Atomic memory capabilities                      relaxed, acquire/release, sequentially-consistent, work-group scope, device scope, all-devices scope
  Atomic fence capabilities                       relaxed, acquire/release, sequentially-consistent, work-item scope, work-group scope, device scope, all-devices scope
  Max size for global variable                    65536 (64KiB)
  Preferred total size of global vars             1073741824 (1024MiB)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        524288 (512KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            67108864 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   4 bytes
    Pitch alignment for 2D image buffers          4 pixels
    Max 2D image size                             16384x16384 pixels
    Max planar YUV image size                     16384x16352 pixels
    Max 3D image size                             16384x16384x2048 pixels
    Max number of read image args                 128
    Max number of write image args                128
    Max number of read/write image args           128
  Pipe support                                    Yes
  Max number of pipe args                         16
  Max active pipe reservations                    1
  Max pipe packet size                            1024
  Local memory type                               Local
  Local memory size                               65536 (64KiB)
  Max number of constant args                     8
  Max constant buffer size                        1073741824 (1024MiB)
  Generic address space support                   Yes
  Max size of kernel argument                     2048 (2KiB)
  Queue properties (on host)
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Device enqueue capabilities                     supported, replaceable default queue
  Queue properties (on device)
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Preferred size                                131072 (128KiB)
    Max size                                      67108864 (64MiB)
  Max queues on device                            1
  Max events on device                            1024
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      83ns
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Non-uniform work-groups                       Yes
    Work-group collective functions               Yes
    Sub-group independent forward progress        Yes
    IL version                                    SPIR-V_1.2
    ILs with version                              SPIR-V                                                           0x402000 (1.2.0)
    SPIR versions                                 1.2
  printf() buffer size                            4194304 (4MiB)
  Built-in kernels                                block_motion_estimate_intel;block_advanced_motion_estimate_check_intel;block_advanced_motion_estimate_bidirectional_check_intel;
  Built-in kernels with version                   block_motion_estimate_intel                                      0x400000 (1.0.0)
                                                  block_advanced_motion_estimate_check_intel                       0x400000 (1.0.0)
                                                  block_advanced_motion_estimate_bidirectional_check_intel         0x400000 (1.0.0)
  Motion Estimation accelerator version (Intel)   2
    Device-side AVC Motion Estimation version     1
      Supports texture sampler use                Yes
      Supports preemption                         No
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_fp64 cl_khr_subgroups cl_intel_spirv_device_side_avc_motion_estimation cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory_preview cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info
  Device Extensions with Version                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_fp16                                                      0x400000 (1.0.0)
                                                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_intel_command_queue_families                                  0x400000 (1.0.0)
                                                  cl_intel_subgroups                                               0x400000 (1.0.0)
                                                  cl_intel_required_subgroup_size                                  0x400000 (1.0.0)
                                                  cl_intel_subgroups_short                                         0x400000 (1.0.0)
                                                  cl_khr_spir                                                      0x400000 (1.0.0)
                                                  cl_intel_accelerator                                             0x400000 (1.0.0)
                                                  cl_intel_driver_diagnostics                                      0x400000 (1.0.0)
                                                  cl_khr_priority_hints                                            0x400000 (1.0.0)
                                                  cl_khr_throttle_hints                                            0x400000 (1.0.0)
                                                  cl_khr_create_command_queue                                      0x400000 (1.0.0)
                                                  cl_intel_subgroups_char                                          0x400000 (1.0.0)
                                                  cl_intel_subgroups_long                                          0x400000 (1.0.0)
                                                  cl_khr_il_program                                                0x400000 (1.0.0)
                                                  cl_intel_mem_force_host_memory                                   0x400000 (1.0.0)
                                                  cl_khr_subgroup_extended_types                                   0x400000 (1.0.0)
                                                  cl_khr_subgroup_non_uniform_vote                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_ballot                                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_non_uniform_arithmetic                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle                                          0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle_relative                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_clustered_reduce                                 0x400000 (1.0.0)
                                                  cl_intel_device_attribute_query                                  0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)
                                                  cl_khr_subgroups                                                 0x400000 (1.0.0)
                                                  cl_intel_spirv_device_side_avc_motion_estimation                 0x400000 (1.0.0)
                                                  cl_intel_spirv_media_block_io                                    0x400000 (1.0.0)
                                                  cl_intel_spirv_subgroups                                         0x400000 (1.0.0)
                                                  cl_khr_spirv_no_integer_wrap_decoration                          0x400000 (1.0.0)
                                                  cl_intel_unified_shared_memory_preview                           0x400000 (1.0.0)
                                                  cl_khr_mipmap_image                                              0x400000 (1.0.0)
                                                  cl_khr_mipmap_image_writes                                       0x400000 (1.0.0)
                                                  cl_intel_planar_yuv                                              0x400000 (1.0.0)
                                                  cl_intel_packed_yuv                                              0x400000 (1.0.0)
                                                  cl_intel_motion_estimation                                       0x400000 (1.0.0)
                                                  cl_intel_device_side_avc_motion_estimation                       0x400000 (1.0.0)
                                                  cl_intel_advanced_motion_estimation                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_image2d_from_buffer                                       0x400000 (1.0.0)
                                                  cl_khr_depth_images                                              0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_intel_media_block_io                                          0x400000 (1.0.0)
                                                  cl_intel_va_api_media_sharing                                    0x400000 (1.0.0)
                                                  cl_intel_sharing_format_query                                    0x400000 (1.0.0)
                                                  cl_khr_pci_bus_info                                              0x400000 (1.0.0)


  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Intel(R) OpenCL HD Graphics
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [INTEL]
  clCreateContext(NULL, ...) [default]            Success [INTEL]
  clCreateContext(NULL, ...) [other]              Success [POCL]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 Intel(R) OpenCL HD Graphics
    Device Name                                   Intel(R) Graphics [0x5917]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Intel(R) OpenCL HD Graphics
    Device Name                                   Intel(R) Graphics [0x5917]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Intel(R) OpenCL HD Graphics
    Device Name                                   Intel(R) Graphics [0x5917]

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.3.1
  ICD loader Profile                              OpenCL 3.0

CLINFO is partially truncated to only show Intel HD Platform part

user@WSL2:~$ sudo hashcat -I
hashcat (v6.2.5) starting in backend information mode

clGetDeviceIDs(): CL_DEVICE_NOT_FOUND

clGetDeviceIDs(): CL_DEVICE_NOT_FOUND

OpenCL Info:
============

OpenCL Platform ID #1
  Vendor..: Intel(R) Corporation
  Name....: Intel(R) OpenCL HD Graphics
  Version.: OpenCL 3.0

  Backend Device ID #1
    Type...........: GPU
    Vendor.ID......: 8
    Vendor.........: Intel(R) Corporation
    Name...........: Intel(R) Graphics [0x5917]
    Version........: OpenCL 3.0 NEO
    Processor(s)...: 24
    Clock..........: 1150
    Memory.Total...: 4864 MB (limited to 512 MB allocatable in one block)
    Memory.Free....: 2400 MB
    OpenCL.Version.: OpenCL C 3.0
    Driver.Version.: 21.35.20826

OpenCL Platform ID #2
  Vendor..: The pocl project
  Name....: Portable Computing Language
  Version.: OpenCL 2.0 pocl 1.8  Linux, None+Asserts, RELOC, LLVM 11.1.0, SLEEF, DISTRO, POCL_DEBUG

  Backend Device ID #2
    Type...........: CPU
    Vendor.ID......: 128
    Vendor.........: GenuineIntel
    Name...........: pthread-Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
    Version........: OpenCL 1.2 pocl HSTR: pthread-x86_64-pc-linux-gnu-skylake
    Processor(s)...: 8
    Clock..........: 2111
    Memory.Total...: 4399 MB (limited to 1024 MB allocatable in one block)
    Memory.Free....: 2167 MB
    OpenCL.Version.: OpenCL C 1.2 pocl
    Driver.Version.: 1.8

OpenCL Platform ID #3
  Vendor..: Mesa
  Name....: Clover
  Version.: OpenCL 1.1 Mesa 22.2.5

Is not a myth. Hope this clear things up for everyone.

edmondium@LAPTOP-1Q9H40K6:~$ clinfo
Abort was called at 54 line in file:
./shared/source/os_interface/windows/wddm/create_um_km_data_translator.cpp
Aborted

lmeyerov · 2023-11-05T09:23:07Z

Looks like an Intel cpu/GPU again ^^^^, so same status

alex-ong · 2024-01-07T05:56:19Z

Has anyone gotten OpenCL working with AMD CPUs (e.g. 2700x, 5600x, 5800x)? At a bare minimum i could do some dev work if that works. Production is a Linux machine running Linux Docker instances, that forwards Nvidia GPU's perfectly fine. I read a few things saying you could "just install the Intel CPU OpenCL driver", and i installed that but still get 0 platforms in clinfo.

Edit: If you use a recent enough version of Ubuntu (i used 24.04, which is bleeding edge), you can just apt install pocl-opencl-icd I was using miniconda3, so i manually built my own image, basing it on ubuntu:24.04 and copying the miniconda3 docker commands exactly, then adding apt install pocl-opencl-icd at the end. This successfully showed my 5600x as an opencl device in clinfo. This does not work for me in 22.04 due to pocl being too old, and it requiring way too many dependencies to recompile it. So you could probably get it working in 22.04 too with enough effort.

(base) root@cfbb31c89f97:/# clinfo
Number of platforms                               1
  Platform Name                                   Portable Computing Language
  Platform Vendor                                 The pocl project
  Platform Version                                OpenCL 3.0 PoCL 4.0+debian  Linux, None+Asserts, RELOC, SPIR, LLVM 15.0.7, SLEEF, DISTRO, POCL_DEBUG
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_pocl_content_size
  Platform Extensions with Version                cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_pocl_content_size                                             0x400000 (1.0.0)
  Platform Numeric Version                        0xc00000 (3.0.0)
  Platform Extensions function suffix             POCL
  Platform Host timer resolution                  0ns

  Platform Name                                   Portable Computing Language
Number of devices                                 1
  Device Name                                     cpu-haswell-AMD Ryzen 5 5600X 6-Core Processor

Note that this is still not a solution for nvidia/amd GPU opencl passthrough, but it's good enough for my development needs.

Seralpa · 2024-01-18T10:52:40Z

@alex-ong I have it working in WSL2 Ubuntu 22.04 without pocl on a laptop with a 5800HS, but for the intel platforms to be detected I have to source the setvars.sh script from the Oneapi installation. source /opt/intel/oneapi/setvars.sh

I installed it a long time ago so I don't recall the details of how I installed it. But I don't remember having much trouble with it.

Bossach · 2024-01-29T02:55:12Z

I was able to run OpenCL on NVIDIA on WSL2 via PoCL
There is "NVIDIA GeForce RTX 3060 Ti" device in clinfo output (listing below) and working OpenCL apps
Windows task manager also shows GPU Cuda utilization when CL programs run
(can say nothing about perfomance but got some benchmark below)

I took the following steps:

instal the latest Windows Nvidia drivers (idk since which version, but new ones can do some clever thing to expose GPU inside WSL)

1.1. Now you can run nvidia-smi in WSL to ensure it works
$ nvidia-smi listing:

Mon Jan 29 04:05:38 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.40.06              Driver Version: 551.23         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060 Ti     On  |   00000000:01:00.0  On |                  N/A |
|  0%   37C    P8             11W /  225W |     607MiB /   8192MiB |      7%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        20      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        42      G   /code                                       N/A      |
|    0   N/A  N/A        79      G   /code                                       N/A      |
+-----------------------------------------------------------------------------------------+

DO NOT install any gpu/cuda drivers into WSL
Install Cuda-Toolkit "WSL-Ubuntu" version from here (nvidia page)
(I have Debian on my WSL, but there wasnt any problems to install) (except shocking that i need to download ~9GB of packages...)
Again, ensure that you don't installing linux drivers ("WSL-Ubuntu" supposed to not contain them)
AFAIK you need to have llvm/clang installed in order to compile kernels via PoCL
so $ sudo apt install llvm clang i think
(There also possibility of "LLVM-less buid", but dont mind)
Install some packages required to build PoCL
(i almost sure that forgot something)
$ sudo apt install ...
libclang-dev (maybe also libclang-{version}-dev)
libclang-common-{version}-dev
libclang-cpp (maybe also libclang-cpp{version})
libclang-cpp-dev (libclang-cpp{ver}-dev)
ocl-icd-libopencl1 (maybe also ocl-icd-opencl-dev) - icd loader
opencl-headers (opencl-c-headers opencl-clhpp-headers)
valgrind (because some cuda-related PoCL sources requires it)
Download and build Pocl (GitHub)
I was build with this variables (from pocl directory):

$ cmake -B {your-build-dir} \
    -DCMAKE_C_FLAGS=-L/usr/lib/wsl/lib \          # for ld to find libcuda.so
    -DCMAKE_CXX_FLAGS=-L/usr/lib/wsl/lib \      # i don't know which of these two is neccessary, but it works
    -DENABLE_HOST_CPU_DEVICES=OFF \           # you can leave this 'ON' if you want also have your CPU as OpenCL device
    -DENABLE_CUDA=ON \                                  # no comments

Then run $ cmake --build {your-build-dir} -j{num of threads} and pray and maybe fix problems that arise

On successful build you can try if it works without installing
$ export POCL_BUILDING=1 - says to pocl that it will able to work from building directory
$ export OCL_ICD_VENDORS={full-path-to-your-build-dir}/ocl-vendors/ - says to ocl-icd-loader where to find pocl
Viola!
Now you can run 'clinfo' and other OpenCL apps

Also $ cmake --install {your-build-dir} to istall in system if you need (i dont so not testing)

My clinfo listing:

Number of platforms                               1
  Platform Name                                   Portable Computing Language
  Platform Vendor                                 The pocl project
  Platform Version                                OpenCL 3.0 PoCL 5.1-pre main-0-g8053faf0  Linux, Debug+Asserts, RELOC, SPIR, LLVM 15.0.6, SLEEF, CUDA, POCL_DEBUG
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_pocl_content_size
  Platform Extensions with Version                cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_pocl_content_size                                             0x400000 (1.0.0)
  Platform Numeric Version                        0xc00000 (3.0.0)
  Platform Extensions function suffix             POCL
  Platform Host timer resolution                  0ns

  Platform Name                                   Portable Computing Language
Number of devices                                 1
  Device Name                                     NVIDIA GeForce RTX 3060 Ti
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 3.0 PoCL HSTR: CUDA-sm_86
  Device Numeric Version                          0xc00000 (3.0.0)
  Driver Version                                  5.1-pre main-0-g8053faf0
  Device OpenCL C Version                         OpenCL C 1.2 PoCL
  Device OpenCL C all versions                    OpenCL C                                                         0x400000 (1.0.0)
                                                  OpenCL C                                                         0x401000 (1.1.0)
                                                  OpenCL C                                                         0x402000 (1.2.0)
                                                  OpenCL C                                                         0xc00000 (3.0.0)
  Device OpenCL C features                        __opencl_c_images                                                0xc00000 (3.0.0)
                                                  __opencl_c_atomic_order_acq_rel                                  0xc00000 (3.0.0)
                                                  __opencl_c_atomic_order_seq_cst                                  0xc00000 (3.0.0)
                                                  __opencl_c_atomic_scope_device                                   0xc00000 (3.0.0)
                                                  __opencl_c_program_scope_global_variables                        0xc00000 (3.0.0)
                                                  __opencl_c_generic_address_space                                 0xc00000 (3.0.0)
                                                  __opencl_c_fp16                                                  0xc00000 (3.0.0)
                                                  __opencl_c_fp64                                                  0xc00000 (3.0.0)
  Latest conformance test passed                  (n/a)
  Device Type                                     GPU
  Device Topology (NV)                            PCI-E, 0000:01:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               38
  Max clock frequency                             1695MHz
  Compute Capability (NV)                         8.6
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
  Preferred work group size multiple (device)     32
  Preferred work group size multiple (kernel)     32
  Warp size (NV)                                  32
  Max sub-groups per work group                   32
  Preferred / native vector sizes                 
    char                                                 1 / 1       
    short                                                1 / 1       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 0 / 0        (cl_khr_fp16)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             No
    Round to nearest                              No
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              8589279232 (7.999GiB)
  Error Correction support                        No
  Max memory allocation                           2147319808 (2GiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   Yes
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Preferred alignment for atomics                 
    SVM                                           64 bytes
    Global                                        64 bytes
    Local                                         64 bytes
  Atomic memory capabilities                      relaxed, work-group scope
  Atomic fence capabilities                       relaxed, acquire/release, work-group scope
  Max size for global variable                    0
  Preferred total size of global vars             0
  Global Memory cache type                        None
  Image support                                   No
  Pipe support                                    No
  Max number of pipe args                         0
  Max active pipe reservations                    0
  Max pipe packet size                            0
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        65536
  Max number of constant args                     8
  Max constant buffer size                        65536 (64KiB)
  Generic address space support                   Yes
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties (on host)                      
    Out-of-order execution                        No
    Profiling                                     Yes
  Device enqueue capabilities                     (n/a)
  Queue properties (on device)                    
    Out-of-order execution                        No
    Profiling                                     No
    Preferred size                                0
    Max size                                      0
  Max queues on device                            0
  Max events on device                            0
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Non-uniform work-groups                       No
    Work-group collective functions               No
    Sub-group independent forward progress        Yes
    Kernel execution timeout (NV)                 Yes
    Concurrent copy and kernel execution (NV)     Yes
      Number of async copy engines                5
    IL version                                    (n/a)
    ILs with version                              (n/a)
    SPIR versions                                 (n/a)
  printf() buffer size                            16777216 (16MiB)
  Built-in kernels                                pocl.mul.i32;pocl.add.i32;pocl.dnn.conv2d_int8_relu;pocl.sgemm.local.f32;pocl.sgemm.tensor.f16f16f32;pocl.sgemm_ab.tensor.f16f16f32;pocl.abs.f32;pocl.add.i8;org.khronos.openvx.scale_image.nn.u8;org.khronos.openvx.scale_image.bl.u8;org.khronos.openvx.tensor_convert_depth.wrap.u8.f32
  Built-in kernels with version                   pocl.mul.i32                                                     0x402000 (1.2.0)
                                                  pocl.add.i32                                                     0x402000 (1.2.0)
                                                  pocl.dnn.conv2d_int8_relu                                        0x402000 (1.2.0)
                                                  pocl.sgemm.local.f32                                             0x402000 (1.2.0)
                                                  pocl.sgemm.tensor.f16f16f32                                      0x402000 (1.2.0)
                                                  pocl.sgemm_ab.tensor.f16f16f32                                   0x402000 (1.2.0)
                                                  pocl.abs.f32                                                     0x402000 (1.2.0)
                                                  pocl.add.i8                                                      0x402000 (1.2.0)
                                                  org.khronos.openvx.scale_image.nn.u8                             0x402000 (1.2.0)
                                                  org.khronos.openvx.scale_image.bl.u8                             0x402000 (1.2.0)
                                                  org.khronos.openvx.tensor_convert_depth.wrap.u8.f32              0x402000 (1.2.0)
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics     cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics     cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics     cl_khr_int64_extended_atomics cl_nv_device_attribute_query cl_khr_spir cl_khr_fp16 cl_khr_fp64
  Device Extensions with Version                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_nv_device_attribute_query                                     0x400000 (1.0.0)
                                                  cl_khr_spir                                                      0x801000 (2.1.0)
                                                  cl_khr_fp16                                                      0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [POCL]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 Portable Computing Language
    Device Name                                   NVIDIA GeForce RTX 3060 Ti
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Portable Computing Language
    Device Name                                   NVIDIA GeForce RTX 3060 Ti
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Portable Computing Language
    Device Name                                   NVIDIA GeForce RTX 3060 Ti

And some benchmark (i dont know what these numbers means, good or bad)

.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /               FluidX3D Version 2.12 |
|                                      '     Copyright (c) Dr. Moritz Lehmann |
|-----------------------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA GeForce RTX 3060 Ti                                 |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA GeForce RTX 3060 Ti                                 |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 5.1-pre main-0-g8053faf0 (Linux)                           |
| OpenCL Version | OpenCL C 1.2 PoCL                                          |
| Compute Units  | 38 at 1695 MHz (4864 cores, 16.489 TFLOPs/s)               |
| Memory, Cache  | 8191 MB, 0 KB global / 48 KB local                         |
| Buffer Limits  | 2047 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
| Info: Allocating memory. This may take a few seconds.                       |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                                CPU 272 MB, GPU 1x 1488 MB |
| Max Alloc Size  |                                                   1216 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    3307 |    506 GB/s |       197 |         9990   0% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 3466                                                   |

husmen · 2024-02-03T21:31:59Z

@Bossach Now this looks promising! PoCL too seems to have come a long way since I last checked. I might give it a try at some point.

joaomamede · 2024-02-21T04:13:34Z

I was able to run OpenCL on NVIDIA on WSL2 via PoCL There is "NVIDIA GeForce RTX 3060 Ti" device in clinfo output (listing below) and working OpenCL apps Windows task manager also shows GPU Cuda utilization when CL programs run (can say nothing about perfomance but got some benchmark below)

I took the following steps:

instal the latest Windows Nvidia drivers (idk since which version, but new ones can do some clever thing to expose GPU inside WSL)

1.1. Now you can run nvidia-smi in WSL to ensure it works $ nvidia-smi listing:

Mon Jan 29 04:05:38 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.40.06              Driver Version: 551.23         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060 Ti     On  |   00000000:01:00.0  On |                  N/A |
|  0%   37C    P8             11W /  225W |     607MiB /   8192MiB |      7%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        20      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        42      G   /code                                       N/A      |
|    0   N/A  N/A        79      G   /code                                       N/A      |
+-----------------------------------------------------------------------------------------+

DO NOT install any gpu/cuda drivers into WSL
Install Cuda-Toolkit "WSL-Ubuntu" version from here (nvidia page)
(I have Debian on my WSL, but there wasnt any problems to install) (except shocking that i need to download ~9GB of packages...)
Again, ensure that you don't installing linux drivers ("WSL-Ubuntu" supposed to not contain them)
AFAIK you need to have llvm/clang installed in order to compile kernels via PoCL
so $ sudo apt install llvm clang i think
(There also possibility of "LLVM-less buid", but dont mind)
Install some packages required to build PoCL
(i almost sure that forgot something)
$ sudo apt install ...
libclang-dev (maybe also libclang-{version}-dev)
libclang-common-{version}-dev
libclang-cpp (maybe also libclang-cpp{version})
libclang-cpp-dev (libclang-cpp{ver}-dev)
ocl-icd-libopencl1 (maybe also ocl-icd-opencl-dev) - icd loader
opencl-headers (opencl-c-headers opencl-clhpp-headers)
valgrind (because some cuda-related PoCL sources requires it)
Download and build Pocl (GitHub)
I was build with this variables (from pocl directory):

$ cmake -B {your-build-dir} \
    -DCMAKE_C_FLAGS=-L/usr/lib/wsl/lib \          # for ld to find libcuda.so
    -DCMAKE_CXX_FLAGS=-L/usr/lib/wsl/lib \      # i don't know which of these two is neccessary, but it works
    -DENABLE_HOST_CPU_DEVICES=OFF \           # you can leave this 'ON' if you want also have your CPU as OpenCL device
    -DENABLE_CUDA=ON \                                  # no comments

Then run $ cmake --build {your-build-dir} -j{num of threads} and pray and maybe fix problems that arise

On successful build you can try if it works without installing $ export POCL_BUILDING=1 - says to pocl that it will able to work from building directory $ export OCL_ICD_VENDORS={full-path-to-your-build-dir}/ocl-vendors/ - says to ocl-icd-loader where to find pocl Viola! Now you can run 'clinfo' and other OpenCL apps

Also $ cmake --install {your-build-dir} to istall in system if you need (i dont so not testing)

My clinfo listing:

Number of platforms                               1
  Platform Name                                   Portable Computing Language
  Platform Vendor                                 The pocl project
  Platform Version                                OpenCL 3.0 PoCL 5.1-pre main-0-g8053faf0  Linux, Debug+Asserts, RELOC, SPIR, LLVM 15.0.6, SLEEF, CUDA, POCL_DEBUG
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_pocl_content_size
  Platform Extensions with Version                cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_pocl_content_size                                             0x400000 (1.0.0)
  Platform Numeric Version                        0xc00000 (3.0.0)
  Platform Extensions function suffix             POCL
  Platform Host timer resolution                  0ns

  Platform Name                                   Portable Computing Language
Number of devices                                 1
  Device Name                                     NVIDIA GeForce RTX 3060 Ti
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 3.0 PoCL HSTR: CUDA-sm_86
  Device Numeric Version                          0xc00000 (3.0.0)
  Driver Version                                  5.1-pre main-0-g8053faf0
  Device OpenCL C Version                         OpenCL C 1.2 PoCL
  Device OpenCL C all versions                    OpenCL C                                                         0x400000 (1.0.0)
                                                  OpenCL C                                                         0x401000 (1.1.0)
                                                  OpenCL C                                                         0x402000 (1.2.0)
                                                  OpenCL C                                                         0xc00000 (3.0.0)
  Device OpenCL C features                        __opencl_c_images                                                0xc00000 (3.0.0)
                                                  __opencl_c_atomic_order_acq_rel                                  0xc00000 (3.0.0)
                                                  __opencl_c_atomic_order_seq_cst                                  0xc00000 (3.0.0)
                                                  __opencl_c_atomic_scope_device                                   0xc00000 (3.0.0)
                                                  __opencl_c_program_scope_global_variables                        0xc00000 (3.0.0)
                                                  __opencl_c_generic_address_space                                 0xc00000 (3.0.0)
                                                  __opencl_c_fp16                                                  0xc00000 (3.0.0)
                                                  __opencl_c_fp64                                                  0xc00000 (3.0.0)
  Latest conformance test passed                  (n/a)
  Device Type                                     GPU
  Device Topology (NV)                            PCI-E, 0000:01:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               38
  Max clock frequency                             1695MHz
  Compute Capability (NV)                         8.6
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
  Preferred work group size multiple (device)     32
  Preferred work group size multiple (kernel)     32
  Warp size (NV)                                  32
  Max sub-groups per work group                   32
  Preferred / native vector sizes                 
    char                                                 1 / 1       
    short                                                1 / 1       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 0 / 0        (cl_khr_fp16)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             No
    Round to nearest                              No
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              8589279232 (7.999GiB)
  Error Correction support                        No
  Max memory allocation                           2147319808 (2GiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   Yes
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Preferred alignment for atomics                 
    SVM                                           64 bytes
    Global                                        64 bytes
    Local                                         64 bytes
  Atomic memory capabilities                      relaxed, work-group scope
  Atomic fence capabilities                       relaxed, acquire/release, work-group scope
  Max size for global variable                    0
  Preferred total size of global vars             0
  Global Memory cache type                        None
  Image support                                   No
  Pipe support                                    No
  Max number of pipe args                         0
  Max active pipe reservations                    0
  Max pipe packet size                            0
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        65536
  Max number of constant args                     8
  Max constant buffer size                        65536 (64KiB)
  Generic address space support                   Yes
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties (on host)                      
    Out-of-order execution                        No
    Profiling                                     Yes
  Device enqueue capabilities                     (n/a)
  Queue properties (on device)                    
    Out-of-order execution                        No
    Profiling                                     No
    Preferred size                                0
    Max size                                      0
  Max queues on device                            0
  Max events on device                            0
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Non-uniform work-groups                       No
    Work-group collective functions               No
    Sub-group independent forward progress        Yes
    Kernel execution timeout (NV)                 Yes
    Concurrent copy and kernel execution (NV)     Yes
      Number of async copy engines                5
    IL version                                    (n/a)
    ILs with version                              (n/a)
    SPIR versions                                 (n/a)
  printf() buffer size                            16777216 (16MiB)
  Built-in kernels                                pocl.mul.i32;pocl.add.i32;pocl.dnn.conv2d_int8_relu;pocl.sgemm.local.f32;pocl.sgemm.tensor.f16f16f32;pocl.sgemm_ab.tensor.f16f16f32;pocl.abs.f32;pocl.add.i8;org.khronos.openvx.scale_image.nn.u8;org.khronos.openvx.scale_image.bl.u8;org.khronos.openvx.tensor_convert_depth.wrap.u8.f32
  Built-in kernels with version                   pocl.mul.i32                                                     0x402000 (1.2.0)
                                                  pocl.add.i32                                                     0x402000 (1.2.0)
                                                  pocl.dnn.conv2d_int8_relu                                        0x402000 (1.2.0)
                                                  pocl.sgemm.local.f32                                             0x402000 (1.2.0)
                                                  pocl.sgemm.tensor.f16f16f32                                      0x402000 (1.2.0)
                                                  pocl.sgemm_ab.tensor.f16f16f32                                   0x402000 (1.2.0)
                                                  pocl.abs.f32                                                     0x402000 (1.2.0)
                                                  pocl.add.i8                                                      0x402000 (1.2.0)
                                                  org.khronos.openvx.scale_image.nn.u8                             0x402000 (1.2.0)
                                                  org.khronos.openvx.scale_image.bl.u8                             0x402000 (1.2.0)
                                                  org.khronos.openvx.tensor_convert_depth.wrap.u8.f32              0x402000 (1.2.0)
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics     cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics     cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics     cl_khr_int64_extended_atomics cl_nv_device_attribute_query cl_khr_spir cl_khr_fp16 cl_khr_fp64
  Device Extensions with Version                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_nv_device_attribute_query                                     0x400000 (1.0.0)
                                                  cl_khr_spir                                                      0x801000 (2.1.0)
                                                  cl_khr_fp16                                                      0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [POCL]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 Portable Computing Language
    Device Name                                   NVIDIA GeForce RTX 3060 Ti
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Portable Computing Language
    Device Name                                   NVIDIA GeForce RTX 3060 Ti
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Portable Computing Language
    Device Name                                   NVIDIA GeForce RTX 3060 Ti

And some benchmark (i dont know what these numbers means, good or bad)

.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /               FluidX3D Version 2.12 |
|                                      '     Copyright (c) Dr. Moritz Lehmann |
|-----------------------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA GeForce RTX 3060 Ti                                 |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA GeForce RTX 3060 Ti                                 |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 5.1-pre main-0-g8053faf0 (Linux)                           |
| OpenCL Version | OpenCL C 1.2 PoCL                                          |
| Compute Units  | 38 at 1695 MHz (4864 cores, 16.489 TFLOPs/s)               |
| Memory, Cache  | 8191 MB, 0 KB global / 48 KB local                         |
| Buffer Limits  | 2047 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
| Info: Allocating memory. This may take a few seconds.                       |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                                CPU 272 MB, GPU 1x 1488 MB |
| Max Alloc Size  |                                                   1216 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    3307 |    506 GB/s |       197 |         9990   0% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 3466                                                   |

This worked for me. I installed it but the arguments aren't passed by default if I do clinfo it works with the
$ export POCL_BUILDING=1
$ export OCL_ICD_VENDORS={full-path-to-your-build-dir}/ocl-vendors/ -
But it doesn't "stick" should I put this into my bash.rc or rc.local or something like that or there's a cleaner way?

Bossach · 2024-02-21T11:40:48Z

@joaomamede
The cleaner way is
$ sudo cmake --install {your-build-dir}
It should install pocl and icd in system and it should just work
if not, first i would chek is $ ls /etc/OpenCL/vendors contains pocl.icd and $ cat pocl.icd contains valid path to /.../libpocl.so... and libpocl indeed exists there. If not, then something is wrong with installation

Alternatively, yo can put exports in your bash.rc and it should work for all apps you launch from bash under your user. (until you accidentally remove pocl build directory cause it works from there)

Tongzhao9417 · 2024-03-02T08:12:49Z

@Bossach

Thanks for your share! I follow your step and it almost successful. However, the clinfo told me that "unknown target CPU 'sm_89'". Here is my full output and full benchmark.

clinfo:

Number of platforms                               1
  Platform Name                                   Portable Computing Language
  Platform Vendor                                 The pocl project
  Platform Version                                OpenCL 3.0 PoCL 5.0  Linux, RelWithDebInfo, RELOC, SPIR, LLVM 14.0.0, SLEEF, CUDA, POCL_DEBUG
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_pocl_content_size
  Platform Extensions with Version                cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_pocl_content_size                                             0x400000 (1.0.0)
  Platform Numeric Version                        0xc00000 (3.0.0)
  Platform Extensions function suffix             POCL
  Platform Host timer resolution                  0ns

  Platform Name                                   Portable Computing Language
Number of devices                                 1
  Device Name                                     NVIDIA GeForce RTX 4090
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 3.0 PoCL HSTR: CUDA-sm_89
  Device Numeric Version                          0xc00000 (3.0.0)
  Driver Version                                  5.0
  Device OpenCL C Version                         OpenCL C 1.2 PoCL
  Device OpenCL C all versions                    OpenCL C                                                         0x400000 (1.0.0)
                                                  OpenCL C                                                         0x401000 (1.1.0)
                                                  OpenCL C                                                         0x402000 (1.2.0)
                                                  OpenCL C                                                         0xc00000 (3.0.0)
  Device OpenCL C features                        __opencl_c_images                                                0xc00000 (3.0.0)
                                                  __opencl_c_atomic_order_acq_rel                                  0xc00000 (3.0.0)
                                                  __opencl_c_atomic_order_seq_cst                                  0xc00000 (3.0.0)
                                                  __opencl_c_atomic_scope_device                                   0xc00000 (3.0.0)
                                                  __opencl_c_program_scope_global_variables                        0xc00000 (3.0.0)
                                                  __opencl_c_generic_address_space                                 0xc00000 (3.0.0)
                                                  __opencl_c_fp16                                                  0xc00000 (3.0.0)
                                                  __opencl_c_fp64                                                  0xc00000 (3.0.0)
  Latest comfornace test passed                   (n/a)
  Device Type                                     GPU
  Device Topology (NV)                            PCI-E, 0000:01:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               128
  Max clock frequency                             2595MHz
  Compute Capability (NV)                         8.9
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
  Preferred work group size multiple (device)     32
=== CL_PROGRAM_BUILD_LOG ===
error: unknown target CPU 'sm_89'
Device NVIDIA GeForce RTX 4090 failed to build the program
  Preferred work group size multiple (kernel)     <getWGsizes:1504: create kernel : error -45>
  Warp size (NV)                                  32
  Max sub-groups per work group                   32
  Preferred / native vector sizes
    char                                                 1 / 1
    short                                                1 / 1
    int                                                  1 / 1
    long                                                 1 / 1
    half                                                 0 / 0        (cl_khr_fp16)
    float                                                1 / 1
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             No
    Round to nearest                              No
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              25756696576 (23.99GiB)
  Error Correction support                        No
  Max memory allocation                           6439174144 (5.997GiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   Yes
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Preferred alignment for atomics
    SVM                                           64 bytes
    Global                                        64 bytes
    Local                                         64 bytes
  Atomic memory capabilities                      relaxed, work-group scope
  Atomic fence capabilities                       relaxed, acquire/release, work-group scope
  Max size for global variable                    0
  Preferred total size of global vars             0
  Global Memory cache type                        None
  Image support                                   No
  Pipe support                                    No
  Max number of pipe args                         0
  Max active pipe reservations                    0
  Max pipe packet size                            0
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        65536
  Max number of constant args                     8
  Max constant buffer size                        65536 (64KiB)
  Generic address space support                   Yes
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties (on host)
    Out-of-order execution                        No
    Profiling                                     Yes
  Device enqueue capabilities                     (n/a)
  Queue properties (on device)
    Out-of-order execution                        No
    Profiling                                     No
    Preferred size                                0
    Max size                                      0
  Max queues on device                            0
  Max events on device                            0
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Non-uniform work-groups                       No
    Work-group collective functions               No
    Sub-group independent forward progress        Yes
    Kernel execution timeout (NV)                 Yes
  Concurrent copy and kernel execution (NV)       Yes
    Number of async copy engines                  1
    IL version                                    (n/a)
    ILs with version                              (n/a)
    SPIR versions                                 (n/a)
  printf() buffer size                            16777216 (16MiB)
  Built-in kernels                                pocl.mul.i32;pocl.add.i32;pocl.dnn.conv2d_int8_relu;pocl.sgemm.local.f32;pocl.sgemm.tensor.f16f16f32;pocl.sgemm_ab.tensor.f16f16f32;pocl.abs.f32;pocl.add.i8;org.khronos.openvx.scale_image.nn.u8;org.khronos.openvx.scale_image.bl.u8;org.khronos.openvx.tensor_convert_depth.wrap.u8.f32
  Built-in kernels with version                   pocl.mul.i32                                                     0x402000 (1.2.0)
                                                  pocl.add.i32                                                     0x402000 (1.2.0)
                                                  pocl.dnn.conv2d_int8_relu                                        0x402000 (1.2.0)
                                                  pocl.sgemm.local.f32                                             0x402000 (1.2.0)
                                                  pocl.sgemm.tensor.f16f16f32                                      0x402000 (1.2.0)
                                                  pocl.sgemm_ab.tensor.f16f16f32                                   0x402000 (1.2.0)
                                                  pocl.abs.f32                                                     0x402000 (1.2.0)
                                                  pocl.add.i8                                                      0x402000 (1.2.0)
                                                  org.khronos.openvx.scale_image.nn.u8                             0x402000 (1.2.0)
                                                  org.khronos.openvx.scale_image.bl.u8                             0x402000 (1.2.0)
                                                  org.khronos.openvx.tensor_convert_depth.wrap.u8.f32              0x402000 (1.2.0)
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics     cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics     cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics     cl_khr_int64_extended_atomics cl_nv_device_attribute_query cl_khr_spir cl_khr_fp16 cl_khr_fp64
  Device Extensions with Version                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_nv_device_attribute_query                                     0x400000 (1.0.0)
                                                  cl_khr_spir                                                      0x801000 (2.1.0)
                                                  cl_khr_fp16                                                      0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [POCL]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 Portable Computing Language
    Device Name                                   NVIDIA GeForce RTX 4090
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Portable Computing Language
    Device Name                                   NVIDIA GeForce RTX 4090
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Portable Computing Language
    Device Name                                   NVIDIA GeForce RTX 4090

benchmark:

.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /               FluidX3D Version 2.13 |
|                                      '     Copyright (c) Dr. Moritz Lehmann |
|-----------------------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA GeForce RTX 4090                                    |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA GeForce RTX 4090                                    |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 5.0 (Linux)                                                |
| OpenCL Version | OpenCL C 1.2 PoCL                                          |
| Compute Units  | 128 at 2595 MHz (16384 cores, 85.033 TFLOPs/s)             |
| Memory, Cache  | 24563 MB, 0 KB global / 48 KB local                        |
| Buffer Limits  | 6140 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Warning: error: unknown target CPU 'sm_89' Device NVIDIA GeForce RTX 4090   |
|          failed to build the program                                        |
| Error: OpenCL C code compilation failed with error code -11. Make sure      |
|        there are no errors in kernel.cpp.                                   |
'-----------------------------------------------------------------------------'

Bossach · 2024-03-02T12:10:46Z

@Tongzhao9417
Your LLVM doesn't know how to compile for your GPU
You can check supported ones by
$ clang --target=nvptx -print-supported-cpus
where --target=nvptx(nvptx64) stands for "nvidia architecture" and supported cpus are specific GPUs
Output:

Debian clang version 14.0.6
Target: nvptx
Thread model: posix
InstalledDir: /usr/bin
Available CPUs for this target:

        sm_20
        sm_21
        sm_30
        sm_32
        sm_35
        sm_37
        sm_50
        sm_52
        sm_53
        sm_60
        sm_61
        sm_62
        sm_70
        sm_72
        sm_75
        sm_80
        sm_86

Use -mcpu or -mtune to specify the target's processor.
For example, clang --target=aarch64-unknown-linux-gui -mcpu=cortex-a35

You need newer version of LLVM/clang. (Just checked llvm-16 from debian repo have "sm_89" one)
So $ sudo apt install llvm-16 clang-16 should fix your problem. Or most actual ones avalible on llvm.org repo
And you have to clean rebuild PoCL with option -DWITH_LLVM_CONFIG=/usr/bin/llvm-config-16 (or your actual llvm-config path) in order to bond PoCL with correct LLVM version.

Tongzhao9417 · 2024-03-11T06:02:18Z

@Tongzhao9417 Your LLVM doesn't know how to compile for your GPU You can check supported ones by $ clang --target=nvptx -print-supported-cpus where --target=nvptx(nvptx64) stands for "nvidia architecture" and supported cpus are specific GPUs Output:
Debian clang version 14.0.6
Target: nvptx
Thread model: posix
InstalledDir: /usr/bin
Available CPUs for this target:

        sm_20
        sm_21
        sm_30
        sm_32
        sm_35
        sm_37
        sm_50
        sm_52
        sm_53
        sm_60
        sm_61
        sm_62
        sm_70
        sm_72
        sm_75
        sm_80
        sm_86

Use -mcpu or -mtune to specify the target's processor.
For example, clang --target=aarch64-unknown-linux-gui -mcpu=cortex-a35
You need newer version of LLVM/clang. (Just checked llvm-16 from debian repo have "sm_89" one) So $ sudo apt install llvm-16 clang-16 should fix your problem. Or most actual ones avalible on llvm.org repo And you have to clean rebuild PoCL with option -DWITH_LLVM_CONFIG=/usr/bin/llvm-config-16 (or your actual llvm-config path) in order to bond PoCL with correct LLVM version.

Sorry for late reply. I follow your step and it's worked for me.

Cheers!

olympichek · 2024-05-09T13:15:48Z

I compiled POCL as decribed above and now clinfo works. But when I try to run an OpenCL application I am getting an error:

 Build option -cl-std specified OpenCL C version 2.0,but device NVIDIA GeForce GTX 1080 Ti doesn't support that OpenCL C version

Does POCL not support OpenCL 2.0 ?

monkeyden · 2024-06-28T15:02:48Z

Absolute king. pocl-opencl-icd was the missing link for me. Ty, sir.

CLRafaelR · 2024-08-13T15:49:16Z

@Bossach

I really appreciate for your brilliant solution!

I want to ask one question to you and everyone who reacted to Bossach's comment and/or tried the solution (@husmen @joaomamede @Tongzhao9417 @olympichek @htao7 @kirse @kon332k): have you tried the PoCL verification tests for NIVIDIA GPU ../tools/scripts/run_cuda_tests as documented in NVIDIA GPU support — Portable Computing Language (PoCL) 6.0 documentation and have all of the test successfully passed?

I basically followed Bossach's steps to install PoCL and now have clinfo and clinfo -l functioning like a charm. However, I found four tests failed when I ran the PoCL verification test as shown below:

cd ~/pocl-6.0/build # move to my `build` directory
../tools/scripts/run_cuda_tests

# For rerunning the failed tests:
../tools/scripts/run_cuda_tests --rerun-failed --output-on-failure

Failed tests were:

The following tests FAILED:
          4 - kernel/test_as_type_loopvec (Failed)
        166 - regression/clSetKernelArg_overwriting_the_previous_kernel's_args_loopvec (Failed)
        208 - runtime/test_device_address (SEGFAULT)
        209 - runtime/test_svm (SEGFAULT)
Errors while running CTest

If anybody has conducted the verification test, could you please tell us whether you pass all tests or which tests you miss? It would be also very helpful if you could tell us about the runtime environment and settings, and configurations for PoCL installation.

I opend an issue on PoCL's repo ../tools/scripts/run_cuda_tests Fails on WSL2 · Issue #1533 · pocl/pocl. Comments on there are also appreciated, and such comments would be helpful for the developers of PoCL to know success/failure of the tests on WSL2 is reproducible and to enhance the PoCL.

Shazway · 2024-10-22T22:16:45Z

Hi,
I saw the POCL solution and jumped on the occasion to try fixing this issue but it didn't work for me.
After the step with cmake --build <build_dir> -j16 which worked fine, for the export of the variable OCL_ICD_VENDORS, there is no ocl-vendors folder
Result of ls in build:
CMakeCache.txt CTestCustom.cmake cl_offline_compiler.sh config.h kernellib_hash.h pocl_opencl.h CMakeFiles CTestTestfile.cmake cmake_install.cmake config2.h lib pocl_version.h CPackConfig.cmake Makefile compile_commands.json examples pocl.pc poclu CPackSourceConfig.cmake bin compile_test_. include pocl_build_timestamp.h tests

Result of clinfo:
Number of platforms 2
And it is too long to paste here but it sees two intel graphics platforms instead of one intel and one nvidia

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
`
Any clues why ?

haipnh · 2025-03-22T13:40:53Z

Many thanks to the community.
[25/3/22] @kirse thank you for your feedback.
Here is the full-flow installation:

sudo apt update
sudo apt install -y python3-dev libpython3-dev build-essential ocl-icd-libopencl1 cmake git pkg-config make ninja-build ocl-icd-dev ocl-icd-opencl-dev libhwloc-dev zlib1g zlib1g-dev clinfo dialog apt-utils libxml2-dev opencl-headers

mkdir ~/Downloads
cd ~/Downloads
wget https://developer.download.nvidia.com/compute/cuda/12.8.1/local_installers/cuda_12.8.1_570.124.06_linux.run
sudo bash ./cuda_12.8.1_570.124.06_linux.run --silent --toolkit --no-opengl-libs

export LLVM_VERSION=14
sudo apt install -y libclang-${LLVM_VERSION}-dev clang-${LLVM_VERSION} llvm-${LLVM_VERSION}  libclang-cpp${LLVM_VERSION}-dev libclang-cpp${LLVM_VERSION} llvm-${LLVM_VERSION}-dev 

git clone https://github.com/pocl/pocl -b v6.0
mkdir pocl/build
cd pocl/build
cmake -DCMAKE_C_FLAGS=-L/usr/lib/wsl/lib \
  -DCMAKE_CXX_FLAGS=-L/usr/lib/wsl/lib \
  -DWITH_LLVM_CONFIG=/usr/bin/llvm-config-${LLVM_VERSION} \
  -DENABLE_HOST_CPU_DEVICES=OFF \
  -DENABLE_CUDA=ON ..

make -j`nproc`
sudo make install
sudo mkdir -p /etc/OpenCL/vendors # amended by @kirse 's suggestion 
sudo cp /usr/local/etc/OpenCL/vendors/pocl.icd /etc/OpenCL/vendors/pocl.icd # We need this otherwise `clinfo` returns 0 platform detected

Finally, we'll get:

user@DESKTOP:~/pocl/build$ uname -r
5.15.167.4-microsoft-standard-WSL2
user@DESKTOP:~/pocl/build$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.5 LTS
Release:        22.04
Codename:       jammy
user@DESKTOP:~/pocl/build$ clinfo
Number of platforms                               1
  Platform Name                                   Portable Computing Language
  Platform Vendor                                 The pocl project
  Platform Version                                OpenCL 3.0 PoCL 6.0  Linux, Debug+Asserts, RELOC, LLVM 14.0.0, SLEEF, CUDA, POCL_DEBUG
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_khr_priority_hints cl_khr_throttle_hints cl_pocl_content_size cl_ext_buffer_device_address
  Platform Extensions with Version                cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_priority_hints                                            0x400000 (1.0.0)
                                                  cl_khr_throttle_hints                                            0x400000 (1.0.0)
                                                  cl_pocl_content_size                                             0x400000 (1.0.0)
                                                  cl_ext_buffer_device_address                                       0x1000 (0.1.0)
  Platform Numeric Version                        0xc00000 (3.0.0)
  Platform Extensions function suffix             POCL
  Platform Host timer resolution                  1ns

  Platform Name                                   Portable Computing Language
Number of devices                                 1
  Device Name                                     NVIDIA GeForce RTX 4070 SUPER
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 3.0 PoCL HSTR: CUDA-sm_75
  Device Numeric Version                          0xc00000 (3.0.0)
  Driver Version                                  6.0
  Device OpenCL C Version                         OpenCL C 1.2 PoCL
  Device OpenCL C all versions                    OpenCL C                                                         0x400000 (1.0.0)
                                                  OpenCL C                                                         0x401000 (1.1.0)
                                                  OpenCL C                                                         0x402000 (1.2.0)
                                                  OpenCL C                                                         0xc00000 (3.0.0)
  Device OpenCL C features                        __opencl_c_images                                                0xc00000 (3.0.0)
                                                  __opencl_c_atomic_order_acq_rel                                  0xc00000 (3.0.0)
                                                  __opencl_c_atomic_order_seq_cst                                  0xc00000 (3.0.0)
                                                  __opencl_c_atomic_scope_device                                   0xc00000 (3.0.0)
                                                  __opencl_c_program_scope_global_variables                        0xc00000 (3.0.0)
                                                  __opencl_c_generic_address_space                                 0xc00000 (3.0.0)
                                                  __opencl_c_fp16                                                  0xc00000 (3.0.0)
                                                  __opencl_c_fp64                                                  0xc00000 (3.0.0)
  Latest comfornace test passed                   (n/a)
  Device Type                                     GPU
  Device Topology (NV)                            PCI-E, 0000:01:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               56
  Max clock frequency                             2520MHz
  Compute Capability (NV)                         8.9
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
  Preferred work group size multiple (device)     32
  Preferred work group size multiple (kernel)     32
  Warp size (NV)                                  32
  Max sub-groups per work group                   32
  Preferred / native vector sizes
    char                                                 1 / 1
    short                                                1 / 1
    int                                                  1 / 1
    long                                                 1 / 1
    half                                                 0 / 0        (cl_khr_fp16)
    float                                                1 / 1
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              12878086144 (11.99GiB)
  Error Correction support                        No
  Max memory allocation                           11589910528 (10.79GiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   Yes
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Preferred alignment for atomics
    SVM                                           64 bytes
    Global                                        64 bytes
    Local                                         64 bytes
  Atomic memory capabilities                      relaxed, work-group scope
  Atomic fence capabilities                       relaxed, acquire/release, work-group scope
  Max size for global variable                    0
  Preferred total size of global vars             0
  Global Memory cache type                        None
  Image support                                   No
  Pipe support                                    No
  Max number of pipe args                         0
  Max active pipe reservations                    0
  Max pipe packet size                            0
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        65536
  Max number of constant args                     8
  Max constant buffer size                        65536 (64KiB)
  Generic address space support                   Yes
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties (on host)
    Out-of-order execution                        No
    Profiling                                     Yes
  Device enqueue capabilities                     (n/a)
  Queue properties (on device)
    Out-of-order execution                        No
    Profiling                                     No
    Preferred size                                0
    Max size                                      0
  Max queues on device                            0
  Max events on device                            0
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Non-uniform work-groups                       No
    Work-group collective functions               No
    Sub-group independent forward progress        Yes
    Kernel execution timeout (NV)                 Yes
  Concurrent copy and kernel execution (NV)       Yes
    Number of async copy engines                  1
    IL version                                    (n/a)
    ILs with version                              (n/a)
  printf() buffer size                            16777216 (16MiB)
  Built-in kernels                                pocl.mul.i32;pocl.add.i32;pocl.dnn.conv2d_int8_relu;pocl.sgemm.local.f32;pocl.sgemm.tensor.f16f16f32;pocl.sgemm_ab.tensor.f16f16f32;pocl.abs.f32;pocl.add.i8;org.khronos.openvx.scale_image.nn.u8;org.khronos.openvx.scale_image.bl.u8;org.khronos.openvx.tensor_convert_depth.wrap.u8.f32
  Built-in kernels with version                   pocl.mul.i32                                                     0x402000 (1.2.0)
                                                  pocl.add.i32                                                     0x402000 (1.2.0)
                                                  pocl.dnn.conv2d_int8_relu                                        0x402000 (1.2.0)
                                                  pocl.sgemm.local.f32                                             0x402000 (1.2.0)
                                                  pocl.sgemm.tensor.f16f16f32                                      0x402000 (1.2.0)
                                                  pocl.sgemm_ab.tensor.f16f16f32                                   0x402000 (1.2.0)
                                                  pocl.abs.f32                                                     0x402000 (1.2.0)
                                                  pocl.add.i8                                                      0x402000 (1.2.0)
                                                  org.khronos.openvx.scale_image.nn.u8                             0x402000 (1.2.0)
                                                  org.khronos.openvx.scale_image.bl.u8                             0x402000 (1.2.0)
                                                  org.khronos.openvx.tensor_convert_depth.wrap.u8.f32              0x402000 (1.2.0)
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics     cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics     cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics     cl_khr_int64_extended_atomics cl_nv_device_attribute_query cl_khr_fp16 cl_khr_fp64 cl_ext_buffer_device_address cl_khr_subgroup_ballot cl_khr_subgroup_shuffle
  Device Extensions with Version                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_nv_device_attribute_query                                     0x400000 (1.0.0)
                                                  cl_khr_fp16                                                      0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)
                                                  cl_ext_buffer_device_address                                       0x1000 (0.1.0)
                                                  cl_khr_subgroup_ballot                                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle                                          0x400000 (1.0.0)

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [POCL]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 Portable Computing Language
    Device Name                                   NVIDIA GeForce RTX 4070 SUPER
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Portable Computing Language
    Device Name                                   NVIDIA GeForce RTX 4070 SUPER
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Portable Computing Language
    Device Name                                   NVIDIA GeForce RTX 4070 SUPER

100% CUDA tests passed

user@DESKTOP:~/pocl/build$ ../tools/scripts/run_cuda_tests
<...>
100% tests passed, 0 tests failed out of 60

Label Time Summary:
cuda          =  73.62 sec*proc (60 tests)
hsa           =   3.29 sec*proc (3 tests)
hsa-native    =  58.46 sec*proc (46 tests)
internal      =  73.16 sec*proc (59 tests)
kernel        =  33.68 sec*proc (18 tests)
level0        =  72.39 sec*proc (59 tests)
proxy         =  26.80 sec*proc (30 tests)
regression    =  16.54 sec*proc (15 tests)
runtime       =  16.82 sec*proc (20 tests)
tce           =   5.42 sec*proc (6 tests)
vulkan        =  11.62 sec*proc (14 tests)

Total Test time (real) =  73.65 sec

The following tests did not run:
        189 - runtime/test_buffer-image-copy (Skipped)
        193 - runtime/clGetSupportedImageFormats (Skipped)

kirse · 2025-03-22T21:27:20Z

@haipnh Tried with a completely fresh Ubuntu 22.04.5 WSL, small tweaks below but otherwise 👍

# Ubuntu cleanup needed to fix 404s on the security repo
sudo apt clean
sudo dpkg --configure -a
sudo apt update
sudo apt --fix-broken install

mkdir ~/Downloads
cd ~/Downloads
wget https://developer.download.nvidia.com/compute/cuda/12.8.1/local_installers/cuda_12.8.1_570.124.06_linux.run
sudo apt install -y gcc # required by next cmd
sudo bash ./cuda_12.8.1_570.124.06_linux.run --silent --toolkit --no-opengl-libs

sudo apt install -y python3-dev libpython3-dev build-essential ocl-icd-libopencl1 cmake git pkg-config make ninja-build ocl-icd-dev ocl-icd-opencl-dev libhwloc-dev zlib1g zlib1g-dev clinfo dialog apt-utils libxml2-dev opencl-headers

export LLVM_VERSION=14
sudo apt install -y libclang-${LLVM_VERSION}-dev clang-${LLVM_VERSION} llvm-${LLVM_VERSION}  libclang-cpp${LLVM_VERSION}-dev libclang-cpp${LLVM_VERSION} llvm-${LLVM_VERSION}-dev 

git clone https://github.com/pocl/pocl -b v6.0
mkdir pocl/build
cd pocl/build
cmake -DCMAKE_C_FLAGS=-L/usr/lib/wsl/lib \
  -DCMAKE_CXX_FLAGS=-L/usr/lib/wsl/lib \
  -DWITH_LLVM_CONFIG=/usr/bin/llvm-config-14 \
  -DENABLE_HOST_CPU_DEVICES=OFF \
  -DENABLE_CUDA=ON ..

make -j`nproc`
sudo make install
sudo mkdir -p /etc/OpenCL/vendors # path missing on default install
sudo cp /usr/local/etc/OpenCL/vendors/pocl.icd /etc/OpenCL/vendors/pocl.icd # We need this otherwise `clinfo` returns 0 platform detected

user@desktop:~$ uname -r
5.15.167.4-microsoft-standard-WSL2
user@desktop:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.5 LTS
Release:        22.04
Codename:       jammy
user@desktop:~$ clinfo
Number of platforms                               1
  Platform Name                                   Portable Computing Language
  Platform Vendor                                 The pocl project
  Platform Version                                OpenCL 3.0 PoCL 6.0  Linux, Debug+Asserts, RELOC, LLVM 14.0.0, SLEEF, CUDA, POCL_DEBUG
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_khr_priority_hints cl_khr_throttle_hints cl_pocl_content_size cl_ext_buffer_device_address
  Platform Extensions with Version                cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_priority_hints                                            0x400000 (1.0.0)
                                                  cl_khr_throttle_hints                                            0x400000 (1.0.0)
                                                  cl_pocl_content_size                                             0x400000 (1.0.0)
                                                  cl_ext_buffer_device_address                                       0x1000 (0.1.0)
  Platform Numeric Version                        0xc00000 (3.0.0)
  Platform Extensions function suffix             POCL
  Platform Host timer resolution                  1ns

  Platform Name                                   Portable Computing Language
Number of devices                                 1
  Device Name                                     NVIDIA GeForce RTX 4070 Laptop GPU
  Device Vendor                                   NVIDIA Corporation
# etc .................

TheSecurityDev · 2025-03-26T22:37:06Z

Many thanks to the community. [25/3/22] @kirse thank you for your feedback. Here is the full-flow installation:

sudo apt update
sudo apt install -y python3-dev libpython3-dev build-essential ocl-icd-libopencl1 cmake git pkg-config make ninja-build ocl-icd-dev ocl-icd-opencl-dev libhwloc-dev zlib1g zlib1g-dev clinfo dialog apt-utils libxml2-dev opencl-headers

mkdir ~/Downloads
cd ~/Downloads
wget https://developer.download.nvidia.com/compute/cuda/12.8.1/local_installers/cuda_12.8.1_570.124.06_linux.run
sudo bash ./cuda_12.8.1_570.124.06_linux.run --silent --toolkit --no-opengl-libs

export LLVM_VERSION=14
sudo apt install -y libclang-${LLVM_VERSION}-dev clang-${LLVM_VERSION} llvm-${LLVM_VERSION} libclang-cpp${LLVM_VERSION}-dev libclang-cpp${LLVM_VERSION} llvm-${LLVM_VERSION}-dev

git clone https://github.com/pocl/pocl -b v6.0
mkdir pocl/build
cd pocl/build
cmake -DCMAKE_C_FLAGS=-L/usr/lib/wsl/lib
-DCMAKE_CXX_FLAGS=-L/usr/lib/wsl/lib
-DWITH_LLVM_CONFIG=/usr/bin/llvm-config-${LLVM_VERSION}
-DENABLE_HOST_CPU_DEVICES=OFF
-DENABLE_CUDA=ON ..

make -jnproc
sudo make install
sudo mkdir -p /etc/OpenCL/vendors # amended by @kirse 's suggestion
sudo cp /usr/local/etc/OpenCL/vendors/pocl.icd /etc/OpenCL/vendors/pocl.icd # We need this otherwise clinfo returns 0 platform detected
Finally, we'll get:

user@DESKTOP:/pocl/build$ uname -r
5.15.167.4-microsoft-standard-WSL2
user@DESKTOP:/pocl/build$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.5 LTS
Release: 22.04
Codename: jammy
user@DESKTOP:~/pocl/build$ clinfo
Number of platforms 1
Platform Name Portable Computing Language
Platform Vendor The pocl project
Platform Version OpenCL 3.0 PoCL 6.0 Linux, Debug+Asserts, RELOC, LLVM 14.0.0, SLEEF, CUDA, POCL_DEBUG
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_khr_priority_hints cl_khr_throttle_hints cl_pocl_content_size cl_ext_buffer_device_address
Platform Extensions with Version cl_khr_icd 0x400000 (1.0.0)
cl_khr_priority_hints 0x400000 (1.0.0)
cl_khr_throttle_hints 0x400000 (1.0.0)
cl_pocl_content_size 0x400000 (1.0.0)
cl_ext_buffer_device_address 0x1000 (0.1.0)
Platform Numeric Version 0xc00000 (3.0.0)
Platform Extensions function suffix POCL
Platform Host timer resolution 1ns

Platform Name Portable Computing Language
Number of devices 1
Device Name NVIDIA GeForce RTX 4070 SUPER
Device Vendor NVIDIA Corporation
Device Vendor ID 0x10de
Device Version OpenCL 3.0 PoCL HSTR: CUDA-sm_75
Device Numeric Version 0xc00000 (3.0.0)
Driver Version 6.0
Device OpenCL C Version OpenCL C 1.2 PoCL
Device OpenCL C all versions OpenCL C 0x400000 (1.0.0)
OpenCL C 0x401000 (1.1.0)
OpenCL C 0x402000 (1.2.0)
OpenCL C 0xc00000 (3.0.0)
Device OpenCL C features __opencl_c_images 0xc00000 (3.0.0)
__opencl_c_atomic_order_acq_rel 0xc00000 (3.0.0)
__opencl_c_atomic_order_seq_cst 0xc00000 (3.0.0)
__opencl_c_atomic_scope_device 0xc00000 (3.0.0)
__opencl_c_program_scope_global_variables 0xc00000 (3.0.0)
__opencl_c_generic_address_space 0xc00000 (3.0.0)
__opencl_c_fp16 0xc00000 (3.0.0)
__opencl_c_fp64 0xc00000 (3.0.0)
Latest comfornace test passed (n/a)
Device Type GPU
Device Topology (NV) PCI-E, 0000:01:00.0
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 56
Max clock frequency 2520MHz
Compute Capability (NV) 8.9
Device Partition (core)
Max number of sub-devices 1
Supported partition types None
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 1024x1024x64
Max work group size 1024
Preferred work group size multiple (device) 32
Preferred work group size multiple (kernel) 32
Warp size (NV) 32
Max sub-groups per work group 32
Preferred / native vector sizes
char 1 / 1
short 1 / 1
int 1 / 1
long 1 / 1
half 0 / 0 (cl_khr_fp16)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (cl_khr_fp16)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 12878086144 (11.99GiB)
Error Correction support No
Max memory allocation 11589910528 (10.79GiB)
Unified memory for Host and Device No
Integrated memory (NV) No
Shared Virtual Memory (SVM) capabilities (core)
Coarse-grained buffer sharing Yes
Fine-grained buffer sharing Yes
Fine-grained system sharing No
Atomics No
Minimum alignment for any data type 128 bytes
Alignment of base address 4096 bits (512 bytes)
Preferred alignment for atomics
SVM 64 bytes
Global 64 bytes
Local 64 bytes
Atomic memory capabilities relaxed, work-group scope
Atomic fence capabilities relaxed, acquire/release, work-group scope
Max size for global variable 0
Preferred total size of global vars 0
Global Memory cache type None
Image support No
Pipe support No
Max number of pipe args 0
Max active pipe reservations 0
Max pipe packet size 0
Local memory type Local
Local memory size 49152 (48KiB)
Registers per block (NV) 65536
Max number of constant args 8
Max constant buffer size 65536 (64KiB)
Generic address space support Yes
Max size of kernel argument 4352 (4.25KiB)
Queue properties (on host)
Out-of-order execution No
Profiling Yes
Device enqueue capabilities (n/a)
Queue properties (on device)
Out-of-order execution No
Profiling No
Preferred size 0
Max size 0
Max queues on device 0
Max events on device 0
Prefer user sync for interop Yes
Profiling timer resolution 1ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Non-uniform work-groups No
Work-group collective functions No
Sub-group independent forward progress Yes
Kernel execution timeout (NV) Yes
Concurrent copy and kernel execution (NV) Yes
Number of async copy engines 1
IL version (n/a)
ILs with version (n/a)
printf() buffer size 16777216 (16MiB)
Built-in kernels pocl.mul.i32;pocl.add.i32;pocl.dnn.conv2d_int8_relu;pocl.sgemm.local.f32;pocl.sgemm.tensor.f16f16f32;pocl.sgemm_ab.tensor.f16f16f32;pocl.abs.f32;pocl.add.i8;org.khronos.openvx.scale_image.nn.u8;org.khronos.openvx.scale_image.bl.u8;org.khronos.openvx.tensor_convert_depth.wrap.u8.f32
Built-in kernels with version pocl.mul.i32 0x402000 (1.2.0)
pocl.add.i32 0x402000 (1.2.0)
pocl.dnn.conv2d_int8_relu 0x402000 (1.2.0)
pocl.sgemm.local.f32 0x402000 (1.2.0)
pocl.sgemm.tensor.f16f16f32 0x402000 (1.2.0)
pocl.sgemm_ab.tensor.f16f16f32 0x402000 (1.2.0)
pocl.abs.f32 0x402000 (1.2.0)
pocl.add.i8 0x402000 (1.2.0)
org.khronos.openvx.scale_image.nn.u8 0x402000 (1.2.0)
org.khronos.openvx.scale_image.bl.u8 0x402000 (1.2.0)
org.khronos.openvx.tensor_convert_depth.wrap.u8.f32 0x402000 (1.2.0)
Device Extensions cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_nv_device_attribute_query cl_khr_fp16 cl_khr_fp64 cl_ext_buffer_device_address cl_khr_subgroup_ballot cl_khr_subgroup_shuffle
Device Extensions with Version cl_khr_byte_addressable_store 0x400000 (1.0.0)
cl_khr_global_int32_base_atomics 0x400000 (1.0.0)
cl_khr_global_int32_extended_atomics 0x400000 (1.0.0)
cl_khr_local_int32_base_atomics 0x400000 (1.0.0)
cl_khr_local_int32_extended_atomics 0x400000 (1.0.0)
cl_khr_int64_base_atomics 0x400000 (1.0.0)
cl_khr_int64_extended_atomics 0x400000 (1.0.0)
cl_nv_device_attribute_query 0x400000 (1.0.0)
cl_khr_fp16 0x400000 (1.0.0)
cl_khr_fp64 0x400000 (1.0.0)
cl_ext_buffer_device_address 0x1000 (0.1.0)
cl_khr_subgroup_ballot 0x400000 (1.0.0)
cl_khr_subgroup_shuffle 0x400000 (1.0.0)

NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform
clCreateContext(NULL, ...) [default] No platform
clCreateContext(NULL, ...) [other] Success [POCL]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1)
Platform Name Portable Computing Language
Device Name NVIDIA GeForce RTX 4070 SUPER
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name Portable Computing Language
Device Name NVIDIA GeForce RTX 4070 SUPER
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
Platform Name Portable Computing Language
Device Name NVIDIA GeForce RTX 4070 SUPER
100% CUDA tests passed
user@DESKTOP:~/pocl/build$ ../tools/scripts/run_cuda_tests
<...>
100% tests passed, 0 tests failed out of 60

Label Time Summary:
cuda          =  73.62 sec*proc (60 tests)
hsa           =   3.29 sec*proc (3 tests)
hsa-native    =  58.46 sec*proc (46 tests)
internal      =  73.16 sec*proc (59 tests)
kernel        =  33.68 sec*proc (18 tests)
level0        =  72.39 sec*proc (59 tests)
proxy         =  26.80 sec*proc (30 tests)
regression    =  16.54 sec*proc (15 tests)
runtime       =  16.82 sec*proc (20 tests)
tce           =   5.42 sec*proc (6 tests)
vulkan        =  11.62 sec*proc (14 tests)

Total Test time (real) =  73.65 sec

The following tests did not run:
        189 - runtime/test_buffer-image-copy (Skipped)
        193 - runtime/clGetSupportedImageFormats (Skipped)

I tried this but now I'm getting CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered error in the application. How do I revert?

PrabodhGyawali · 2025-04-01T23:14:16Z

Thanks for the workflow it worked.

perrymacmurray changed the title ~~No OpenCL platforms reported in Docker container~~ No OpenCL platforms reported May 17, 2021

adrastogi added the GPU label May 17, 2021

therealkenc mentioned this issue May 19, 2021

OpenCL support on WSL #6372

Closed

nelsonjchen mentioned this issue May 22, 2021

[Feature Request] Updates on getting OpenCL and CUDA GPU support? #3789

Closed

mliu2020 mentioned this issue Jul 27, 2021

Build Dl-Streamer in WSL-Ubuntu dlstreamer/dlstreamer#223

Closed

atillack mentioned this issue Oct 21, 2022

Cannot Compile from source nor Run binary on WSL2 ccsb-scripps/AutoDock-GPU#211

Closed

kh-abd-kh mentioned this issue Nov 27, 2022

is it possible c-ocl_*_win64 microsoft/antares#363

Open

BorisVSchmid mentioned this issue Dec 19, 2022

Getting neanderthal to run in the Windows Subsystem for Linux 2 uncomplicate/neanderthal#130

Closed

cylzxje mentioned this issue Jan 2, 2023

Devices not detected on WSL2 1inch/profanity2#17

Closed

ByPumbaa mentioned this issue Aug 18, 2023

ninja: error: rebuilding 'build.ninja': subcommand failed warthog-network/Warthog#5

Closed

hungpham3112 mentioned this issue Dec 5, 2023

CL_PLATFORM_NOT_FOUND_KHR (-1001) while using docker ubuntu22.04 image with GPU enabled. KhronosGroup/OpenCL-Guide#33

Open

romainGuiet mentioned this issue Feb 15, 2024

OpenCL not working on WSL BIOP/BIOP-desktop#5

Open

zku mentioned this issue Feb 15, 2024

If feasible, do not truncate float64 down to float32 in cstyle renderer tinygrad/tinygrad#3420

Merged

tangmc0210 mentioned this issue May 30, 2024

'cl_complex_mul.h' file not found when using GPU elgw/deconwolf#56

Closed

CLRafaelR mentioned this issue Aug 13, 2024

../tools/scripts/run_cuda_tests Fails on WSL2 pocl/pocl#1533

Open

barrypitman mentioned this issue Aug 19, 2024

Building tensorflow lite GPU bytedeco/javacpp-presets#1529

Open

No OpenCL platforms reported #6951

No OpenCL platforms reported #6951

Comments

perrymacmurray commented May 17, 2021 • edited Loading

Windows Build Number

WSL Version

Kernel Version

Distro Version

Other Software

Repro Steps

Expected Behavior

Actual Behavior

Diagnostic Logs

chrisfranko commented Jul 17, 2021

TGM commented Dec 6, 2021

wuweijia1994 commented Dec 14, 2021

richgel999 commented Dec 19, 2021

bridgerrholt commented Dec 19, 2021

lmeyerov commented Dec 24, 2021

jincheng-ai commented Dec 27, 2021

jincheng-ai commented Dec 27, 2021

lmeyerov commented Dec 27, 2021

clausagerskov commented Mar 15, 2022

Foosec commented May 21, 2022

richgel999 commented May 21, 2022

HO-COOH commented Jun 15, 2022

liubola commented Sep 23, 2022

Eboubaker commented Nov 28, 2022

husmen commented Feb 5, 2023

73ad commented Feb 13, 2023

jorgevazquezperez commented Feb 14, 2023

lmeyerov commented Feb 14, 2023

jorgevazquezperez commented Feb 14, 2023

lmeyerov commented Feb 14, 2023

gyferlim commented Aug 12, 2023

lmeyerov commented Aug 12, 2023

Entropy512 commented Aug 16, 2023

perrymacmurray commented Aug 16, 2023

edmondium commented Nov 5, 2023

lmeyerov commented Nov 5, 2023

alex-ong commented Jan 7, 2024 • edited Loading

Seralpa commented Jan 18, 2024

Bossach commented Jan 29, 2024

husmen commented Feb 3, 2024

joaomamede commented Feb 21, 2024

Bossach commented Feb 21, 2024

Tongzhao9417 commented Mar 2, 2024

Bossach commented Mar 2, 2024 • edited Loading

Tongzhao9417 commented Mar 11, 2024

olympichek commented May 9, 2024

monkeyden commented Jun 28, 2024

CLRafaelR commented Aug 13, 2024

Shazway commented Oct 22, 2024 • edited Loading

haipnh commented Mar 22, 2025 • edited Loading

kirse commented Mar 22, 2025

TheSecurityDev commented Mar 26, 2025

PrabodhGyawali commented Apr 1, 2025

perrymacmurray commented May 17, 2021 •

edited

Loading

alex-ong commented Jan 7, 2024 •

edited

Loading

Bossach commented Mar 2, 2024 •

edited

Loading

Shazway commented Oct 22, 2024 •

edited

Loading

haipnh commented Mar 22, 2025 •

edited

Loading