cuPyNumeric is a library that aims to provide a distributed and accelerated drop-in replacement for NumPy built on top of the Legate framework.
With cuPyNumeric you can write code productively in Python, using the familiar NumPy API, and have your program scale with no code changes from single-CPU computers to multi-node-multi-GPU clusters.
For example, you can run the final example of the Python CFD course completely unmodified on 2048 A100 GPUs in a DGX SuperPOD and achieve good weak scaling.
cuPyNumeric works best for programs that have very large arrays of data that cannot fit in the memory of a single GPU or a single node and need to span multiple nodes and GPUs. While our implementation of the current NumPy API is still incomplete, programs that use unimplemented features will still work (assuming enough memory) by falling back to the canonical NumPy implementation.
cuPyNumeric is available from conda on the legate channel. See https://docs.nvidia.com/cupynumeric/latest/installation.html for details about different install configurations, or building cuPyNumeric from source.
The cuPyNumeric documentation can be found here.
See the discussion on contributing in CONTRIBUTING.md.
For technical questions about cuPyNumeric and Legate-based tools, please visit the community discussion forum.
If you have other questions, please contact us at legate(at)nvidia.com.
This project, i.e., cuPyNumeric, is separate and independent of the CuPy project. CuPy is a registered trademark of Preferred Networks.