05 February 2019

Development environment for CUDA-enabled GPUs.


The NVIDIA CUDA Toolkit provides a development environment for creating high performance GPU-accelerated applications. With the CUDA Toolkit, you can develop, optimize and deploy your applications on GPU-accelerated embedded systems, desktop workstations, enterprise data centers, cloud-based platforms and HPC supercomputers. The toolkit includes GPU-accelerated libraries, debugging and optimization tools, a C/C++ compiler and a runtime library to deploy your application

GPU-accelerated CUDA libraries enable drop-in acceleration across multiple domains such as linear algebra, image and video processing, deep learning and graph analytics. For developing custom algorithms, you can use available integrations with commonly used languages and numerical packages as well as well-published development APIs. Your CUDA applications can be deployed across all NVIDIA GPU families available on premise and on GPU instances in the cloud. Using built-in capabilities for distributing computations across multi-GPU configurations, scientists and researchers can develop applications that scale from single GPU workstations to cloud installations with thousands of GPUs

What's new in NVIDIA CUDA Toolkit

Version 10.3:

Full log available here

  • CUDA 10.0 adds support for the Turing architecture (compute_75 and sm_75)
  • CUDA 10.0 adds support for new programming constructs called CUDA Graphs, a new asynchronous task-graph programming model that enables more efficient launch and execution. See the API documentation for more information
  • Warp matrix functions now support additional matrix shapes 32x8x16 and 8x32x16. Warp matrix functions also include the ability (experimental in CUDA 10.0) to perform sub-byte operations (4-bit unsigned, 4-bit signed and 1-bit) using the Tensor Cores
  • Added support for CUDA-Vulkan and CUDA-DX12 interoperability APIs
  • Added support for a new instruction nanosleep that suspends a thread for a specified duration
  • Added 6.3 version of the Parallel Thread Execution instruction set architecture (ISA). For more details on new (sm_75 target, wmma, nanosleep, FP16 atomics) and deprecated instructions, see this section in the PTX documentation
  • Starting with CUDA 10.0, the CUDA runtime is compatible with specific older NVIDIA drivers. A new package called "cuda-compat-" is included in the toolkit installer packages. For more information on compatibility, see the section in the Best Practices Guide
  • The following new operating systems are supported by CUDA. See the System Requirements section in the NVIDIA CUDA Installation Guide for Linux for a full list of supported operating systems
    • Ubuntu 18.04 LTS*
    • Ubuntu 14.04 LTS
    • SUSE SLES 15
    • OpenSUSE Leap 15
  • Added support for peer-to-peer (P2P) with CUDA on Windows (WDDM 2.0+ only)
  • Added a new CUDA sample to demonstrate multi-device cooperative group APIs
  • CUDA samples are now also available on GitHub: https://github.com/NVIDIA/cuda-samples
  • Added APIs to retrieve the LUID of CUDA devices (cuDeviceGetLuid)
  • Added cudaLimitMaxL2FetchGranularity in the device management APIs (cudaDeviceGetLimit) to set the maximum fetch granularity of L2 (in Bytes)
  • The cudaDeviceProp struct now includes the device UUID
  • Added support for synchronization across multiple devices with Cooperative Groups (cuLaunchCooperativeKernelMultiDevice) on Windows in TCC mode

10 June 2009
Version: 2.2

So... are nVidia and Apple going to collaborate on some merge of CUDA and OpenCL? Or does it matter, since Microsoft will just implement their own incompatible variant in DirectX 13?
22 May 2015
Version: 6.0.37
Latest version is 7.0.36: http://www.nvidia.com/object/macosx-cuda-7.0.36-driver.html
16 April 2014
Version: 5.5.20
Latest now is 6.0.37. http://www.nvidia.com/object/macosx-cuda-6.0.37-driver.html
27 September 2013
Version: 5.0.36
Latest is 5.5.25.
18 January 2011
Version: 3.2
While NVidia is constantly moving CUDA forward (kudos to them), and it is found in many pieces of software making real differences to end users (After Effects, Matlab etc.), Apple and Khronos are sat twiddling their thumbs doing nothing with OpenCL. I'm all for open standards, but OpenCL is such a dead duck. Sadly, Apple Mac Pros currently come with ATI cards and thus no software support from major vendors for computing on the GPU...
18 January 2011
Version: 3.2
working great with Genarts and other apps.
24 July 2009
Version: 2.3.1
If they do then they risk further alienating the scientific community, which already flirts more than average with GNU/Linux and Mac OS. Having said that, of course they will.
10 June 2009
Version: 2.2
