Thrown when PyCuda was confronted with a situation where it is likely that the programmer has made a mistake. LogicErrors do not depend on outer circumstances defined by the run-time environment.
Example: CUDA was used before it was initialized.
Thrown when a unforeseen run-time failure is encountered that is not likely due to programmer error.
Example: A file was not found.
Flags for Device.make_context(). CUDA 2.0 and above only.
Flags for Function.get_attribute(). CUDA 2.2 and newer.
CUDA 2.1 and newer.
Flags to be used to allocate Pagelocked Host Memory.
Initialize CUDA.
Warning
This must be called before any other function in this module.
See also pycuda.autoinit.
A handle to the number‘th CUDA device. See also pycuda.autoinit.
Return the (numeric) value of the attribute attr, which may be one of the device_attribute values.
All device_attribute values may also be directly read as (lower-case) attributes on the Device object itself, e.g. dev.clock_rate.
Create a Context on this device, with flags taken from the ctx_flags values.
Also make the newly-created context the current context.
An equivalent of a UNIX process on the compute device. Create instances of this class using Device.make_context(). See also pycuda.autoinit.
A handle for a queue of operations that will be carried out in order.
An event is a temporal ‘marker’ in a Stream that allows taking the time between two events–such as the time required to execute a kernel. An event’s time is recorded when the Stream has finished all tasks enqueued before the record() call.
See event_flags for values for the flags parameter.
Allocates a linear piece of device memory at least width bytes wide and height rows high that an be accessed using a data type of size access_size in a coalesced fashion.
Returns a tuple (dev_alloc, actual_pitch) giving a DeviceAllocation and the actual width of each row in bytes.
An object representing an allocation of linear device memory. Once this object is deleted, its associated device memory is freed.
Objects of this type can be cast to int to obtain a linear index into this Context‘s memory.
Allocate a pagelocked numpy.ndarray of shape, dtype and order.
mem_flags may be one of the values in host_alloc_flags. It may only be non-zero on CUDA 2.2 and newer.
For the meaning of the other parameters, please refer to the numpy documentation.
Allocate a pagelocked numpy.ndarray of shape, dtype and order that is zero-initialized.
mem_flags may be one of the values in host_alloc_flags. It may only be non-zero on CUDA 2.2 and newer.
For the meaning of the other parameters, please refer to the numpy documentation.
Allocate a pagelocked numpy.ndarray with the same shape, dtype and order as array.
mem_flags may be one of the values in host_alloc_flags. It may only be non-zero on CUDA 2.2 and newer.
Allocate a pagelocked numpy.ndarray with the same shape, dtype and order as array. Initialize it to 0.
mem_flags may be one of the values in host_alloc_flags. It may only be non-zero on CUDA 2.2 and newer.
The numpy.ndarray instances returned by these functions have an attribute base that references an object of type
An object representing an allocation of pagelocked host memory. Once this object is deleted, its associated device memory is freed.
Return a device pointer that indicates the address at which this memory is mapped into the device’s address space.
Only available on CUDA 2.2 and newer.
A 2D or 3D memory block that can only be accessed via texture references.
descriptor can be of type ArrayDescriptor or ArrayDescriptor3D.
A handle to a binding of either linear memory or an Array to a texture unit.
Bind self to the Array array.
As long as array remains bound to this texture reference, it will not be freed–the texture reference keeps a reference to the array.
Bind self to the a chunk of linear memory starting at the integer address devptr, encompassing a number of bytes. Due to alignment requirements, the effective texture bind address may be different from the requested one by an offset. This method returns this offset in bytes. If allow_offset is False, a nonzero value of this offset will cause an exception to be raised.
Unlike for Array objects, no life support is provided for linear memory bound to texture references.
Return a tuple (fmt, num_components), where fmt is of type array_format, and num_components is the number of channels in this texture.
(Version 2.0 and above only.)
Turn the three-dimensional numpy.ndarray object matrix into an 2D Array with multiple channels.
Depending on order, the matrix‘s shape is interpreted as
Note
This function assumes that matrix has been created with the memory order order. If that is not the case, the copied data will likely not be what you expect.
Note
count is the number of elements, not bytes.
Copy from the Python buffer src to the device pointer dest (an int or a DeviceAllocation) asynchronously, optionally serialized via stream. The size of the copy is determined by the size of the buffer.
New in 0.93.
Copy from the device pointer src (an int or a DeviceAllocation) to the Python buffer dest. The size of the copy is determined by the size of the buffer.
Optionally execute asynchronously, serialized via stream. In this case, dest must be page-locked.
Handle to a CUBIN module loaded onto the device. Can be created with module_from_file() and module_from_buffer().
Return the Function name in this module.
Warning
While you can obtain different handles to the same function using this method, these handles all share the same state that is set through the set_XXX methods of Function. This means that you can’t obtain two different handles to the same function and Function.prepare() them in two different ways.
Return the device address of the global name as an int.
The main use of this method is to find the address of pre-declared __constant__ arrays so they can be filled from the host before kernel invocation.
Create a Module by loading a PTX or CUBIN module from buffer, which must support the Python buffer interface. (For example, str and numpy.ndarray do.)
Parameters: |
|
---|
Loading PTX modules as well as non-default values of options and message_handler are only allowed on CUDA 2.1 and newer.
Handle to a __global__ function in a Module. Create using Module.get_function().
Launch self, with a thread block size of block. block must be a 3-tuple of integers.
arg1 through argn are the positional C arguments to the kernel. See param_set() for details. See especially the warnings there.
grid specifies, as a 2-tuple, the number of thread blocks to launch, as a two-dimensional grid. stream, if specified, is a Stream instance serializing the copying of input arguments (if any), execution, and the copying of output arguments (again, if any). shared gives the number of bytes available to the kernel in extern __shared__ arrays. texrefs is a list of TextureReference instances that the function will have access to.
The function returns either None or the number of seconds spent executing the kernel, depending on whether time_kernel is True.
This is a convenience interface that can be used instead of the param_*() and launch_*() methods below. For a faster (but mildly less convenient) way of invoking kernels, see prepare() and prepared_call().
Set up arg1 through argn as positional C arguments to self. They are allowed to be of the following types:
Warning
You cannot pass values of Python’s native int or float types to param_set. Since there is no unambiguous way to guess the size of these integers or floats, it complains with a TypeError.
Note
This method has to guess the types of the arguments passed to it, which can make it somewhat slow. For a kernel that is invoked often, this can be inconvenient. For a faster (but mildly less convenient) way of invoking kernels, see prepare() and prepared_call().
Prepare the invocation of this function by
Return self.
Invoke self using launch_grid(), with args and a grid size of grid. Assumes that prepare() was called on self. The texture references given to prepare() are set up as parameters, as well.
Return a 0-ary callable that can be used to query the GPU time consumed by the call, in seconds. Once called, this callable will block until completion of the invocation.
Return one of the attributes given by the function_attribute value attr.
All function_attribute values may also be directly read as (lower-case) attributes on the Function object itself, e.g. dev.clock_rate.
The number of bytes of local memory used by this function.
On CUDA 2.1 and below, this is only available if this function is part of a SourceModule. It replaces the now-deprecated attribute lmem.
The number of bytes of shared memory used by this function. Only available if this function is part of a SourceModule.
On CUDA 2.1 and below, this is only available if this function is part of a SourceModule. It replaces the now-deprecated attribute smem.
The number of 32-bit registers used by this function. Only available if this function is part of a SourceModule.
On CUDA 2.1 and below, this is only available if this function is part of a SourceModule. It replaces the now-deprecated attribute registers.
Create a Module from the CUDA source code source. The Nvidia compiler nvcc is assumed to be on the PATH if no path to it is specified, and is invoked with options to compile the code. If keep is True, the compiler output directory is kept, and a line indicating its location in the file system is printed for debugging purposes.
Unless no_extern_c is True, the given source code is wrapped in extern “C” { ... } to prevent C++ name mangling.
arch and code specify the values to be passed for the -arch and -code options on the nvcc command line. If arch is None, it defaults to the current context’s device’s compute capability. If code is None, it will not be specified.
cache_dir gives the directory used for compiler caching. It has a sensible per-user default. If it is set to False, caching is disabled.
This class exhibits the same public interface as Module, but does not inherit from it.