Python module
driver
Exposes APIs for interacting with hardware, such as allocating tensors on a GPU and moving tensors between the CPU and GPU. It provides interfaces for memory management, device properties, and hardware monitoring. Through these APIs, you can control data placement, track resource utilization, and configure device settings for optimal performance.
For example, you can use the following code to use an accelerator if one is available, otherwise use the CPU:
from max import driver
device = driver.CPU() if driver.accelerator_count() == 0 else driver.Accelerator()
print(f"Using {device} device")Acceleratorâ
class max.driver.Accelerator(self, id: int = -1)
Creates an accelerator device with the specified ID and memory limit.
Provides access to GPU or other hardware accelerators in the system.
Repeated instantiations with a previously-used device-id will still refer to the first such instance that was created. This is especially important when providing a different memory limit: only the value (implicitly or explicitly) provided in the first such instantiation is effective.
from max import driver
device = driver.Accelerator()
# Or specify GPU id
device = driver.Accelerator(id=0) # First GPU
device = driver.Accelerator(id=1) # Second GPU
# Get device id
device_id = device.id-
Parameters:
-
id (int, optional) â The device ID to use. Defaults to -1, which selects the first available accelerator.
-
Returns:
-
A new Accelerator device object.
-
Return type:
Bufferâ
class max.driver.Buffer(self, dtype: max.dtype.DType, shape: collections.abc.Sequence[int], device: max.driver.Device | None = None, pinned: bool = False)
class max.driver.Buffer(self, dtype: max.dtype.DType, shape: collections.abc.Sequence[int], stream: max.driver.DeviceStream, pinned: bool = False)
class max.driver.Buffer(self, shape: ndarray[writable=False], device: max.driver.Device)
Device-resident buffer representation.
Allocates memory onto a given device with the provided shape and dtype. Buffers can be sliced to provide strided views of the underlying memory, but any buffers input into model execution must be contiguous.
Supports numpy-style slicing but does not currently support setting items across multiple indices.
from max import driver
from max.dtype import DType
# Create a buffer on CPU
cpu_buffer = driver.Buffer(shape=[2, 3], dtype=DType.float32)
# Create a buffer on GPU
gpu = driver.Accelerator()
gpu_buffer = driver.Buffer(shape=[2, 3], dtype=DType.float32, device=gpu)-
Parameters:
-
- dtype (DType) â Data type of buffer elements.
- shape (Sequence[int]) â Tuple of positive, non-zero integers denoting the buffer shape.
- device (Device, optional) â Device to allocate buffer onto. Defaults to the CPU.
- pinned (bool, optional) â If True, memory is page-locked (pinned). Defaults to False.
- stream (DeviceStream, optional) â Stream to associate the buffer with.
contiguous()â
contiguous()
Creates a contiguous copy of the parent buffer.
copy()â
copy(self, stream: max.driver.DeviceStream) â max.driver.Buffer
copy(self, device: max.driver.Device | None = None) â max.driver.Buffer
Overloaded function.
-
copy(self, stream: max.driver.DeviceStream) -> max.driver.BufferCreates a deep copy on the device associated with the stream.
- Args:
- stream (DeviceStream): The stream to associate the new buffer with.
- Returns:
- Buffer: A new buffer that is a copy of this buffer.
-
copy(self, device: max.driver.Device | None = None) -> max.driver.BufferCreates a deep copy on an optionally given device.
If device is None (default), a copy is created on the same device.
from max import driver from max.dtype import DType â cpu_buffer = driver.Buffer(shape=[2, 3], dtype=DType.bfloat16, device=driver.CPU()) cpu_copy = cpu_buffer.copy() â # Copy to GPU gpu = driver.Accelerator() gpu_copy = cpu_buffer.copy(device=gpu)- Args:
- device (Device, optional): The device to create the copy on.
- Defaults to None (same device).
- Returns:
- Buffer: A new buffer that is a copy of this buffer.
deviceâ
property device
Device on which tensor is resident.
disable_auto_sync()â
disable_auto_sync(self) â None
Disables automatic synchronization for asynchronous operations on this buffer.
By default, certain operations on buffers cause synchronization, such as when trying to access a buffer on the host through to_numpy. However the default synchronization is quite conservative and often ends up waiting on more than what is strictly needed.
This function disables the default synchronization method and enables mark_as_ready(), which allows for a finer control of what is waited on when a buffer needs to be synchronized.
# Assuming we have 3 buffers of the same sizes, a, b and c
# Default case with auto-synchronization
a.to(b) # 1
a.to(c) # 2
# Will wait on 1 and 2
b.to_numpy()
# Disabled synchronization
a.disable_auto_sync()
a.to(b) # 1
a.to(c) # 2
# Doesn't wait on 1 or 2, data in b could be invalid
b.to_numpy()
# Disabled synchronization with mark_as_ready
a.disable_auto_sync()
a.to(b) # 1
b.mark_as_ready()
a.to(c) # 2
# Wait on 1 but not on 2
b.to_numpy()dtypeâ
property dtype
DType of constituent elements in tensor.
element_sizeâ
property element_size
Return the size of the element type in bytes.
from_dlpack()â
from_dlpack(*, copy=None)
Create a buffer from an object implementing the dlpack protocol.
This usually does not result in a copy, and the producer of the object retains ownership of the underlying memory.
from_numpy()â
from_numpy()
Creates a buffer from a provided numpy array on the host device.
The underlying data is not copied unless the array is noncontiguous. If it is, a contiguous copy will be returned.
inplace_copy_from()â
inplace_copy_from(src)
Copy the contents of another buffer into this one.
These buffers may be on different devices. Requires that both buffers are contiguous and have same size.
is_contiguousâ
property is_contiguous
Whether or not buffer is contiguously allocated in memory. Returns false if the buffer is a non-contiguous slice.
Currently, we consider certain situations that are contiguous as non-contiguous for the purposes of our engine, such as when a buffer has negative steps.
is_hostâ
property is_host
Whether or not buffer is host-resident. Returns false for GPU buffers, true for CPU buffers.
from max import driver
from max.dtype import DType
cpu_buffer = driver.Buffer(shape=[2, 3], dtype=DType.bfloat16, device=driver.CPU())
print(cpu_buffer.is_host)item()â
item(self) â Any
Returns the scalar value at a given location. Currently implemented only for zero-rank buffers. The return type is converted to a Python built-in type.
mark_as_ready()â
mark_as_ready(self) â None
Establishes a synchronization point for buffers with disabled auto-sync.
This method can only be called on buffers with disabled synchronization through disable_auto_sync().
It instructs max that whenever it needs to wait on this buffer it should only wait to the point where this was called.
It can be called multiple times, but it will override a previous synchronization point with the new one.
Refer to the disable_auto_sync() documentation for more details and examples.
mmap()â
mmap(dtype, shape, mode='copyonwrite', offset=0)
num_elementsâ
property num_elements
Returns the number of elements in this buffer.
Rank-0 buffers have 1 element by convention.
pinnedâ
property pinned
Whether or not the underlying memory is pinned (page-locked).
rankâ
property rank
Buffer rank.
scalarâ
scalar = <nanobind.nb_func object>
shapeâ
property shape
Shape of buffer.
streamâ
property stream
Stream to which tensor is bound.
to()â
to(self, device: max.driver.Device) â max.driver.Buffer
to(self, stream: max.driver.DeviceStream) â max.driver.Buffer
to(self, devices: collections.abc.Sequence[max.driver.Device]) â list[max.driver.Buffer]
to(self, streams: collections.abc.Sequence[max.driver.DeviceStream]) â list[max.driver.Buffer]
Overloaded function.
-
to(self, device: max.driver.Device) -> max.driver.BufferReturn a buffer thatâs guaranteed to be on the given device.
The buffer is only copied if the requested device is different from the device upon which the buffer is already resident.
-
to(self, stream: max.driver.DeviceStream) -> max.driver.BufferReturn a buffer thatâs guaranteed to be on the given device and associated with the given stream.
The buffer is only copied if the requested device is different from the device upon which the buffer is already resident. If the destination stream is on the same device, then a new reference to the same buffer is returned.
-
to(self, devices: collections.abc.Sequence[max.driver.Device]) -> list[max.driver.Buffer]Return a list of buffers that are guaranteed to be on the given devices.
The buffers are only copied if the requested devices are different from the device upon which the buffer is already resident.
-
to(self, streams: collections.abc.Sequence[max.driver.DeviceStream]) -> list[max.driver.Buffer]Return a list of buffers that are guaranteed to be on the given streams.
The buffers are only copied if the requested streams are different from the stream upon which the buffer is already resident.
to_numpy()â
to_numpy()
Converts the buffer to a numpy array.
If the buffer is not on the host, a copy will be issued.
view()â
view(dtype, shape=None)
Return a new buffer with the given type and shape that shares the underlying memory.
If the shape is not given, it will be deduced if possible, or a ValueError is raised.
zerosâ
zeros = <nanobind.nb_func object>
CPUâ
class max.driver.CPU(self, id: int = -1)
Creates a CPU device.
from max import driver
# Create default CPU device
device = driver.CPU()
# Device id is always 0 for CPU devices
device_id = device.idDLPackArrayâ
class max.driver.DLPackArray(*args, **kwargs)
Deviceâ
class max.driver.Device
apiâ
property api
Returns the API used to program the device.
Possible values are:
cpufor host devices.cudafor NVIDIA GPUs.hipfor AMD GPUs.
from max import driver
device = driver.CPU()
device.apiarchitecture_nameâ
property architecture_name
Returns the architecture name of the device.
Examples of possible values:
gfx90a,gfx942for AMD GPUs.sm_80,sm_86for NVIDIA GPUs.- CPU devices raise an exception.
from max import driver
device = driver.Accelerator()
device.architecture_namecan_access()â
can_access(self, other: max.driver.Device) â bool
Checks if this device can directly access memory of another device.
from max import driver
gpu0 = driver.Accelerator(id=0)
gpu1 = driver.Accelerator(id=1)
if gpu0.can_access(gpu1):
print("GPU0 can directly access GPU1 memory.")cpuâ
cpu = <nanobind.nb_func object>
default_streamâ
property default_stream
Returns the default stream for this device.
The default stream is initialized when the device object is created.
-
Returns:
-
The default execution stream for this device.
-
Return type:
idâ
property id
Returns a zero-based device id. For a CPU device this is always 0.
For GPU accelerators this is the id of the device relative to this host.
Along with the label, an id can uniquely identify a device,
e.g. gpu:0, gpu:1.
from max import driver
device = driver.Accelerator()
device_id = device.id-
Returns:
-
The device ID.
-
Return type:
is_compatibleâ
property is_compatible
Returns whether this device is compatible with MAX.
-
Returns:
-
True if the device is compatible with MAX, False otherwise.
-
Return type:
is_hostâ
property is_host
Whether this device is the CPU (host) device.
from max import driver
device = driver.CPU()
device.is_hostlabelâ
property label
Returns device label.
Possible values are:
cpufor host devices.gpufor accelerators.
from max import driver
device = driver.CPU()
device.labelstatsâ
property stats
Returns utilization data for the device.
from max import driver
device = driver.CPU()
stats = device.stats-
Returns:
-
A dictionary containing device utilization statistics.
-
Return type:
synchronize()â
synchronize(self) â None
Ensures all operations on this device complete before returning.
-
Raises:
-
ValueError â If any enqueued operations had an internal error.
DeviceEventâ
class max.driver.DeviceEvent(self, device: max.driver.Device, enable_timing: bool = False)
Provides access to an event object.
An event can be used to wait for the GPU execution to reach a certain point on the given stream.
from max import driver
# Create a default accelerator device
device = driver.Accelerator()
# Create an event on the device
event = driver.DeviceEvent(device)
# Record an event on the device (default stream)
device.default_stream.record_event(event)
# Wait for execution on the default stream to reach the event
event.synchronize()Creates an event for synchronization on the specified device.
-
Parameters:
-
- device (Device) â The device on which to create the event.
- enable_timing (bool) â If True, enable GPU timing on this event.
Events created with
enable_timing=Truecan be used withelapsed_time()to measure GPU execution time. Defaults to False.
-
Raises:
-
ValueError â If event creation failed.
from max import driver
device = driver.Accelerator()
event = driver.DeviceEvent(device)
timed_event = driver.DeviceEvent(device, enable_timing=True)elapsed_time()â
elapsed_time(self, end_event: max.driver.DeviceEvent) â float
Returns the elapsed GPU time in milliseconds between this event
and end_event.
Both events must have been created with enable_timing=True
and recorded on a stream before calling this method. The end
event must be synchronized before calling this method.
-
Parameters:
-
end_event (DeviceEvent) â The ending event.
-
Returns:
-
Elapsed time in milliseconds.
-
Return type:
-
Raises:
-
RuntimeError â If either event was not created with timing enabled, or if the events have not been recorded.
from max import driver
device = driver.Accelerator()
start = driver.DeviceEvent(device, enable_timing=True)
end = driver.DeviceEvent(device, enable_timing=True)
stream = device.default_stream
stream.record_event(start)
# ... GPU work ...
stream.record_event(end)
end.synchronize()
elapsed_ms = start.elapsed_time(end)is_ready()â
is_ready(self) â bool
Returns whether this event is ready.
-
Returns:
-
True if the event is complete, otherwise false.
-
Return type:
-
Raises:
-
ValueError â If querying the event status returned an error
synchronize()â
synchronize(self) â None
Ensures all operations on this stream complete before returning.
-
Raises:
-
ValueError â If any enqueued operations had an internal error.
DeviceSpecâ
class max.driver.DeviceSpec(id, device_type='cpu')
Specification for a device, containing its ID and type.
This class provides a way to specify device parameters like ID and type (CPU/GPU) for creating Device instances.
accelerator()â
static accelerator(id=0)
Creates an accelerator (GPU) device specification.
-
Parameters:
-
id (int)
cpu()â
static cpu(id=-1)
Creates a CPU device specification.
-
Parameters:
-
id (int)
device_typeâ
device_type: Literal['cpu', 'gpu'] = 'cpu'
Type of specified device.
idâ
id: int
Provided id for this device.
DeviceStreamâ
class max.driver.DeviceStream(self, device: max.driver.Device)
Provides access to a stream of execution on a device.
A stream represents a sequence of operations that will be executed in order. Multiple streams on the same device can execute concurrently.
from max import driver
# Create a default accelerator device
device = driver.Accelerator()
# Get the default stream for the device
stream = device.default_stream
# Create a new stream of execution on the device
new_stream = driver.DeviceStream(device)Creates a new stream of execution associated with the device.
-
Parameters:
-
device (Device) â The device to create the stream on.
-
Returns:
-
A new stream of execution.
-
Return type:
deviceâ
property device
The device this stream is executing on.
record_event()â
record_event(self) â max.driver.DeviceEvent
record_event(self, event: max.driver.DeviceEvent) â None
Overloaded function.
-
record_event(self) -> max.driver.DeviceEventRecords an event on this stream. Returns: : DeviceEvent: A new event that will be signaled when all operations : submitted to this stream before this call have completed.
- Raises:
- ValueError: If recording the event failed.
-
record_event(self, event: max.driver.DeviceEvent) -> NoneRecords an existing event on this stream.
- Args:
- event (DeviceEvent): The event to record on this stream.
- Raises:
- ValueError: If recording the event failed.
synchronize()â
synchronize(self) â None
Ensures all operations on this stream complete before returning.
-
Raises:
-
ValueError â If any enqueued operations had an internal error.
wait_for()â
wait_for(self, stream: max.driver.DeviceStream) â None
wait_for(self, device: max.driver.Device) â None
Overloaded function.
-
wait_for(self, stream: max.driver.DeviceStream) -> NoneEnsures all operations on the other stream complete before future work submitted to this stream is scheduled.
- Args:
- stream (DeviceStream): The stream to wait for.
-
wait_for(self, device: max.driver.Device) -> NoneEnsures all operations on deviceâs default stream complete before future work submitted to this stream is scheduled.
- Args:
- device (Device): The device whose default stream to wait for.
accelerator_api()â
max.driver.accelerator_api()
Returns the API used to program the accelerator.
-
Return type:
accelerator_architecture_name()â
max.driver.accelerator_architecture_name()
Returns the architecture name of the accelerator device.
-
Return type:
calculate_virtual_device_count()â
max.driver.calculate_virtual_device_count(*device_spec_lists)
Calculate the minimum virtual device count needed for the given device specs.
-
Parameters:
-
*device_spec_lists (list[DeviceSpec]) â One or more lists of DeviceSpec objects (e.g., main devices and draft devices)
-
Returns:
-
The minimum number of virtual devices needed (max GPU ID + 1), or 1 if no GPUs
-
Return type:
calculate_virtual_device_count_from_cli()â
max.driver.calculate_virtual_device_count_from_cli(*device_inputs)
Calculate virtual device count from raw CLI inputs (before parsing).
This helper works with the raw device input strings or lists before theyâre parsed into DeviceSpec objects. Used when virtual device mode needs to be enabled before device validation occurs.
devices_exist()â
max.driver.devices_exist(devices)
Identify if devices exist.
-
Parameters:
-
devices (list[DeviceSpec])
-
Return type:
load_devices()â
max.driver.load_devices(device_specs)
Initialize and return a list of devices, given a list of device specs.
-
Parameters:
-
device_specs (Sequence[DeviceSpec])
-
Return type:
load_max_buffer()â
max.driver.load_max_buffer(path)
Experimental method for loading serialized MAX buffers.
Max buffers can be exported by creating a graph and calling Value.print() with the BINARY_MAX_CHECKPOINT option.
scan_available_devices()â
max.driver.scan_available_devices()
Returns all accelerators if available, else return cpu.
-
Return type:
accelerator_count()â
max.driver.accelerator_count() â int
Returns number of accelerator devices available.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!