Python module

driver

Exposes APIs for interacting with hardware, such as allocating tensors on a GPU and moving tensors between the CPU and GPU. It provides interfaces for memory management, device properties, and hardware monitoring. Through these APIs, you can control data placement, track resource utilization, and configure device settings for optimal performance.

For example, you can use the following code to use an accelerator if one is available, otherwise use the CPU:

from max import driver

device = driver.CPU() if driver.accelerator_count() == 0 else driver.Accelerator()
print(f"Using {device} device")

`Accelerator`â

class max.driver.Accelerator(self, id: int = -1)

Creates an accelerator device with the specified ID and memory limit.

Provides access to GPU or other hardware accelerators in the system.

Repeated instantiations with a previously-used device-id will still refer to the first such instance that was created. This is especially important when providing a different memory limit: only the value (implicitly or explicitly) provided in the first such instantiation is effective.

from max import driver
device = driver.Accelerator()
# Or specify GPU id
device = driver.Accelerator(id=0)  # First GPU
device = driver.Accelerator(id=1)  # Second GPU
# Get device id
device_id = device.id

Parameters:: id (int, optional) â The device ID to use. Defaults to -1, which selects the first available accelerator.
Returns:: A new Accelerator device object.
Return type:: Accelerator

`Buffer`â

class max.driver.Buffer(self, dtype: max.dtype.DType, shape: collections.abc.Sequence[int], device: max.driver.Device | None = None, pinned: bool = False)

class max.driver.Buffer(self, dtype: max.dtype.DType, shape: collections.abc.Sequence[int], stream: max.driver.DeviceStream, pinned: bool = False)

class max.driver.Buffer(self, shape: ndarray[writable=False], device: max.driver.Device)

Device-resident buffer representation.

Allocates memory onto a given device with the provided shape and dtype. Buffers can be sliced to provide strided views of the underlying memory, but any buffers input into model execution must be contiguous.

Supports numpy-style slicing but does not currently support setting items across multiple indices.

from max import driver
from max.dtype import DType

# Create a buffer on CPU
cpu_buffer = driver.Buffer(shape=[2, 3], dtype=DType.float32)

# Create a buffer on GPU
gpu = driver.Accelerator()
gpu_buffer = driver.Buffer(shape=[2, 3], dtype=DType.float32, device=gpu)

Parameters:

dtype (DType) â Data type of buffer elements.
shape (Sequence[int]) â Tuple of positive, non-zero integers denoting the buffer shape.
device (Device, optional) â Device to allocate buffer onto. Defaults to the CPU.
pinned (bool, optional) â If True, memory is page-locked (pinned). Defaults to False.
stream (DeviceStream, optional) â Stream to associate the buffer with.

`contiguous()`â

contiguous()

Creates a contiguous copy of the parent buffer.

Parameters:: self (Buffer)
Return type:: Buffer

`copy()`â

copy(self, stream: max.driver.DeviceStream) â max.driver.Buffer

copy(self, device: max.driver.Device | None = None) â max.driver.Buffer

Overloaded function.

copy(self, stream: max.driver.DeviceStream) -> max.driver.Buffer

Creates a deep copy on the device associated with the stream.

Args:
stream (DeviceStream): The stream to associate the new buffer with.

Returns:
Buffer: A new buffer that is a copy of this buffer.
copy(self, device: max.driver.Device | None = None) -> max.driver.Buffer

Creates a deep copy on an optionally given device.
If device is None (default), a copy is created on the same device.
```
from max import driver
from max.dtype import DType
â
cpu_buffer = driver.Buffer(shape=[2, 3], dtype=DType.bfloat16, device=driver.CPU())
cpu_copy = cpu_buffer.copy()
â
# Copy to GPU
gpu = driver.Accelerator()
gpu_copy = cpu_buffer.copy(device=gpu)
```
Args:
device (Device, optional): The device to create the copy on.
Defaults to None (same device).

Returns:
Buffer: A new buffer that is a copy of this buffer.

`device`â

property device

Device on which tensor is resident.

`disable_auto_sync()`â

disable_auto_sync(self) â None

Disables automatic synchronization for asynchronous operations on this buffer.

Caution

This is an experimental feature that may be unstable. It also requires special care from the user to ensure proper synchronization.

By default, certain operations on buffers cause synchronization, such as when trying to access a buffer on the host through to_numpy. However the default synchronization is quite conservative and often ends up waiting on more than what is strictly needed.

This function disables the default synchronization method and enables mark_as_ready(), which allows for a finer control of what is waited on when a buffer needs to be synchronized.

# Assuming we have 3 buffers of the same sizes, a, b and c

# Default case with auto-synchronization
a.to(b) # 1
a.to(c) # 2

# Will wait on 1 and 2
b.to_numpy()

# Disabled synchronization
a.disable_auto_sync()
a.to(b) # 1
a.to(c) # 2

# Doesn't wait on 1 or 2, data in b could be invalid
b.to_numpy()

# Disabled synchronization with mark_as_ready
a.disable_auto_sync()
a.to(b) # 1
b.mark_as_ready()
a.to(c) # 2

# Wait on 1 but not on 2
b.to_numpy()

`dtype`â

property dtype

DType of constituent elements in tensor.

`element_size`â

property element_size

Return the size of the element type in bytes.

`from_dlpack()`â

from_dlpack(*, copy=None)

Create a buffer from an object implementing the dlpack protocol.

This usually does not result in a copy, and the producer of the object retains ownership of the underlying memory.

Parameters:

array (Any)
copy (bool | None)

Return type:

Buffer

`from_numpy()`â

from_numpy()

Creates a buffer from a provided numpy array on the host device.

The underlying data is not copied unless the array is noncontiguous. If it is, a contiguous copy will be returned.

Parameters:: arr (ndarray[tuple[Any, ...], dtype[Any]])
Return type:: Buffer

`inplace_copy_from()`â

inplace_copy_from(src)

Copy the contents of another buffer into this one.

These buffers may be on different devices. Requires that both buffers are contiguous and have same size.

Parameters:

self (Buffer)
src (Buffer)

Return type:

None

`is_contiguous`â

property is_contiguous

Whether or not buffer is contiguously allocated in memory. Returns false if the buffer is a non-contiguous slice.

Currently, we consider certain situations that are contiguous as non-contiguous for the purposes of our engine, such as when a buffer has negative steps.

`is_host`â

property is_host

Whether or not buffer is host-resident. Returns false for GPU buffers, true for CPU buffers.

from max import driver
from max.dtype import DType

cpu_buffer = driver.Buffer(shape=[2, 3], dtype=DType.bfloat16, device=driver.CPU())

print(cpu_buffer.is_host)

`item()`â

item(self) â Any

Returns the scalar value at a given location. Currently implemented only for zero-rank buffers. The return type is converted to a Python built-in type.

`mark_as_ready()`â

mark_as_ready(self) â None

Establishes a synchronization point for buffers with disabled auto-sync.

Caution

This is an experimental feature that may be unstable. It also requires special care from the user to ensure proper synchronization.

This method can only be called on buffers with disabled synchronization through disable_auto_sync().

It instructs max that whenever it needs to wait on this buffer it should only wait to the point where this was called.

It can be called multiple times, but it will override a previous synchronization point with the new one.

Refer to the disable_auto_sync() documentation for more details and examples.

`mmap()`â

mmap(dtype, shape, mode='copyonwrite', offset=0)

Parameters:

filename (PathLike[str] | str)
dtype (DType)
shape (ShapeType | int)
mode (np._MemMapModeKind)
offset (int)

Return type:

Buffer

`num_elements`â

property num_elements

Returns the number of elements in this buffer.

Rank-0 buffers have 1 element by convention.

`pinned`â

property pinned

Whether or not the underlying memory is pinned (page-locked).

`rank`â

property rank

Buffer rank.

`scalar`â

scalar = <nanobind.nb_func object>

`shape`â

property shape

Shape of buffer.

`stream`â

property stream

Stream to which tensor is bound.

`to()`â

to(self, device: max.driver.Device) â max.driver.Buffer

to(self, stream: max.driver.DeviceStream) â max.driver.Buffer

to(self, devices: collections.abc.Sequence[max.driver.Device]) â list[max.driver.Buffer]

to(self, streams: collections.abc.Sequence[max.driver.DeviceStream]) â list[max.driver.Buffer]

Overloaded function.

to(self, device: max.driver.Device) -> max.driver.Buffer

Return a buffer thatâs guaranteed to be on the given device.

The buffer is only copied if the requested device is different from the device upon which the buffer is already resident.
to(self, stream: max.driver.DeviceStream) -> max.driver.Buffer

Return a buffer thatâs guaranteed to be on the given device and associated with the given stream.

The buffer is only copied if the requested device is different from the device upon which the buffer is already resident. If the destination stream is on the same device, then a new reference to the same buffer is returned.
to(self, devices: collections.abc.Sequence[max.driver.Device]) -> list[max.driver.Buffer]

Return a list of buffers that are guaranteed to be on the given devices.

The buffers are only copied if the requested devices are different from the device upon which the buffer is already resident.
to(self, streams: collections.abc.Sequence[max.driver.DeviceStream]) -> list[max.driver.Buffer]

Return a list of buffers that are guaranteed to be on the given streams.

The buffers are only copied if the requested streams are different from the stream upon which the buffer is already resident.

`to_numpy()`â

to_numpy()

Converts the buffer to a numpy array.

If the buffer is not on the host, a copy will be issued.

Parameters:: self (Buffer)
Return type:: ndarray[tuple[Any, â¦], dtype[Any]]

`view()`â

view(dtype, shape=None)

Return a new buffer with the given type and shape that shares the underlying memory.

If the shape is not given, it will be deduced if possible, or a ValueError is raised.

Parameters:

self (Buffer)
dtype (DType)
shape (Sequence[int] | None)

Return type:

Buffer

`zeros`â

zeros = <nanobind.nb_func object>

`CPU`â

class max.driver.CPU(self, id: int = -1)

Creates a CPU device.

from max import driver
# Create default CPU device
device = driver.CPU()
# Device id is always 0 for CPU devices
device_id = device.id

Parameters:: id (int, optional) â The device ID to use. Defaults to -1.
Returns:: A new CPU device object.
Return type:: CPU

`DLPackArray`â

class max.driver.DLPackArray(*args, **kwargs)

`Device`â

class max.driver.Device

`api`â

property api

Returns the API used to program the device.

Possible values are:

cpu for host devices.
cuda for NVIDIA GPUs.
hip for AMD GPUs.

from max import driver

device = driver.CPU()
device.api

`architecture_name`â

property architecture_name

Returns the architecture name of the device.

Examples of possible values:

gfx90a, gfx942 for AMD GPUs.
sm_80, sm_86 for NVIDIA GPUs.
CPU devices raise an exception.

from max import driver

device = driver.Accelerator()
device.architecture_name

`can_access()`â

can_access(self, other: max.driver.Device) â bool

Checks if this device can directly access memory of another device.

from max import driver

gpu0 = driver.Accelerator(id=0)
gpu1 = driver.Accelerator(id=1)

if gpu0.can_access(gpu1):
    print("GPU0 can directly access GPU1 memory.")

Parameters:: other (Device) â The other device to check peer access against.
Returns:: True if peer access is possible, False otherwise.
Return type:: bool

`cpu`â

cpu = <nanobind.nb_func object>

`default_stream`â

property default_stream

Returns the default stream for this device.

The default stream is initialized when the device object is created.

Returns:: The default execution stream for this device.
Return type:: DeviceStream

`id`â

property id

Returns a zero-based device id. For a CPU device this is always 0. For GPU accelerators this is the id of the device relative to this host. Along with the label, an id can uniquely identify a device, e.g. gpu:0, gpu:1.

from max import driver

device = driver.Accelerator()
device_id = device.id

Returns:: The device ID.
Return type:: int

`is_compatible`â

property is_compatible

Returns whether this device is compatible with MAX.

Returns:: True if the device is compatible with MAX, False otherwise.
Return type:: bool

`is_host`â

property is_host

Whether this device is the CPU (host) device.

from max import driver

device = driver.CPU()
device.is_host

`label`â

property label

Returns device label.

Possible values are:

cpu for host devices.
gpu for accelerators.

from max import driver

device = driver.CPU()
device.label

`stats`â

property stats

Returns utilization data for the device.

from max import driver

device = driver.CPU()
stats = device.stats

Returns:: A dictionary containing device utilization statistics.
Return type:: dict

`synchronize()`â

synchronize(self) â None

Ensures all operations on this device complete before returning.

Raises:: ValueError â If any enqueued operations had an internal error.

`DeviceEvent`â

class max.driver.DeviceEvent(self, device: max.driver.Device, enable_timing: bool = False)

Provides access to an event object.

An event can be used to wait for the GPU execution to reach a certain point on the given stream.

from max import driver
# Create a default accelerator device
device = driver.Accelerator()
# Create an event on the device
event = driver.DeviceEvent(device)
# Record an event on the device (default stream)
device.default_stream.record_event(event)
# Wait for execution on the default stream to reach the event
event.synchronize()

Creates an event for synchronization on the specified device.

Parameters:

device (Device) â The device on which to create the event.
enable_timing (bool) â If True, enable GPU timing on this event. Events created with enable_timing=True can be used with elapsed_time() to measure GPU execution time. Defaults to False.

Raises:

ValueError â If event creation failed.

from max import driver

device = driver.Accelerator()
event = driver.DeviceEvent(device)
timed_event = driver.DeviceEvent(device, enable_timing=True)

`elapsed_time()`â

elapsed_time(self, end_event: max.driver.DeviceEvent) â float

Returns the elapsed GPU time in milliseconds between this event and end_event.

Both events must have been created with enable_timing=True and recorded on a stream before calling this method. The end event must be synchronized before calling this method.

Parameters:: end_event (DeviceEvent) â The ending event.
Returns:: Elapsed time in milliseconds.
Return type:: float
Raises:: RuntimeError â If either event was not created with timing enabled, or if the events have not been recorded.

from max import driver

device = driver.Accelerator()
start = driver.DeviceEvent(device, enable_timing=True)
end = driver.DeviceEvent(device, enable_timing=True)

stream = device.default_stream
stream.record_event(start)
# ... GPU work ...
stream.record_event(end)
end.synchronize()

elapsed_ms = start.elapsed_time(end)

`is_ready()`â

is_ready(self) â bool

Returns whether this event is ready.

Returns:: True if the event is complete, otherwise false.
Return type:: bool
Raises:: ValueError â If querying the event status returned an error

`synchronize()`â

synchronize(self) â None

Ensures all operations on this stream complete before returning.

Raises:: ValueError â If any enqueued operations had an internal error.

`DeviceSpec`â

class max.driver.DeviceSpec(id, device_type='cpu')

Specification for a device, containing its ID and type.

This class provides a way to specify device parameters like ID and type (CPU/GPU) for creating Device instances.

Parameters:

id (int)
device_type (Literal['cpu', 'gpu'])

`accelerator()`â

static accelerator(id=0)

Creates an accelerator (GPU) device specification.

Parameters:: id (int)

`cpu()`â

static cpu(id=-1)

Creates a CPU device specification.

Parameters:: id (int)

`device_type`â

device_type: Literal['cpu', 'gpu'] = 'cpu'

Type of specified device.

`id`â

id: int

Provided id for this device.

`DeviceStream`â

class max.driver.DeviceStream(self, device: max.driver.Device)

Provides access to a stream of execution on a device.

A stream represents a sequence of operations that will be executed in order. Multiple streams on the same device can execute concurrently.

from max import driver
# Create a default accelerator device
device = driver.Accelerator()
# Get the default stream for the device
stream = device.default_stream
# Create a new stream of execution on the device
new_stream = driver.DeviceStream(device)

Creates a new stream of execution associated with the device.

Parameters:: device (Device) â The device to create the stream on.
Returns:: A new stream of execution.
Return type:: DeviceStream

`device`â

property device

The device this stream is executing on.

`record_event()`â

record_event(self) â max.driver.DeviceEvent

record_event(self, event: max.driver.DeviceEvent) â None

Overloaded function.

record_event(self) -> max.driver.DeviceEvent

Records an event on this stream. Returns: : DeviceEvent: A new event that will be signaled when all operations : submitted to this stream before this call have completed.

Raises:
ValueError: If recording the event failed.
record_event(self, event: max.driver.DeviceEvent) -> None

Records an existing event on this stream.

Args:
event (DeviceEvent): The event to record on this stream.

Raises:
ValueError: If recording the event failed.

`synchronize()`â

synchronize(self) â None

Ensures all operations on this stream complete before returning.

Raises:: ValueError â If any enqueued operations had an internal error.

`wait_for()`â

wait_for(self, stream: max.driver.DeviceStream) â None

wait_for(self, device: max.driver.Device) â None

Overloaded function.

wait_for(self, stream: max.driver.DeviceStream) -> None

Ensures all operations on the other stream complete before future work submitted to this stream is scheduled.

Args:
stream (DeviceStream): The stream to wait for.
wait_for(self, device: max.driver.Device) -> None

Ensures all operations on deviceâs default stream complete before future work submitted to this stream is scheduled.

Args:
device (Device): The device whose default stream to wait for.

`accelerator_api()`â

max.driver.accelerator_api()

Returns the API used to program the accelerator.

Return type:: str

`accelerator_architecture_name()`â

max.driver.accelerator_architecture_name()

Returns the architecture name of the accelerator device.

Return type:: str

`calculate_virtual_device_count()`â

max.driver.calculate_virtual_device_count(*device_spec_lists)

Calculate the minimum virtual device count needed for the given device specs.

Parameters:: *device_spec_lists (list[DeviceSpec]) â One or more lists of DeviceSpec objects (e.g., main devices and draft devices)
Returns:: The minimum number of virtual devices needed (max GPU ID + 1), or 1 if no GPUs
Return type:: int

`calculate_virtual_device_count_from_cli()`â

max.driver.calculate_virtual_device_count_from_cli(*device_inputs)

Calculate virtual device count from raw CLI inputs (before parsing).

This helper works with the raw device input strings or lists before theyâre parsed into DeviceSpec objects. Used when virtual device mode needs to be enabled before device validation occurs.

Parameters:: *device_inputs (str | list[int]) â One or more raw device inputs - either strings like âgpu:0,1,2â or lists of integers like [0, 1, 2]
Returns:: The minimum number of virtual devices needed (max GPU ID + 1), or 1 if no GPUs
Return type:: int

`devices_exist()`â

max.driver.devices_exist(devices)

Identify if devices exist.

Parameters:: devices (list[DeviceSpec])
Return type:: bool

`load_devices()`â

max.driver.load_devices(device_specs)

Initialize and return a list of devices, given a list of device specs.

Parameters:: device_specs (Sequence[DeviceSpec])
Return type:: list[Device]

`load_max_buffer()`â

max.driver.load_max_buffer(path)

Experimental method for loading serialized MAX buffers.

Max buffers can be exported by creating a graph and calling Value.print() with the BINARY_MAX_CHECKPOINT option.

Parameters:: path (PathLike[str]) â Path to buffer (should end with .max)
Returns:: A Buffer created from the path. The shape and dtype are read from the file.
Raises:: ValueError if the file format is not the MAX checkpoint format. â
Return type:: Buffer

`scan_available_devices()`â

max.driver.scan_available_devices()

Returns all accelerators if available, else return cpu.

Return type:: list[DeviceSpec]

`accelerator_count()`â

max.driver.accelerator_count() â int

Returns number of accelerator devices available.

View source

Was this page helpful?

Thank you! We'll create more content like this.

Thank you for helping us improve!

Acceleratorâ

Bufferâ

contiguous()â

copy()â

deviceâ

disable_auto_sync()â

dtypeâ

element_sizeâ

from_dlpack()â

from_numpy()â

inplace_copy_from()â

is_contiguousâ

is_hostâ

item()â

mark_as_ready()â

mmap()â

num_elementsâ

pinnedâ

rankâ

scalarâ

shapeâ

streamâ

to()â

to_numpy()â

view()â

zerosâ

CPUâ

DLPackArrayâ

Deviceâ

apiâ

architecture_nameâ

can_access()â

cpuâ

default_streamâ

idâ

is_compatibleâ

is_hostâ

labelâ

statsâ

synchronize()â

DeviceEventâ

elapsed_time()â

is_ready()â

synchronize()â

DeviceSpecâ

accelerator()â

cpu()â

device_typeâ

idâ

DeviceStreamâ

deviceâ

record_event()â

synchronize()â

wait_for()â

accelerator_api()â

accelerator_architecture_name()â

calculate_virtual_device_count()â

calculate_virtual_device_count_from_cli()â

devices_exist()â

load_devices()â

load_max_buffer()â

scan_available_devices()â

accelerator_count()â

`Accelerator`â

`Buffer`â

`contiguous()`â

`copy()`â

`device`â

`disable_auto_sync()`â

`dtype`â

`element_size`â

`from_dlpack()`â

`from_numpy()`â

`inplace_copy_from()`â

`is_contiguous`â

`is_host`â

`item()`â

`mark_as_ready()`â

`mmap()`â

`num_elements`â

`pinned`â

`rank`â

`scalar`â

`shape`â

`stream`â

`to()`â

`to_numpy()`â

`view()`â

`zeros`â

`CPU`â

`DLPackArray`â

`Device`â

`api`â

`architecture_name`â

`can_access()`â

`cpu`â

`default_stream`â

`id`â

`is_compatible`â

`is_host`â

`label`â

`stats`â

`synchronize()`â

`DeviceEvent`â

`elapsed_time()`â

`is_ready()`â

`synchronize()`â

`DeviceSpec`â

`accelerator()`â

`cpu()`â

`device_type`â

`id`â

`DeviceStream`â

`device`â

`record_event()`â

`synchronize()`â

`wait_for()`â

`accelerator_api()`â

`accelerator_architecture_name()`â

`calculate_virtual_device_count()`â

`calculate_virtual_device_count_from_cli()`â

`devices_exist()`â

`load_devices()`â

`load_max_buffer()`â

`scan_available_devices()`â

`accelerator_count()`â