Usage Guide

Usage Guide#

Abracudabra is a Python library designed to simplify conversions between arrays, dataframes, series, and tensors, seamlessly handling CPU (NumPy/Pandas/Torch) and CUDA (CuPy/cuDF/Torch) environments.

Supported Data Types and Libraries#

Data Object	CPU	CUDA
Array	`numpy.ndarray`	`cupy.ndarray`
Series	`pandas.Series`	`cudf.Series`
DataFrame	`pandas.DataFrame`	`cudf.DataFrame`
Index	`pandas.Index`	`cudf.Index`
Tensor	`torch.Tensor`	`torch.Tensor`

Device Management#

Abracudabra manages devices through the Device object:

type ("cpu" or "cuda")
idx (optional integer, e.g., 0 for cuda:0)

[1]:

from abracudabra import Device

cpu_device = Device(type="cpu")
cuda_device = Device(type="cuda", idx=0)

print(f"CPU device: {cpu_device}")
print(f"CUDA device: {cuda_device}")

CPU device: cpu
CUDA device: cuda:0

Conversion Functions#

Abracudabra provides high-level functions for data conversion:

Function	Converts From	Converts To
to_array	array, series, dataframe, tensor	array
to_tensor	array, series, dataframe, tensor	tensor
to_series	array, tensor	series
to_dataframe	array, tensor, mapping of arrays/tensors	dataframe

All functions accept an optional device parameter:

If specified, output data is moved to that device.
If not specified, data stays on its original device.

Example Usage#

Convert a torch tensor to an array:

[2]:

import torch

from abracudabra import to_array

tensor = torch.rand(2, 3, device="cuda:0")
array = to_array(tensor)

print("type:", type(array))

type: <class 'cupy.ndarray'>

Convert array to series:

[3]:

import numpy as np

from abracudabra import to_series

array = np.ones((4,), dtype=np.float32)

series = to_series(array, device="cuda:0")
print(series)
print("type:", type(series))

0    1.0
1    1.0
2    1.0
3    1.0
dtype: float32
type: <class 'cudf.core.series.Series'>

Build dataframe from mixed data types and devices:

[4]:

import cupy as cp
import numpy as np
import torch

from abracudabra import to_dataframe

numpy_array = np.full((5,), 1, dtype=np.float32)
cupy_array = cp.full((5,), 2, dtype=cp.int8)
torch_tensor = torch.full((5,), 3, dtype=torch.float32, device="cuda:0")

dataframe = to_dataframe(
    {"numpy": numpy_array, "cupy": cupy_array, "torch": torch_tensor}, device="cuda:0"
)

print(dataframe)
print("type:", type(dataframe))

   numpy  cupy  torch
0    1.0     2    3.0
1    1.0     2    3.0
2    1.0     2    3.0
3    1.0     2    3.0
4    1.0     2    3.0
type: <class 'cudf.core.dataframe.DataFrame'>

Device Management#

Check the Device of an Object#

Use get_device to determine the current device of an object:

[5]:

import numpy as np

from abracudabra import get_device

numpy_array = np.ones((4,), dtype=np.float32)

get_device(numpy_array)  # CPU device

[5]:

Device(type='cpu', idx=None)

Move Data Between Devices#

Use to_device to move data between devices:

[6]:

import cupy as cp

from abracudabra import to_device

numpy_array = cp.ones((4,), dtype=cp.float32)

cupy_array = to_device(numpy_array, device="cuda:0")  # Move to CUDA device

print(cupy_array)
print("type:", type(cupy_array))

[1. 1. 1. 1.]
type: <class 'cupy.ndarray'>

Library Selection Helpers#

Use helper functions to obtain the correct library based on the target device:

get_np_or_cp returns numpy ("cpu") or cupy (for "cuda").
get_pd_or_cudf returns pandas (for "cpu") or cudf (for "cuda").

It can be particularly useful since these libraries intentionally share a common API.

Example:

[7]:

from abracudabra import get_np_or_cp

device_type = "cuda"

# Get numpy or cupy (here: cupy)
np_or_cp = get_np_or_cp(device_type)

# Create a numpy or cupy array (here: cupy)
array = np_or_cp.ones((4,), dtype=np.float32)
print(type(array))

<class 'cupy.ndarray'>

Usage Guide

Contents

Usage Guide#

Supported Data Types and Libraries#

Device Management#

Conversion Functions#

Example Usage#

Device Management#

Check the Device of an Object#

Move Data Between Devices#

Library Selection Helpers#