Usage Guide#
Abracudabra is a Python library designed to simplify conversions between arrays, dataframes, series, and tensors, seamlessly handling CPU (NumPy/Pandas/Torch) and CUDA (CuPy/cuDF/Torch) environments.
Supported Data Types and Libraries#
Data Object |
CPU |
CUDA |
---|---|---|
Array |
|
|
Series |
|
|
DataFrame |
|
|
Index |
|
|
Tensor |
|
|
Device Management#
Abracudabra manages devices through the Device object:
type (
"cpu"
or"cuda"
)idx (optional integer, e.g.,
0
forcuda:0
)
[1]:
from abracudabra import Device
cpu_device = Device(type="cpu")
cuda_device = Device(type="cuda", idx=0)
print(f"CPU device: {cpu_device}")
print(f"CUDA device: {cuda_device}")
CPU device: cpu
CUDA device: cuda:0
Conversion Functions#
Abracudabra provides high-level functions for data conversion:
Function |
Converts From |
Converts To |
---|---|---|
array, series, dataframe, tensor |
array |
|
array, series, dataframe, tensor |
tensor |
|
array, tensor |
series |
|
array, tensor, mapping of arrays/tensors |
dataframe |
All functions accept an optional device
parameter:
If specified, output data is moved to that device.
If not specified, data stays on its original device.
Example Usage#
Convert a torch tensor to an array:
[2]:
import torch
from abracudabra import to_array
tensor = torch.rand(2, 3, device="cuda:0")
array = to_array(tensor)
print("type:", type(array))
type: <class 'cupy.ndarray'>
Convert array to series:
[3]:
import numpy as np
from abracudabra import to_series
array = np.ones((4,), dtype=np.float32)
series = to_series(array, device="cuda:0")
print(series)
print("type:", type(series))
0 1.0
1 1.0
2 1.0
3 1.0
dtype: float32
type: <class 'cudf.core.series.Series'>
Build dataframe from mixed data types and devices:
[4]:
import cupy as cp
import numpy as np
import torch
from abracudabra import to_dataframe
numpy_array = np.full((5,), 1, dtype=np.float32)
cupy_array = cp.full((5,), 2, dtype=cp.int8)
torch_tensor = torch.full((5,), 3, dtype=torch.float32, device="cuda:0")
dataframe = to_dataframe(
{"numpy": numpy_array, "cupy": cupy_array, "torch": torch_tensor}, device="cuda:0"
)
print(dataframe)
print("type:", type(dataframe))
numpy cupy torch
0 1.0 2 3.0
1 1.0 2 3.0
2 1.0 2 3.0
3 1.0 2 3.0
4 1.0 2 3.0
type: <class 'cudf.core.dataframe.DataFrame'>
Device Management#
Check the Device of an Object#
Use get_device to determine the current device of an object:
[5]:
import numpy as np
from abracudabra import get_device
numpy_array = np.ones((4,), dtype=np.float32)
get_device(numpy_array) # CPU device
[5]:
Device(type='cpu', idx=None)
Move Data Between Devices#
Use to_device to move data between devices:
[6]:
import cupy as cp
from abracudabra import to_device
numpy_array = cp.ones((4,), dtype=cp.float32)
cupy_array = to_device(numpy_array, device="cuda:0") # Move to CUDA device
print(cupy_array)
print("type:", type(cupy_array))
[1. 1. 1. 1.]
type: <class 'cupy.ndarray'>
Library Selection Helpers#
Use helper functions to obtain the correct library based on the target device:
get_np_or_cp returns
numpy
("cpu"
) orcupy
(for"cuda"
).get_pd_or_cudf returns
pandas
(for"cpu"
) orcudf
(for"cuda"
).
It can be particularly useful since these libraries intentionally share a common API.
Example:
[7]:
from abracudabra import get_np_or_cp
device_type = "cuda"
# Get numpy or cupy (here: cupy)
np_or_cp = get_np_or_cp(device_type)
# Create a numpy or cupy array (here: cupy)
array = np_or_cp.ones((4,), dtype=np.float32)
print(type(array))
<class 'cupy.ndarray'>