Title: | Interface to the 'nanoarrow' 'C' Library |
---|---|
Description: | Provides an 'R' interface to the 'nanoarrow' 'C' library and the 'Apache Arrow' application binary interface. Functions to import and export 'ArrowArray', 'ArrowSchema', and 'ArrowArrayStream' 'C' structures to and from 'R' objects are provided alongside helpers to facilitate zero-copy data transfer among 'R' bindings to libraries implementing the 'Arrow' 'C' data interface. |
Authors: | Dewey Dunnington [aut, cre] , Apache Arrow [aut, cph], Apache Software Foundation [cph] |
Maintainer: | Dewey Dunnington <[email protected]> |
License: | Apache License (>= 2) |
Version: | 0.6.0 |
Built: | 2024-11-12 03:09:49 UTC |
Source: | https://github.com/apache/arrow-nanoarrow |
In some cases, R functions that return a nanoarrow_array_stream
may require that the scope of some other object outlive that of the array
stream. If there is a need for that object to be released deterministically
(e.g., to close open files), you can register a function to run after the
stream's release callback is invoked from the R thread. Note that this
finalizer will not be run if the stream's release callback is invoked
from a non-R thread. In this case, the finalizer and its chain of
environments will be garbage-collected when nanoarrow:::preserved_empty()
is run.
array_stream_set_finalizer(array_stream, finalizer)
array_stream_set_finalizer(array_stream, finalizer)
array_stream |
|
finalizer |
A function that will be called with zero arguments. |
A newly allocated array_stream
whose release callback will call
the supplied finalizer.
stream <- array_stream_set_finalizer( basic_array_stream(list(1:5)), function() message("All done!") ) stream$release()
stream <- array_stream_set_finalizer( basic_array_stream(list(1:5)), function() message("All done!") ) stream$release()
In nanoarrow an 'array' refers to the struct ArrowArray
definition
in the Arrow C data interface. At the R level, we attach a
schema such that functionally the nanoarrow_array
class can be used in a similar way as an arrow::Array
. Note that in
nanoarrow an arrow::RecordBatch
and a non-nullable arrow::StructArray
are represented identically.
as_nanoarrow_array(x, ..., schema = NULL)
as_nanoarrow_array(x, ..., schema = NULL)
x |
An object to convert to a array |
... |
Passed to S3 methods |
schema |
An optional schema used to enforce conversion to a particular
type. Defaults to |
An object of class 'nanoarrow_array'
(array <- as_nanoarrow_array(1:5)) as.vector(array) (array <- as_nanoarrow_array(data.frame(x = 1:5))) as.data.frame(array)
(array <- as_nanoarrow_array(1:5)) as.vector(array) (array <- as_nanoarrow_array(data.frame(x = 1:5))) as.data.frame(array)
In nanoarrow, an 'array stream' corresponds to the struct ArrowArrayStream
as defined in the Arrow C Stream interface. This object is used to represent
a stream of arrays with a common
schema. This is similar to an
arrow::RecordBatchReader except it can be used to represent a stream of
any type (not just record batches). Note that a stream of record batches
and a stream of non-nullable struct arrays are represented identically.
Also note that array streams are mutable objects and are passed by
reference and not by value.
as_nanoarrow_array_stream(x, ..., schema = NULL)
as_nanoarrow_array_stream(x, ..., schema = NULL)
x |
An object to convert to a array_stream |
... |
Passed to S3 methods |
schema |
An optional schema used to enforce conversion to a particular
type. Defaults to |
An object of class 'nanoarrow_array_stream'
(stream <- as_nanoarrow_array_stream(data.frame(x = 1:5))) stream$get_schema() stream$get_next() # The last batch is returned as NULL stream$get_next() # Release the stream stream$release()
(stream <- as_nanoarrow_array_stream(data.frame(x = 1:5))) stream$get_schema() stream$get_next() # The last batch is returned as NULL stream$get_next() # Release the stream stream$release()
Convert an object to a nanoarrow buffer
as_nanoarrow_buffer(x, ...)
as_nanoarrow_buffer(x, ...)
x |
An object to convert to a buffer |
... |
Passed to S3 methods |
An object of class 'nanoarrow_buffer'
array <- as_nanoarrow_array(c(NA, 1:4)) array$buffers as.raw(array$buffers[[1]]) as.raw(array$buffers[[2]]) convert_buffer(array$buffers[[1]]) convert_buffer(array$buffers[[2]])
array <- as_nanoarrow_array(c(NA, 1:4)) array$buffers as.raw(array$buffers[[1]]) as.raw(array$buffers[[2]]) convert_buffer(array$buffers[[1]]) convert_buffer(array$buffers[[2]])
In nanoarrow a 'schema' refers to a struct ArrowSchema
as defined in the
Arrow C Data interface. This data structure can be used to represent an
arrow::schema()
, an arrow::field()
, or an arrow::DataType
. Note that
in nanoarrow, an arrow::schema()
and a non-nullable arrow::struct()
are represented identically.
as_nanoarrow_schema(x, ...) infer_nanoarrow_schema(x, ...) nanoarrow_schema_parse(x, recursive = FALSE) nanoarrow_schema_modify(x, new_values, validate = TRUE)
as_nanoarrow_schema(x, ...) infer_nanoarrow_schema(x, ...) nanoarrow_schema_parse(x, recursive = FALSE) nanoarrow_schema_modify(x, new_values, validate = TRUE)
x |
An object to convert to a schema |
... |
Passed to S3 methods |
recursive |
Use |
new_values |
New schema component to assign |
validate |
Use |
An object of class 'nanoarrow_schema'
infer_nanoarrow_schema(integer()) infer_nanoarrow_schema(data.frame(x = integer()))
infer_nanoarrow_schema(integer()) infer_nanoarrow_schema(data.frame(x = integer()))
This experimental vctr class allows zero or more Arrow arrays to present as an R vector without converting them. This is useful for arrays with types that do not have a non-lossy R equivalent, and helps provide an intermediary object type where the default conversion is prohibitively expensive (e.g., a nested list of data frames). These objects will not survive many vctr transformations; however, they can be sliced without copying the underlying arrays.
as_nanoarrow_vctr(x, ..., schema = NULL, subclass = character()) nanoarrow_vctr(schema = NULL, subclass = character())
as_nanoarrow_vctr(x, ..., schema = NULL, subclass = character()) nanoarrow_vctr(schema = NULL, subclass = character())
x |
An object that works with |
... |
Passed to |
schema |
An optional |
subclass |
An optional subclass of nanoarrow_vctr to prepend to the final class name. |
The nanoarrow_vctr is currently implemented similarly to factor()
: its
storage type is an integer()
that is a sequence along the total length
of the vctr and there are attributes that are required to resolve these
indices to an array + offset. Sequences typically have a very compact
representation in recent versions of R such that this has a cheap storage
footprint even for large arrays. The attributes are currently:
schema
: The nanoarrow_schema shared by each chunk.
chunks
: A list()
of nanoarrow_array
.
offsets
: An integer()
vector beginning with 0
and followed by the
cumulative length of each chunk. This allows the chunk index + offset
to be resolved from a logical index with log(n)
complexity.
This implementation is preliminary and may change; however, the result of
as_nanoarrow_array_stream(some_vctr[begin:end])
should remain stable.
A vctr of class 'nanoarrow_vctr'
array <- as_nanoarrow_array(1:5) as_nanoarrow_vctr(array)
array <- as_nanoarrow_array(1:5) as_nanoarrow_vctr(array)
Create ArrayStreams from batches
basic_array_stream(batches, schema = NULL, validate = TRUE)
basic_array_stream(batches, schema = NULL, validate = TRUE)
batches |
A |
schema |
A nanoarrow_schema or |
validate |
Use |
(stream <- basic_array_stream(list(data.frame(a = 1, b = 2)))) as.data.frame(stream$get_next()) stream$get_next()
(stream <- basic_array_stream(list(data.frame(a = 1, b = 2)))) as.data.frame(stream$get_next()) stream$get_next()
Converts array
to the type specified by to
. This is a low-level interface;
most users should use as.data.frame()
or as.vector()
unless finer-grained
control is needed over the conversion. This function is an S3 generic
dispatching on to
: developers may implement their own S3 methods for
custom vector types.
convert_array(array, to = NULL, ...)
convert_array(array, to = NULL, ...)
array |
|
to |
A target prototype object describing the type to which |
... |
Passed to S3 methods |
Note that unregistered extension types will by default issue a warning.
Use options(nanoarrow.warn_unregistered_extension = FALSE)
to disable
this behaviour.
Conversions are implemented for the following R vector types:
logical()
: Any numeric type can be converted to logical()
in addition
to the bool type. For numeric types, any non-zero value is considered TRUE
.
integer()
: Any numeric type can be converted to integer()
; however,
a warning will be signaled if the any value is outside the range of the
32-bit integer.
double()
: Any numeric type can be converted to double()
. This
conversion currently does not warn for values that may not roundtrip
through a floating-point double (e.g., very large uint64 and int64 values).
character()
: String and large string types can be converted to
character()
. The conversion does not check for valid UTF-8: if you need
finer-grained control over encodings, use to = blob::blob()
.
factor()
: Dictionary-encoded arrays of strings can be converted to
factor()
; however, this must be specified explicitly (i.e.,
convert_array(array, factor())
) because arrays arriving
in chunks can have dictionaries that contain different levels. Use
convert_array(array, factor(levels = c(...)))
to materialize an array
into a vector with known levels.
Date: Only the date32 type can be converted to an R Date vector.
hms::hms()
: Time32 and time64 types can be converted to hms::hms()
.
difftime()
: Time32, time64, and duration types can be converted to
R difftime()
vectors. The value is converted to match the units()
attribute of to
.
blob::blob()
: String, large string, binary, and large binary types can
be converted to blob::blob()
.
vctrs::list_of()
: List, large list, and fixed-size list types can be
converted to vctrs::list_of()
.
data.frame()
: Struct types can be converted to data.frame()
.
vctrs::unspecified()
: Any type can be converted to vctrs::unspecified()
;
however, a warning will be raised if any non-null values are encountered.
In addition to the above conversions, a null array may be converted to any
target prototype except data.frame()
. Extension arrays are currently
converted as their storage type.
An R vector of type to
.
array <- as_nanoarrow_array(data.frame(x = 1:5)) str(convert_array(array)) str(convert_array(array, to = data.frame(x = double())))
array <- as_nanoarrow_array(data.frame(x = 1:5)) str(convert_array(array)) str(convert_array(array, to = data.frame(x = double())))
Converts array_stream
to the type specified by to
. This is a low-level
interface; most users should use as.data.frame()
or as.vector()
unless
finer-grained control is needed over the conversion. See convert_array()
for details of the conversion process; see infer_nanoarrow_ptype()
for
default inferences of to
.
convert_array_stream(array_stream, to = NULL, size = NULL, n = Inf) collect_array_stream(array_stream, n = Inf, schema = NULL, validate = TRUE)
convert_array_stream(array_stream, to = NULL, size = NULL, n = Inf) collect_array_stream(array_stream, n = Inf, schema = NULL, validate = TRUE)
array_stream |
|
to |
A target prototype object describing the type to which |
size |
The exact size of the output, if known. If specified, slightly more efficient implementation may be used to collect the output. |
n |
The maximum number of batches to pull from the array stream. |
schema |
A nanoarrow_schema or |
validate |
Use |
convert_array_stream()
: An R vector of type to
.
collect_array_stream()
: A list()
of nanoarrow_array
stream <- as_nanoarrow_array_stream(data.frame(x = 1:5)) str(convert_array_stream(stream)) str(convert_array_stream(stream, to = data.frame(x = double()))) stream <- as_nanoarrow_array_stream(data.frame(x = 1:5)) collect_array_stream(stream)
stream <- as_nanoarrow_array_stream(data.frame(x = 1:5)) str(convert_array_stream(stream)) str(convert_array_stream(stream, to = data.frame(x = double()))) stream <- as_nanoarrow_array_stream(data.frame(x = 1:5)) collect_array_stream(stream)
Resolves the default to
value to use in convert_array()
and
convert_array_stream()
. The default conversions are:
infer_nanoarrow_ptype(x)
infer_nanoarrow_ptype(x)
x |
A nanoarrow_schema, nanoarrow_array, or nanoarrow_array_stream. |
null to vctrs::unspecified()
boolean to logical()
int8, uint8, int16, uint16, and int13 to integer()
uint32, int64, uint64, float, and double to double()
string and large string to character()
struct to data.frame()
binary and large binary to blob::blob()
list, large_list, and fixed_size_list to vctrs::list_of()
time32 and time64 to hms::hms()
duration to difftime()
date32 to as.Date()
timestamp to as.POSIXct()
Additional conversions are possible by specifying an explicit value for
to
. For details of each conversion, see convert_array()
.
An R vector of zero size describing the target into which the array should be materialized.
infer_nanoarrow_ptype(as_nanoarrow_array(1:10))
infer_nanoarrow_ptype(as_nanoarrow_array(1:10))
Implement Arrow extension types
infer_nanoarrow_ptype_extension( extension_spec, x, ..., warn_unregistered = TRUE ) convert_array_extension( extension_spec, array, to, ..., warn_unregistered = TRUE ) as_nanoarrow_array_extension(extension_spec, x, ..., schema = NULL)
infer_nanoarrow_ptype_extension( extension_spec, x, ..., warn_unregistered = TRUE ) convert_array_extension( extension_spec, array, to, ..., warn_unregistered = TRUE ) as_nanoarrow_array_extension(extension_spec, x, ..., schema = NULL)
extension_spec |
An extension specification inheriting from 'nanoarrow_extension_spec'. |
x , array , to , schema , ...
|
Passed from |
warn_unregistered |
Use |
infer_nanoarrow_ptype_extension()
: The R vector prototype to be used
as the default conversion target.
convert_array_extension()
: An R vector of type to
.
as_nanoarrow_array_extension()
: A nanoarrow_array
of type schema
.
In nanoarrow, types, fields, and schemas are all represented by a
nanoarrow_schema. These functions are convenience
constructors to create these objects in a readable way. Use na_type()
to
construct types based on the constructor name, which is also the name that
prints/is returned by nanoarrow_schema_parse()
.
na_type( type_name, byte_width = NULL, unit = NULL, timezone = NULL, column_types = NULL, item_type = NULL, key_type = NULL, value_type = NULL, index_type = NULL, ordered = NULL, list_size = NULL, keys_sorted = NULL, storage_type = NULL, extension_name = NULL, extension_metadata = NULL, nullable = NULL ) na_na(nullable = TRUE) na_bool(nullable = TRUE) na_int8(nullable = TRUE) na_uint8(nullable = TRUE) na_int16(nullable = TRUE) na_uint16(nullable = TRUE) na_int32(nullable = TRUE) na_uint32(nullable = TRUE) na_int64(nullable = TRUE) na_uint64(nullable = TRUE) na_half_float(nullable = TRUE) na_float(nullable = TRUE) na_double(nullable = TRUE) na_string(nullable = TRUE) na_large_string(nullable = TRUE) na_string_view(nullable = TRUE) na_binary(nullable = TRUE) na_large_binary(nullable = TRUE) na_fixed_size_binary(byte_width, nullable = TRUE) na_binary_view(nullable = TRUE) na_date32(nullable = TRUE) na_date64(nullable = TRUE) na_time32(unit = c("ms", "s"), nullable = TRUE) na_time64(unit = c("us", "ns"), nullable = TRUE) na_duration(unit = c("ms", "s", "us", "ns"), nullable = TRUE) na_interval_months(nullable = TRUE) na_interval_day_time(nullable = TRUE) na_interval_month_day_nano(nullable = TRUE) na_timestamp(unit = c("us", "ns", "s", "ms"), timezone = "", nullable = TRUE) na_decimal128(precision, scale, nullable = TRUE) na_decimal256(precision, scale, nullable = TRUE) na_struct(column_types = list(), nullable = FALSE) na_sparse_union(column_types = list()) na_dense_union(column_types = list()) na_list(item_type, nullable = TRUE) na_large_list(item_type, nullable = TRUE) na_fixed_size_list(item_type, list_size, nullable = TRUE) na_map(key_type, item_type, keys_sorted = FALSE, nullable = TRUE) na_dictionary(value_type, index_type = na_int32(), ordered = FALSE) na_extension(storage_type, extension_name, extension_metadata = "")
na_type( type_name, byte_width = NULL, unit = NULL, timezone = NULL, column_types = NULL, item_type = NULL, key_type = NULL, value_type = NULL, index_type = NULL, ordered = NULL, list_size = NULL, keys_sorted = NULL, storage_type = NULL, extension_name = NULL, extension_metadata = NULL, nullable = NULL ) na_na(nullable = TRUE) na_bool(nullable = TRUE) na_int8(nullable = TRUE) na_uint8(nullable = TRUE) na_int16(nullable = TRUE) na_uint16(nullable = TRUE) na_int32(nullable = TRUE) na_uint32(nullable = TRUE) na_int64(nullable = TRUE) na_uint64(nullable = TRUE) na_half_float(nullable = TRUE) na_float(nullable = TRUE) na_double(nullable = TRUE) na_string(nullable = TRUE) na_large_string(nullable = TRUE) na_string_view(nullable = TRUE) na_binary(nullable = TRUE) na_large_binary(nullable = TRUE) na_fixed_size_binary(byte_width, nullable = TRUE) na_binary_view(nullable = TRUE) na_date32(nullable = TRUE) na_date64(nullable = TRUE) na_time32(unit = c("ms", "s"), nullable = TRUE) na_time64(unit = c("us", "ns"), nullable = TRUE) na_duration(unit = c("ms", "s", "us", "ns"), nullable = TRUE) na_interval_months(nullable = TRUE) na_interval_day_time(nullable = TRUE) na_interval_month_day_nano(nullable = TRUE) na_timestamp(unit = c("us", "ns", "s", "ms"), timezone = "", nullable = TRUE) na_decimal128(precision, scale, nullable = TRUE) na_decimal256(precision, scale, nullable = TRUE) na_struct(column_types = list(), nullable = FALSE) na_sparse_union(column_types = list()) na_dense_union(column_types = list()) na_list(item_type, nullable = TRUE) na_large_list(item_type, nullable = TRUE) na_fixed_size_list(item_type, list_size, nullable = TRUE) na_map(key_type, item_type, keys_sorted = FALSE, nullable = TRUE) na_dictionary(value_type, index_type = na_int32(), ordered = FALSE) na_extension(storage_type, extension_name, extension_metadata = "")
type_name |
The name of the type (e.g., "int32"). This form of the constructor is useful for writing tests that loop over many types. |
byte_width |
For |
unit |
One of 's' (seconds), 'ms' (milliseconds), 'us' (microseconds), or 'ns' (nanoseconds). |
timezone |
A string representing a timezone name. The empty string "" represents a naive point in time (i.e., one that has no associated timezone). |
column_types |
A |
item_type |
For |
key_type |
The nanoarrow_schema representing the
|
value_type |
The nanoarrow_schema representing the
|
index_type |
The nanoarrow_schema representing the
|
ordered |
Use |
list_size |
The number of elements in each item in a
|
keys_sorted |
Use |
storage_type |
For |
extension_name |
For |
extension_metadata |
A string or raw vector defining extension metadata. Most Arrow extension types define extension metadata as a JSON object. |
nullable |
Use |
precision |
The total number of digits representable by the decimal type |
scale |
The number of digits after the decimal point in a decimal type |
na_int32() na_struct(list(col1 = na_int32()))
na_int32() na_struct(list(col1 = na_int32()))
The Arrow format provides a rich type system that can handle most R
vector types; however, many R vector types do not roundtrip perfectly
through Arrow memory. The vctrs extension type uses vctrs::vec_data()
,
vctrs::vec_restore()
, and vctrs::vec_ptype()
in calls to
as_nanoarrow_array()
and convert_array()
to ensure roundtrip fidelity.
na_vctrs(ptype, storage_type = NULL)
na_vctrs(ptype, storage_type = NULL)
ptype |
A vctrs prototype as returned by |
storage_type |
For |
vctr <- as.POSIXlt("2000-01-02 03:45", tz = "UTC") array <- as_nanoarrow_array(vctr, schema = na_vctrs(vctr)) infer_nanoarrow_ptype(array) convert_array(array)
vctr <- as.POSIXlt("2000-01-02 03:45", tz = "UTC") array <- as_nanoarrow_array(vctr, schema = na_vctrs(vctr)) infer_nanoarrow_ptype(array) convert_array(array)
Create a new array or from an existing array, modify one or more parameters.
When importing an array from elsewhere, nanoarrow_array_set_schema()
is
useful to attach the data type information to the array (without this
information there is little that nanoarrow can do with the array since its
content cannot be otherwise interpreted). nanoarrow_array_modify()
can
create a shallow copy and modify various parameters to create a new array,
including setting children and buffers recursively. These functions power the
$<-
operator, which can modify one parameter at a time.
nanoarrow_array_init(schema) nanoarrow_array_set_schema(array, schema, validate = TRUE) nanoarrow_array_modify(array, new_values, validate = TRUE)
nanoarrow_array_init(schema) nanoarrow_array_set_schema(array, schema, validate = TRUE) nanoarrow_array_modify(array, new_values, validate = TRUE)
schema |
A nanoarrow_schema to attach to this
|
array |
|
validate |
Use |
new_values |
A named |
nanoarrow_array_init()
returns a possibly invalid but initialized
array with a given schema
.
nanoarrow_array_set_schema()
returns array
, invisibly. Note that
array
is modified in place by reference.
nanoarrow_array_modify()
returns a shallow copy of array
with the
modified parameters such that the original array remains valid.
nanoarrow_array_init(na_string()) # Modify an array using $ and <- array <- as_nanoarrow_array(1:5) array$length <- 4 as.vector(array) # Modify potentially more than one component at a time array <- as_nanoarrow_array(1:5) as.vector(nanoarrow_array_modify(array, list(length = 4))) # Attach a schema to an array array <- as_nanoarrow_array(-1L) nanoarrow_array_set_schema(array, na_uint32()) as.vector(array)
nanoarrow_array_init(na_string()) # Modify an array using $ and <- array <- as_nanoarrow_array(1:5) array$length <- 4 as.vector(array) # Modify potentially more than one component at a time array <- as_nanoarrow_array(1:5) as.vector(nanoarrow_array_modify(array, list(length = 4))) # Attach a schema to an array array <- as_nanoarrow_array(-1L) nanoarrow_array_set_schema(array, na_uint32()) as.vector(array)
Create and modify nanoarrow buffers
nanoarrow_buffer_init() nanoarrow_buffer_append(buffer, new_buffer) convert_buffer(buffer, to = NULL)
nanoarrow_buffer_init() nanoarrow_buffer_append(buffer, new_buffer) convert_buffer(buffer, to = NULL)
buffer , new_buffer
|
|
to |
A target prototype object describing the type to which |
nanoarrow_buffer_init()
: An object of class 'nanoarrow_buffer'
nanoarrow_buffer_append()
: Returns buffer
, invisibly. Note that
buffer
is modified in place by reference.
buffer <- nanoarrow_buffer_init() nanoarrow_buffer_append(buffer, 1:5) array <- nanoarrow_array_modify( nanoarrow_array_init(na_int32()), list(length = 5, buffers = list(NULL, buffer)) ) as.vector(array)
buffer <- nanoarrow_buffer_init() nanoarrow_buffer_append(buffer, 1:5) array <- nanoarrow_array_modify( nanoarrow_array_init(na_int32()), list(length = 5, buffers = list(NULL, buffer)) ) as.vector(array)
Create Arrow extension arrays
nanoarrow_extension_array( storage_array, extension_name, extension_metadata = NULL )
nanoarrow_extension_array( storage_array, extension_name, extension_metadata = NULL )
storage_array |
|
extension_name |
For |
extension_metadata |
A string or raw vector defining extension metadata. Most Arrow extension types define extension metadata as a JSON object. |
A nanoarrow_array with attached extension schema.
nanoarrow_extension_array(1:10, "some_ext", '{"key": "value"}')
nanoarrow_extension_array(1:10, "some_ext", '{"key": "value"}')
Register Arrow extension types
nanoarrow_extension_spec(data = list(), subclass = character()) register_nanoarrow_extension(extension_name, extension_spec) unregister_nanoarrow_extension(extension_name) resolve_nanoarrow_extension(extension_name)
nanoarrow_extension_spec(data = list(), subclass = character()) register_nanoarrow_extension(extension_name, extension_spec) unregister_nanoarrow_extension(extension_name) resolve_nanoarrow_extension(extension_name)
data |
Optional data to include in the extension type specification |
subclass |
A subclass for the extension type specification. Extension methods will dispatch on this object. |
extension_name |
An Arrow extension type name (e.g., arrow.r.vctrs) |
extension_spec |
An extension specification inheriting from 'nanoarrow_extension_spec'. |
nanoarrow_extension_spec()
returns an object of class
'nanoarrow_extension_spec'.
register_nanoarrow_extension()
returns extension_spec
, invisibly.
unregister_nanoarrow_extension()
returns extension_name
, invisibly.
resolve_nanoarrow_extension()
returns an object of class
'nanoarrow_extension_spec' or NULL if the extension type was not
registered.
nanoarrow_extension_spec("mynamespace.mytype", subclass = "mypackage_mytype_spec")
nanoarrow_extension_spec("mynamespace.mytype", subclass = "mypackage_mytype_spec")
The nanoarrow_schema,
nanoarrow_array,
and nanoarrow_array_stream classes are
represented in R as external pointers (EXTPTRSXP
). When these objects
go out of scope (i.e., when they are garbage collected or shortly
thereafter), the underlying object's release()
callback is called if
the underlying pointer is non-null and if the release()
callback is
non-null.
nanoarrow_pointer_is_valid(ptr) nanoarrow_pointer_addr_dbl(ptr) nanoarrow_pointer_addr_chr(ptr) nanoarrow_pointer_addr_pretty(ptr) nanoarrow_pointer_release(ptr) nanoarrow_pointer_move(ptr_src, ptr_dst) nanoarrow_pointer_export(ptr_src, ptr_dst) nanoarrow_allocate_schema() nanoarrow_allocate_array() nanoarrow_allocate_array_stream() nanoarrow_pointer_set_protected(ptr_src, protected)
nanoarrow_pointer_is_valid(ptr) nanoarrow_pointer_addr_dbl(ptr) nanoarrow_pointer_addr_chr(ptr) nanoarrow_pointer_addr_pretty(ptr) nanoarrow_pointer_release(ptr) nanoarrow_pointer_move(ptr_src, ptr_dst) nanoarrow_pointer_export(ptr_src, ptr_dst) nanoarrow_allocate_schema() nanoarrow_allocate_array() nanoarrow_allocate_array_stream() nanoarrow_pointer_set_protected(ptr_src, protected)
ptr , ptr_src , ptr_dst
|
An external pointer to a |
protected |
An object whose scope must outlive that of |
When interacting with other C Data Interface implementations, it is
important to keep in mind that the R object wrapping these pointers is
always passed by reference (because it is an external pointer) and may
be referred to by another R object (e.g., an element in a list()
or as a
variable assigned in a user's environment). When importing a schema,
array, or array stream into nanoarrow this is not a problem: the R object
takes ownership of the lifecycle and memory is released when the R
object is garbage collected. In this case, one can use
nanoarrow_pointer_move()
where ptr_dst
was created using
nanoarrow_allocate_*()
.
The case of exporting is more complicated and as such has a dedicated
function, nanoarrow_pointer_export()
, that implements different logic
schemas, arrays, and array streams:
Schema objects are (deep) copied such that a fresh copy of the schema is exported and made the responsibility of some other C data interface implementation.
Array objects are exported as a shell around the original array that preserves a reference to the R object. This ensures that the buffers and children pointed to by the array are not copied and that any references to the original array are not invalidated.
Array stream objects are moved: the responsibility for the object is transferred to the other C data interface implementation and any references to the original R object are invalidated. Because these objects are mutable, this is typically what you want (i.e., you should not be pulling arrays from a stream accidentally from two places).
If you know the lifecycle of your object (i.e., you created the R object
yourself and never passed references to it elsewhere), you can slightly
more efficiently call nanoarrow_pointer_move()
for all three pointer
types.
nanoarrow_pointer_is_valid()
returns TRUE if the pointer is non-null
and has a non-null release callback.
nanoarrow_pointer_addr_dbl()
and nanoarrow_pointer_addr_chr()
return
pointer representations that may be helpful to facilitate moving or
exporting nanoarrow objects to other libraries.
nanoarrow_pointer_addr_pretty()
gives a pointer representation suitable
for printing or error messages.
nanoarrow_pointer_release()
returns ptr
, invisibly.
nanoarrow_pointer_move()
and nanoarrow_pointer_export()
reeturn
ptr_dst
, invisibly.
nanoarrow_allocate_array()
, nanoarrow_allocate_schema()
, and
nanoarrow_allocate_array_stream()
return an
array, a schema, and an
array stream, respectively.
Underlying 'nanoarrow' C library build
nanoarrow_version(runtime = TRUE)
nanoarrow_version(runtime = TRUE)
runtime |
Compare TRUE and FALSE values to detect a possible ABI mismatch. |
A string identifying the version of nanoarrow this package was compiled against.
nanoarrow_version()
nanoarrow_version()
Reads/writes connections, file paths, URLs, or raw vectors from/to serialized Arrow data. Arrow documentation typically refers to this format as "Arrow IPC", since its origin was as a means to transmit tables between processes (e.g., multiple R sessions). This format can also be written to and read from files or URLs and is essentially a high performance equivalent of a CSV file that does a better job maintaining types.
read_nanoarrow(x, ..., lazy = FALSE) write_nanoarrow(data, x, ...) example_ipc_stream()
read_nanoarrow(x, ..., lazy = FALSE) write_nanoarrow(data, x, ...) example_ipc_stream()
x |
A |
... |
Currently unused. |
lazy |
By default, |
data |
An object to write as an Arrow IPC stream, converted using
|
The nanoarrow package implements an IPC writer; however, you can also
use arrow::write_ipc_stream()
to write data from R, or use
the equivalent writer from another Arrow implementation in Python, C++,
Rust, JavaScript, Julia, C#, and beyond.
The media type of an Arrow stream is application/vnd.apache.arrow.stream
and the recommended file extension is .arrows
.
as.data.frame(read_nanoarrow(example_ipc_stream()))
as.data.frame(read_nanoarrow(example_ipc_stream()))