can store on hard drive, and read slices of GB-level data in seconds
Value
self
the sliced data
a data frame with the dimension names as index columns and
value_name
as value column
original array
the collapsed data
Public fields
dim
dimension of the array
dimnames
dimension names of the array
use_index
whether to use one dimension as index when storing data as multiple files
hybrid
whether to allow data to be written to disk
last_used
timestamp of the object was read
temporary
whether to remove the files once garbage collected
Active bindings
varnames
dimension names (read-only)
read_only
whether to protect the swap files from being changed
swap_file
file or files to save data to
Methods
Method print()
print out the data dimensions and snapshot
Method .use_multi_files()
Internally used, whether to use multiple files to cache data instead of one
Method new()
constructor
Usage
Tensor$new(
data,
dim,
dimnames,
varnames,
hybrid = FALSE,
use_index = FALSE,
swap_file = temp_tensor_file(),
temporary = TRUE,
multi_files = FALSE
)
Arguments
data
numeric array
dim
dimension of the array
dimnames
dimension names of the array
varnames
characters, names of
dimnames
hybrid
whether to enable hybrid mode
use_index
whether to use the last dimension for indexing
swap_file
where to store the data in hybrid mode files to save data by index; default stores in
raveio_getopt('tensor_temp_path')
temporary
whether to remove temporary files when existing
multi_files
if
use_index
is true, whether to use multiple
Method subset()
subset tensor
Usage
Tensor$subset(..., drop = FALSE, data_only = FALSE, .env = parent.frame())
Arguments
...
dimension slices
drop
whether to apply
drop
on subset datadata_only
whether just return the data value, or wrap them as a
Tensor
instance.env
environment where
...
is evaluated
Method to_swap()
Serialize tensor to a file and store it via
write_fst
Method to_swap_now()
Serialize tensor to a file and store it via
write_fst
immediately
Method get_data()
restore data from hard drive to memory
Arguments
drop
whether to apply
drop
to the datagc_delay
seconds to delay the garbage collection
Method operate()
apply the tensor by anything along given dimension
Usage
Tensor$operate(
by,
fun = .Primitive("/"),
match_dim,
mem_optimize = FALSE,
same_dimension = FALSE
)
Examples
if(!is_on_cran()){
# Create a tensor
ts <- Tensor$new(
data = 1:18000000, c(3000,300,20),
dimnames = list(A = 1:3000, B = 1:300, C = 1:20),
varnames = c('A', 'B', 'C'))
# Size of tensor when in memory is usually large
# `lobstr::obj_size(ts)` -> 8.02 MB
# Enable hybrid mode
ts$to_swap_now()
# Hybrid mode, usually less than 1 MB
# `lobstr::obj_size(ts)` -> 814 kB
# Subset data
start1 <- Sys.time()
subset(ts, C ~ C < 10 & C > 5, A ~ A < 10)
#> Dimension: 9 x 300 x 4
#> - A: 1, 2, 3, 4, 5, 6,...
#> - B: 1, 2, 3, 4, 5, 6,...
#> - C: 6, 7, 8, 9
end1 <- Sys.time(); end1 - start1
#> Time difference of 0.188035 secs
# Join tensors
ts <- lapply(1:20, function(ii){
Tensor$new(
data = 1:9000, c(30,300,1),
dimnames = list(A = 1:30, B = 1:300, C = ii),
varnames = c('A', 'B', 'C'), use_index = 2)
})
ts <- join_tensors(ts, temporary = TRUE)
}
#> NOT_CRAN is TRUE/true (not on CRAN)