can store on hard drive, and read slices of GB-level data in seconds
Value
self
the sliced data
a data frame with the dimension names as index columns and
value_name as value column
original array
the collapsed data
Public fields
dimdimension of the array
dimnamesdimension names of the array
use_indexwhether to use one dimension as index when storing data as multiple files
hybridwhether to allow data to be written to disk
last_usedtimestamp of the object was read
temporarywhether to remove the files once garbage collected
Active bindings
varnamesdimension names (read-only)
read_onlywhether to protect the swap files from being changed
swap_filefile or files to save data to
Methods
Method print()
print out the data dimensions and snapshot
Method .use_multi_files()
Internally used, whether to use multiple files to cache data instead of one
Method new()
constructor
Usage
Tensor$new(
data,
dim,
dimnames,
varnames,
hybrid = FALSE,
use_index = FALSE,
swap_file = temp_tensor_file(),
temporary = TRUE,
multi_files = FALSE
)Arguments
datanumeric array
dimdimension of the array
dimnamesdimension names of the array
varnamescharacters, names of
dimnameshybridwhether to enable hybrid mode
use_indexwhether to use the last dimension for indexing
swap_filewhere to store the data in hybrid mode files to save data by index; default stores in
raveio_getopt('tensor_temp_path')temporarywhether to remove temporary files when existing
multi_filesif
use_indexis true, whether to use multiple
Method subset()
subset tensor
Usage
Tensor$subset(..., drop = FALSE, data_only = FALSE, .env = parent.frame())Arguments
...dimension slices
dropwhether to apply
dropon subset datadata_onlywhether just return the data value, or wrap them as a
Tensorinstance.envenvironment where
...is evaluated
Method to_swap()
Serialize tensor to a file and store it via
write_fst
Method to_swap_now()
Serialize tensor to a file and store it via
write_fst immediately
Method get_data()
restore data from hard drive to memory
Arguments
dropwhether to apply
dropto the datagc_delayseconds to delay the garbage collection
Method operate()
apply the tensor by anything along given dimension
Usage
Tensor$operate(
by,
fun = .Primitive("/"),
match_dim,
mem_optimize = FALSE,
same_dimension = FALSE
)Examples
if(!is_on_cran()){
# Create a tensor
ts <- Tensor$new(
data = 1:18000000, c(3000,300,20),
dimnames = list(A = 1:3000, B = 1:300, C = 1:20),
varnames = c('A', 'B', 'C'))
# Size of tensor when in memory is usually large
# `lobstr::obj_size(ts)` -> 8.02 MB
# Enable hybrid mode
ts$to_swap_now()
# Hybrid mode, usually less than 1 MB
# `lobstr::obj_size(ts)` -> 814 kB
# Subset data
start1 <- Sys.time()
subset(ts, C ~ C < 10 & C > 5, A ~ A < 10)
#> Dimension: 9 x 300 x 4
#> - A: 1, 2, 3, 4, 5, 6,...
#> - B: 1, 2, 3, 4, 5, 6,...
#> - C: 6, 7, 8, 9
end1 <- Sys.time(); end1 - start1
#> Time difference of 0.188035 secs
# Join tensors
ts <- lapply(1:20, function(ii){
Tensor$new(
data = 1:9000, c(30,300,1),
dimnames = list(A = 1:30, B = 1:300, C = ii),
varnames = c('A', 'B', 'C'), use_index = 2)
})
ts <- join_tensors(ts, temporary = TRUE)
}
#> NOT_CRAN is TRUE/true (not on CRAN)