Skip to contents

Class definition for pipeline tools

Class definition for pipeline tools

Value

The value of the inputs, or a list if key is missing

The values of the targets

A PipelineResult instance if as_promise

or async is true; otherwise a list of values for input names

An environment of shared variables

See type

A table of the progress

Nothing

A new pipeline object based on the path given

the saved file path

the data if file is found or a default value

A persistent map, see rds_map

See also

Active bindings

settings_path

absolute path to the settings file

extdata_path

absolute path to the user-defined pipeline data folder

preference_path

directory to the pipeline preference folder

target_table

table of target names and their descriptions

result_table

summary of the results, including signatures of data and commands

pipeline_path

the absolute path of the pipeline

pipeline_name

the code name of the pipeline

Methods


Method new()

construction function

Usage

PipelineTools$new(
  pipeline_name,
  settings_file = "settings.yaml",
  paths = pipeline_root(),
  temporary = FALSE
)

Arguments

pipeline_name

name of the pipeline, usually in the pipeline 'DESCRIPTION' file, or pipeline folder name

settings_file

the file name of the settings file, where the user inputs are stored

paths

the paths to find the pipeline, usually the parent folder of the pipeline; default is pipeline_root()

temporary

whether not to save paths to current pipeline root registry. Set this to TRUE when importing pipelines from subject pipeline folders


Method set_settings()

set inputs

Usage

PipelineTools$set_settings(..., .list = NULL)

Arguments

..., .list

named list of inputs; all inputs should be named, otherwise errors will be raised


Method get_settings()

get current inputs

Usage

PipelineTools$get_settings(key, default = NULL, constraint)

Arguments

key

the input name; default is missing, i.e., to get all the settings

default

default value if not found

constraint

the constraint of the results; if input value is not from constraint, then only the first element of constraint will be returned.


Method read()

read intermediate variables

Usage

PipelineTools$read(var_names, ifnotfound = NULL, ...)

Arguments

var_names

the target names, can be obtained via x$target_table member; default is missing, i.e., to read all the intermediate variables

ifnotfound

variable default value if not found

...

other parameters passing to pipeline_read


Method run()

run the pipeline

Usage

PipelineTools$run(
  names = NULL,
  async = FALSE,
  as_promise = async,
  scheduler = c("none", "future", "clustermq"),
  type = c("smart", "callr", "vanilla"),
  envir = new.env(parent = globalenv()),
  callr_function = NULL,
  return_values = TRUE,
  ...
)

Arguments

names

pipeline variable names to calculate; default is to calculate all the targets

async

whether to run asynchronous in another process

as_promise

whether to return a PipelineResult instance

scheduler, type, envir, callr_function, return_values, ...

passed to pipeline_run if as_promise is true, otherwise these arguments will be passed to pipeline_run_bare


Method eval()

run the pipeline in order; unlike $run(), this method does not use the targets infrastructure, hence the pipeline results will not be stored, and the order of names will be respected.

Usage

PipelineTools$eval(names, env = parent.frame(), clean = TRUE, ...)

Arguments

names

pipeline variable names to calculate; must be specified

env

environment to evaluate and store the results

clean

whether to evaluate without polluting env

...

passed to pipeline_eval


Method shared_env()

run the pipeline shared library in scripts starting with path R/shared

Usage

PipelineTools$shared_env()


Method python_module()

get 'Python' module embedded in the pipeline

Usage

PipelineTools$python_module(
  type = c("info", "module", "shared", "exist"),
  must_work = TRUE
)

Arguments

type

return type, choices are 'info' (get basic information such as module path, default), 'module' (load module and return it), 'shared' (load a shared sub-module from the module, which is shared also in report script), and 'exist' (returns true or false on whether the module exists or not)

must_work

whether the module needs to be existed or not. If TRUE, the raise errors when the module does not exist; default is TRUE, ignored when type is 'exist'.


Method progress()

get progress of the pipeline

Usage

PipelineTools$progress(method = c("summary", "details"))

Arguments

method

either 'summary' or 'details'


Method attach()

attach pipeline tool to environment (internally used)

Usage

PipelineTools$attach(env)

Arguments

env

an environment


Method visualize()

visualize pipeline target dependency graph

Usage

PipelineTools$visualize(
  glimpse = FALSE,
  aspect_ratio = 2,
  node_size = 30,
  label_size = 40,
  ...
)

Arguments

glimpse

whether to glimpse the graph network or render the state

aspect_ratio

controls node spacing

node_size, label_size

size of nodes and node labels

...

passed to pipeline_visualize


Method fork()

fork (copy) the current pipeline to a new directory

Usage

PipelineTools$fork(path, filter_pattern = PIPELINE_FORK_PATTERN)

Arguments

path

path to the new pipeline, a folder will be created there

filter_pattern

file pattern to copy


Method with_activated()

run code with pipeline activated, some environment variables and function behaviors might change under such condition (for example, targets package functions)

Usage

PipelineTools$with_activated(expr, quoted = FALSE, env = parent.frame())

Arguments

expr

expression to evaluate

quoted

whether expr is quoted; default is false

env

environment to run expr


Method clean()

clean all or part of the data store

Usage

PipelineTools$clean(
  destroy = c("all", "cloud", "local", "meta", "process", "preferences", "progress",
    "objects", "scratch", "workspaces"),
  ask = FALSE
)

Arguments

destroy, ask

see tar_destroy


Method save_data()

save data to pipeline data folder

Usage

PipelineTools$save_data(
  data,
  name,
  format = c("json", "yaml", "csv", "fst", "rds"),
  overwrite = FALSE,
  ...
)

Arguments

data

R object

name

the name of the data to save, must start with letters

format

serialize format, choices are 'json', 'yaml', 'csv', 'fst', 'rds'; default is 'json'. To save arbitrary objects such as functions or environments, use 'rds'

overwrite

whether to overwrite existing files; default is no

...

passed to saver functions


Method load_data()

load data from pipeline data folder

Usage

PipelineTools$load_data(
  name,
  error_if_missing = TRUE,
  default_if_missing = NULL,
  format = c("auto", "json", "yaml", "csv", "fst", "rds"),
  ...
)

Arguments

name

the name of the data

error_if_missing

whether to raise errors if the name is missing

default_if_missing

default values to return if the name is missing

format

the format of the data, default is automatically obtained from the file extension

...

passed to loader functions


Method load_preferences()

load persistent preference settings from the pipeline. The preferences should not affect how pipeline is working, hence usually stores minor variables such as graphic options. Changing preferences will not invalidate pipeline cache.

Usage

PipelineTools$load_preferences(
  name,
  ...,
  .initial_prefs = list(),
  .overwrite = FALSE,
  .verbose = FALSE
)

Arguments

name

preference name, must contain only letters, digits, underscore, and hyphen, will be coerced to lower case (case-insensitive)

..., .initial_prefs

key-value pairs of initial preference values

.overwrite

whether to overwrite the initial preference values if they exist.

.verbose

whether to verbose the preferences to be saved; default is false; turn on for debug use


Method clone()

The objects of this class are cloneable with this method.

Usage

PipelineTools$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.