2020-03-19
future is introduced with the following summary:
The purpose of this package is to provide a lightweight and unified
future API for sequential and parallel processing of R expression via
futures. The simplest way to evaluate an expression in parallel is to
use x %<-% { expression
}
with plan(multiprocess)
.
This package implements sequential, multicore, multisession, and cluster
futures. With these, R expressions can be evaluated on the local
machine, in parallel a set of local machines, or distributed on a mix of
local and remote machines. Extensions to this package implement
additional backends for processing futures via compute cluster
schedulers etc. Because of its unified API, there is no need to modify
any code in order switch from sequential on the local machine to, say,
distributed processing on a remote compute cluster. Another strength of
this package is that global variables and functions are automatically
identified and exported as needed, making it straightforward to tweak
existing code to make use of futures.[1]}
futures are abstractions for values that may be available at some point in the future, taking the form of objects possessing state, being either resolved and therefore available immediately, or unresolved, wherein the process blocks until resolution.
futures find their greatest use when run asynchronously. The future package has the inbuilt capacity to resolve futures asynchronously, including in parallel and through a cluster, making use of the parallel package. This typically runs a separate process for each future, resolving separately to the current R session and modifying the object state and value according to it’s resolution status.
R lays open a powerful set of metaprogramming functions, which bear
similarity to future. R expressions can be captured in a quote()
, then
evaluated in an environment with eval()
at some point
in the future. Additionally, substitute()
substitutes any variables in the expression passed to it with the values
bound in an environment argument, thus allowing “non-standard
evaluation” in functions.
future offers a delay of evaluation as well, however such a delay is
not due to manual control of the programmer through eval()
functions and
the like, but due to background computation of an expression
instead.
Through substitution and quoting, R can, for example, run a console within the language. Futures allows the extension of this to a parallel evaluation scheme. lst. 1 gives a simple implementation of this idea: a console that accepts basic expressions, evaluating them in the background and presenting them upon request when complete. Error handling and shared variables are not implemented.
Listing 1: Usage of future to implement a basic multicore console
library(future)
multicore.console <- function(){
get.input <- function(){
cat("Type \"e\" to enter an expression for",
"evaluation \nand \"r\" to see",
"resolved expressions\n", sep="")
readline()
}
send.expr <- function(){
cat("Multicore Console> ")
input <- readline()
futs[[i]] <<- future(eval(str2expression(input)))
cat("\nResolving as: ", as.character(i), "\n")
}
see.resolved <- function(){
for (i in 1:length(futs)){
if (is(futs[[i]], "Future") &
resolved(futs[[i]])) {
cat("Resolved: ", as.character(i), " ")
print(value(futs[[i]]))
}
}
}
plan(multicore)
futs <- list()
i <- 1
while(TRUE){
input <- get.input()
if (input == "e") {
send.expr()
i <- i + 1
} else if (input == "r") {
see.resolved()
} else {
cat("Try again")
}
}
}
multicore.console()
apply
procedures, with a future backend
enabling parallel, cluster, and other functionality as enabled by
backends such as batchtools through future.batchtools.
One initial drawback to future is the lack of callback functionality, which would open enormous potential. However, this feature is made available in the promises package, which has been developed by Joe Cheng at RStudio, which allows for user-defined handlers to be applied to futures upon resolution[10].
Issues that aren’t resolved by other packages include the copying of objects referenced by future, with mutable objects thereby unable to be directly updated by future (though this may be ameliorated with well-defined callbacks). This also means that data movement is mandatory, and costly; future raises an error if the data to be processed is over 500Mb, though this can be overridden.
Referencing variables automatically is a major unsung feature of
future, though it doesn’t always work reliably; future relies on code
inspection, and allows a global
parameter to have manual variable specification.
It seems likely that the future package will have some value to it’s use, especially if asynchronous processing is required on the R end; it is the simplest means of enabling asynchrony in R without having to manipulate networks or threads.