2020-04-02
foreach introduces itself on CRAN with the following description:
Support for the foreach looping construct. Foreach is an idiom that allows for iterating over elements in a collection, without the use of an explicit loop counter. This package in particular is intended to be used for its return value, rather than for its side effects. In that sense, it is similar to the standard lapply function, but doesn’t require the evaluation of a function. Using foreach without side effects also facilitates executing the loop in parallel.[1]
From the user end, the package is conceptually simple, revolving entirely around a looping construct and the one-off backend registration.
The principal goal of the package, which it hasn’t strayed from, is the enabling of parallelisation through backend transparency within the foreach construct. Notably, more complex functionality, such as side effects and parallel recurrance, are not part of the package’s intention.
Thus, the primary driver for the practicality of the package, beyond the support offered for parallel backends, is the backends themselves, currently enabling a broad variety of parallel systems.
foreach is developed by Steve Weston and Hoong Ooi.
foreach doesn’t require setup for simple serial execution, but
parallel backends require registration by the user, typically with a
single function as in the registration for doParallel, registerDoParallel()
.
The syntax of foreach consists of a foreach()
function
call next to a %do%
operator, and
some expression to the right[2]. Without loss in generality, the
syntactic form is given in lst. 1.
Listing 1: Standard foreach syntax
foreach(i=1:n) %do% {expr}
The foreach()
function
can take other arguments including changing the means of combination
along iterations, whether iterations should be performed in order, as
well as the export of environmental variables and packages to each
iteration instance.
In addition to %do%
, other binary
operators can be appended or substituted. Parallel iteration is
performed by simply replacing %do%
with %dopar%
. Nested
loops can be created by inserting %:%
between main and
nested foreach functions, prior to the %do%
call[3].
The last step to composition of foreach as capable of list comprehension
is the filtering function %when%
, which
filters iterables based on some predicate to control evaluation.
The mechanism of action in foreach is often forgotten in the face of
the atypical form of the standard syntax. Going one-by-one, the foreach()
function
returns an iterable object, %do%
and derivatives
are binary functions operating on the iterable object returned by foreach()
on the
left, and the expression on the right; the rightmost expression is
simply captured as such in %do%
. Thus, the main
beast of burder is the %do%
function, where
the evaluation of the iteration takes place.
In greater detail, %do%
captures and
creates environments, enabling sequential evaluation. %dopar%
captures the
environment of an expression, as well taking as a formal parameter a
vector of names of libraries used in the expression, then passing that
to the backend, which will in turn do additional work on capturing
references to variables in expressions and adding them to evaluation
environment, as well as ensure packages are loaded on worker nodes.
%do%
and
%dopar%
, after
correct error checking, send calls to getDoSeq()
and getDoPar()
respectively, which return lists determined by the registered backend,
which contain a function used backend, used to operate on the main
expression along with other environmental data.
foreach depends strongly upon the iterators package, which gives the
ability to construct custom iterators. These custom iterators can be
used in turn with the foreach()
function,
as the interface to them is transparent.
The name of the package and function interface refer to the foreach
programming language construct,
present in many other languages. By definition, the foreach
construct performs traversal over
some collection, not necessarily requiring any traversal order. In this
case, the collection is an iterator object or an object coercible to
one, but in other languages with foreach as part of the core language,
such as python (whose for loop is actually only a foreach loop),
collections can include sets, lists, and a variety of other classes
which have an __iter__
and __next__
defined[4].
Due to the constraints imposed by a foreach construct, loop
optimisation is simplified relative to a for loop, and the lack of
explicit traversal ordering permits parallelisation, which is the
primary reason for usage of the foreach
package. The constraints are not
insignificant however, and they do impose a limit on what can be
expressed through their usage. Most notably, iterated functions, wherein
the function depends on it’s prior output, are not necessarily
supported, and certainly not supported in parallel. This is a result of
the order of traversal being undefined, and when order is essential to
maintain coherent state, as in iterated functions, the two concepts are
mutually exclusive.
In spite of the constraints, iterated functions can actually be emulated in foreach through the use of destructive reassignment within the passed expression, or through the use of stateful iterators. Examples of both are given in listings lst. 2 and lst. 3.
Listing 2: Serial iterated function through destructive reassignment
<- 10
x foreach(i=1:5) %do% {x <- x+1}
Listing 3: Serial iterated function through creation of a stateful iterator
<- function(start, to) {
addsone <- function(){
nextEl <<- start + 1
start if (start >= to) {
stop('StopIteration')
}
start}<- list(nextElem=nextEl)
obj class(obj) <- c('addsone', 'abstractiter', 'iter')
obj
}
<- addsone(10, 15)
it nextElem(it)
foreach(i = addsone(10, 15), .combine = c) %do% i
As alluded to earlier, the functionality breaks down when attempting to run them in parallel. Listings lst. 4 and lst. 5 demonstrate attempts to evaluate these iterated functions in parallel. They only return a list of 5 repetitions of the same “next” number, not iterating beyond it.
\begin{listing}
Listing 4: Parallel Iteration attempt through destructive reassignment
<- makeCluster(2)
cl ::registerDoParallel(cl)
doParallel<- 10
x foreach(i=1:5) %dopar% {x <- x+1}
Listing 5: Parallel Iteration attempt through a stateful iterator
::registerDoParallel
doParallelforeach(i = addsone(10, 15), .combine = c) %dopar% i
The key point of success in foreach is it’s backend extensibility,
without which, foreach would lack any major advantages over a standard
for
loop.
Other parallel backends are enabled through specific functions made available by the foreach package. The packages define their parallel evaluation procedures with reference to the iterator and accumulator methods from foreach.
%dorng%
[10].
foreach serves as an example of a well-constructed package supported by it’s transparency and extensibility.
For packages looking to provide any parallel capabilities, a foreach extension would certainly aid it’s potential usefulness and visibility.