2020-05-19

A *functional* iterative interface may be a suitable interface to the platform. An explicit interface has benefits over recursion in that R has little optimisation for recursion either through tail call elimination or call stack manipulation [1]. Being *functional* entails referential transparency, purity, and operational access through functions [2]. The benefits brought by referential transparency include most notably for this platform, sufficient encapsulation such that backends can be switched out without requiring alteration of the code [3]. Additionally, the design and terminology of the function offers a close mapping to the mathematical notation of a recurrence relation.

This can be contrasted with a non-functional approach, which in it’s most efficient form in R would entail defining an iterator object, which pulls the act of implementation closer to the details of object definition rather than, as on paper, the function definition [4].

This particular interface attempts to emulate notation of iterated functions of the following form:

*x*_{n + 1} = *x*_{n}

With the function body aiming to map as closely as possible a standard mathematical notion of iterated functions:

*f*^{0} = *i**d*_{x} *f*^{n + 1} = *f* ∘ *f*^{n}

A draft interface sees a function taking the following form:

`iter(x0, ..., P, fn)`

Where the formal parameters include,

`x0`

as the initial object to iterate upon;`...`

as additional objects to iterate upon;`P`

as the predicate to cease iteration upon evaluating false, and;`fn`

as the iterated function

A means of referring to prior returned objects of `fn`

within `P`

and `fn`

would have to be defined. This could take the form of reserved names such as `xn`

, `xn\_1`

, … `xn\_n`

to refer to the output of `fn`

, the prior output of `fn`

, and so on, respectively, as well as `x1`

, `x2`

, … `xn`

to refer to the output of the *n*th iteration. Memory could be spared through parsing `P`

and `fn`

and only keeping the value of those names which are referenced in `P`

and `fn`

.

Through this interface, iterated functions are clearly defined without the need for explicit loop control structures, and through the transparancy of the function, may swap in backend-specific mechanisms of action. This function may be defined as a generic, with appropriate backend-specific methods dispatched based on the class of `x0`

.

As an example of an iterated function, a GLM can be defined using this interface, as outlined in lst. 1:

Listing 1: GLM Implementation

```
# include definitions for X and y
<- function(X, B){
pr 1 / (1 + exp(-X %*% B))
}
<- function(B){
fn <- diag(as.vector(pr(X, B)))
W <- - t(X) %*% W %*% X
hessian <- X %*% B + solve(W) %*% (y - pr(X, B))
z <- solve(-hessian) %*% crossprod(X, W %*% z)
B
B
}
<- matrix(x(0,0))
x0
<- quote(colSums(xn_1 - xn)^2) > tolerance)
P
iter(x0, P, fn)
```

Internally, a while loop with predicate `P`

runs function `fn`

on it’s returned output, starting with `x0`

. A basic, non-generic implementation could take the form of lst. 2:

Listing 2: Implementation structure

```
<- function(x0, P, fn){
iter ModifyBaseEnv(P, fn)
while (P) {
<- fn(xn)
xn ModifyLoopEnv(P)
}
xn }
```

Note: - `ModifyBaseEnv(P, fn)`

is required in the case that `P`

or `fn`

references variables such as `xn\_1`

; In this case, the reserved names are linked to the initial input variables up to the n’th input variable. - `ModifyLoopEnv(P)`

is required in the same case as the previous, wherein the environment is modified to hold the additional variable references.

Optimisations can include inlining the function through substitution in order to minimise object movement.

Various backends could be used. As an example, sparklyr could be run through being passed an `x0`

of class `spark\_tbl`

, wherein the appropriate method would be dispatched. This method would enclose the `fn`

function calls within `mutate`

, and perform other enhancements such as caching directives. sparklyr could just as easily be bypassed, with some serialisation of the closure `fn`

performed and sent directly to Spark, if Spark allows for such an operation.

Explicit embarrassing parallel sections may be specified as additional arguments; With an appropriate backend, these parallel functions are changed to be processed in parallel, with the converted functions fed into the environment of the iterated function. As a corollary, the partitioning of data may be manually specified to ensure good performance under parallelisation.

Communicative parallelism is significantly more difficult to capture in the design of such an interface, with most parallel R packages ignoring such a construct [5]}, [6]}, [7]}, [8]. However, it is conceivable that such functions can also be delivered as formal parameters to the `iter`

function, though it is as yet unclear how to integrate this cleanly.

While this functional API does provide a simple abstract interface, it comes at the cost of lacking clear communication directives. Multivariable iterated functions are also difficult to manage; they can be contained in a list as a single formal parameter, but this limits the generic capability of the function to dispatch methods based on the class of the initial object `x0`

. Such a cost is bearable by some applications, but limits the platform excessively at this early stage; In addition, the necessity of reference to prior and named output has led to the usage of a set of reserved names, beginning to create something of a domain-specific language. This can be powerful if executed perfectly, but will be difficult to implement and may prove difficult to debug and make use of if not expertly implemented. it may be better to pursue this idea at a later point in time, pending development of the more primitive platform details.

[1]

R. C. Team, *R language definition*. 2020.

[2]

G. Cousineau and M. Mauny, *The functional approach to programming*. Cambridge University Press, 1998.

[3]

C. Strachey, “Fundamental concepts in programming languages,” *Higher-order and symbolic computation*, vol. 13, no. 1–2, pp. 11–49, 2000.

[4]

R. Analytics and S. Weston, *Iterators: Provides iterator construct*. 2019.

[5]

H. Bengtsson, *Future: Unified parallel and distributed processing in r for everyone*. 2020.

[6]

R. Core, *Package ’parallel’*. 2018.

[7]

N. Matloff, “Software alchemy: Turning complex statistical computations into embarrassingly-parallel ones,” *Journal of Statistical Software*, vol. 71, no. 4, pp. 1–15, 2016.

[8]

D. Vaughan and M. Dancho, *Furrr: Apply mapping functions in parallel using futures*. 2018.