`calculate` machinery discussion/proposal

This is an attempt to flesh out what the calculation instructions and `calculate` itself would look like.

## Instruction set data structures

Each `calcInstr` would have:

- a list of `indexInstrVec` vectors taken from indexRanges in the calcRange (see below)
- `numSets`  # number of independent index sets (i.e., the length of the list of indexInstrVec's)
- `dim`  # total dimension of the indexing (often the same as the focal variable but not in some cases such as `y[i,2] ~ dnorm` or `y[i,1:3]~dmnorm`.
- `types`  # type of each indexRange (1=`seq`, 2=`matrix`, 3=`scalar`)
- `ndims`  # number of dimensions represented by each indexInstrVec (should sum to `dim`)
- `slots` # flattened vector of indexing slots represented by each indexInstrVec, e.g., (2,3,1) if first represents indices 2 and 3 and second represents 1; length of this should equal `dim`
- `lens`  # lengths of indexing represented by each indexInstrVec (e.g., 10 for `2:11` seq case, 3 for c(2,3,7) matrix case.
- `offsets` # `<start_index>-1` for seq cases and dummy value for matrix

If we decided for efficiency to flatten the above into an integer vector that should be straightforward albeit somewhat opaque and perhaps bug-prone.

Each `indexInstrVec` would be:

- for a sequence case, the starting value (which will also be placed in the `calcInstr` and therefore probably redundant). Note that the length would be in the `calcInstr`
- for a matrix case, the index values, flattened by row if representing more than one slot/dimension
- note that we could embed the relevant len, ndim, slot metadata info here as well, but I don't think it is necessary and having `calculate` get the info from the `calcInstr` saves a little bit of arithmetic in looking up index values
- the values would be a simple copy from the `calcRange$indexingRange$indexRange[[i]]$values` matrix, flattened by row

It should be trivial to make a function that takes a calcRange and converts to a calcInstr. 

Uncompiled (or called from R) will be able to handle input either as a calcRange or a calcInstr (presumably by internally doing the conversion to a calcInstr).

Note that scalar (a non-indexed variable) is not considered below. Perhaps it takes the form of a `seq` type but with the value 0 as a placeholder. Presumably trivial but perhaps fiddly.

## Basic form of calculate

For a 1-d sequence case, `calculate` looks like this:

```
for(i in 1:lens[1])
  calc_one(offsets[1]+i)
```

For a 1-d matrix case, it looks like this:

```
for(i in 1:lens[1])
  calc_one(indexInstrVec[[1]][i])
```

For a k-dim matrix case, it looks like this:

```
idx <- rep(0, dim)
for(i in 1:lens[1]) {
  for(p in 1:ndims[1]) 
    idx[slots[p]] <- indexInstrVec[p+(i-1)*ndims[1]]
  calc_one(idx)
}
```

## Generic form of `calculate`

To build up to the generic case, consider a 2-d case with 2 1-d indexInstrVec, which could either be seq or matrix cases

```
 for(i in 1:lens[1]) {
    if(type[1] == 1) idx[slots[1]] <- offsets[1]+i 
      else  idx[slots[1]] <- indexInstrVec[[1]][i]
        for(j in 1:lens[2]) {
            if(type[2] == 1) idx[slots[2]] <- offsets[2]+j 
              else  idx[slots[2]] <- indexInstrVec[[2]][j]
            calc_one(idx)
        }
```

Note that the `if` would appear in uncompiled run code and in the run code called from R since `type` (and other information) will not be known until run-time. In the fully compiled case, the `if` would be evaluated at compile-time during code generation. Alternatively, we could pre-define a bunch of variants, e.g., calc_seq_mat, calc_mat_seq, calc_seq_seq, cal_mat_mat (which could itself be done via code generation).

Finally we could define fully generic code for an arbitrary number of index sets with building blocks that look like this:

```
<insert outer loops>
    if(ID == 1) slotOffset <- 0 else slotOffset <- sum(ndim[1:(ID-1)]) ## Compile-time calculation and hard-coded insertion of slotOffset value
    for(i<ID> in 1:lens[ID]) {
        # compile-time `if` determination  
        if(type[ID] == 1) idx[slots[slotOffset+1]] <- offset[ID]+i<ID>  ## Compile-time insert value of `slots[slotOffset+1]`
        if(type[ID] == 1 && ndim[ID] == 1) idx[slots[slotOffset+1]] <- indexInstrVec[[ID]][i<ID>]
        if(type[ID] == 2 && ndim[ID] > 1) for(p in 1:ndim[ID]) idx[slots[slotOffset+p]] <- indexInstrVec[[ID]][ndim[ID]*(i<ID>-1)+p]
        if(ID == numSets) calc_one(idx)  # compile-time `if`
        <insert inner loops>
    }
```

For fully-compiled situations, we should be able to code generate all the loops, evaluating all the `if` statements and hard-coding in slot information.

One side note, it might be slightly more efficient to have all seq cases as the inner-most loops so that lookup into the `indexInstrVec` for the matrix case, which goes to memory (albeit mostly into the cache) is done as little as possible compared to indexing arithmetic used in seq case. At compile time, we could re-arrange the order of the index sets to achieve this. That said, I am somewhat doubtful it would make any real difference. Even with a long matrix case, the lookup is probably rather faster than the actual density calculations.

## calc_vec and calc_parallel

I am not clear on what the eigenized version of `calc_one` looks like or takes as input. But perhaps `calculate` looks like this:

```
block <- (offset[1]+1):(offset[1]+len[1])
calc_vec(block)
```

For `parallel_for`, we would probably want the largest `len[k]` case as the innermost loop and then to run code like:

```
parallel_for(i, 1:len[k], {calc_one(i+offset[k])}  # seq case
parallel_for(i, 1:len[k], {calc_one(indexInstrVec[[k]][i])}  # 1-d matrix case
```
That said, in a little bit of testing, when I nested parallel_for loops, TBB seemed to do a good job of not incurring any additional cost above the cost of simply having one 'optimal' parallel_for loop. So we could test having `parallel_for` for every loop.

## multivariate nodes

For simplicity, let's consider `y[i, 1:3] ~ dmnorm(...)`. In this case the calcRange has a single index range for `i`. `1:3` is baked into `calc_one`. So the `idx` passed to calc_one need only consider the first index. All the info needed should already be handled by the `calcRange` machinery and will be populated into the calcInstr. 

So everything should go through fine, except that `dim` and `slots` reallly refer to the dimension of the indexing and the slots of the indexing, not of a variable. (And a reminder that the indexing is not just about the node on which calculation is being done but any RHS nodes too.)




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`calculate` machinery discussion/proposal #150

Instruction set data structures

Basic form of calculate

Generic form of `calculate`

calc_vec and calc_parallel

multivariate nodes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

calculate machinery discussion/proposal #150

Description

Instruction set data structures

Basic form of calculate

Generic form of calculate

calc_vec and calc_parallel

multivariate nodes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`calculate` machinery discussion/proposal #150

Generic form of `calculate`