Skip to content

calculate machinery discussion/proposal #150

@paciorek

Description

@paciorek

This is an attempt to flesh out what the calculation instructions and calculate itself would look like.

Instruction set data structures

Each calcInstr would have:

  • a list of indexInstrVec vectors taken from indexRanges in the calcRange (see below)
  • numSets # number of independent index sets (i.e., the length of the list of indexInstrVec's)
  • dim # total dimension of the indexing (often the same as the focal variable but not in some cases such as y[i,2] ~ dnorm or y[i,1:3]~dmnorm.
  • types # type of each indexRange (1=seq, 2=matrix, 3=scalar)
  • ndims # number of dimensions represented by each indexInstrVec (should sum to dim)
  • slots # flattened vector of indexing slots represented by each indexInstrVec, e.g., (2,3,1) if first represents indices 2 and 3 and second represents 1; length of this should equal dim
  • lens # lengths of indexing represented by each indexInstrVec (e.g., 10 for 2:11 seq case, 3 for c(2,3,7) matrix case.
  • offsets # <start_index>-1 for seq cases and dummy value for matrix

If we decided for efficiency to flatten the above into an integer vector that should be straightforward albeit somewhat opaque and perhaps bug-prone.

Each indexInstrVec would be:

  • for a sequence case, the starting value (which will also be placed in the calcInstr and therefore probably redundant). Note that the length would be in the calcInstr
  • for a matrix case, the index values, flattened by row if representing more than one slot/dimension
  • note that we could embed the relevant len, ndim, slot metadata info here as well, but I don't think it is necessary and having calculate get the info from the calcInstr saves a little bit of arithmetic in looking up index values
  • the values would be a simple copy from the calcRange$indexingRange$indexRange[[i]]$values matrix, flattened by row

It should be trivial to make a function that takes a calcRange and converts to a calcInstr.

Uncompiled (or called from R) will be able to handle input either as a calcRange or a calcInstr (presumably by internally doing the conversion to a calcInstr).

Note that scalar (a non-indexed variable) is not considered below. Perhaps it takes the form of a seq type but with the value 0 as a placeholder. Presumably trivial but perhaps fiddly.

Basic form of calculate

For a 1-d sequence case, calculate looks like this:

for(i in 1:lens[1])
  calc_one(offsets[1]+i)

For a 1-d matrix case, it looks like this:

for(i in 1:lens[1])
  calc_one(indexInstrVec[[1]][i])

For a k-dim matrix case, it looks like this:

idx <- rep(0, dim)
for(i in 1:lens[1]) {
  for(p in 1:ndims[1]) 
    idx[slots[p]] <- indexInstrVec[p+(i-1)*ndims[1]]
  calc_one(idx)
}

Generic form of calculate

To build up to the generic case, consider a 2-d case with 2 1-d indexInstrVec, which could either be seq or matrix cases

 for(i in 1:lens[1]) {
    if(type[1] == 1) idx[slots[1]] <- offsets[1]+i 
      else  idx[slots[1]] <- indexInstrVec[[1]][i]
        for(j in 1:lens[2]) {
            if(type[2] == 1) idx[slots[2]] <- offsets[2]+j 
              else  idx[slots[2]] <- indexInstrVec[[2]][j]
            calc_one(idx)
        }

Note that the if would appear in uncompiled run code and in the run code called from R since type (and other information) will not be known until run-time. In the fully compiled case, the if would be evaluated at compile-time during code generation. Alternatively, we could pre-define a bunch of variants, e.g., calc_seq_mat, calc_mat_seq, calc_seq_seq, cal_mat_mat (which could itself be done via code generation).

Finally we could define fully generic code for an arbitrary number of index sets with building blocks that look like this:

<insert outer loops>
    if(ID == 1) slotOffset <- 0 else slotOffset <- sum(ndim[1:(ID-1)]) ## Compile-time calculation and hard-coded insertion of slotOffset value
    for(i<ID> in 1:lens[ID]) {
        # compile-time `if` determination  
        if(type[ID] == 1) idx[slots[slotOffset+1]] <- offset[ID]+i<ID>  ## Compile-time insert value of `slots[slotOffset+1]`
        if(type[ID] == 1 && ndim[ID] == 1) idx[slots[slotOffset+1]] <- indexInstrVec[[ID]][i<ID>]
        if(type[ID] == 2 && ndim[ID] > 1) for(p in 1:ndim[ID]) idx[slots[slotOffset+p]] <- indexInstrVec[[ID]][ndim[ID]*(i<ID>-1)+p]
        if(ID == numSets) calc_one(idx)  # compile-time `if`
        <insert inner loops>
    }

For fully-compiled situations, we should be able to code generate all the loops, evaluating all the if statements and hard-coding in slot information.

One side note, it might be slightly more efficient to have all seq cases as the inner-most loops so that lookup into the indexInstrVec for the matrix case, which goes to memory (albeit mostly into the cache) is done as little as possible compared to indexing arithmetic used in seq case. At compile time, we could re-arrange the order of the index sets to achieve this. That said, I am somewhat doubtful it would make any real difference. Even with a long matrix case, the lookup is probably rather faster than the actual density calculations.

calc_vec and calc_parallel

I am not clear on what the eigenized version of calc_one looks like or takes as input. But perhaps calculate looks like this:

block <- (offset[1]+1):(offset[1]+len[1])
calc_vec(block)

For parallel_for, we would probably want the largest len[k] case as the innermost loop and then to run code like:

parallel_for(i, 1:len[k], {calc_one(i+offset[k])}  # seq case
parallel_for(i, 1:len[k], {calc_one(indexInstrVec[[k]][i])}  # 1-d matrix case

That said, in a little bit of testing, when I nested parallel_for loops, TBB seemed to do a good job of not incurring any additional cost above the cost of simply having one 'optimal' parallel_for loop. So we could test having parallel_for for every loop.

multivariate nodes

For simplicity, let's consider y[i, 1:3] ~ dmnorm(...). In this case the calcRange has a single index range for i. 1:3 is baked into calc_one. So the idx passed to calc_one need only consider the first index. All the info needed should already be handled by the calcRange machinery and will be populated into the calcInstr.

So everything should go through fine, except that dim and slots reallly refer to the dimension of the indexing and the slots of the indexing, not of a variable. (And a reminder that the indexing is not just about the node on which calculation is being done but any RHS nodes too.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions