-
Notifications
You must be signed in to change notification settings - Fork 5
calculate machinery discussion/proposal #150
Description
This is an attempt to flesh out what the calculation instructions and calculate itself would look like.
Instruction set data structures
Each calcInstr would have:
- a list of
indexInstrVecvectors taken from indexRanges in the calcRange (see below) numSets# number of independent index sets (i.e., the length of the list of indexInstrVec's)dim# total dimension of the indexing (often the same as the focal variable but not in some cases such asy[i,2] ~ dnormory[i,1:3]~dmnorm.types# type of each indexRange (1=seq, 2=matrix, 3=scalar)ndims# number of dimensions represented by each indexInstrVec (should sum todim)slots# flattened vector of indexing slots represented by each indexInstrVec, e.g., (2,3,1) if first represents indices 2 and 3 and second represents 1; length of this should equaldimlens# lengths of indexing represented by each indexInstrVec (e.g., 10 for2:11seq case, 3 for c(2,3,7) matrix case.offsets#<start_index>-1for seq cases and dummy value for matrix
If we decided for efficiency to flatten the above into an integer vector that should be straightforward albeit somewhat opaque and perhaps bug-prone.
Each indexInstrVec would be:
- for a sequence case, the starting value (which will also be placed in the
calcInstrand therefore probably redundant). Note that the length would be in thecalcInstr - for a matrix case, the index values, flattened by row if representing more than one slot/dimension
- note that we could embed the relevant len, ndim, slot metadata info here as well, but I don't think it is necessary and having
calculateget the info from thecalcInstrsaves a little bit of arithmetic in looking up index values - the values would be a simple copy from the
calcRange$indexingRange$indexRange[[i]]$valuesmatrix, flattened by row
It should be trivial to make a function that takes a calcRange and converts to a calcInstr.
Uncompiled (or called from R) will be able to handle input either as a calcRange or a calcInstr (presumably by internally doing the conversion to a calcInstr).
Note that scalar (a non-indexed variable) is not considered below. Perhaps it takes the form of a seq type but with the value 0 as a placeholder. Presumably trivial but perhaps fiddly.
Basic form of calculate
For a 1-d sequence case, calculate looks like this:
for(i in 1:lens[1])
calc_one(offsets[1]+i)
For a 1-d matrix case, it looks like this:
for(i in 1:lens[1])
calc_one(indexInstrVec[[1]][i])
For a k-dim matrix case, it looks like this:
idx <- rep(0, dim)
for(i in 1:lens[1]) {
for(p in 1:ndims[1])
idx[slots[p]] <- indexInstrVec[p+(i-1)*ndims[1]]
calc_one(idx)
}
Generic form of calculate
To build up to the generic case, consider a 2-d case with 2 1-d indexInstrVec, which could either be seq or matrix cases
for(i in 1:lens[1]) {
if(type[1] == 1) idx[slots[1]] <- offsets[1]+i
else idx[slots[1]] <- indexInstrVec[[1]][i]
for(j in 1:lens[2]) {
if(type[2] == 1) idx[slots[2]] <- offsets[2]+j
else idx[slots[2]] <- indexInstrVec[[2]][j]
calc_one(idx)
}
Note that the if would appear in uncompiled run code and in the run code called from R since type (and other information) will not be known until run-time. In the fully compiled case, the if would be evaluated at compile-time during code generation. Alternatively, we could pre-define a bunch of variants, e.g., calc_seq_mat, calc_mat_seq, calc_seq_seq, cal_mat_mat (which could itself be done via code generation).
Finally we could define fully generic code for an arbitrary number of index sets with building blocks that look like this:
<insert outer loops>
if(ID == 1) slotOffset <- 0 else slotOffset <- sum(ndim[1:(ID-1)]) ## Compile-time calculation and hard-coded insertion of slotOffset value
for(i<ID> in 1:lens[ID]) {
# compile-time `if` determination
if(type[ID] == 1) idx[slots[slotOffset+1]] <- offset[ID]+i<ID> ## Compile-time insert value of `slots[slotOffset+1]`
if(type[ID] == 1 && ndim[ID] == 1) idx[slots[slotOffset+1]] <- indexInstrVec[[ID]][i<ID>]
if(type[ID] == 2 && ndim[ID] > 1) for(p in 1:ndim[ID]) idx[slots[slotOffset+p]] <- indexInstrVec[[ID]][ndim[ID]*(i<ID>-1)+p]
if(ID == numSets) calc_one(idx) # compile-time `if`
<insert inner loops>
}
For fully-compiled situations, we should be able to code generate all the loops, evaluating all the if statements and hard-coding in slot information.
One side note, it might be slightly more efficient to have all seq cases as the inner-most loops so that lookup into the indexInstrVec for the matrix case, which goes to memory (albeit mostly into the cache) is done as little as possible compared to indexing arithmetic used in seq case. At compile time, we could re-arrange the order of the index sets to achieve this. That said, I am somewhat doubtful it would make any real difference. Even with a long matrix case, the lookup is probably rather faster than the actual density calculations.
calc_vec and calc_parallel
I am not clear on what the eigenized version of calc_one looks like or takes as input. But perhaps calculate looks like this:
block <- (offset[1]+1):(offset[1]+len[1])
calc_vec(block)
For parallel_for, we would probably want the largest len[k] case as the innermost loop and then to run code like:
parallel_for(i, 1:len[k], {calc_one(i+offset[k])} # seq case
parallel_for(i, 1:len[k], {calc_one(indexInstrVec[[k]][i])} # 1-d matrix case
That said, in a little bit of testing, when I nested parallel_for loops, TBB seemed to do a good job of not incurring any additional cost above the cost of simply having one 'optimal' parallel_for loop. So we could test having parallel_for for every loop.
multivariate nodes
For simplicity, let's consider y[i, 1:3] ~ dmnorm(...). In this case the calcRange has a single index range for i. 1:3 is baked into calc_one. So the idx passed to calc_one need only consider the first index. All the info needed should already be handled by the calcRange machinery and will be populated into the calcInstr.
So everything should go through fine, except that dim and slots reallly refer to the dimension of the indexing and the slots of the indexing, not of a variable. (And a reminder that the indexing is not just about the node on which calculation is being done but any RHS nodes too.)