Implementing the core algorithm in C++ and interfacing with it through [Rcpp](https://www.rcpp.org/) could result in significant performance gains.