Praise for statistical developers

Those that write statistical software sit at the intersection of two difficult disciplines. They are heroes.

Photo by Benjamin Davies on Unsplash

Statistics is hard. Computer programming is hard. To write accurate and reusable statistical software, you have to be good at both. And if you can do both, people should be high-fiving you in the street and gifting you Iberian hams.

All hail the statistical developers

How many statisticians in their careers will need to fit a mixed-effects model? Almost all. How many will try different covariance structures or generalisations for binary and/or count data? Most. How many could write the code to fit such a model? Barely any.

I do not mean how many statisticians could use existing tools to fit such a model. I mean how many statisticians, starting from scratch, facing an empty text file, writing R or C++ or Python or Julia or whatever, would have a cat in hell’s chance of producing code to fit a mixed-effects model with all the bells and whistles? I wouldn’t. The mere idea makes me want to lay down.

Chatting with a colleague, Nico Kist recently, he floated the idea that researchers need to be careful when they develop tools, because programmers are somehow perceived as less important than other researchers, like coding is the pursuit of those that cannot do proper research. Desk flesh. Code monkeys. Give them a giant bag of Cheesie Wotsits and stick on a Spiderman DVD and they will leave you alone.

And I agreed with him. There is career risk, I think, in being seen as a programmer / researcher rather than out-and-out researcher. Perhaps there is some faulty application of zero-sum logic that diminishes your perceived talents if you also understand object oriented programming.

But this should not be the case! Nobody has done more to improve R in the last 10 years than Hadley Wickham. R was, to be blunt, an awkward language to use. The tidyverse has made modern R an actual joy to behold and something I recommend without qualification to every analyst.

Likewise, Stan has completely changed the game in statistical modelling. The broad suite of packages in the Stan ecosystem has shoved us all miles forwards in statistical modelling, model checking, visualisation, model combination, probabilistic decision making, etc. I was so blown away when I realised the utterly bewildering array of statistical models that could be fit using brms that I wanted to carry the author, Paul-Christian Bürkner, aloft on a makeshift throne, like C3PO and the Ewoks at the end of Return of the Jedi.

Looking at the R packages I have used in the last month, I have to say to the authors of dplyr, ggplot2, rlang, R6, httr, magrittr, glue, tibble, purrr, furrr, tidyselect, withr, Rcpp, RcppParallel, rstan, rstanarm, loo, brms, testthat, curl, jsonlite, V8, tidybayes, lattice, here, DoseFinding, patchwork, xtable, gtsummary, lme4, binom, broom, broom.mixed, RBesT, tidyr, forcats, lubridate, readxl, pwr, and devtools that you truly are the best amongst us. I owe you a ham.

Avatar
Kristian Brock
Statistical Consultant

I am a clinical trial methodology statistician that likes to use Bayesian statistics.