# Fitting the Emax Model in R

Emax is an awesome, flexible non-linear model for estimating dose-response curves. Come learn how to fit it in R. Photo by adrian on Unsplash

# Introduction

Let’s say we have some outcomes observed at different doses or exposures of some intervention. Those data might look like this:

library(ggplot2)
library(dplyr)

df <- tibble(
Exposure = c(10, 25, 50, 75, 100, 150, 300, 400),
Response = c(0.03, 0.04, 0.15, 0.12, 0.25, 0.43, 0.54, 0.47)
)

df %>%
ggplot(aes(x = Exposure, y = Response)) +
geom_point() Exposure-response relationships are incredibly common in medical research. They do not just arise in phase I trials.

The response variable could be interpreted as:

• the average level of target inhibition;
• the average concentration in serum;
• the percentage of subjects in a sample experiencing an event.

Likewise, the exposure variable could be interpreted as:

• the prevalence of some characteristic in a population;
• time spent in a certain state;
• the quantity of molecule popped into a patient’s mouth.

So how would we analyse data like this? An ordinary linear model looks a bit of a stretch because the response variable appears to stop increasing at high exposures. A generalised linear model (GLM) might be a fruitful approach.

Here is a logit model fit to the data:

library(broom)

df %>%
glm(data = ., formula = Response ~ Exposure, family = binomial('logit')) %>%
augment() %>%
mutate(Response = gtools::inv.logit(.fitted)) %>%
ggplot(aes(x = Exposure, y = Response)) +
geom_line() +
geom_point(data = df) and here is a probit model:

df %>%
glm(data = ., formula = Response ~ Exposure, family = binomial('probit')) %>%
augment() %>%
mutate(Response = gtools::inv.logit(.fitted)) %>%
ggplot(aes(x = Exposure, y = Response)) +
geom_line() +
geom_point(data = df) These fits are awful. Both of these GLM approaches suffer from the same simple problem; they assume that the event probability tends to 1 as the linear predictor tends to infinity. Put another way, there is no way for the response curve to asymptote to some value other than 1.0. This bakes into the analysis that an event probability of 1.0 is not only possible, but guaranteed, given high enough exposure. In many situations, this assumption is inappropriate.

So what might we do instead?

## The Emax Model

Emax is a non-linear model for estimating dose-response curves.

The general model form is:

$R_i = E_o + \frac{D_i^N \times E_{max}}{D_i^N + {ED}_{50}^N}$

where:

• $$R_i$$ is the response for experimental unit $$i$$;
• $$D_i$$ is the exposure (or dose) of experimental unit $$i$$;
• $$E_0$$ is the expected response when exposure is zero, or the zero-dose effect, or the basal effect;
• $$E_{max}$$ is the maximum effect attributable to exposure;
• $$ED_{50}$$ is the exposure that produces half of $$E_{max}$$;
• $$N > 0$$ is the slope factor, determining the steepness of the dose-response curve.

Macdougall (2006) gives a fantastic introduction to the method, providing excellent interpretation of the parameters and summaries of the main model extensions. We have borrowed here their notation and elementary explanation of model terms.

The model variant above is called the sigmoidal Emax model. The variant with $$N$$ fixed to take the value 1 is called the hyperbolic model.

Fitting this model to some data means estimating values for the free parameters, $$E_0, E_{max}, ED_{50}$$, and possibly $$N$$, conditional on the observed $$R_i$$ and $$D_i$$.

Fortunately, there are packages in R that will fit this model. We introduce maximum likelihood and Bayesian approaches below.

## Maximum likelihood methods

Bornkamp (2019) provides a function in the DoseFinding package for fitting both the hyperbolic and sigmoidal model variants.

We demonstrate first the hyperbolic approach, fitting it to our manufactured dataset:

library(DoseFinding)

emax0 <- fitMod(Exposure, Response, data = df,  model = "emax")
plot(emax0) By default, the package uses lattice-type graphics. We see that the model fit is much better than the GLM approaches above. To see the estimated parameters, we run:

summary(emax0)
## Dose Response Model
##
## Model: emax
## Fit-type: normal
##
## Residuals:
##      Min       1Q   Median       3Q      Max
## -0.08579 -0.03974  0.00223  0.02980  0.08854
##
## Coefficients with approx. stand. error:
##      Estimate Std. Error
## e0     -0.052      0.079
## eMax    0.827      0.172
## ed50  162.962    117.031
##
## Residual standard error: 0.0698
## Degrees of freedom: 5

We now fit the sigmoidal model. This simply involves calling the same function with a different model parameter:

emax1 <- fitMod(Exposure, Response, data = df,  model = "sigEmax")
plot(emax1) It is clear that the extra parameter dramatically improves the model fit. Glimpsing the parameter values:

summary(emax1)
## Dose Response Model
##
## Model: sigEmax
## Fit-type: normal
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -0.0721 -0.0154  0.0120  0.0251  0.0389
##
## Coefficients with approx. stand. error:
##      Estimate Std. Error
## e0     0.0594     0.0324
## eMax   0.4515     0.0535
## ed50 107.0215    11.5670
## h      4.1370     1.6398
##
## Residual standard error: 0.0497
## Degrees of freedom: 4

we learn that the $$N$$ parameter (which they call h to keep us on our toes) is not particularly consistent with the value 1. The extra degree of freedom has roughly halved the estimated value of $$E_{max}$$ and more than halved the associated standard error.

## Bayesian methods

The rstanemax package by Yoshida (2019) implements a Bayesian version of Emax, offloading MCMC sampling to the miracle software Stan (Carpenter et al. 2016).

Fitting the sigmoidal model is as simple as running code like:

library(rstanemax)

stan1 <- stan_emax(Response ~ Exposure, data = df, gamma.fix = NULL,
seed = 12345, cores = 4, control = list(adapt_delta = 0.95))
stan1
## Inference for Stan model: emax.
## 4 chains, each with iter=2000; warmup=1000; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=4000.
##
##         mean se_mean    sd  2.5%    25%    50%    75%  97.5% n_eff Rhat
## emax    0.51    0.00  0.12  0.35   0.44   0.48   0.55   0.86   681    1
## e0      0.04    0.00  0.04 -0.05   0.02   0.05   0.07   0.12  1409    1
## ec50  122.33    1.76 45.01 81.81 100.91 110.81 124.89 257.70   651    1
## gamma   3.66    0.06  2.03  0.96   2.18   3.32   4.74   8.67  1220    1
## sigma   0.07    0.00  0.03  0.03   0.05   0.06   0.08   0.14   923    1
##
## Samples were drawn using NUTS(diag_e) at Wed May 13 19:15:55 2020.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).

The $$N$$ parameter in this package is referred to as gamma (still with me?) and the gamma.fix = NULL argument causes the variable to be estimated from the data. The argument name suggests you can provide your own fixed value - I have not investigated. To fit the hyperbolic version, you just omit the gamma.fix argument.

We see that there is a little bit of difference in each of the estimated variables but nothing to get alarmed about.

The package also provides a nice way of fetching predicted values:

samp <- rstanemax::posterior_predict(
stan1, returnType = "tibble",
newdata = tibble(exposure = seq(0, 400, length.out = 100)))

samp %>% head(10) %>% knitr::kable()
mcmcid exposure emax e0 ec50 gamma sigma respHat response
100 0.000000 0.1712489 0.0718365 89.70576 2.741876 0.1903292 0.07183650 -0.2860366
100 4.040404 0.1712489 0.0718365 89.70576 2.741876 0.1903292 0.07187133 0.3240279
100 8.080808 0.1712489 0.0718365 89.70576 2.741876 0.1903292 0.07206919 0.2041787
100 12.121212 0.1712489 0.0718365 89.70576 2.741876 0.1903292 0.07254184 0.0818142
100 16.161616 0.1712489 0.0718365 89.70576 2.741876 0.1903292 0.07338111 -0.0172850
100 20.202020 0.1712489 0.0718365 89.70576 2.741876 0.1903292 0.07466294 0.1978203
100 24.242424 0.1712489 0.0718365 89.70576 2.741876 0.1903292 0.07644671 0.2342920
100 28.282828 0.1712489 0.0718365 89.70576 2.741876 0.1903292 0.07877353 0.0095302
100 32.323232 0.1712489 0.0718365 89.70576 2.741876 0.1903292 0.08166463 -0.0747148
100 36.363636 0.1712489 0.0718365 89.70576 2.741876 0.1903292 0.08512041 -0.0137214

This facilitates my favourite method of visualising Bayesian model inferences: overplotting dots and lines.

E.g. how does the expected curve look?

samp %>% head(100 * 150) %>%
rename(Exposure = exposure, Response = respHat) %>%
ggplot(aes(x = Exposure, y = Response)) +
geom_point(data = df) +
geom_line(aes(group = mcmcid), alpha = 0.1, col = 'purple') +
ylim(0, NA) Each of the 150 purple lines above is a candidate for the mean exposure-response curve (or dose-response curve, if you prefer), hence the column name respHat.

The subject-level predictions (i.e. the predictions with added noise) are also available in the samp object via the response column. For instance, if you wanted to infer on the distribution of responses for a single unit (patient, dog, country, whatever) with given exposure, this is the distribution you would be using:

samp %>% head(100 * 100) %>%
rename(Exposure = exposure, Response = response) %>%
ggplot(aes(x = Exposure, y = Response)) +
geom_point(alpha = 0.1) +
ylim(0, NA) In the plot above, responses are predicted at each of one hundred evenly-spaced exposures. At each exposure, there are 100 points shown - i.e. each vertical tower contains exactly 100 points. Thus, each dot represents a percentile.

That is a bloody nice plot, isn’t it? Time to sign off.

# Next steps

In a forthcoming post I will fit the Emax model to outcomes in the dataset of phase I outcomes of Brock et al. (2019) introduced in this post. Til then.