A Bayesian approach to modelling censored data
fitting models to censored data with STAN
For thise case, we can write Bayes formula as:
The two components in the numerator are:
- The probability of the data given a \( \mu \) and \( \sigma \), also called the likelihood function: \( p(y|\mu,\sigma) \)
- The probability of a given \( \mu \) and \( \sigma \), before seeing any data; also called the prior likelihood: \( p(\mu, \sigma) \)
The denominator, \( p(y^{\ast}) \), is the total likelihood of the data integrated over all hypotheses (that is, over all values of \( \mu \) and \( \sigma \)). This is the part of the formula that typically doesn’t have an analytical solution; in practice, numerical approximations need to be used.
Our job is to define the likelihood function and provide a prior distribution for \( \mu \) and \( \sigma \).
In the previous post, we derived the likelihood function for left-censored data:
where \( P \) is the log-normal probability density function and \( D \) is the log-normal cumulative density function.
The specification of a prior distribution depends on the application, and what is known about what reasonable values may be for \( \mu \) and \( \sigma \). In this case, let’s set a very weakly informative prior on both variables: \( \mu \sim Normal(0, 100) \) and \( \sigma \sim Normal(0, 100) \)
A STAN program that encodes our likelihood function and prior specifications:
This program is written to look like the math, so it doesn’t use some special syntax that makes the code shorter, and it isn’t as computationally efficient as it code be.
data {
int<lower=0> N;
real L; // censoring limit
real<lower=L> y[N];
int cens[N]; // -1 if left-censored, 0 if not censored
}
parameters {
real mu;
real<lower=0> sigma;
}
model {
// priors
mu ~ normal(0, 100);
sigma ~ normal(0, 100);
// likelihood
for(i in 1:N) {
if(cens[i] == -1) {
// left-censored
target += normal_cdf(L | mu, sigma);
}
else {
// not censored
target += normal_pdf(y[i] | mu, sigma);
}
}
}