Silent Spring Institute

subscribe via RSS, or follow us on Github.

Developer Blog

A Bayesian approach to modelling censored data

fitting models to censored data with STAN

For thise case, we can write Bayes formula as:

The two components in the numerator are:

The denominator, \( p(y^{\ast}) \), is the total likelihood of the data integrated over all hypotheses (that is, over all values of \( \mu \) and \( \sigma \)). This is the part of the formula that typically doesn’t have an analytical solution; in practice, numerical approximations need to be used.

Our job is to define the likelihood function and provide a prior distribution for \( \mu \) and \( \sigma \).

In the previous post, we derived the likelihood function for left-censored data:

where \( P \) is the log-normal probability density function and \( D \) is the log-normal cumulative density function.

The specification of a prior distribution depends on the application, and what is known about what reasonable values may be for \( \mu \) and \( \sigma \). In this case, let’s set a very weakly informative prior on both variables: \( \mu \sim Normal(0, 100) \) and \( \sigma \sim Normal(0, 100) \)

A STAN program that encodes our likelihood function and prior specifications:

This program is written to look like the math, so it doesn’t use some special syntax that makes the code shorter, and it isn’t as computationally efficient as it code be.

data {
  int<lower=0> N;
  real L; // censoring limit
  real<lower=L> y[N];
  int cens[N]; // -1 if left-censored, 0 if not censored
}
parameters {
  real mu;
  real<lower=0> sigma;
}
model {
  // priors
  mu ~ normal(0, 100);
  sigma ~ normal(0, 100);

  // likelihood
  for(i in 1:N) {
    if(cens[i] == -1) {
      // left-censored
      target += normal_cdf(L | mu, sigma);
    }
    else {
      // not censored
      target += normal_pdf(y[i] | mu, sigma);
    }
  }
}