Norsk Regnesentral

Appendix: Gibbs-sampling of Linear Mixed-Effects Models

This appendix describes how to perform single-site Gibbs-sampling in the Linear Mixed-Effects Model used in [3]. The reason of not applying standard software and techniques in this case, is the huge dimension of the data. Single site Gibbs-sampling is a particular algorithm in a wider class of algorithms which is suitable for such models, named Markov chain Monte Carlo (MCMC) methods. For an introduction to MCMC methodology, see for example [1,2].

The model discussed in the paper, is of the following form

$\displaystyle %%
\log_2 y_{ijklmn}$ $\textstyle =$ $\displaystyle \mu + A_i + C_j + B_k + D_l + P_m + G_n +$  
  $\textstyle +$ $\displaystyle AD_{il} + CP_{jm} + GP_{nm} + CG_{jn} + AG_{in} +
DG_{ln}+CGP_{jmn}$  
  $\textstyle +$ $\displaystyle \epsilon_{ijklmn}$ (1)

for $i=1, \ldots, 12$, $j=1,2$, $k=1,\ldots, 12$, $l=1,2$, $m=1,2$, $n=1, \ldots, 10759$. The measurement errors $\{\epsilon_{ijklmno}\}$ are modeled as independent zero mean Gaussian variables with variance $\sigma^{2}_{\epsilon}$.

Only the cell line effect ($C_j$ ), the dye effect ($D_l$ ), the protocol effect ($P_n$) and the cell line protocol interaction ($CP_{jn}$) are modeled to be fixed effects (since the first three have only two levels and the last is a combination of two fixed effects). $CP_{jm}$ is confounded with $C_j$ and $P_m$ , so we only estimate $CP_{22}$. All other effects are modeled as random effects. This means that $\{A_i\}$ are modeled as independent Gaussian random variables with zero mean and (unknown) variance $\sigma^{2}_A$, and similarly for all the other random effects.

The model defined in Eq. (1) is quite involved, so we will discuss how to perform Gibbs-sampling using this very simple model instead,

\begin{displaymath}
%%
y_{ij} = \mu + r_j + \epsilon_{ij}.
\end{displaymath} (2)

Here, $i=1,\ldots, N$, $j=1, \ldots, M$, $\mu$ is the overall mean, $r_1, \ldots, r_M$ are Gaussian random effects with variance $\sigma^{2}_r$, and $\epsilon_{ij}$ are independent Gaussian noise with variance $\sigma^{2}_{\epsilon}$. The Gibbs-sampling algorithm which analyze Eq. (2) is then straight forward in theory to extend to the model defined in Eq. (1).

The unknown parameters in Eq. (2) are

  • $\mu$, the overall mean
  • $r_1, \ldots, r_M$, the random effect
  • $\sigma^{2}_r$, the variance for the random effect
  • $\sigma^{2}_{\epsilon}$, the variance for the measurement error
We assign a uniform prior to $\mu$, and vague inverse Gamma-priors to $\sigma^{2}_r$ and $\sigma^{2}_{\epsilon}$. Due to conjugacy, then also the full conditional1 for $\mu$ and each of $r_j$ are Gaussian and for $\sigma^{2}_r$ and $\sigma^{2}_{\epsilon}$ are inverse Gamma. Our single-site Gibbs-sampler algorithm is then as follows:
\begin{algorithmic}[0]
\STATE Set $\mu = 0$ \FOR{$j=1, \ldots, M$} \STATE Set $...
...R
\STATE Monitor all variables and update statistics
\ENDFOR
\end{algorithmic}

In the entry ``Monitor all variables and update statistics'', we compute the mean of $\mu$ for example, to obtain after a huge number of iterations (N_Iterations=$100 000$, for example), our (posterior mean) estimate of $\mu$. We usually start collecting statistics after, say $10\%$ of the iterations are done, to avoid any effect of the initial values.

Bibliography

1
W. R. Gilks, S. Richardson, and D. J. Spiegelhalter.
Markov Chain Monte Carlo in Practice.
London: Chapman & Hall, 1996.

2
C. P. Robert and G. Casella.
Monte Carlo statistical methods.
Springer-Verlag New York, 1999.

3
V.  Nygaard et. al.
Effects of mRNA amplification on gene expression ratios in cDNA experiments estimated by analysis of variance.


Footnotes

... conditional1
The full conditional for $\mu$, say, is the conditional density for $\mu$ given all other variables and data.
Anders Løland 2003-01-07