---
title: "nlraa: Models and Functions for Mixed Models"
author: "Fernando Miguez"
date: "`r Sys.Date()`"
fig_width: 6
fig_height: 4
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{nlraa: Models and Functions for Mixed Models}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(nlme)
library(nlraa)
library(ggplot2)
library(knitr)
```

# Intro

The puspose of this note is to make a connection between R functions and the mathematical notation of linear and (non)linear mixed models. One goal is to produce a table that maps R functions with notation for these models.

This document is a work in progress.

## Linear Model

A linear model can be expressed as 

$$
y = X \beta + \epsilon
$$

Typically, we assume that the errors are normally distributed, independent and they have constant variance.

$$
\epsilon \sim N(0, \sigma^2)
$$

Note: The covariance of the errors $Cov(\mathbf{\epsilon})$ where $\mathbf{\epsilon}$ is an $n \times 1$ vector is $\sigma^2 \mathbf{I}_n$.

In R we can fit the following model

```{r lm-1}
x <- rnorm(20)
y <- 1 + 2 * x + rnorm(20, sd = 0.5)
plot(x, y)
fit <- lm(y ~ x)
```

In order to extract the estimate for $\beta$, this is $\hat{\beta}$, we can use the function *coef*. In order to extract the estiamte for $\sigma$ (again, strictly $\hat{\sigma}$), we can use the function *sigma*. In order to extract the covariance of $\hat{\beta}$, this is $Cov(\hat{\beta})$, we can use the function *vcov*.

The covariance for $\hat{\beta}$ is defined as 

$$
Cov(\hat{\beta}) = \hat{\sigma}^2 (\mathbf{X'X})^{-1}
$$

Obtaining this matrix is important for simulation as it provides information on the variances for the parameters in the model as well as the covariances. 

```{r lm-coef}
## Print beta hat
coef(fit)
## Print sigma
sigma(fit)
## Print the covariance matrix of beta hat
vcov(fit)
```

There are other functions in R which are also useful. The function *fitted* extracts the fitted values or $\hat{y} = \mathbf{X}\hat{\beta}$, *model.matrix* extracts $\mathbf{X}$ and *residuals* extracts $\hat{e}$ (or *resid*). 

In addition, in the nlraa package I have the function 'var_cov' which extracts the complete variance of the residuals. In this case it is a simple diagonal matrix multiplied by $\hat{\sigma}^2$.

```{r lm-var_cov}
## Extract the variance covariance of the residuals (which is also of the response)
lm.vc <- var_cov(fit)
## Print just the first 5 observations
round(lm.vc[1:5,1:5],2)
## Visualize the matrix. This is a 20 x 20 matrix.
## It is easier to visualize in the log-scale
## Note: log(0) becomes -Inf, but not shown here
image(log(lm.vc[,ncol(lm.vc):1]))
```

In the previous image, each orange square represents the estimate of the residual variance ($\sigma^2$) and there are 20 squares because there are 20 observations. 

It might seem silly for a model such as this one to extract a matrix which is simlpy a diagonal identity matrix multiplied by the scalar $\hat{\sigma}^2$, but this will become more clear as we move into more complex models.

So far,

```{r lm-table, echo = FALSE}
dat <- data.frame(r.function = "lm", beta = "coef", 
                  cov.beta = "vcov", sigma = "sigma",
                  X = "model.matrix", y.hat = "fitted",
                  e = "residuals", cov.e = "var_cov")
kable(dat)
```

## Generalized least squares (gls)

Note: The models that can be fitted using *gls* should not be confused with the ones fitted using *glm*, which are generalized linear models.

The function *gls* in the *nlme* package fits linear models, but in which the covariance matrix of the residuals (also called errors) is more flexible.

The model is still

$$
y = X \beta + \epsilon
$$

But now the errors have a more flexible covariance structure.

$$
\epsilon \sim N(0, \Sigma)
$$

How do we extract $\Sigma$ from a fitted model?

We will again use the function *var_cov* (there is a function in nlme called *getVarCov*, but there are many cases which it does not handle).

For the next example I will use the ChickWeight dataset.

```{r gls-chickweight}
## ChickWeight example
data(ChickWeight)
ggplot(data = ChickWeight, aes(Time, weight)) + geom_point()
```

Clearly, the variance increases as the weight increases and it would be a good approach to consider this in the modeling process. The code below fits the variance of the residuals (or the response) as a function of the fitted values. We could choose and evaluate different variance functions and determine which one is better, but for the purpose of illustrating the connection between R functions and mathematical notation this is enough.

```{r gls-chickweight-weights}
## One possible model is
fit.gls <- gls(weight ~ Time, data = ChickWeight,
               weights = varPower())
fit.gls.vc <- var_cov(fit.gls)
## Note: the function getVarCov fails for this object
## Visualize the first few observations
fit.gls.vc[1:10,1:10]
## Store for visualization
## only the first 12 observations
vc2 <- fit.gls.vc[1:12,1:12]
round(vc2,0)
## The variance increases as weight increases
## Visualize the variance-covariance of the errors
## This is only for the first 12 observations
## It is easier to visualize in the log scale
## Note: log(0) becomes -Inf, but not shown here
image(log(vc2[,ncol(vc2):1]), 
      main = "First 12 observations, Cov(resid)")
## For all observations
image(log(fit.gls.vc[,ncol(fit.gls.vc):1]), 
      main = "All observations, Cov(resid)")
```

Since not only the variance is increasing, but also the data comes from different chick and is thus correlated, we could fit the following model.

```{r gls-chickweight-weights-corr}
## Adding the correlation
fit.gls2 <- gls(weight ~ Time, data = ChickWeight,
                weights = varPower(), 
                correlation = corCAR1(form = ~ Time | Chick))
## Extract the variance-covariance of the residuals
## Note: getVarCov fails for this object
fit.gls2.vc <- var_cov(fit.gls2)
## Visualize the first few observations
round(fit.gls2.vc[1:13,1:13],0)
## Visualize the variance-covariance of the errors
## On the log-scale
## Reorder and select
vc2.36 <- fit.gls2.vc[1:36,1:36]
image(log(vc2.36[,ncol(vc2.36):1]),
      main = "Covariance matrix of residuals \n for the first three Chicks (log-scale)")
```

Let's plot the data, by Chick. this shows that there is a high temporal correlation as we would expect. Chicks which have a higher weight (than others) at given time, tend to still be higher at a later time. 

```{r ChickWeight-ggplot2-facet}
ggplot(data = ChickWeight, aes(x = Time, y = weight)) + 
  facet_wrap( ~ Chick) + 
  geom_point()
```