Dec. 6th, 2006
I don't like Tuesdays
Dec. 6th, 2006 10:47 amSometimes peoples ax us, "Say, tiki, what's it like being a big shot Scientist 'n all? Betcha it's pretty neat. Hangin' out and confronting Big Questions of the Universe. Say, are you gonna eat that?"
To try to convey the thrill that is doing Big Science, here is an excerpt from a 100% guaranteed authentic email which we received from a colleague yesterday, detailing an arcane statistical problem that has arisen in the course of analyzing our results:
After dealing with like for 8 hours (including a flurry of emails, assorted phone calls, and faxes, we went home and shot bunnies with plunger guns.
To try to convey the thrill that is doing Big Science, here is an excerpt from a 100% guaranteed authentic email which we received from a colleague yesterday, detailing an arcane statistical problem that has arisen in the course of analyzing our results:
This is to address your question regarding the use of an offset. In order to explain the differences between using an offset and having exposure time as covariate, I prefer to write some models. Firstly, let Z_i be the outcome variable, and Y_i be log(Z_i), let X_i be covariate, K_i be exposure time and M_i = log(K_i). To make things easier, we first consider linear regression
Y_i = alpha + beta X_i + residual_i, --------------- (1)
where alpha is the intercept and beta is the slope for X. In model (1), we have not taken exposure time M_i into consideration. In order to take M_i into consideration, we may apply the following model
Y_i = alpha + beta X_i + gamma M_i + residual_i ----- (2)
What the offset model is to assume gamma = 1, that is
Y_i = alpha + beta X_i + M_i + residual_i ------------- (3)
So, your question was about the use of model (2) and (3).
Note that (3) can be expressed as
E(Y_i - M_i) = alpha + beta X_i, ----------------- (4)
where E denotes expectation. We rewrite (4) as
E(log(Z_i / K_i)) = alpha + beta X_i ---------- (5)
So, we now can understand that using an offset is to divide the untransformed outcome by the exposure time. Therefore, it seems to make sense to use it. This argument is generally true for linear regression, and perhaps for Poisson regression. However, I am not very sure yet, if the explanation is very thorough for logistic regression.
After dealing with like for 8 hours (including a flurry of emails, assorted phone calls, and faxes, we went home and shot bunnies with plunger guns.

