next up previous home.png
Next: The Buchanan Vote Across Up: Palm Beach County and Previous: Palm Beach County and


Robust Estimation of an Overdispersed Binomial Model

We use an overdispersed binomial model for the count $ y_i$ of votes for Buchanan out of $ m_i$ ballots cast in county $ i$, $ i=1,\dots,n$. For each county there are $ k$ observed regressors (including a constant) collected in a vector $ x_{i}$. Following McCullagh and Nelder (1989, 125, eqn. 4.20), the mean and variance of $ y_i$ are

$\displaystyle E(y_i \mid x_i, m_i)$ $\displaystyle = m_i \pi_i ,$ (1)
$\displaystyle E[(y_i- m_i\pi_i)^2 \mid x_i, m_i]$ $\displaystyle = \sigma^2 m_i \pi_i (1-\pi_i) ,$ (2)

with $ \sigma^2>0$ and, for an unknown constant vector of coefficient parameters $ \beta$,

$\displaystyle \pi_{i} = \dfrac{1}{1 + \exp(-x'_i \beta)} .$ (3)

If $ \sigma^2>1$ then there is overdispersion relative to a purely binomial model.

To estimate $ \sigma^2$ we use a least quartile difference (LQD) estimator (Croux et al., 1994; Rousseeuw and Croux, 1993) for the scale $ \sigma=\sqrt{\sigma^2}$. Let $ \hat{\sigma}$ denote the estimated scale value. Given $ \hat{\sigma}$, we use a hyperbolic tangent ($ \tanh$) estimator (Hampel et al., 1981) for $ \beta$. Let $ \hat{\beta}$ denote the estimated coefficient vector. The estimators are described in more detail in the Appendix. An important product of the estimation is a weight $ w_i\in[0,1]$ for each county. If $ w_i=0$ then data from county $ i$ had no effect on $ \hat{\beta}$, given $ \hat{\sigma}$: the $ \tanh$ estimator completely rejected the county as an outlier.

Given expected proportions $ \hat{\pi}_i = [1 + \exp(-x'_i \hat{\beta})]^{-1}$, we use studentized residuals (Carroll and Ruppert, 1988, 31-34) to measure the discrepancy between actual and expected votes for Buchanan. The studentized residuals may be compared across counties, both within and across states.12 A standardized residual is

$\displaystyle r_{i} = \dfrac{y_{i} - m_{i}\hat{\pi}_{i}} {\hat{\sigma} \sqrt{m_{i}\hat{\pi}_{i} (1- \hat{\pi}_{i})}} .$ (4)

To obtain studentized residuals we need to make a weighting adjustment for leverage (applying to the counties that have $ w_i>0$) or for forecasting error (applying to the counties that have $ w_i=0$). Let $ W$ denote the matrix that has diagonal entries $ W_{ii}=w_i$ and off-diagonal entries equal to zero ($ W_{ij}=0$ for $ i\neq j$). Let $ V$ denote the matrix that has $ V_{ii}=[m_{i}\hat{\pi}_{i}(1-\hat{\pi}_{i})]^{-1/2}$ and $ V_{ij}=0$ for $ i\neq j$. Let $ X$ be the $ n\times k$ matrix of the regressors (row $ i$ of $ X$ is $ x'_i$). The diagonal values of

$\displaystyle H = V X ( X' V W V X ) ^{-1} X' V$    

provide robust estimates of the additional weights13. Let $ h_{i} = H_{ii}$ if $ w_{i} >0$ and $ h_{i} = -H_{ii}$ if $ w_{i} =
0$. The studentized residual is

$\displaystyle \tilde{r}_{i} = r_{i} / \sqrt{1-h_{i}} .$ (5)


next up previous home.png
Next: The Buchanan Vote Across Up: Palm Beach County and Previous: Palm Beach County and
Jasjeet S. Sekhon 2001-03-04