next up previous home.png
Next: Bibliography Up: Voting Irregularities in Palm Previous: Conclusion


Appendix: National Analysis Methodology

Our model is a generalized linear model (GLM) (McCullagh and Nelder, 1989) of the binomial family with a logistic link, allowing for overdispersion. The dependent variable is the proportion of the presidential vote in reporting unit $ i$ that was cast for Buchanan, denoted $ P_{\text{Buchanan},i}$, out of the total number of votes cast for Browne, Buchanan, Bush, Gore, Hagelin, Nader, Phillips (when the candidate appears on the ballot). Let $ N_{\text{Buchanan},i}$ denote the number of votes for Buchanan and let $ N_i$ denote the total number of votes cast for either Buchanan, Bush, Gore or Nader in reporting unit $ i$. The proportion we study is $ P_{\text{Buchanan},i} =
N_{\text{Buchanan},i} / N_i$. We base the GLM's linear predictor, denoted $ \mu_i$, on the proportions of the vote in reporting unit $ i$ that were cast for Bush and for Nader, denoted respectively $ P_{\text{Bush},i}$ and $ P_{\text{Nader},i}$. The linear predictor is defined as

$\displaystyle \mu_i = \hat{\beta}_{0} + \hat{\beta}_{1} P_{\text{Bush},i} + \hat{\beta}_{2} P_{\text{Nader},i} \;,$ (1)

where $ \hat{\beta}_0$, $ \hat{\beta}_1$ and $ \hat{\beta}_2$ are estimated coefficient values. The estimate for the proportion of the vote for Buchanan in reporting unit $ i$, based on the model, is

$\displaystyle \hat{P}_{\text{Buchanan},i} = \dfrac{\exp(\mu_i)}{1 + \exp(\mu_i)} \;.$ (2)

We are interested in the discrepancy between the actual number of votes for Buchanan in reporting unit $ i$ ( $ N_{\text{Buchanan},i}$) and the predicted number of votes, denoted $ \hat{N}_{\text{Buchanan},i} = N_i
\hat{P}_{\text{Buchanan},i}$. The simplest measure of that discrepancy is the simple residual defined by

$\displaystyle r_i = N_{\text{Buchanan},i} - \hat{N}_{\text{Buchanan},i} \;.$ (3)

A value of $ r_i$ that is much larger for reporting unit $ i$ than for other reporting units would indicate that the excess of the actual vote for Buchanan over the expected vote is much larger in unit $ i$ than it is in other areas.

A problem with the simple residuals is that, in a sense, the size of residual that we should expect to occur depends on the size of the support for Buchanan that the model predicts. As the size of the expected proportion $ \hat{P}_{\text{Buchanan},i}$ increases from zero toward 0.5, the chances of observing a larger residual increases. This may be a real problem where the main question is whether support for Buchanan in a particular reporting unit is excessively large. The residual for a reporting unit may be large relative to the residuals for other reporting units merely because the expected support for Buchanan is truly larger among the voters in that reporting unit. If one determines whether Buchanan vote in an area is excessively large by using a test based on simple residuals, the resulting test results will be biased in the sense of tending to find such excesses when they do not really exist.20

It is important to understand how this phenomenon occurs. The reason one expects to see larger residuals when the baseline support for Buchanan is truly bigger is that as the baseline proportion of votes for Buchanan increases from zero up to 0.5, the variance of the actual proportion of votes around the baseline expected value increases. This means that for any particular ``large'' size for a possible residual that one might specify (within the range zero to $ N_i/2$), the chances of seeing a residual as large as that size increase as the baseline proportion increases. If $ \hat{P}_{\text{Buchanan},i}$ is the baseline expected value and one analyzes the vote for Buchanan while treating the total number of votes $ N_i$ as a fixed quantity (known as conditioning on the total), then the variance of $ N_{\text{Buchanan},i}$ is

var$\displaystyle (N_{\text{Buchanan},i}) = \hat{\sigma}^2 N_i \hat{P}_{\text{Buchanan},i} (1-\hat{P}_{\text{Buchanan},i}) \;.$ (4)

So the variation of $ N_{\text{Buchanan},i}$ around the expected value $ \hat{N}_{\text{Buchanan},i}$ increases as $ \hat{P}_{\text{Buchanan},i}$ increases, as long as $ \hat{P}_{\text{Buchanan},i}$ is less than 0.5. This result follows from assuming that the number of votes for Buchanan in reporting unit $ i$, given $ N_i$, is a binomial random variable with probability $ \hat{P}_{\text{Buchanan},i}$, with overdispersion that is approximated in the GLM by the estimated value $ \hat{\sigma}^2$.

To make the discrepancies from different reporting units comparable to one another it is necessary to eliminate the variations that stem from the heteroscedasticity (differing variances) among the observed votes for Buchanan. The way to do that is to divide each simple residual by the square root of the variance var$ (N_{\text{Buchanan},i})$. In this way we compute what's known as the studentized residual, $ s_i$:

$\displaystyle s_i$ $\displaystyle = r_i / \sqrt{\text{var}(N_{\text{Buchanan},i})}$    
  $\displaystyle = \dfrac{N_{\text{Buchanan},i} - \hat{N}_{\text{Buchanan},i}} {[ ...
...}^2 N_i \hat{P}_{\text{Buchanan},i} (1-\hat{P}_{\text{Buchanan},i})]^{1/2}} \;.$ (5)

If the model we use to compute $ \hat{P}_{\text{Buchanan},i}$ correctly approximates the process that generates the vote for Buchanan in each and every reporting unit, then the chances of observing a studentized residual of any particular size are the same for all reporting units. There is no longer a built-in bias which makes the observed discrepancies tend to have larger magnitudes whenever the baseline expected support for Buchanan is larger. If the studentized residual is much larger for one reporting unit than it is for other reporting units, then we can have confidence that the votes for Buchanan in the unusual reporting unit were generated by a process substantially different from what went on in the other units.

For each state we estimate a separate set of parameter values $ \hat{\beta}_0$, $ \hat{\beta}_1$ and $ \hat{\beta}_2$ of equation (1) and overdispersion value $ \hat{\sigma}$. The studentized residuals are comparable across the reporting units from each state and also across states.

To implement a more powerful assessment of the discrepancy for each reporting unit, we use a jackknife method: the parameter values used to compute the residual for reporting unit $ i$ are estimated using the data from all the reporting units in the same state as $ i$ but omitting the data for $ i$. The histogram in Figure 1 shows the jackknife studentized residuals from counties in Florida. The histogram in Figure 2 pools such residuals from all 46 states for which the model of equation (1) could be estimated.


next up previous home.png
Next: Bibliography Up: Voting Irregularities in Palm Previous: Conclusion
Jasjeet Singh Sekhon 2000-11-28