Does this claim pass the smell test?

It is hard to quickly evaluate data in an everyday situation, but a nifty shortcut I saw on the R-bloggers aggregator can help.

In the simplest example, suppose you toss a coin 50 times with 32 heads and 18 tails. What are the chances the coin is fair? The handy shortcut helps to quickly evaluate simple problems like this in the field, and, more importantly, it offers an easy way to remember how to calculate the \(\chi^{2}\) statistic:

  1. Square the difference of the observed value of one group with what would be the expected value of that group \((o-e)^2\).
  2. Divide by the expected value of that group.
    \[\frac{(o-e)^2}{e}\]
  3. Repeat for the other group: square the difference of the observed value with what would be its expected value. Divide by the expected value of the group.
  4. Add both values.

If the result is greater than 3.84 the probability of the observed result happening just by chance is below 5%, because 3.84 is the .95 percentile of the \(\chi^2\) distribution with one degree of freedom. The sum (or the statistic) has to be greater than this threshold because the occurence is more extreme than the event that has the threshold probability.
Apologies are due for the frequentist paradigm; I will need to recast this discussion into a Bayesian framework later.

n <- 50   # total trials
ev <- n/2 # expected value
k <- 32
i <- n-k
(k-ev)^2/ev + (i-ev)^2/ev
## [1] 3.92

Because the value (i.e. the statistic) is greater than 3.84, we will conclude that the coin is probably not fair. In other words there is less than 5% chance that a fair coin will get this result (or a more extreme one) by chance alone.

Let’s now consider another, still simple, question. Do men and women differ in their preference for cats or dogs? Let’s assume we poll 1,000 pet owners and get the following results:

Cat Dog
Men 215 292 507
Women 241 252 493
456 544 1000

Let’s calculate the expected values using a simple proportion $$\frac{row\hspace{4pt} total * column \hspace{4pt} total}{grand \hspace{4pt} total}$$ for each cell:

Cat Dog
Men 231.2 275.8 507
Women 224.8 268.2 493
456 544 1000

Then square the deviations and divide them into the expected values:

Cat Dog
Men 1.134 0.951
Women 1.166 0.978

Finally, add up the four cells. Because the sum of the 4 cells is 4.228 (greater thean 3.84), we conclude that there is a difference between men and women in our sample in their preferences for a pet.

Let’s redo this example using the \(\chi^2\) test:

pets<-matrix(data= c(215, 241, 292, 252), nrow=2, ncol=2, dimnames=list(c("Men", "Women"), c("Cat", "Dog")))
pets
##       Cat Dog
## Men   215 292
## Women 241 252
chisq.test(pets, correct=F)
## 
##  Pearson's Chi-squared test
## 
## data:  pets
## X-squared = 4.2285, df = 1, p-value = 0.03975

We can also evaluate the original (fair coin?) question using the \(\chi^{2}\) function:

chisq.test(c(k,i))
## 
##  Chi-squared test for given probabilities
## 
## data:  c(k, i)
## X-squared = 3.92, df = 1, p-value = 0.04771

We conclude that the coin is unfair, because there is only a 4.7% probability that a fair coin would get this (or a more extreme) result.

This approach works for simple problems, with one degree of freedom. By the way, the degrees of freedom of the matrix are calculated as \[ d.f. = (n_{columns}-1) * (n_{rows}-1)\] For more than one degree of freedom, one could look up a \(\chi^2\) table, or run the R function qchisq for the appropriate d.f.:

qchisq(.95, df=1:7)
## [1]  3.841459  5.991465  7.814728  9.487729 11.070498 12.591587 14.067140

The first example, the (un)fair coin, is a case of applying the \(\chi^2\) test for goodness-of-fit, while the cats vs. dogs example is of the \(\chi^2\) test for independence.

In summary, the \(\chi^2\) test is a measure of deviation, compared against expected values, that tells us how probable these deviations are.

Notes:
1. The Chi-Squared distribution with m degrees of freedom has the mean of m, and its variance is 2m.
2. For small samples (the expected value in any cell is 5 or less; I think a much higher threshold is needed) an exact test should be used
3. The test for the cats-vs.-dogs question has a few options we can set (and get very similar results):

chisq.test(pets) # with continuity correction for 2x2 tables
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  pets
## X-squared = 3.9713, df = 1, p-value = 0.04628
chisq.test(pets, simulate.p.value = TRUE, B=10000) #Monte Carlo simulation of the p-value
## 
##  Pearson's Chi-squared test with simulated p-value (based on 10000
##  replicates)
## 
## data:  pets
## X-squared = 4.2285, df = NA, p-value = 0.0428

Leave a Reply