It is hard to quickly evaluate data in an everyday situation, but a nifty shortcut I saw on the R-bloggers aggregator can help. In the simplest example, suppose you toss a coin 50 times with 32 heads and 18 tails. What are the chances the coin is fair? The handy shortcut helps to quickly evaluate…

## Bootstraping

Bootstrapping (or ‘the bootstrap’) is a statistical technique of drawing repeated samples (resampling) with replacement from an available sample; this ultimately allows one to draw inferences about a population from the available sample. The number of resamples is usually large (say, 10,000), although with a representative sample, 50 resamples will get you there. This is…

## Strings and regex (regular expressions)

See Chapter 11 in The Art of R Programming, Chapter 7 in The R Cookbook, Section 2.12 in The R Book (in particular, sections 2.12.5 – 2.12.13). Good discussion of regular expression in sections 7.4-7.8 of Data Manipulation with R. Also see this example on how to melt the dataset. Set-up Split it! strsplit splits…

## apply – a most useful family of functions

This is a very important function (family) in a vector-type language like R. Its members are apply, lapply, sapply, mapply, and tapply. I use lapply and sapply most often, although for a matrix, apply is more suitable. These functions work well for complicated iterative calculations and are MUCH faster than loops, that appear in comparison,…

## The hypergeometric distribution

The hypergeometric distribution is used to solve the classic “balls in an urn” proble. Suppose one has 7 red balls and 3 white ball in an urn, and draws 2 balls. What is the probability that both balls are white? Let’s make a function to allow quick replication Other combinations

## MLE – The Maximum Likelihood Estimate

The maximum likelihood estimation (MLE) is a general method to find the function that most likely fits the available data; it therefore addresses a central problem in data sciences. Depending on the model, the math behind MLE can be very complicated, but an intuitive way to think about it is through the following thought experiment….

## The binomial distribution

The binomial distribution is used when there are n (a fixed number) independent trials with two possible outcomes (“success” and “failure”) with a probability that is constant. With 10 tosses of a fair coin, what is the probability of getting 7 heads? \[Prob = dbinom(7,10,0.5) = 0.1171875\] And what is the probability of getting exactly…

## Fisher’s exact test

Fisher’s exact test is used to compare counts and proportions between groups when small samples of nominal variables are available. It assumes that the individual observations are independent, and that the row and column totals are fixed, or “conditioned.” An example would be putting 12 female hermit crabs and 9 male hermit crabs in an…

## The Poisson distribution

A sports team scores 84 points in 21 games, so the average score is 4 points per game. What is the probability that it scores 1 point per game? Or 6 points, or more than than 4? The answer is in the Poisson distribution, where variance = mean, denoted as $\lambda$. The answer to the…

## The beta distribution

Definition Using the mean Using ranges for the prior Informative vs. uninformative priors Mean, median, variance Highest Density Interval References Addenda Definition The Beta distribution represents a probability distribution of probabilities. It is the conjugate prior of the binomial distribution (if the prior and the posterior distribution are in the same family, the prior and posterior…