I typically load the same set of libraries when I start a project, and this is my first chunk in an *rmd file: For a bare-bones, no-drama start, may also use this one-liner:

# Author: owner

## Right-to-Left Shunt

Right-to-left shunt fraction Estimated using an arterial blood gas measurement after inspiring 100% oxygen for about 20 minutes. library (ggplot2) library(reshape2) #to melt library(plotly) Main function Calculates the \(\frac{Q_s}{Q_t}\) shunt fraction rl.shunt <- function (po2, pco2, temp, altitude){ if (missing(temp)) {temp <- 37} if (missing(altitude)) {altitude <- 1300} #MCS altitude p.water <- 47 * exp((temp-37)/18.4)…

## Obtain the CI from a p-value

From Doug Altman’s paper “How to obtain the confidence interval from a P value” BMJ 2011;343:d2090 Steps to calculate CI for a difference Est from p-value calculate the test statistic for a normal distribution test, z, from p: calculate the standard error: (ignoring minus signs) calculate the 95% CI: …

## Toy datasets with random NAs

Simulated (toy) datasets are very helpful to test data analysis tools and various other functions or transformations. For example, inserting random blanks (NAs) may allow testing imputation procedures. I have created a function to quickly insert NAs into a vector, that can be used across rows, columns, or on the whole data frame with one…

## Power calculations

Underpowered studies are a big (but far from the only) source of the current replication crisis in the medical literature. Power calculations hinge on the expected effect size (often expressed as Cohen’s d), the populations’ spread around the mean (standard deviation) and arbitrary frequentist assumptions about alpha and beta. Cohen’s d is conceptually similar to…

## The Reproducibility Crisis in Medicine

In 2005, Stanford epidemiologist John Ioannidis published the provocatively titled paper “Why most published research findings are false” (Ioannidis, PLoS Med 2005, 2:e12), that has since become a foundational piece of metascience. Among other things, he stated: The smaller the study sample conducted in a scientific field, the less likely the research findings are to…

## Does this claim pass the smell test?

It is hard to quickly evaluate data in an everyday situation, but a nifty shortcut I saw on the R-bloggers aggregator can help. In the simplest example, suppose you toss a coin 50 times with 32 heads and 18 tails. What are the chances the coin is fair? The handy shortcut helps to quickly evaluate…

## Bootstraping

Bootstrapping (or ‘the bootstrap’) is a statistical technique of drawing repeated samples (resampling) with replacement from an available sample; this ultimately allows one to draw inferences about a population from the available sample. The number of resamples is usually large (say, 10,000), although with a representative sample, 50 resamples will get you there. This is…

## Strings and regex (regular expressions)

See Chapter 11 in The Art of R Programming, Chapter 7 in The R Cookbook, Section 2.12 in The R Book (in particular, sections 2.12.5 – 2.12.13). Good discussion of regular expression in sections 7.4-7.8 of Data Manipulation with R. Also see this example on how to melt the dataset. Set-up Split it! strsplit splits…

## apply – a most useful family of functions

This is a very important function (family) in a vector-type language like R. Its members are apply, lapply, sapply, mapply, and tapply. I use lapply and sapply most often, although for a matrix, apply is more suitable. These functions work well for complicated iterative calculations and are MUCH faster than loops, that appear in comparison,…