apply

Toy datasets with random NAs

Posted on May 3, 2021June 17, 2021 by LV

Simulated (toy) datasets are very helpful to test data analysis tools and various other functions or transformations. For example, inserting random blanks (NAs) may allow testing imputation procedures. I have created a function to quickly insert NAs into a vector, that can be used across rows, columns, or on the whole data frame with one…

Strings and regex (regular expressions)

Posted on April 14, 2020May 3, 2021 by LV

See Chapter 11 in The Art of R Programming, Chapter 7 in The R Cookbook, Section 2.12 in The R Book (in particular, sections 2.12.5 – 2.12.13). Good discussion of regular expression in sections 7.4-7.8 of Data Manipulation with R. Also see this example on how to melt the dataset. Set-up Split it! strsplit splits…

apply – a most useful family of functions

Posted on April 13, 2020May 3, 2021 by LV

This is a very important function (family) in a vector-type language like R. Its members are apply, lapply, sapply, mapply, and tapply. I use lapply and sapply most often, although for a matrix, apply is more suitable. These functions work well for complicated iterative calculations and are MUCH faster than loops, that appear in comparison,…

MLE – The Maximum Likelihood Estimate

Posted on April 11, 2020May 3, 2021 by LV

The maximum likelihood estimation (MLE) is a general method to find the function that most likely fits the available data; it therefore addresses a central problem in data sciences. Depending on the model, the math behind MLE can be very complicated, but an intuitive way to think about it is through the following thought experiment….