Why we standardise using training statistics when doing Machine Learning

This post runs through why it’s important to preprocess new data that you’re passing to a Machine Learning model using statistics calculated from the training data.

February 4, 2022 · 6 min · Ed

Introducing group sequential designs for early stopping of A/B tests

...

May 1, 2020 · 13 min · Ed

Simulating A/B tests with data.table

...

February 28, 2020 · 10 min · Ed

Adjusting for covariates and baseline differences in A/B testing

...

July 12, 2019 · 11 min · Ed

Approximate Nearest Neighbours in R and Spark

...

April 14, 2019 · 14 min · Ed

Feature selection by cross-validation with sparklyr

...

December 12, 2018 · 8 min · Ed

Cross-validation with sparklyr 2: Electric Boogaloo

...

November 23, 2018 · 15 min · Ed

SparkR vs sparklyr for interacting with Spark from R

...

December 5, 2017 · 14 min · Ed

Machine learning and k-fold cross validation with sparklyr

...

November 2, 2017 · 12 min · Ed

Writing your thesis with bookdown

...

September 25, 2017 · 15 min · Ed