Why we standardise using training statistics when doing Machine Learning

This post runs through why it’s important to preprocess new data that you’re passing to a Machine Learning model using statistics calculated from the training data.

February 4, 2022 · 6 min · Ed

Approximate Nearest Neighbours in R and Spark

...

April 14, 2019 · 14 min · Ed

Feature selection by cross-validation with sparklyr

...

December 12, 2018 · 8 min · Ed

Cross-validation with sparklyr 2: Electric Boogaloo

...

November 23, 2018 · 15 min · Ed

Machine learning and k-fold cross validation with sparklyr

...

November 2, 2017 · 12 min · Ed