Photo by Simon Rae on Unsplash

The Zen of Data

Evan Zamir
Apr 25, 2021

Inspired by The Zen of Python

  • More data is better than less data.
  • Correct data is better than incorrect data.
  • Ethically gathered data is better than unethically gathered data.
  • Explicit assumptions are better than implicit ones.
  • Explaining predictions is desirable whenever possible.
  • Correlation does not imply causation. Really.
  • Randomized controlled experiment (RCE) is the gold standard for proving causality.
  • In some observational datasets causal inference *can* still be achieved even without RCE.
  • All else being equal the simpler model is better.
  • A good model using bad data is a bad model.
  • False Positives and False Negatives do not always have the same weight.
  • Visualizations should not sacrifice information content and readability for cleverness or novelty.

--

--

Evan Zamir
Evan Zamir

Written by Evan Zamir

Data Scientist. San Francisco.

No responses yet