The Zen of Data
Apr 25, 2021
Inspired by The Zen of Python
- More data is better than less data.
- Correct data is better than incorrect data.
- Ethically gathered data is better than unethically gathered data.
- Explicit assumptions are better than implicit ones.
- Explaining predictions is desirable whenever possible.
- Correlation does not imply causation. Really.
- Randomized controlled experiment (RCE) is the gold standard for proving causality.
- In some observational datasets causal inference *can* still be achieved even without RCE.
- All else being equal the simpler model is better.
- A good model using bad data is a bad model.
- False Positives and False Negatives do not always have the same weight.
- Visualizations should not sacrifice information content and readability for cleverness or novelty.