Machine Learning Rules from 2012 (that STILL matter)
A Few Useful Things to Know About ML by Pedro Domingos
A Few Useful Things to Know About Machine Learning
This 2012 paper by Pedro Domingos captures essential "folk knowledge" about machine learning - the kind of practical wisdom you typically only learn through experience.
As a data engineer working on AI research, I found several insights particularly valuable.
The main point: Machine learning is about getting models to generalize well to unseen data.
This requires balancing model complexity, data dimensionality, and practical constraints like compute and human time.
Key concepts I learned:
Learning = Representation + Evaluation + Optimization
Choose how to represent your solution (neural net, decision tree, etc.)
Define how to evaluate its performance
Determine how to find the best model in your solution space
Feature Engineering vs Data Volume
Feature engineering takes time but can be impactful if done right
Sometimes more data beats clever algorithms
But watch out for the curse of dimensionality - more features require exponentially more data
The Generalization Challenge
Models must balance between underfitting (too simple) and overfitting (too complex)
Techniques like cross-validation and regularization help
Ensemble models often work better than single models
Simple isn't always better, but simplicity has value for its own sake
Data Engineering Reality
Getting clean, training-ready data is often the biggest time sink
The human effort in iterating and improving models is a real constraint
Feature engineering, when done right, can encode valuable domain knowledge
I share my learning process in the video, including how I mapped out these concepts and connected them to my experience as a data engineer.
Hope you also find this interesting- let me know what you think!