Machine Learning Rules from 2012 (that STILL matter)

A Few Useful Things to Know About ML by Pedro Domingos

Jan 11, 2025

A Few Useful Things to Know About Machine Learning

This 2012 paper by Pedro Domingos captures essential "folk knowledge" about machine learning - the kind of practical wisdom you typically only learn through experience.

As a data engineer working on AI research, I found several insights particularly valuable.

The main point: Machine learning is about getting models to generalize well to unseen data.

This requires balancing model complexity, data dimensionality, and practical constraints like compute and human time.

Key concepts I learned:

Learning = Representation + Evaluation + Optimization

Choose how to represent your solution (neural net, decision tree, etc.)
Define how to evaluate its performance
Determine how to find the best model in your solution space

Feature Engineering vs Data Volume

Feature engineering takes time but can be impactful if done right
Sometimes more data beats clever algorithms
But watch out for the curse of dimensionality - more features require exponentially more data

The Generalization Challenge

Models must balance between underfitting (too simple) and overfitting (too complex)
Techniques like cross-validation and regularization help
Ensemble models often work better than single models
Simple isn't always better, but simplicity has value for its own sake

Data Engineering Reality

Getting clean, training-ready data is often the biggest time sink
The human effort in iterating and improving models is a real constraint
Feature engineering, when done right, can encode valuable domain knowledge

I share my learning process in the video, including how I mapped out these concepts and connected them to my experience as a data engineer.

Hope you also find this interesting- let me know what you think!

Augment, Stay Human

Discussion about this post