Ben Lorica, O’Reilly’s chief data scientist, has posted slides and notes from his talk at last December’s Strata Data Conference in Singapore, “We need to build machine learning tools to augment machine learning engineers.”
Lorica describes a new job emerging in IT departments: “machine learning engineers,” whose job is to adapt machine learning models for production environments. These new engineers run the risk of embedding algorithmic bias into their systems, which unfairly discriminate, create liability, and reduces the quality of the recommendations the systems produce.
He presents a set of technical and procedural steps to take to minimize these risks, with links to the relevant papers and code. It’s really required reading for anyone implementing a machine learning system in a production environment.
Another example has to do with error: once we are satisfied with a certain error rate, aren’t we done and ready to deploy our model to production? Consider a scenario where you have a machine learning model used in health care: in the course of model building, your training data for millenials (in red) is quite large compared to the number of labeled examples from senior citizens (in blue). Since accuracy tends to be correlated with the size of your training set, chances are the error rate for senior citizens will be higher than for millenials.
For situations like this, a group of researchers introduced a concept, called “equal opportunity”, that can help alleviate disproportionate error rates and ensure the “true positive rate” for the two groups are similar. See their paper and accompanying interactive visualization.
We need to build machine learning tools to augment machine learning engineers [Ben Lorica/O’Reilly]
(via 4 Short Links)
Two days ago, an industry/academic team released a terrifying alert about a pair of CPU bugs called Spectre and Meltdown that allowed one program to steal data from another, even with the best memory-management and isolation techniques — news that meant that virtually all the mission-critical computers in the world could no longer be trusted […]
Princeton’s Ed Felten (previously) is one of America’s preeminent computer scientists, having done turns as CTO of the FTC and deputy CTO of the White House.
University of Washington data scientist Jake Vanderplas found himself trapped in an interminable series of Snakes and Ladders (AKA Chutes and Ladders) with his four-year-old and found himself thinking of how he could write a Python program to simulate and solve the game.
This Content is Generated from RSS Feeds, if your content is featured and you would like to be removed, please Contact Us With your website address and name of site you wish to be removed from.
You can control what content is distributed in your RSS Feed by using your Website Editor.