Excavating AI: The Politics of Images in Machine Learning Training Sets is a #longForm article by NYU professors on the limitations of image database and in turn of #facialRecognition systems that …

You open up a database of pictures used to train artificial intelligence systems. At first, things seem straightforward. You’re met with thousands of images: apples and oranges, birds, dogs, horses, mountains, clouds, houses, and street signs. But as you probe further into the dataset, people begin to appear: cheerleaders, scuba divers, welders, Boy Scouts, fire walkers, and flower girls. Things get strange: A photograph of a woman smiling in a bikini is labeled a “slattern, slut, slovenly woman, trollop.” A young man drinking beer is categorized as an “alcoholic, alky, dipsomaniac, boozer, lush, soaker, souse.” A child wearing sunglasses is classified as a “failure, loser, non-starter, unsuccessful person.” You’re looking at the “person” category in a dataset called ImageNet, one of the most widely used training sets for machine learning. 

Something is wrong with this picture. 

Where did these images come from? Why were the people in the photos labeled this way? What sorts of politics are at work when pictures are paired with labels, and what are the implications when they are used to train technical systems?

In short, how did we get here? 

Sourced through Scoop.it from: www.excavating.ai

WHY IT MATTERS: facial recognition machine learning and deep learning AI systems are trained using millions of images. These images often are labelled manually by users with little or no supervision or oversight. Moreover they contain biaises that are transferred into the algorithms they train. This article and accompanying application that I’ve reviewed recently provide insights into this topic and should give pause to those that want to implement AI systems without appropriate data, checks and balances.


Farid Mheir