Michael Bernstein is determined not to let the errors of today’s platforms be the final word on what it means for technology to bring people together. Instead, by paying more humble attention to the lessons of the past, and to the structural inequalities that make those lessons still echo today, he aims to design new platforms.
Michael’s focus is on designing technologies that support groups in achieving their collective goals. Social computing faces a reckoning today: we are implicated in creating the socio-technical systems that now amplify societal ills. Inspired by those in the behavioral sciences, as well as those who advocate for better platform futures, Michael’s research builds alternative visions for the future of social and collaborative platforms, manifests those visions in software platforms, and launches those platforms to the public.
Michael and his collaborators are developing a machine learning approach oriented around the metaphor of juries: explicitly articulating numbers of members of groups in society, and of intersectional identities, whose composite votes should determine the classifier’s behavior. Unlike traditional machine learning classifiers, which model a single answer for each question, a jury-based classifier estimates responses of each of these diverse individuals in a population. Jury-based AI forces the developer to explicitly describe the population that should be deciding these issues, for example a population more fully representing women and Black perspectives for online harassment classification. Jury-based machine learning offers an architectural restart on classification to enable developers to explicitly articulate whose values they are encoding into their algorithm.
The team will develop visualizations that allow nontechnical users to see how a machine learning jury reacts to each input, to understand the diversity of opinions within groups and across groups, and to follow how the models representing individual jury members are aggregated into a composite response. A public web application will allow end users to explore the construction of their own jury AIs for common classifiers such as online harassment and disinformation detection. Users will compose juries to understand how online platforms might behave differently if their AI systems were constructed with juries that represented, for example, more Black members, or more transgender members, or more women, than the AIs based on today’s common training sets. The team aims to demonstrate that a shift to jury-based AI represents marginalized groups in a way that is easy to understand, performs well, and forces explicit judgment around whose voice should be represented.
Meeting the Challenge
Machine learning came of age in an era when we articulated single correct answers for each classification task: an image contains a picture of a cat, or it doesn’t. Today, this single-answer assumption is mistakenly applied to problems where groups have dramatically different opinions.
Current algorithmic approaches remain mired in an assumption that society agrees on the answer to the questions that AIs are being trained to answer. In social computing systems such as Facebook, Wikipedia, and Twitter, however, marginalized groups often disagree with the answers provided by AIs. For example, women and Black users of Twitter, who are more commonly targets of online harassment, observe harassment that White males might overlook. Unfortunately, the training data used to power the AIs in these systems is an aggregate of the population’s opinion, which typically aligns on the largest group’s point of view. The resulting AIs reflect the values of that largest group, appear to have excellent performance on held-out test sets, and then launch to the public—where they fail marginalized groups. Advances in algorithmic fairness can help, but they are focused on reducing disparate outcomes, whereas there remains an issue that the marginalized group’s opinions were not proportionally represented in the AI model in the first place. Today’s machine learning models, which often output a single ground truth label per item, are unable to capture this diversity of perspectives.