Why a YouTube Chat About Chess Got Flagged for Hate Speech


Last June, Antonio Radić, the host of a YouTube chess channel with more than a million subscribers, was live-streaming an interview with the grandmaster Hikaru Nakamura when the broadcast suddenly cut out.

Instead of a lively discussion about chess openings, famous games, and iconic players, viewers were told Radić’s video had been removed for “harmful and dangerous” content. Radić saw a message stating that the video, which included nothing more scandalous than a discussion of the King’s Indian Defense, had violated YouTube’s community guidelines. It remained offline for 24 hours.

Exactly what happened still isn’t clear. YouTube declined to comment beyond saying that removing Radić’s video was a mistake. But a new study suggests it reflects shortcomings in artificial intelligence programs designed to automatically detect hate speech, abuse, and misinformation online.

Ashique KhudaBukhsh, a project scientist who specializes in AI at Carnegie Mellon University and a serious chess player himself, wondered if YouTube’s algorithm may have been confused by discussions involving black and white pieces, attacks, and defenses.

So he and Rupak Sarkar, an engineer at CMU, designed an experiment. They trained two versions of a language model called BERT, one using messages from the racist far-right website Stormfront and the other using data from Twitter. They then tested the algorithms on the text and comments from 8,818 chess videos and found them to be far from perfect. The algorithms flagged around 1 percent of transcripts or comments as hate speech. But more than 80 percent of those flagged were false positives—read in context, the language was not racist. “Without a human in the loop,” the pair say in their paper, “relying on off-the-shelf classifiers’ predictions on chess discussions can be misleading.”

The experiment exposed a core problem for AI language programs. Detecting hate speech or abuse is about more than just catching foul words and phrases. The same words can have vastly different meaning in different contexts, so an algorithm must infer meaning from a string of words.

“Fundamentally, language is still a very subtle thing,” says Tom Mitchell, a CMU professor who has previously worked with KhudaBukhsh. “These kinds of trained classifiers are not soon going to be 100 percent accurate.”

Yejin Choi, an associate professor at the University of Washington who specializes in AI and language, says she is “not at all” surprised by the YouTube takedown, given the limits of language understanding today. Choi says additional progress in detecting hate speech will require big investments and new approaches. She says that algorithms work better when they analyze more than just a piece of text in isolation, incorporating, for example, a user’s history of comments or the nature of the channel in which the comments are being posted.

But Choi’s research also shows how hate-speech detection can perpetuate biases. In a 2019 study, she and others found that human annotators were more likely to label Twitter posts by users who self-identify as African American as abusive and that algorithms trained to identify abuse using those annotations will repeat those biases.

The WIRED Guide to Artificial Intelligence

Supersmart algorithms won’t take all the jobs, But they are learning faster than ever, doing everything from medical diagnostics to serving up ads.

Companies have spent many millions collecting and annotating training data for self-driving cars, but Choi says the same effort has not been put into annotating language. So far, no one has collected and annotated a high-quality data set of hate speech or abuse that includes lots of “edge cases” with ambiguous language. “If we made that level of investment on data collection—or even a small fraction of it—I’m sure AI can do much better,” she says.

Mitchell, the CMU professor, says YouTube and other platforms likely have more sophisticated AI algorithms than the one KhudaBukhsh built; but even those are still limited.



Source link