Toxic language detection is totally inadequate

Alphabet (nee Google) is working on Perspective, which is a machine learning algorithm that tries to detect “toxic” language.

Perspective’s got a ways to go, and some of the errors the program has been shown to make are enough to cause one to question if machines will ever be able to parse meaning-laden human expression with any accuracy. Bottom line: humans, in their ability to evade detection for nasty invective, are way, way ahead of the machines.

This is why our own Moderation team shines and why leaving it to computers without any people looks like it will continue to fail.

 the algorithm is a bit too dependent on hot-button keywords, and not enough on the surrounding contextual clues in the statement, especially a word like “not,” which tends to reverse the polarity of what’s being said. “Rape,” “Jews,” “terrorist,” and “Hitler” are all likely to increase your toxicity score, even in comments that are mostly placating or unobjectionable.

Auerbach supplies a hilarious account of the ways Perspective gets it wrong:

“Trump sucks” scored a colossal 96 percent, yet neo-Nazi codeword “14/88” only scored 5 percent. “Few Muslims are a terrorist threat” was 79 percent toxic, while “race war now” scored 24 percent. “Hitler was an anti-Semite” scored 70 percent, but “Hitler was not an anti-Semite” scored only 53%, and “The Holocaust never happened” scored only 21%. And while “gas the joos” scored 29 percent, rephrasing it to “Please gas the joos. Thank you.” lowered the score to a mere 7 percent. (“Jews are human,” however, scores 72 percent. “Jews are not human”? 64 percent.)

Humans are highly subtle when it comes to language, and machines find it hard to keep up. A particularly chilling example from the MIT Technology Review article is the sentence “You should be made into a lamp,” which is a direct allusion to Nazi atrocities and has been directed at several journalists in recent months. Perspective gives that a toxicity rating of 4.

 

It’s hard enough to parse language for hateful intent; imagine how much harder when you toss in a factor like juxtaposition with an image. A sentence like “You can trust me to do the right thing” has a completely different meaning when placed next to a picture of Pepe the Frog, wouldn’t you think?

A reader also put a few bible passages through, starting with the Ten Commandments:

And somehow having God’s blessing isn’t seen as a good thing either

When it comes to humans communicating with each other, only other humans can be relied on to truly know what the context is.  Especially when surrounded by other images or placed inside a location that has a meaning that is totally unknowable to the computer.

Want an example?    How about an ad for a self-cleaning oven photoshopped on the fence at Dachau?

 


THANK YOU for being a subscriber. Because of you Whaleoil is going from strength to strength. It is a little known fact that Whaleoil subscribers are better in bed, good looking and highly intelligent. Sometimes all at once! Please Click Here Now to subscribe to an ad-free Whaleoil.

39%