Researchers turn to crowdsourcing for better YouTube recommendations

Enterprise

Join today’s leading executives online at the Data Summit live now! Watch here.


In 2019, an analysis by ex-Google computer scientist, Guillaume Chaslot, found that YouTube’s recommendation algorithm overwhelmingly recommended Russia Today’s video about the Mueller report, the U.S. government report documenting Russian efforts to interfere in the 2016 presidential election. The video, which contained false claims about the report’s findings, had only 50,000 views, yet YouTube’s algorithm surfaced it over hundreds of other, more popular videos uploaded by independent media outlets.

Google, which owns YouTube, responded to this and other alleged algorithmic flaws with policy tweaks and a purge of terms of service-violating accounts. But more recent research from Mozilla offers evidence that YouTube continues to put objectionable, questionably related content — including misinformation, violent and graphic content, and hate speech — in front of its users. In one instance documented by Mozilla, a person who watched a video about software rights was then recommended a video about gun rights.

Exasperated by the lack of progress and inspired to shine a light on the issue of algorithmic transparency, a team at the Swiss Federal Institute of Technology Lausanne (EPFL) launched the Tournesol Foundation, a nonprofit created to develop a voting-based system that recruits viewers to highlight the best content on YouTube. With Tournesol, any YouTube user can create an account and recommend content, which Tournesol then aggregates and converts into per-video scores representing the ‘collaborative judgement” of the community.

According to Le Nguyên Hoang, a scientist at the École Polytechnique Fédérale of Lausan EPFL and one of the cofounders of Tournesol, the goal is to provide a safer, more “benevolent” alternative to YouTube’s recommendations that’s powered by the sway of the crowd.

“Tournesol is the result of five years of discussions with my colleagues at EPFL on the safety and ethics of large-scale algorithms,” Hoang told VentureBeat via email. “As a science YouTuber myself, I have quickly been extremely concerned about recommendation algorithms and troll campaigns … With a few friends, we spent a year developing the platform in our spare time. In April 2021, we created the nonprofit Tournesol Association to support the platform.”

Crowdsourcing recommendations

The science is mixed on the helpfulness of crowdsourcing when applied to recommending content. Reddit — where the visibility of posts and comments is decided by the number of accumulated “upvotes” — is an infamous example. Research has shown that even a single extra upvote on a comment can lead to a snowball effect, where more people in the community up-vote that comment.

The reasons for this “snowball effect” vary. Sean Taylor, a social scientist at Facebook who coauthored a seminal study on the subject, speculates that people rely on upvotes to indicate that something is worthwhile — especially when they aren’t sure what to think. Alternatively, he says, highly-rated posts could be more likely to draw attention and, therefore, more votes from the community.

Crowdsourcing also runs the risk of introducing other biases. Users passionate about a particular cause might be more inclined to sign up so that they can ensure their votes are represented, for example. Voices and communities who aren’t made aware of ways to participate or who don’t have the means (e.g., access to a computer running Chrome) could be unintentionally excluded. Regardless of who is participating, users are often driven by emotion, the extent to which an opinion matches their own, and whether they’re familiar with a source of information — irrespective of the source’s veracity.

Bias can arise along several dimensions, as studies of recommendation algorithms of the years have demonstrated. In a 2020 research paper, the coauthors showed that a popular dataset of movie ratings could cause an algorithm to provide less-relevant suggestions for women versus men because the dataset contained more ratings from men. Other work has found that Facebook’s ad-recommending algorithm discriminates by gender and race.

Hoang acknowledges the issues, but argues crowdsourcing is a sensible replacement for YouTube’s financially motivated systems, which prioritize “engagement” — i.e., ad views — at the expense of most other metrics. A 2019 report from Bloomberg alleges that YouTube executives ignored warnings from staff, letting toxic, algorithm-boosted videos run rampant to increase viewership.

“The governance and oversight over today’s large-scale algorithms is frustratingly poor,” he said. “[A]lgorithms are clearly being weaponized by organized disinformation campaigns, [but] even when they are not weaponized, by providing addictive content to maximize retention, recommendation algorithms are biased towards sensationalist, divisive, and angering content … [These] algorithms are neither audited nor auditable by external entities.”

Several legislative remedies to the problem of harmful recommendation algorithms have been proposed, including a provision in the European Union’s draft AI Act that would place constraints on AI systems that “manipulate” human behavior, opinions, or decisions “to their detriment.” In the U.S., a recently floated bill — the Social Media NUDGE Act — would direct agencies to identify “content neutral” methods to slow down the algorithm-driven spread of harmful content and misinformation on social media.

But Hoang says that these efforts don’t go far enough — and aren’t guaranteed to succeed. The AI Act’s language around recommender systems has been softened in subsequent drafts, and the Social Media NUDGE Act — along with other U.S. bills to regulate algorithms — remains stalled in Congress.

“What we want above all is for such a global algorithmic governance to be designed with the best-possible solutions to make it effective, scalable, secure, robust, transparent, interpretable, fair, and trustworthy,” he added. “It is crucial to note that, whether we use social media or not, we all have a stake in the information that large-scale recommender systems distribute billions of times per day to other users.”

Challenges and vision

Tournesol, like YouTube, uses algorithms to power its recommendations, which are informed by votes from users on the platform and the scores associated with each video. (Recall that votes on Tournesol are aggregated and converted into scores for videos.) To cast a vote, users compare any two videos that they’ve watched and then choose the one that they’d recommend over the other on a sliding scale. A “vouching” system requires that users be certified based on the “trustworthiness” of their email domains to protect against fake accounts, and users can’t see how others have voted when comparing videos.

When comparing videos, Tournesol users can also denote whether one video meets criteria like “reliable and not misleading,” “clear and pedagogical,” and “important and actionable” versus the other video. The results funnel into a public dataset that’s used to train the algorithms to provide recommendations in English, French, and German.

EPFL has contributed to Tournesol’s open source code, as well as tech startups PolyConseil and Kleis, which participated in the project through their Tech for Good program. Features in the works include voter profile pages, the ability to compare videos directly on YouTube through the Chrome extension, and visualizations that show votes from “subpopulations of interest” like journalists and subject-matter experts.

Hoang is forthright about the hurdles that must be overcome before Tournesol graduates from a research effort. For example, contributors to the project are exploring how to assign weights to votes so that experts have — depending on the video — more sway over recommendations than non-experts. They’re also investigating ways that under-represented communities on Tournesol can have their influence boosted to that of dominant groups, like white men, to combat algorithmic bias.

“Today’s main limitation, by far, is the lack of data. We desperately need more people engaging in content reviewing, to help us assign more robustly scores to as many important YouTube videos as possible. Once Tournesol is shown to be performant and robust at identifying top content, convincing large information companies to leverage our scores … will be a huge challenge,” Hoang said. “[But] imagine if, instead of amplifying hate speech and calls to violence as they currently do, recommendation algorithms massively and repeatedly shared the numerous calls for peace that courageous peace activists are making … [S]uch algorithms could become a fantastic ally for world peace.”

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn More