The Impact of COVID-19 on Content Moderation
One year of measuring content preserved, made unavailable and restored
The problem with empowering Artificial Intelligence (A.I.) to moderate our social media feeds? We might never know what we have lost. That is unless we collect the data, audit the algorithms and make the changes we need for the democratic world we want.
If we, a small civil society organisation, can safely archive content before it is removed and identify errors, why aren’t social media companies doing the same?
May fifteenth was the one year anniversary of the Christchurch Call. Made in the wake of the 2019 terrorist attack on two mosques in Christchurch, New Zealand the “call to action” brought together governments and tech companies to “eliminate terrorist and violent extremist content online”. The initiative was led by New Zealand and France. This May France passed its own law—pre-empting similar EU regulation—requiring social media companies to remove content identified as illicit within 24 hours, and content identified as “terrorist” within one hour. The new law is enforceable by fines, potentially in the billions.
The removal of content within these timeframes will only be achievable with increased use of A.I. However, A.I. is notoriously context blind. It is often unable to gauge the historical, political or linguistic setting of posts. In the machine vision of an algorithm, satire and racist abuse or human rights documentation and violent extremist propaganda, are too often indistinguishable. Infamously, in 2016 Facebook removed the iconic Vietnam war photograph of Kim Phuc, naked, fleeing American napalm bombs. The reason was “child nudity”. After public outcry, the photograph was reinstated.
For this reason, in recent years social media companies have complemented the work of their machine learning algorithms with increasingly large numbers of human content moderators.
In line with the “just-in-time” logistics of social media moderation, human content moderators are mostly contingent workers—employed by contractors rather than the social media companies themselves. A recent documentary film dubbed them “the cleaners”. Like many of the everyday services that regulate our lives—food and medical supply chains or waste collection—before the COVID-19 pandemic, human content moderation and its unseen workforce was neither a concern nor something most people knew much about, if at all.
Until now it was impossible to know the true impact of human content moderators. Did their work make a measurable difference or did A.I. dominate? Without granular public data, the answers to this question were speculative or, at best, anecdotal. The current pandemic has changed this.
For the last two months, human content moderators contracted by social media companies such as Facebook, YouTube and Twitter, like many of us, have observed physical distancing and stayed home. As tech activist Dia Kayyali notes, most are unable to remote work due to constraints like privacy agreements and regional data protection policies, and because it is unsafe to view potentially distressing content outside controlled work environments.
Overnight these measures put in place to stop the transmission of COVID-19 have brought forward a future in which A.I., in the absence of its human co-workers, is empowered to aggressively moderate our social media feeds. This is a future that lawmakers had de facto called for by ordering ever more rapid removals. It is also a future that civil society organisations on the frontline of documenting human rights abuses, such as Syrian Archive, had cautioned against.
Before the human content moderators went home, the rate of removals Syrian Archive were recording was already climbing. Since the beginning of this year, the rate of content takedowns of Syrian human rights documentation on YouTube has roughly doubled, and on Twitter the rate of content takedowns of Syrian human rights documentation has roughly tripled, our research suggests. We do not have comparable data on Facebook because unlike YouTube and Twitter it does not provide users with the reason why content has been removed.We have collected data on content takedowns since 2018, the increase is unprecedented.
Syrian Archive was set up in 2014 to preserve documentation of the Syrian War on social media—images, videos, and posts that are both invaluable historical artifacts and potential evidence of human rights abuses. Indeed, the conflict has been called the ‘YouTube War’ in reference to the fact that there are more “hours of footage of the Syrian civil war on YouTube then there actually are hours of the war in real life.”
When in 2017, under pressure from Western governments to appear pro-active in fighting the dissemination of extremist materials, social media companies introduced machine learning algorithms to flag content. Syrian Archive responded by monitoring which images and videos on our servers were being taken offline. We have used this data to work with other organisations, such as Witness, and social media companies, such as YouTube, to reinstate content wrongfully removed. So far we have helped to put back online hundreds of thousands of images and videos.
For example, in 2019 YouTube removed 31.9 million videos — roughly 87,000 per day . Of those flagged for potential violation of terms of service, 87 percent were removed through automated flagging, and over one third were removed before any views. Facebook removed roughly 25 million pieces of content deemed “terrorist propaganda” in 2019. In the first quarter of 2020, 99.3% of content was removed before users had reported it. Twitter removed 115,861 accounts for terrorist content in the first half of 2019.
To be clear, greater amounts of content being taken down does not necessarily equal bad content moderation. It could be indicative of more spam or illicit content being uploaded.
So what is bad content moderation? Bad content moderation means a high error rate. Two basic errors are possible: leaving content up that should be taken down and taking content down that should be left up. Getting the balance right is an ongoing challenge for social media companies who, under pressure to both remove “violent extremist content” and protect freedom of speech, must choose “which errors to err on the side of”.
Critically, in our opinion, bad content moderation is also not providing users with the reason why their upload was removed and not empowering them to appeal when they believe the decision has been made in error.
As Ellery Roberts Biddle wrote, the famed photograph of Kim Phuc fleeing napalm bombs triggered a shift in public opinion that, arguably, contributed to the Vietnam war ending. How would the world be different if the photograph had been censored nearly half a century ago? Today, a similar image taken in Syria would likely be erased without human moderation and before receiving a single view—we would never know what we had lost.
We understand that not using A.I. is out of the question. As assistant professor of information studies Sarah Roberts says, turning it off would inundate our social media feeds with “unbearable garbage, spam or irrelevant and disturbing content.”
Governments, lawmakers and civil society organisations must see the current pandemic as an opportunity to understand what will happen if and when we ramp-up the use of A.I. permanently. If error rates must rise, we must ask: what kinds of errors do we want future regulation and laws to incentivise? This debate, that is at essence a debate about how we understand ‘good content moderation’ must be held in public.
Syrian Archive’s data is unique and, as such, we are committed to sharing it as widely and publicly as possible but it is not enough.
Without compromising the privacy of their users, social media companies must make as much information on content moderation available as possible to enable independent oversight. To know that more content is going down is not enough—we need to know why. Providing the reason for content takedowns is a start. Facebook must start doing so for each item of content it removes.
Social media companies have a responsibility to properly invest in both human and automated content moderation. Left unchallenged and unchecked, errors can snowball—becoming entrenched in the flawed systems that produced them. Currently, opportunities for users to get content reinstated are frustratingly few. It is not democratic that the power to challenge wrongful takedowns is concentrated in the hands of a few largely Western organisations, us included.
An idea gaining traction is that social media companies should make content removed under the category of “terrorist and violent extremist” available to those for whom access is in the public interest, such as, human rights advocates and lawyers and counter terrorism experts. We already know the impact of sharing these materials. In 2017, the International Criminal Court issued its first arrest warrant based solely on social media evidence. The warrant was for the Libyan military commander, Al-Werfalli. In 2018 in the context of the Myanmar genocide and on the request of the UN, Facebook stored and made accessible to investigators the content it removed.
In the same year, images from social media preserved by Syrian Archive of sarin gas attacks in Syria were used in assisting the Organisation for the Prohibition of Chemical Weapons to determine the Syrian government was responsible for a number of chemical weapons attacks. Before the investigation began many of the images were erased from social media due to their graphic content.
On and offline, the stakes have never been higher. [Social media companies have experienced continual surges of user activity that surpass known busy periods like the stroke of midnight on New Year’s eve.
If we can archive 5 million images and videos, social media companies can do the same in a privacy preserving way for content that cannot remain public but is too valuable to be erased. We calculated that it costs $10,000 to securely archive a million images and videos on a server for a year. For 2019 it would have cost Facebook, Twitter and YouTube half a million dollars to store the content they removed. As many civil society organisations have already called for, these companies, who each have billions in annual revenue, should be able to come up with solutions of how to do so in a privacy-conscious way.
Social media companies are already paying for the cost of under-investing in human content moderators. Last month Facebook paid $52 million in compensation to current and former moderators who were left traumatised by their work.