Content Moderation Case Study: Using Hashes And Scanning To Stop Cloud Storage From Being Used For Infringement (2014)
from the cloud-storage-scanning dept
Summary: Since the rise of the internet, the recording industry has been particularly concerned about how the internet can and will be used to share infringing content. Over time, the focus of that concern has shifted as the technology (as well as copyright laws) have shifted. In the early 2000s, most of the concern was around file sharing applications, services and sites, such as Napster, Limewire, and The Pirate Bay. However, after 2010, much of the emphasis switched to so-called “cyberlockers.”
Unlike file sharing apps, that involved person-to-person sharing directly from their own computers via intermediary technologies, a cyberlocker was more of a hard drive on the internet. The issue was that some would store large quantities of music files, and then make them available for unlicensed downloading.
While some cyberlockers were built directly around this use-case, at the same time, cloud storage companies were trying to build legitimate businesses, allowing consumers and businesses to store their own files in the cloud, rather than on their own hard drive. However, technologically, there is little to distinguish a cloud storage service from a cyberlocker, and as the entertainment industry became more vocal about the issue, some services started to change their policies.
Dropbox is one of the most well-known cloud storage companies. Wishing to avoid facing comparisons to cyberlockers built off of the sharing of infringing works, the company put in place a system to make it more difficult to use the service for sharing works in an infringing manner, while still allowing the service to be useful for storing personal files.
Specifically, if Dropbox received a DMCA takedown notice for a specific file, the company would create a hash (a computer generated identifier that would be the same for all identical files), and then if you shared any file from your Dropbox to someone else (such as by creating a shareable link), Dropbox would create a hash and check it against the database of hashes of files that had previously received DMCA takedown notices.
This got some attention in 2014 when a user on Twitter highlighted that he had been blocked from sharing a file because of this, raising concerns that Dropbox was looking at everyone’s files.
Dropbox quickly clarified that it is not scanning every file, nor was it looking at everyone’s files. Rather it was using an automated process to check files that were being shared and see if they matched files that had previously been subject to a DMCA takedown notice:
“There have been some questions around how we handle copyright notices. We sometimes receive DMCA notices to remove links on copyright grounds. When we receive these, we process them according to the law and disable the identified link. We have an automated system that then prevents other users from sharing the identical material using another Dropbox link. This is done by comparing file hashes. We don’t look at the files in your private folders and are committed to keeping your stuff safe.”
Decisions to be made by Dropbox:
- How proactive does the company need to be to remain on the compliant side of copyright law?
- Will blocking sharing of files that might be shared for non-infringing purposes, make the service less useful to users?
- What steps are necessary to avoid being accused of supporting infringement by traditional copyright industries?
- There may be legitimate, non-infringing reasons to share a file that in other contexts may be infringing.
- Is it appropriate for a company to block that possibility?
- What measures could be put in place to allow for those possibilities?
- The recording and movie industries have a history of being aggressive litigants against technologies used for infringement. What level of response is appropriate for new startups and technology companies?
- Will there be limitations on innovation to services like cloud storage imposed by the need to avoid angering certain industries?
Originally published on the Trust & Safety Foundation website.
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: cloud storage, copyright, cyberlockers, dmca, hashes, private storage, takedowns
Companies: dropbox
Reader Comments
Subscribe: RSS
View by: Time | Thread
Related:
https://blog.cloudflare.com/the-csam-scanning-tool/
[ link to this | view in chronology ]
Re:
1) If you're running a site which regularly gets large amounts of child porn, you're either big enough to do something in-house (which doesn't have Cloudflare invade your privacy), or you're running your site horribly, horribly wrong.
2) I really don't like these sorts of tools / scanners. Wikipedia got blocked in the United Kingdom in the past, because they had an image of a naked child on a band cover from decades ago. There's also the issue of cartoon images. Even if their filters aren't that bad now, filters usually converge on censoring everything.
3) As detailed in some articles, criminals can trivially bypass these filters with simple modifications to content. They also don't work on video. They provide a false sense of security, and are often overstated as a panacea.
4) These filters are curated by a non-transparent and unaccountable non-profit organization (NCMEC), which has supposedly gotten a teenager in Costa Rica arrested for posting a cartoon image on her blog.
https://www.article19.org/resources/inhope-members-reporting-artwork-as-child-sexual-abuse/
An European free speech organization, Article19, and two other organizations wrote a letter to them, and a collaborative body they're part of (INHOPE) to try to dissuade them from performing such actions in the future.
[ link to this | view in chronology ]
Similar situation
I had a similar situation with MediaFire. After FileDen shut down, I switched to MediaFire as my primary cyberlocker for sharing (only files I can share... nothing illegal). MediaFire took down one of my files when they got a DMCA notice that was clearly just doing partial mapping to file names. I was sharing a PSP port of the open source emulator Basilisk, while the file was flagged as being the movie Basilisk. I changed the name of the file and reupped it. Lesson - give your files names that can't possibly match to anything commercial less the stupid bots used for sending DMCA notices take notice.
[ link to this | view in chronology ]
Re: Similar situation
Lesson: Bust up the common language into an ever-shrinking public domain for everyone — and an ever-growing list of reserved words owned by unthinking media corps and their lawyer-bot armies under a bastard quasi-trademark regime!
Intellectual property. Rule of law.
[ link to this | view in chronology ]
Re: Re: Similar situation
More like rule of thought. Basic language is a requirement to communicate.
Though I'm sure IP maximalists would love being able to claim that they own every possible sentence that could ever be produced. The sad reality for them is that our society has a need to communicate to function properly.
IP needs to go the way of the dodo. It's nothing more than a tool to oppress and stymie at this point.
[ link to this | view in chronology ]
Re: Re: Re: Similar situation
But the common tongue just doesn't really need
™ba$ili$k™
[ link to this | view in chronology ]
Re: Re: Re: Similar situation
So does MD4, MD5, and SHA-1.
[ link to this | view in chronology ]
Rar the file with encryption. Different password equals different hash.
[ link to this | view in chronology ]