Content Moderation Case Studies: Copyright Claims On White Noise (2018)
from the white-noise-is-public-domain dept
Summary: Every platform hosting user generated content these days is pretty much required (usually by law) to have policies in place to deal with copyright-infringing material. However, not all content on these platforms is covered by copyright, and that can potentially lead to complications, since policies are often built off of the assumption that everything must be covered by some form of copyright.
Australia-based music technologist Sebastian Tomczak, who has a PhD in computer generated music, created from scratch a 10 hour “low level white noise” recording, which he placed on YouTube. He created the file himself, then made a video version of it, and posted it to YouTube. In early 2018, he discovered that there had been five separate copyright claims on the video from four separate copyright holders.
Each of the claims argued that other videos of white noise held the copyright on white noise, and that Tomczak’s video infringed on their own. Amusingly, each claim designates which short segment of the 10 hour video infringes on their own work -- even though the entire 10 hours is literally the same white noise.
None of the claims demanded that Tomczak’s video be taken down, but rather sought to “monetize” it under YouTube’s ContentID offering, which allows copyright holders to leave up videos they claim are infringing but divert any advertising revenue to the copyright holder.
Somewhat incredibly, one copyright holder claims that Tomczak’s video infringes on two separate videos of their own, both of which also offer white noise.
One company involved – Catapult Distribution – say that Tomczak’s composition infringes on the copyrights of “White Noise Sleep Therapy”, a client selling the title “Majestic Ocean Waves”. It also manages to do the same for the company’s “Soothing Baby Sleep” title. The other complaints come from Merlin Symphonic Distribution and Dig Dis for similar works .
It appears that all of the claims were automated claims, using various services that scan videos for similarities. However, it does not appear that any of those services first check if the originating videos actually involve a valid copyright in the first place. Instead, they often are based on an entire account, and just search for any similar videos, whether or not there is a valid copyright.
Decisions to be made by YouTube:
- Is white noise even covered by copyright?
- Should the platform allow users to claim the monetization rights on other similar videos in which there is no valid copyright?
- If there are multiple copyright claims (and monetization claims) on the same video, how is it determined who has the rights and who gets to monetize?
- Should automated systems be allowed to make copyright claims without any regard to actual copyright status?
- If copyright laws and policies are built on the assumption that every piece of content is covered by copyright, how should internet websites deal with situations in which there does not appear to be a valid copyright?
- What are the long term implications of automated systems that do not involve any actual lawyers or experts reviewing either copyright takedown or monetization requests?
“In any of the cases where I think a given claim would be an issue, I would dispute it by saying I could either prove that I have made the work, have the original materials that generated the work, or could show enough of the components included in the work to prove originality. This has always been successful for me and I hope it will be in this case as well.”
Indeed, a few days after he contested the claims (and those claims received widespread press attention), YouTube did release all of the claims on the white noise video. Tomczak has separately argued that this case -- even with the final outcome -- suggests that parts of the system need to change.
"Hopefully cases like these with the white noise, which shows how sort of broken their copyright system is, can shed some light on it or get YouTube to think about changing their system," he said.
Originally posted on the Trust & Safety Foundation website.
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: content moderation, contentid, copyright, dmca, white noise
Companies: youtube
Reader Comments
Subscribe: RSS
View by: Time | Thread
Is white noise even covered by copyright?
Compendium of U.S. Copyright Office Practices (3rd ed.)
Chapter 300 - Copyrightable Authorship: What Can Be Registered
[ link to this | view in chronology ]
Re: Is white noise even covered by copyright?
The problem with that is that Dr. Sebastian "Little Scale" Tomczak is Australian, and would have to be governed by Australian copyright laws instead of American ones.
[ link to this | view in chronology ]
Re: Re: Is white noise even covered by copyright?
Not in the United States. A foreign copyright in the United States is still governed by U.S. copyright law. Material that is not copyrightable within the U.S. is still not copyrightable in the U.S.
For instance, it doesn't matter where Naruto is domiciled. The macaque does not have an enforceable copyright in the United States. Period.
[ link to this | view in chronology ]
Re: Re: Re: Is white noise even covered by copyright?
Fair enough.
[ link to this | view in chronology ]
Re: Re: Re: Is white noise even covered by copyright?
The problem here comes with the RIAA and MPAA spreading around the world.
It wasnt BAd in the past, when someone could jump to japan and get a Bunch of Anime CHEAP, or INDIA and Bollywood, and bring them to the USA, and we didnt worry about OTHERS CR. AND THEY DIDNT WORRY ABOUT OURS.
And how tech got spread around the world.
For some reasoning, this is whats happening NOW with international CR, and the USA corps Bitching that China is stealing Them.
China has been BUYING TONS of them from around the world, and Most USA corps with CR, have to use it/Supply it to have things made. Which means China gets a copy. What happens after that Point is interesting.
[ link to this | view in chronology ]
Re: Re: Is white noise even covered by copyright?
See 17 U.S. Code § 104 - Subject matter of copyright: National origin
Also see Naruto stories at Techdirt.
[ link to this | view in chronology ]
Re: Is white noise even covered by copyright?
Irrelevant question.
Copyright trolls only care about money and control. In this case money. They want profits, and because they can leverage ContentID to send profits their way for nothing, they do so. It doesn't matter that the content isn't copyrightable. What matters is that the ContentID system will gladly give them money if they demand it. So long as the benefit of them doing this outweighs the penalties, they will write off the penalties as the cost of doing business.
[ link to this | view in chronology ]
Re: Re: Is white noise even covered by copyright?
and they told me that crime does not pay
[ link to this | view in chronology ]
Signal processing police calls.
If there are substantial "literally the same" parts, we are not talking about white noise. Its features may be governed by the same statistics, but that's not the same as being the same.
[ link to this | view in chronology ]
Re: Signal processing police calls.
From the (linked above) TorrentFreak article:
[ link to this | view in chronology ]
Re: Re: Signal processing police calls.
And your point was? That the whole sequence is governed by the same statistics? That's not the same as being the same.
[ link to this | view in chronology ]
Re: Re: Re: Signal processing police calls.
You may read Sebastian Tomczak's explanation for yourself. You may also consider whether the facts that he has related there are consistent with your previous understanding.
Or you might persist in being obtuse.
[ link to this | view in chronology ]
Re: Signal processing police calls.
You are correct in that two samples of random noise will not be literally the same. Use of the word literally here is the classic definition that does not include figuratively.
If one were to look at the characteristics of each sample, one could make the claim that they were characteristically the same. They used the same pseudo random number generator and the same application with the same inputs, etc.
It's not the same as being the same - lol
[ link to this | view in chronology ]
Re: Re: Signal processing police calls.
If the run (recording) time is long enough, those generators repeat their output; as they do if started with the same initial values. I suspect 10 hours is long enough for several repeats of the generated sequence.
[ link to this | view in chronology ]
Re: Re: Re: Signal processing police calls.
Actually, repeating over a 10 hour period is unlikely. Assuming a 32 bit PRNG, we have a period of 4,294,967,296 cycles. Assuming CDROM quality of 44,100 samples per second, that would take 97,391.55 seconds before repeating. Or about 27 hours. If a larger PRNG was used, the the period would be larger. For instance, the period of a 64 bit generator would be over 13 million years.
[ link to this | view in chronology ]
Re: Re: Re: Re: Signal processing police calls.
The glibc sources are, of course, widely available. Here are current sources for:
From the comments on lines 67-69 of random.c (which are repeated on lines 68-70 of random_r.c):
Do browse the source to see the rest of that comment, including the discussion of the period of the generator.
[ link to this | view in chronology ]
Re: Re: Re: Re: Re: Signal processing police calls.
Although, looking more carefully at Sebastian Tomczak's explanation, and particularly with regards to his use of ScreenFlow, it appears likely that he was running that software on a Mac platform. Obviously, that does not necessarily mean that he ran Audacity on Mac. But it tends towards that guess.
[ link to this | view in chronology ]
Re: Re: Re: Re: Re: Signal processing police calls.
And the point is?
Given the sources you've indicated, the PRNG used has far more than 32 bits of state and therefore a period far exceeding the rather minimal period stated a few posts back. Using the default if 128 bytes and assuming one ling word used for management data, the period would be on the order of 2^960, or about 10^289. Call it about 10^277 years. And that's the low end. If those 64 bits of management data were used to increase the period, it has an upper limit of 2^1024, or 10^308, or about 10^296 years as the upper limit. In any case, 32 bits of state gives a period well beyond 10 hours and increasing the state size just increases the period to ludicrous durations.
[ link to this | view in chronology ]
Re: Re: Re: Re: Re: Re: Signal processing police calls.
Continuing to quote from the comment in random_r.c, now from lines 85-86:
Then looking down at the actual code on lines 121-2, it looks to me like a default of 128 bytes of state corresponds to a degree 31 polynomial(*). Plugging that back into the formula given in the comment would be 31*(2**31 - 1) or roughly 2**5 * 2**31 = 2**36.
―
(*) Clearly, in a LP64 model, 128 bytes isn't going to store 31 longs sucessfully, although it will store 31 int32_t's. The comment about “longs” just seems to differ from the actual code in that model. Presumably the comment dates from an era when (I)LP32 was the prevalent model.
[ link to this | view in chronology ]
Re: Re: Re: Signal processing police calls.
A very quick browse through the current Audacity source turns up line 127 in noise.cpp:
buffer[i] = mAmp * ((rand() / div) - 1.0f);
At first glance, this looks to me like the white noise generator uses the system rand() function.
But note that this is the first time I've ever looked at the Audacity source, and I may be mis-reading it horribly. And, even if I'm reading the current source correctly, this is certainly a later version than was used a couple years ago.
[ link to this | view in chronology ]
Re: Re: Re: Signal processing police calls.
Modern pseudo random number generation includes the possible use of noise encountered in the hardware to augment the sequence in addition to other sources. The specific item used here may allow such options, I do not know, but the claim was not limited to this instance.
[ link to this | view in chronology ]
Re: Re: Re: Re: Signal processing police calls.
When using the rand() function from the standard library, the C standard requires the sequence to be repeatable.
Since the C standard documents are not freely available, here's a link for the final C11 committee draft. See “7.22.2 Pseudo-random sequence generation functions”. Or, more conveniently in this case, see the POSIX standard, since in this respect, POSIX is “aligned with the ISO C standard.”
(Emphasis.)
I do recognize that your “claim” is much more hand-wavy and mushy about prng's in general, than intended to address this specific instance.
But in this specific instance, where Audacity is using rand(), while non-repeatable hardware-generated randomness might be used to seed the prng, the standards practically prohibit hardware noise augmentation of the resulting sequence.
[ link to this | view in chronology ]
Re: Re: Re: Re: Re: Signal processing police calls.
For what it's worth ....
OP comment to which I replied:
"If there are substantial "literally the same" parts, we are not talking about white noise."
Sorry for the hand wavy and mushy text as you put it, but I was not addressing the specific usage in this case because the OP did not address the specific usage in this case.
Subsequent comments pointed out that depreciated versions of pseudo random number generation repeat over time. I pointed out this particular problem has been addressed.
[ link to this | view in chronology ]
Re: Re: Re: Re: Re: Re: Signal processing police calls.
It's probably worth clearly distinguishing between repeatable sequences and repeating cycles within seqences.
A repeatable sequence may conceivably be infinite in length without any internal cycles. For example, the sequence of digits or bits of π can be generated and regenerated repeatably, out to any length, even though that potentially-infinite sequence contains no cycles within it.
In many applications, neither of those attributes are necessarily “problems”.
Especially for simulation applications, the capability to repeat a sequence in a later computation run has been considered important enough to require it in standards documents.
In other applications, though, particularly cryptography, totally different qualities may indeed be more important. In cryptographic applications, usually it's most important that the sequence is, in some sense, “unpredictable”. Often, the desired cryptographic qualities of random numbers may have formal definitions that are ill-fit to any pseudo-random number generation process. There are quite a few cryptographic “problems”.
For a white noise application, I'd think that the most important quality would be that the sequence is fairly gaussian-distributed in both time and frequency domains (within bandwidth limitations). Although, I do suspect that some rather non-gaussian distributions may sound “whitish” enough for casual listening.
[ link to this | view in chronology ]
Re: Re: Re: Re: Re: Re: Re: Signal processing police calls.
On re-reading, that is an extremely clumsy way to say flat frequency distribution over some band-limited region.
Sorry. That's what I meant.
[ link to this | view in chronology ]
Re: Re: Re: Re: Re: Re: Re: Signal processing police calls.
It is also worth pointing out that random number generation is not limited to software based systems.
"In computing, a hardware random number generator (HRNG) or true random number generator (TRNG) is a device that generates random numbers from a physical process, rather than by means of an algorithm. Such devices are often based on microscopic phenomena that generate low-level, statistically random "noise" signals, such as thermal noise, the photoelectric effect, involving a beam splitter, and other quantum phenomena. These stochastic processes are, in theory, completely unpredictable, and the theory's assertions of unpredictability are subject to experimental test. This is in contrast to the paradigm of pseudo-random number generation commonly implemented in computer programs. "
Hardware random number generator
[ link to this | view in chronology ]
'You know what, looks like we were mistaken...'
You could solve a massive amount of copyright related issues and abuses if you simply made the law equal, such that the penalties for issuing bogus claims were treated and punished just as harshly as claims of infringement.
This would not only make people much more careful regarding what they claimed was infringement but by making the penalties equal it would also provide an incentive to bring penalties for infringement down to sane levels, because someone sending out claims would always have to face the possibility that they might be on the receiving end of the penalty.
[ link to this | view in chronology ]
I remember when Techdirt reported on this. Needless to say, antidirt and the usual copyright advocates (read: fanatics) got their panties so twisted that someone might question the validity of copyright, they ended up rediscovering the Gordian Knot. The reason why you haven't heard about it is because they hold the copyright to it.
[ link to this | view in chronology ]
A couple years ago, I filmed my back window, at night in the summer just to record how loud the insects were. I uploaded it to YouTube and was immediately informed that the audio had been muted due to a copyright claim.
There was literally no other sound other than the insects and a slight hum from my computer's fans.
[ link to this | view in chronology ]
Re:
Well, what do you expect when uploading a soundtrack from the beetles?
[ link to this | view in chronology ]
Re: Re:
Badum tsh?
[ link to this | view in chronology ]
Hmm
If you click on the link to the video today it looks like it's been claimed by somebody else again... claimed by the creator a white noise video that was made 2 years later.
It's insane.
[ link to this | view in chronology ]
Re: Hmm
The date of upload doesn't have to be the same as the date of creation/copyright.
[ link to this | view in chronology ]
Re: Re: Hmm
I'm not talking about the date of upload. The other video says it was created in 2017, which is 2 years later. Apart from that it shouldn't have been claimed at all.
[ link to this | view in chronology ]
Re: Re: Re: Hmm
Copyright wasn't claimed in the time domain. It was claimed in the frequency domain.
(Don't worry if you don't really get the joke—it's a pretty weak one, even for a technical joke. Just rate it at about 0.7 chuckle, if you need to ask.)
[ link to this | view in chronology ]
Apparently everything is now copyrighted (copywrote?) at this point in time and no one can create anything in the future due to copyright .... isn't this the opposite of what copyright is supposed to be doing?
[ link to this | view in chronology ]
Re:
Surely copyrightten (copywritten).
Surely copyright on white noise is needed to incentivize the creation of more white noise.
[ link to this | view in chronology ]
Re: Re:
The past tense of "copyright" is not "copywritten". Those aren't even the same verbs.
[ link to this | view in chronology ]
Re: Re: Re:
Yes, but the suggested incorrection of "everything is copyrighted" to "everything is copywrote" would also replace the correct past participle form with a simple past tense form.
For "right", the past tense form and the past participle form are both "righted", but for "write", one is "wrote" and the other is "written". Right?
[ link to this | view in chronology ]
Re: Re: Re: Re:
It was a bad joke
[ link to this | view in chronology ]
Write.
[ link to this | view in chronology ]
Re: Right?
This is otherwise known as the Wright write "right" write rite.
[ link to this | view in chronology ]
Re: Re: Right?
Drive on the Parkway and park in the Driveway.
[ link to this | view in chronology ]