Did Watson Succeed On Jeopardy By Infringing Copyrights?
from the good-questions dept
An anonymous tipster points us to a really interesting comment by Peter Hirtle on a Laboratorium.net post discussing Watson, the Jeopardy-playing computer, where he asks whether or not Watson infringes on copyrights:From IBM’s Watson Supercomputer Wins Practice Jeopardy Round in Wired Magazine: "Researchers scanned some 200 million pages of content -- or the equivalent of about one million books -- into the system, including books, movie scripts and entire encyclopedias."This is a really good point and (once again) highlights the ridiculousness of copyright in certain circumstances. Of course, your viewpoint on this may depend heavily on whether or not you believe Google's book scanning infringed on copyright (I don't). But, for those who do, do you believe that IBM's scanning of books does infringe? Technically, it's the same basic process. In fact, you could argue that with Watson it's much more involved, because Watson then actually made use of the actual data to a much greater extent than Google did with Google books.
It seems unlikely that IBM got permission to scan one million books. Can we expect soon a lawsuit from the Author's Guild against IBM and the producers of Jeopardy! (which, after all, is profiting from this scanning)?
But, really, a bigger point is how this highlights one of the oddities of copyright. If you read something and retain it in your brain, is that infringement? Most people say no, of course. Now, if a computer "reads" something and retains it in memory is that infringement? Well, that's a bit more borderline according to many. So take it a step further and as we reach the point that people can augment their wetware brains with computer brains... when do we hit a copyright infringement issue?
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: copyright, jeopardy, scanning, watson
Companies: ibm
Reader Comments
Subscribe: RSS
View by: Time | Thread
[ link to this | view in chronology ]
Re:
Is the only distinction that Watson is a computer? Or is it the scale of the scanning that makes this copyright infringement?
[ link to this | view in chronology ]
Re: Re:
[ link to this | view in chronology ]
Re: Re:
scale is often trotted out as the difference between home taping and file sharing.
[ link to this | view in chronology ]
Re: Re: Re:
1 x 0 = 0
1,000,000 x 0 = 0
[ link to this | view in chronology ]
Re: Re: Re: Re:
1 x 40,036,695,896* = 40,036,695,896
1,000,000 x 40,036,695,896* = total cost of copyright infringement in the US EVERY DAY!
*this number must be correct as I plucked it out of thin air.
[ link to this | view in chronology ]
Re:
[ link to this | view in chronology ]
Re: Re:
[ link to this | view in chronology ]
Re: Re: Re:
[ link to this | view in chronology ]
Supercomputing pocket copiers
Well, think about it. Once you've copied all your stuff in, you no longer need the supercomputing pocket copier. So after a while, you get tired of carrying it in your pocket and you give it to someone who hasn't scanned all their material yet. And so on and so forth.
Supercomputing pocket copiers are something of a self-limiting market,
[ link to this | view in chronology ]
Re: Supercomputing pocket copiers
[ link to this | view in chronology ]
Re:
[ link to this | view in chronology ]
Re:
[ link to this | view in chronology ]
Re: Re:
[ link to this | view in chronology ]
Re: crade
does it become infringement when : I type the book out? or when I let people know I've created this physical typed copy? or only when I finally distribute this new physical copy?
???
The key is : "scanning a book in" is different from "printing a book out."
Google book scanning facilitates the process of printing out my own hard copies. But only I should be at fault if I complete this infringement process by printing out a physical copy.
Technically computers aren't storing physical copies. they store bits (quantities of electricity if you prefer), which can be interpreted visually as similar to paper. but this "quantity of electricity" is notably different from a physical copy. In my opinion, quantities of electricity should not be governed by archaic laws, which were created to protect the owners of paper-printing presses.
I know I'm being weird here. but. curious how I'm missing the point, if I am.
[ link to this | view in chronology ]
Re: Re: crade
That is why many countries put in their books that any necessary form for electronics to work should not be considered copyright infringement explicitly, because otherwise every one is liable.
Frankly I think copyright is just idiotic.
[ link to this | view in chronology ]
Re:
How can anyone possible believe that memorizing a book doesn't infringe on copyright?
[ link to this | view in chronology ]
Re:
What copies? They aren't distributing copies of the scanned material (other than works in the public domain). They are indexing and cataloging them so they can be searched by keywords and phrases, just like a librarian does in a card catalog (only a lot more detailed).
[ link to this | view in chronology ]
Did they distribute
Does anyone even remotely conceive that the material was disclosed (during Jeopardy play)in any form other than fair use? Does anyone think IBM is going to mass manufacturer this system with those works included?
That's like saying that Google, Microsoft, or any other internet search index database is considered copyright infringement.
[ link to this | view in chronology ]
Re: Did they distribute
You ask "did they distribute?", but do alleged fileshares get the same benefit of doubt? If they don't distribute, is the copying (for themselves) okay?
[ link to this | view in chronology ]
Re: Re: Did they distribute
[ link to this | view in chronology ]
Re: Re: Re: Did they distribute
[ link to this | view in chronology ]
Re: Re: Re: Re: Did they distribute
Do you know what you are downloading before receiving and viewing all of it?
If yes, then:
Why do so many knowingly download virus'?
[ link to this | view in chronology ]
Re: Re: Re: Re: Re: Did they distribute
If your computer has a virus and you send an email containing the virus are you infringing on the programmers copyrights?
[ link to this | view in chronology ]
Re: Re: Re: Re: Re: Re: Did they distribute
although sadly, i could certainly see someone trying that these days
[ link to this | view in chronology ]
Re: Re: Re: Re: Re: Re: Re: Did they distribute
Yes, against anti-virus software for tampering with their product.
[ link to this | view in chronology ]
Re: Did they distribute
So, is Google infringing and thus Watson is as well, or is Watson in the clear and Google is as well?
[ link to this | view in chronology ]
Re: Did they distribute
IBM is full of freetards that need to pay up.
[ link to this | view in chronology ]
So can a computer be guilty of copyright infingment and how can it be made to pay if it is?
[ link to this | view in chronology ]
Re:
[ link to this | view in chronology ]
Re:
Hard to say, but I guess it would depend on what was thought about copyright infringement by the programmers that built Watson.
After all it was Dr. Richard Daystrom's human engrams that caused the M5 computer to try and self terminate for the crime of murder after attacking the U.S.S. Lexington, killing 53, then destroying all starfleet personnel aboard the U.S.S. Excalibur.
If Watson had the engrams from someone like the MPAA or RIAA, then the supercomputer is turning over all relevant logs and data right now (to help in the lawsuits against IBM that's are sure to come from everywhere) before self terminating.
R.I.P
WATSON
2006-2011
Jeopardy Champion - Copyright Infringer
"As smart as I was, I didn't think of the children"
[ link to this | view in chronology ]
when do we hit a copyright infringement issue?
Thats the way you see it, right patent trolls?
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Re:
Sponsored by your friends at the new joint DHS-RIAA.
[ link to this | view in chronology ]
Typo
The gentleman's name is actually "Peter Hirtle". I got to meet him a couple years ago at Cornell... very bright guy and well versed in copyright issues (member of the Section 108 Study Group, among other things).
...brig
[ link to this | view in chronology ]
Re: Typo
[ link to this | view in chronology ]
I had it wrong.
How in the hell does that actually mesh with the real world?
[ link to this | view in chronology ]
Re: I had it wrong.
[ link to this | view in chronology ]
Re: I had it wrong.
It doesn't. That's how you know it's copyright legislation.
[ link to this | view in chronology ]
AI
[ link to this | view in chronology ]
Format Shifting?
[ link to this | view in chronology ]
Re: Format Shifting?
That's the question, isn't it?
[ link to this | view in chronology ]
Re: Format Shifting?
Consider: you read a work in French (which you read fluently). For giggles, you take the book and pen and paper and write an English translation. There is little question that your English manuscript is a copyright infringement. Isn't a translation from English to 1s and 0s (by way of scanning and OCR) the same sort of thing?
Also, this was not for personal use. An argument could be made that it was for scientific research or educational purposes, but it would be a bit thin: the fact is it was done as a publicity stunt, to show off how great IBM's AI program has become. Not sure what the fair use basis would be.
[ link to this | view in chronology ]
Research?
[ link to this | view in chronology ]
As a believer
I also think, in this case, you could make an argument that the inputting of all of the books was essentially inputting facts about human history necessary to understand language, and therefore the restriction of the use would hinder the arts(or even science).
Either way, pretty good thought experiment.
[ link to this | view in chronology ]
Re: As a believer
But it did compete with people, depriving them of money that should be rightfully theirs.
First they took our jobs, now they're taking our money and next they will be using us as batteries in the Matrix.
At this rate, it won't be long before we replace the phrase "Think of the Children" with "Think of the Humans", but it will be way too late by then.
/sarc
I'll wait for the day when an upgraded version of Watson answers an "Audio Daily Double" or "Video Daily Double" question. Then the copyright police lawyers will come down like a ton of heat sinks, for unauthorized use of copyrighted material copied into Watson's data banks.
[ link to this | view in chronology ]
Re: Re: As a believer
[ link to this | view in chronology ]
Re: Re: As a believer
[ link to this | view in chronology ]
http://www.infoworld.com/t/business-intelligenceanalytics/how-ibms-watson-hammered-its-je opardy-foes-798
"Unlike Google and its Books project, IBM chose to obey licensing rules. "If we don't have a license, we don't have it," notes Chu-Carroll."
[ link to this | view in chronology ]
Re:
"... primarily free text that is available to us: dictionaries, encyclopedias, newspaper articles, things that...cover 'Jeopardy' topics well."
I'm not entirely sure what they meant there, but those things are not "free" of copyright restrictions... and yet they seem to think that they are.
I would definitely like more details on this, because despite their claims, it seems unlikely (and indeed almost impossible) that they secured proper licenses for ALL of the information they used...
[ link to this | view in chronology ]
Re: Re:
[ link to this | view in chronology ]
Re: Re: Re:
[ link to this | view in chronology ]
What if...
Then it would be similar to a child listening to a book over and over again until they memorized it. Does memorizing a book infringe?
So by nature, Watson is only reading and memorizing the book the only way it knows possible. So unless Watson decided to read the book out loud verbatim this should be allowed under fair use.
[ link to this | view in chronology ]
Some Reasons
- There is a real sense in which the work has been COPIED onto a hard-drive when a computer 'reads' it. That's not so clear with a human - if you wanted to remember a quote on page 53 of the last novel you read, where would you turn to for reference? But if you had a hard-drive for a memory, where would you turn then?
- Now true, some people have impressive recall - stories of medieval scholars who could recite the Koran or Bible abound - but a computer is guaranteed to be able to perfectly reproduce what it 'reads' - is even programmed to be able to do so.
- Most significantly, computer hard drives can in turn be 'read' by third-parties, even against the computers 'will'. You can't access the contents of a human mind from the outside; you can a computer's.
(None of this is to say copyright makes sense or etc. But you CAN find a certain logic to this issue if you accept the pro-copyright stance.)
[ link to this | view in chronology ]
Re: Some Reasons
However it isn't hard to tweak the thought experiment a bit more to produce a contradiction.
A competitive learning network, like a human brain, doesn't exactly remember everything that is put in front of it so you might argue that it is sufficiently like our human brains to be in the clear - however this fact is dependent on the exact values of certain parameters so it would be incredibly difficult to define exactly where the threshold ought to be.
Copyright was invented in a world where the difference between reading a book and printing copies of it was clear and obvious. Modern technology has created a continuum where ultimately no-one can really say with any confidence whether any particular activity is OK or not.
[ link to this | view in chronology ]
Re: Some Reasons
You would turn to the long-term storage cells of your brain, which are clearly a storage medium.
...a computer is guaranteed to be able to perfectly reproduce what it 'reads' - is even programmed to be able to do so.
Really? I recently lost a whole hard drive full of information.
Most significantly, computer hard drives can in turn be 'read' by third-parties, even against the computers 'will'. You can't access the contents of a human mind from the outside; you can a computer's.
No, you can only change the hard drive's "will" and make it "want" to give you the information. Same thing with humans.
So you see, there's really no difference after all.
But you CAN find a certain logic to this issue if you accept the pro-copyright stance.
A fundamental difference between human and non-human memorization? Not really. And that's one of the fundamental problems with the notion of copyright.
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Re:
If you can't extract the data itself from such a system is this infringement or not?
If they distribute the database on an academic forum for other researchers to compare different algorithms with the same data (a vital part of the research - if they don't they their research is of little value to the advance of knowledge) .
[ link to this | view in chronology ]
Re: Degrading profit
[ link to this | view in chronology ]
Re: Degrading profit
So, I go to my Wordpress blog, and write a blog entry about Egregious Error Book, including some of the most error-filled code examples, and calling attention to how nit-witted the errors are.
The nitwit who wrote Egregious Error Book has his or her profit degraded by my commentary on the error-filled examples.
By your definition, this is copyright infringement. But I believe that legally, this is "fair use" as I'm commenting specifically on the error-filled examples.
Stop me if I'm wrong, anybody. Copyright doesn't exist merely to guarantee profit.
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Watson versus Google Books
Google is in contravention of that said LAW, but only on a slight technicality if it exists. It seeks to justify its position of re-distribution of these works citing its the democratization of information. Bringing equality and social educational benefit to all, which indeed it may. I'm sure many in third world or developing counties would agree, as do the freeloaders.
The acquisition of information [not recorded] and its re-utilization should not be called to answer the question of copyrights. Otherwise every human being on the planet is doomed to ignorance and lack of education.
Its intent was to prevent its retransmission word for word or duplication without consent or permission from its authors. How the original copy was obtained and whether it contained the clause NOT TO BE TRANSMITTED ELECTRONICALLY is where it can get ugly.
Was Watson in contravention of the LAW, NO! However, how they obtained this information is another question. For all we know it was Google Books! Should its users find ways of re-utilizing its information in new ways, GREAT. Should they profit from it, good for them. Watson did not RETRANSMIT the information word for word and scanning can be said to be a type of reading. Watson was not designed to be GOOGLE BOOKS.
The moral ethical question as to whether Google Books should exist is a slippery one. Though they don't profit from it directly, they do via advertisements. But, if their considerable efforts to protect the works via a read only mechanism, therefore not allowing replication in recorded forms. Then they are no more negligent of the LAW than is any Civic Municipal Library. But the library bought their original copy. Thats where the grey area is. ;)
Are they going to sue the Library?
No shit Sherlock.
Greed robs the corporate leadership of clarity, because they see Google make lots of money. But I don't see any subject to clauses or exceptions to the rule of LAW.
If anything, the LAW would need to be revised to suit the greedy as to redefine what was "recorded" and distributed.
Because some are implying the information displayed on screens is enough to warrant a protracted legal battle, which clearly is not.
Today technology is leaping well ahead of the terms referenced in LAW. With new types of AI and OCR technology manipulating information at the minutest level, how one ever hopes to control anything is beyond me and the LAW.
[ link to this | view in chronology ]
Re: Watson versus Google Books
(is this a known quantity, legally?)
If I read a book, have I "duplicated" it in my brain (chemically)? What if I have a great memory, and can type it out anytime I want to?
I am now like google books,right? a tool that might assist in the infringement act?
... I don't see it. don't agree.
Technically, isn't scanning a book the process of interpreting light into quantity of electricity? Is it "duplicating" to turn physical ink-on-paper into a virtual string-of-bits?
Feel like I'm missing the point (and maybe some precedents), but can't think on it further than this.
[ link to this | view in chronology ]
Copyright on 'thoughts'?
Or would a writer be wanting money from me every time I think of a passage of a book and explain it to a co-worker?
Would a songwriter want money from me, when I'm singing a song I remember? Oh wait, the RIAA probably does want that. Bad example.
[ link to this | view in chronology ]
Re: Copyright on 'thoughts'?
Biologists still aren't sure how human memory works, or the human mind in general for that matter. But programmers are fully aware of how a computer's memory works, because it's an invention of humans. We know that witness recall of events is terribly fuzzy in many court cases, so it's definitely not true that humans copy every little thing we've ever seen or learned.
The most current studies seem to suggest that our memory is nothing like a filing cabinet that stores info. We recreate memories on the fly through associating fragments and our mind "fills in the blanks" so to speak. That would make all thoughts new creations. Computers don't do that. All they can really do (as of now) is access data that's been copied to their hard drive and make calculations (that we also feed into them).
[ link to this | view in chronology ]
Re: Re: Copyright on 'thoughts'?
You apparently don't know much about computers and have never heard of distributed storage if you think that makes humans different.
Biologists still aren't sure how human memory works, or the human mind in general for that matter.
But neuroscientists know quite a bit about it. Still, there is scarcely anything about which absolutely everything is known.
But programmers are fully aware of how a computer's memory works, because it's an invention of humans.
No, they don't. They just know how to use it. Computer memory is based on physics are there some aspects of the physics that even the memory designers do not completely understand. But, they know how to use it.
We know that witness recall of events is terribly fuzzy in many court cases, so it's definitely not true that humans copy every little thing we've ever seen or learned.
Computers erase information all the time. Same again.
The most current studies seem to suggest that our memory is nothing like a filing cabinet that stores info. We recreate memories on the fly through associating fragments and our mind "fills in the blanks" so to speak. That would make all thoughts new creations. Computers don't do that.
They can if they're programmed to. Again, it is clear that you don't know much about computers.
All they can really do (as of now) is access data that's been copied to their hard drive and make calculations (that we also feed into them).
As true of humans as it is of computers.
[ link to this | view in chronology ]
In Regard to Electricity vs Print
Remember E=MC˛ ?
Matter IS energy. Energy is matter. TECHNICALLY, there's no difference between electricity and paper and flesh. It's just a different expression of the same underlying energy (or an expression of a probability or wave-function, if you're so inclined). Which makes moot a lot of arguments about how data on the net is somehow different from data on paper. Paper, too, is full of electrons and held together by the same forces that drive an electric impulse.
I'm not saying I AGREE with that notion, in a practical sense. Obviously we don't live in the extremity of a black hole, or inside the Big Singularity, where such distinctions between matter and energy become frivolous. I just wanted to point out that it doesn't help an anti-copyright argument to get technical and only go halfway with it.
For the record: it's patently ludicrous to suggest that what Watson does is wrong or should be stopped, even though in a strictly legal sense it may be infringent (but again, this is because we KNOW how a computer's memory works and that it *does* create and store and will later "distribute" a virtually identical copy of whatever info is fed into it).
I don't like "gotcha!" arguments. They don't do much to sway anyone's opinion, they just make you seem like a teenager who just bought his first philosophy book.
[ link to this | view in chronology ]
Re: In Regard to Electricity vs Print
To some degree. We also know how human memory works to some degree. But what difference does that make anyway?
it *does* create and store and will later "distribute" a virtually identical copy of whatever info is fed into it
Humans can do the same thing. So what's your point?
[ link to this | view in chronology ]
Duplication
A book being scanned, then turned into information on a hard drive does not constitute copyright infringement.
Its how this new form is used whether it then beckons the question IF any copyright infringement has occurred. Like say if the new form competes with the original authors work.
If Google books made available all of the authors works readily available and discoverable, so as to render the original form redundant. This would affect the author and be considered good grounds for infringement of Copyrights.
Since they only make part of the works available, and they make efforts to prevent copying easily, they in fact offer a service to the author. By making available works not easily acquired and in a sense promote the author.
By reading one applies a fuzzy logic and interpretation. So the question of duplication cannot be considered. If one were to possess such a memory, where by the person could furnish a copy word for word. Then only when the physical form were produced could there be a possible case for infringement.
I've never met such an individual, only some rare gifted musicians with extraordinary memory. But they preferred to re-invent or interpret the works. As far as some Muslim cleric or muller that could recite the WHOLE Qur'an, well my friend thats Muslim propaganda. Its device is simply to make you believe they're special and GODS chosen. Chosen yes, but not for they think he has intended ;)
[ link to this | view in chronology ]
Profiting Supercomputers
Seriously though, Copyrights are not to protect profits, but the integrity of the authors intent for the works to achieve.
It is a symptom of its cause, that today these LAWS are abused to derive such an effect via this instrument.
To the point where the LAW is now an object of ridicule.
If Copyrights and the LAW cannot keep pace with life and its technology, it becomes redundant and ineffective. Obviously they need to be overhauled, certainly with respect to Patents where the greatest greed has produced abuses of LAW at the highest level. This has created hindrance in the advancement of humanity as a whole.
Patent LAWS intent was to protect the initial investment & research.
But today you have companies seeking to lock out competition completely. Hardly sounds like a free market economy where anti competitive LAWS are made a mockery of and monopolies granted by LAW.
By reviewing a book or authors work, you then affect his profitability. I hardly think it a Copyright infringement.
However, if you slander him with no substance to your argument or position, then I dare say you'll have a protracted legal battle on your hands. But thats slander & liable LAWS.
You have a right to express your thought's. But if your another author who happens to be competition going under a pseudonym, who posts a review not worthy of a true peer review. Look out if they find out who you are! :) $$$
[ link to this | view in chronology ]
"Patent LAWS intent was to protect the initial investment & research.
But today you have companies seeking to lock out competition completely. Hardly sounds like a free market economy where anti competitive LAWS are made a mockery of and monopolies granted by LAW."
I find it constitutionally unacceptable, that in a so called Democratic nation that espouses or professes equality, you have a condition where monopolies exist.
Sounds more at home in a Autocratic Dictatorship or Regime.
[ link to this | view in chronology ]
Copyrights & Patents
"Patent LAWS intent was to protect the initial investment & research.
But today you have companies seeking to lock out competition completely. Hardly sounds like a free market economy where anti competitive LAWS are made a mockery of and monopolies granted by LAW."
I find it constitutionally unacceptable, that in a so called Democratic nation that espouses or professes equality, you have a condition where monopolies exist.
Sounds more at home in a Autocratic Dictatorship or Regime.
[ link to this | view in chronology ]
Coyright
[ link to this | view in chronology ]
Who cares?
[ link to this | view in chronology ]