Kindle Spam Is A Filter Issue, Not A Spam Issue

from the filter-away dept

Mon, Jun 20th 2011 10:02pm — Mike Masnick

Via Slashdot, we learn that spammers have discovered the ability to publish cheap "ebooks":

Thousands of digital books, called ebooks, are being published through Amazon�s self-publishing system each month. Many are not written in the traditional sense.

Instead, they are built using something known as Private Label Rights, or PLR content, which is information that can be bought very cheaply online then reformatted into a digital book.

These ebooks are listed for sale � often at 99 cents � alongside more traditional books on Amazon�s website, forcing readers to plow through many more titles to find what they want.

The article makes it sound like this is a big problem, calling it "the dark side" of self-publishing, but I don't get it. Assuming no one wants this crap, then it seems likely that Amazon will start to filter it out of any search results or top lists.

There is some slightly more legitimate concern about outright plagiarism, where some of these "spammers" are merely copying other books and then re-branding them and selling them as ebooks. But, once again, this seems like a filter problem more than anything else. In fact, I'm a bit surprised that Amazon doesn't do a basic check to make sure the content of an ebook hasn't already been offered by someone else, and do a further investigation if that's the case. Others have suggested that Amazon charge a small fee to upload a book, as that might prevent spammers from going crazy with such copies, and that could make sense as well. I just have trouble believing that this is such a serious "problem" that it can't easily be stopped.

Filed Under: ebooks, kindle, spam
Companies: amazon

26 Comments

If you liked this post, you may also be interested in...

Reader Comments

Subscribe: RSS

View by: Time | Thread

Anonymous Coward, 20 Jun 2011 @ 10:07pm

Calling Mike Masnick.... Clean up needed in the "small-businesses-are-the-backbonehead dept".
[ link to this | view in chronology ]
- Anonymous Coward, 20 Jun 2011 @ 10:16pm
  
  Re:
  Yes, because only big business is a good thing.
  [ link to this | view in chronology ]
Anonymous Coward, 20 Jun 2011 @ 10:47pm

It seems that this is the start of the problem of signal to noise. More and more noise, drowning out whatever signal is left.
[ link to this | view in chronology ]
- Chargone (profile), 20 Jun 2011 @ 11:03pm
  
  Re:
  you say that as if the absolute quantity of signal is reducing. it's not.
  
  also: that's what filters are for. give each entry a bunch of filter tags for various things and let the person looking for stuff browse by filter catagory as well as searching. or even search within a catagory. in a lot of contexts good catigorisation is more useful than a search engine for finding what you want. especially when you've got a less than complete understanding of what that Is.
  [ link to this | view in chronology ]
  - Michael Long (profile), 21 Jun 2011 @ 12:38am
    
    Re: Re:
    That's why I think Amazon should require a $99 "publishing fee" per book.
    
    In fact, requiring a fee would be the "first filter". If you don't think a book is good enough to earn back $99, it's probably not good enough to be on the store in the first place.
    
    Some guys has spammed the store with over 8,000 titles ripped off from hither and yon and selling for a buck a head. Would he have still done that if doing so would have cost him $80,000 up front?
    [ link to this | view in chronology ]
    - Jake, 21 Jun 2011 @ 8:33am
      
      Re: Re: Re:
      $99 is a bit on the high side, unless it were to take the form of a security deposit to be forfeit if your ebook had to be removed from their listings for spam or some other offence, but the basic idea seems sensible.
      [ link to this | view in chronology ]
    - Atkray (profile), 21 Jun 2011 @ 9:09am
      
      Re: Re: Re:
      While I don't agree with someone spamming 8000 titles, I find myself back at square one with spam, if it didn't work people wouldn't do it.
      
      Just looking at the raw numbers, 8000 titles @ a buck a piece, seems like that could provide a pretty decent revenue stream even at a low percentage.
      [ link to this | view in chronology ]
  - Anonymous Coward, 21 Jun 2011 @ 8:55am
    
    Re: Re:
    You said:" you say that as if the absolute quantity of signal is reducing. it's not."
    
    Me: I don't think so. It is a question of ratio. If you have 100 good works, and 10 bad ones, your signal to noise ratio is high and you have no problems. But if that shifts to 100 good works and 1000 bad ones, the chance that you find the signal (same level as before) is very low.
    
    In a world where anyone can publish anything at any time with little or no real cost, you will get more noise. Spammers, jammers, and scammers will figure out how to make money in the noise, and the noise increases.
    
    One only has to look at the number of twitter bots, automated posters, automated follow bots, and auto-retweeters to see where the noise comes from.
    [ link to this | view in chronology ]
- Anonymous Coward, 20 Jun 2011 @ 11:12pm
  
  Re:
  One mans noise is another mans signal.
  [ link to this | view in chronology ]
- Anonymous Coward, 21 Jun 2011 @ 4:44am
  
  Re:
  Oooh... engineer talk. Cool.
  [ link to this | view in chronology ]
Bruce Partington, 20 Jun 2011 @ 11:06pm

"I'm a bit surprised that Amazon doesn't do a basic check to make sure the content of an ebook hasn't already been offered by someone else, and do a further investigation if that's the case."

Hmmm, so you expect Amazon to have a searchable list of all ebook content that submissions are compared to. But you regularly ridicule people who expect YouTube or torrent sites to have the same thing for copyrighted materials.
[ link to this | view in chronology ]
- Anonymous Coward, 20 Jun 2011 @ 11:11pm
  
  Re:
  Boom
  
  Head shot
  [ link to this | view in chronology ]
  - The eejit (profile), 20 Jun 2011 @ 11:36pm
    
    Re: Re:
    And as for you, you spineless Coward, got anything productive to add to this discussion, or would you rather play the Taliban in CoD: MW2?
    
    Oh, wait, YOU CAN'T.
    [ link to this | view in chronology ]
- Anonymous Coward, 20 Jun 2011 @ 11:28pm
  
  Re:
  "Hmmm, so you expect Amazon to have a searchable list of all ebook content that submissions are compared to."
  
  He doesn't expect it to be a government imposed requirement, just a good business model suggestion. Also, Youtube already does such basic checks that attempt to identify infringing materials, but there are almost always ways around these checks. MM suggests that Amazon should do more to reduce plagiarism, not that it will ever be stopped completely.
  
  "But you regularly ridicule people who expect YouTube or torrent sites to have the same thing for copyrighted materials."
  
  Maybe copyright shouldn't exist to begin with.
  [ link to this | view in chronology ]
- The eejit (profile), 20 Jun 2011 @ 11:35pm
  
  Re:
  You missed a spot:
  
  "do a further investigation".
  
  In Youtube v Viacom, Viacom was expecting all ther content to be removed without investigation.
  
  Can you spot the difference between the teo?
  
  I do agree, however, that it would be silly to police on the behalf of content 'creators', as that would be a dangerous precedent (if not legally, then rationally).
  [ link to this | view in chronology ]
- Anonymous Coward, 21 Jun 2011 @ 12:11am
  
  Re:
  Amazon is perfectly in the right to allow spam books to be released and indexed en masse. It's a business choice if they pre-police their content.
  
  YouTube is also perfectly in the right if they choose not to police their uploads either, because it is ALSO a business choice to pre-police their content.
  
  The problem here, that you don't seem to understand, is that no one has the right to demand that Amazon or YouTube make the choice one way or another. However, if the problem really is so bad for Amazon, that drives customers away, in which case it is a problem that a smart business would address.
  [ link to this | view in chronology ]
- Mike Masnick (profile), 21 Jun 2011 @ 1:23am
  
  Re:
  Hmmm, so you expect Amazon to have a searchable list of all ebook content that submissions are compared to. But you regularly ridicule people who expect YouTube or torrent sites to have the same thing for copyrighted materials
  
  No. I expect Amazon to have the database of existing ebooks in its store, because it already has that. That's all.
  
  And then I'm not expecting them to compare for copyright, issues and automatically block, but as I said (I thought clearly, but perhaps not), all it should trigger is further investigation.
  
  Not sure where the confusion comes in. Apologies if I wasn't clear.
  [ link to this | view in chronology ]
  - Anonymous Coward, 21 Jun 2011 @ 5:30am
    
    Re: Re:
    "I expect Amazon to have the database of existing ebooks in its store"
    
    Exactly, there is a difference between knowing whether you have duplicate information and knowing what information does and doesn't infringe. It's much easier to know that you have duplicate information than it is to magically know that some segment of information infringes. The later requires a psychic, the former simply requires some comparison tools.
    [ link to this | view in chronology ]
- Anonymous Coward, 21 Jun 2011 @ 6:04am
  
  Re:
  Unsurprisingly, you seem to miss the point entirely.
  
  Implementing a filter to help better improve search results for ease-of-access to customers is a business decision. It's not about copyright - just profits. I find it hard to believe Mike seriously thinks Amazon should do any sort of policing over the content itself, only that they should verify whether the content in question is indeed what it's labeled as. If for no other reason to avoid false advertising repercussions.
  
  If you're trying to find an e-book and you get 40 false results for every positive it's going to be pretty annoying pretty quick to look for what you want. Find a different vendor who has only positive results and you'll probably just shop from there instead and give them any future business.
  
  The biggest question is: how much money is Amazon making off the sale of all the spam? I would have to guess not very much, but if it's somehow generating revenue for them then no point shutting it down.
  
  Asking third-party aggregate sites such as youtube or torrents to police and enforce copyright law is first and foremost granting them too much authority to declare what is or is not infringing. Second, the cost of implementing any form of workforce to go over the amount of data being uploaded to these sites would make it impossible for any sustainable service of the sort.
  
  You cannot reasonably scrutinize thousands of terabytes of data without creating digitized signatures of the files that are being uploaded. If the file has a copyright on it then you are thereby violating that copyright by using an unauthorized copy of the work in your filtering software without express written permission by the content holder.
  
  Either way - it's all a matter of profits. Do whatever makes you the most.
  [ link to this | view in chronology ]
- Griff (profile), 21 Jun 2011 @ 4:30pm
  
  Re:
  I thought he was saying just check the content hasn't already been uploaded once before.
  No need for an "official list". Just hash the content of every book at upload and block any whose hash is not unique.
  
  You can be pretty sure that among these 1000's of PLR books there will be a lot of duplication.
  [ link to this | view in chronology ]
Anonymous Coward, 21 Jun 2011 @ 1:13am

In fact, I'm a bit surprised that Amazon doesn't do a basic check to make sure the content of an ebook hasn't already been offered by someone else, and do a further investigation if that's the case.

Why should someone have some exclusive right to publish a book just because they wrote it? If you wrote a book and someone does a better job of publishing it than you, why do you deserve to make any money from it at all? Execution matters, not the idea. Besides which, if someone sees your book published by someone else and likes it, and you can't figure out how to make money that way, whose fault is that? They are giving you free promotion by selling your book for you. Get with the times, copyright maximalists!
[ link to this | view in chronology ]
- Anonymous Coward, 21 Jun 2011 @ 7:03am
  
  Re:
  Says the man who has obviously never written anything, but has stolen much.
  [ link to this | view in chronology ]
Anonymous Coward, 21 Jun 2011 @ 3:22am

This is not a new problem...
...all that's new is that the press is now reporting it. Those of us who work in the anti-spam arena have known about it for quite some time, have alerted Amazon, and quietly provided them with some free consulting advice on how to put a stop to it. It's unfortunate that they haven't used that advice, doubly so given that it comes from people vastly more experienced than anyone on their staff, but that's their choice.

Unsurprisingly, the same scumbags who are engaged in this are also engaged in other abuse: link farming, content farming, SEO, Usenet spam, email spam, IM spam, text spam, etc. And one of their current strategies seems to be to combine these modalities into integrated "campaigns" designed to annoy as many people as possible.
[ link to this | view in chronology ]
Gene Cavanaugh (profile), 21 Jun 2011 @ 9:58am

Amazon filtering based on content
WAIT! Please explain. Having search engines do this sort of filtering is wrong, they aren't responsible for content; but requiring it (or encouraging, semantics again) from Amazon is reasonable?
I get the impression that we are in the famous "I don't know how to distinguish it, but I know it when I see it" (not an exact quote, I am sure) quote.
Wouldn't it be nice if we had consistency? To much to ask, I assume.
[ link to this | view in chronology ]
- Chris Rhodes (profile), 21 Jun 2011 @ 11:01am
  
  Re: Amazon filtering based on content
  Did he say require? I took his statement to mean it would make good business sense for Amazon to root out poor quality items.
  [ link to this | view in chronology ]
Joel, 20 Apr 2013 @ 4:56pm

As a legitimate author, I can attest to the millions of crap books that proliferate the marketplace. Spammers tend to ruin everything that is internet related. Look at Pinterest, Facebook and any number of social sites.

The problem is that it is like sticking your finger in a dam. Another hole quickly opens up. I applaud Amazon for tightening up their publishing standards and at least removing the PLR books people were submitting. But they are running uphill against this epidemic....
[ link to this | view in chronology ]