stories filed under: "kent anderson"

The Warehousing And Delivery Of Digital Goods? Nearly Free, Pretty Easy, Mostly Trivial

from the relax,-be-happy dept

Fri, Jul 6th 2012 5:35pm — Glyn Moody

One of the most important moments in the rise of a radical idea is when the fightback begins, because it signals an acceptance by the establishment that the challenger is a real threat. That moment has certainly arrived for open access, most obviously through moves like the Research Works Act, which would have cut off open access to research funded by the US government. That attack soon stalled, but the sniping at open access and its underlying model of free distribution has continued.

Here, for example, is an interesting post by Kent Anderson, who is CEO/Publisher of the Journal of Bone & Joint Surgery, with the title "Not Free, Not Easy, Not Trivial -- The Warehousing and Delivery of Digital Goods." The starting point is as follows:

There is a persistent conceit stemming from the IT arrogance we continue to see around us, but it's one that most IT professionals are finding real problems with -- the notion that storing and distributing digital goods is a trivial, simple matter, adds nothing to their cost, and can be effectively done by amateurs.

As a result, he thinks, there is "a consistent theme among dew-eyed idealists about publishing -- that digital goods are infinitely reproducible at no marginal cost, and therefore can be priced at the rock-bottom price of 'free'."

Well, they're certainly "infinitely reproducible", but nobody seriously claims that can be done at zero marginal cost. It is, however, extremely small. Indeed, in another post, Anderson himself provides a rough estimate for one part of the cost -- the online delivery of a 1Mbyte file: $0.001. It's true that delivering millions of copies would represent a more significant sum, but that ignores things like BitTorrent, which effectively shares the cost of distributing digital goods among many downloaders. Using such P2P delivery systems, the cost to the publisher really is vanishingly small.

But Anderson thinks there are other issues:

Even beyond just their power requirements, digital goods have particular traits that make them difficult to store effectively, challenging to distribute well, and much more effective when handled by paid professionals.

Why might that be? First, digital goods are not intangible. They occupy physical space, be that on a hard drive, on flash memory, or during transmission. A full Kindle weighs an attogram more when fully loaded with digital goods, and there are hundreds of thousands of Kindles in the field.
According to the source referenced, "the difference between an empty e-reader and a full one is just one attogram" -- a million-trillionth of a gram. Even with "hundreds of thousands of Kindles in the field," that extra fraction of a gram spread around the world is hardly going to be a major problem. But leaving aside the issue of weight, it's certainly true that this data takes up space on storage media: The proliferation of digital goods -- photos, music, Web pages, blog posts, social media shares, tweets, ratings, movies and videos, and so much more -- puts incredible and growing pressure on metadata management techniques and layers. This means building more and larger warehouses, which adds to both ongoing costs for current users and migration costs as older warehouses are outstripped by new demands. Megabytes become gigabytes become terrabytes become zettabytes and beyond. Where will they all fit?
One answer is "in your pocket:" according to Amazon, a 1 terabyte portable hard disc currently costs around $100. Yes, a zettabyte might be a little more pricey, but judging by this recent large-scale, real-life project, we're still in the sub-petabyte era, so storing all this data isn't really going to require a warehouse -- a few rack systems should suffice.

But independently of where you are going to put it, another question is: Where is all that important metadata going to come from? As Anderson rightly says:

Creating, updating, and tracking the metadata is a chore for owners of digital goods. Poor metadata -- like a photo name off your digital camera of DX0023 -- can make the photo hard to find or use. Better metadata -- usually applied by humans, like "Rose in bloom, August 2006" for that elusive photo -- makes more sense.

That's mostly true, most of the time. But in another paragraph, quoting from a description of the Library of Congress's effort to archive all Twitter messages since 2006, Anderson also shows us why metadata is not always an issue: Each tweet is a JSON file, containing an immense amount of metadata in addition to the contents of the tweet itself: date and time, number of followers, account creation date, geodata, and so on.
That is, the data comes with "an immense amount of metadata" automatically, because of the way Twitter (wisely) designed its system. And even for datasets that require metadata to be applied by hand, crowdsourcing is proving an efficient and low-cost way of providing it.

Other issues raised by Anderson are that digital goods need to be backed up, and secure, but that's hardly rocket science: open source solutions that cost nothing to acquire (but not to run, obviously) have been around for years. His main concern, however, seems to be about the physical infrastructure required:

Digital warehouses are more expensive to build. Site planning is a major undertaking. A physical warehouse is something a small business owner can buy and construct with relative ease. They aren’t expensive (a concrete pad, a sheet metal structure, some crude HVAC, and a security system is usually all it takes). A digital warehouse is expensive to construct -- servers, site planning, redundant power requirements, high-grade HVAC, earthquake-proofing, and so forth. This means that digital goods have to work off a much higher fixed warehouse cost.

It seems unlikely that it is cheaper to build a typical physical warehouse than to install a typical LAMP stack on rented commodity servers in a few different geographical locations (or in the cloud) to provide resiliency and backups. This exposes the central problem with Anderson's argument about the amount of data that must be handled, and the necessity for huge and expensive infrastructure to handle it: he seems to be lumping together very different kinds of digital data. In the realm of digital goods, we’re reaching a point at which we’re facing trade-offs. Already, some data sets are propagating at a rate that exceeds Moore’s Law, which may still accurately predict our ability to expand capacity. And these are purposeful data sets. As data becomes an effect of just living -- traffic monitoring software, GPS outputs, tweets, reviews, star ratings, emails, blog posts, song recommendations, text messages -- we as a collective will easily outstrip Moore’s Law with our data. If there’s no place to put it, and nobody to manage it, does it exist?
Yes, genomic data is spewing out of DNA sequencers at an incredible rate; yes, the Large Hadron Collider produces almost unimaginable quantities of data. But these are exceptions: nobody is talking about letting the general public access this stuff in the same way that they can download media files, say. As I've pointed out in a previous post, we are fast approaching the point where we could store every Spotify track on a single hard disc, and the same will soon be true for every film, book -- and academic article.

For the latter, despite Anderson's title, it really is the case that storing and sharing them is nearly free, pretty easy and mostly trivial, which is why open access makes sense and is constantly gaining ground. The sooner traditional publishers stop fearing and fighting this trend, the sooner they can embrace and enjoy the possibilities this new abundance opens up for them.

Follow me @glynmoody on Twitter or identi.ca, and on Google+

Filed Under: data, kent anderson, open access, publishing
Companies: amazon

49 Comments

Follow Techdirt

Essential Reading

The Techdirt Greenhouse

Read the latest posts:

read all »

Techdirt Deals

Report this ad | Hide Techdirt ads

Techdirt Insider Discord

The latest chatter on the Techdirt Insider Discord channel...

Older Stuff

Thursday
13:33	Former Employees Say Mossad Members Dropped By NSO Officers To Run Off-The-Books Phone Hacks (2)
12:01	No, Creating An NFT Of The Video Of A Horrific Shooting Will Not Get It Removed From The Internet (18)
10:49	San Francisco Cops Are Running Rape Victims' DNA Through Criminal Databases Because What Even The Fuck (18)
10:44	Daily Deal: The Complete 2022 Java Coder Bundle (0)
09:31	As Expected, Trump's Social Network Is Rapidly Banning Users It Doesn't Like, Without Telling Them Why (44)
06:30	Comcast Continues To Bleed Olympics Viewers After Years Of Bumbling (19)
Wednesday
20:42	Apple Finally Defeats Dumb Diverse Emoji Lawsuit One Year Later (6)
15:39	Clearview Pitch Deck Says It's Aiming For A 100 Billion Image Database, Restarting Sales To The Private Sector (10)
13:41	Peloton Outage Prevents Customers From Using $2,500 Exercise Bikes (16)
12:09	The GOP Knows That The Dem's Antitrust Efforts Have A Content Moderation Trojan Horse; Why Don't The Dems? (16)
10:51	Hertz Ordered To Tell Court How Many Thousands Of Renters It Falsely Accuses Of Theft Every Year (24)
09:21	Even As Trump Relies On Section 230 For Truth Social, He's Claiming In Lawsuits That It's Unconstitutional (34)
06:16	Medical, Home Alarm Industries Warn Of Major Outages As AT&T Shuts Down 3G Network (25)
Tuesday
20:37	Video Game History Foundation: Nintendo Actions 'Actively Destructive To Video Game History' (29)
15:35	Massachusetts Court Says No Expectation Of Privacy In Social Media Posts Unwittingly Shared With An Undercover Cop (17)
13:30	Techdirt Podcast Episode 312: Regulating The Internet (2)
12:03	US Copyright Office Gets It Right (Again): AI-Generated Works Do Not Get A Copyright Monopoly (60)
10:42	LA Sheriff Threatens To 'Subject' City Council To 'Defamation Law' If They Won't Stop Calling His Deputies 'Gang Members' (20)
10:37	Daily Deal: codeSpark Academy Sibling Bundle (0)
09:25	Trump's Truth Social Bakes Section 230 Directly Into Its Terms, So Apparently Trump Now Likes Section 230 (128)
06:22	15 Years Late, The FCC Cracks Down On Broadband Apartment Monopolies (31)
Sunday
12:05	Funniest/Most Insightful Comments Of The Week At Techdirt (11)
Saturday
12:00	This Week In Techdirt History: February 13th - 19th (1)
Friday
19:39	Letter From High-Ranking FBI Lawyer Tells Prosecutors How To Avoid Court Scrutiny Of Firearms Analysis Junk Science (25)
15:52	Nintendo Is Beginning To Look Like The Disney Of The Video Game Industry (44)
13:49	Seattle Public Radio Station Manages To Partially Brick Area Mazdas Using Nothing More Than Some Image Files (44)
12:13	Thankfully, Jay Inslee's Unconstitutional Bill To Criminalize Political Speech Dies In The Washington Senate (8)
10:52	How Our Convoluted Copyright Regime Explains Why Spotify Chose Joe Rogan Over Neil Young (136)
10:47	Daily Deal: The Complete Blocs Website Builder Bundle (0)
09:33	Arizona Prosecutor Who Brought Bogus Gang Charges Against Protesters Files Ridiculous Defamation Suit Against Her Boss (12)

The Warehousing And Delivery Of Digital Goods? Nearly Free, Pretty Easy, Mostly Trivial

from the relax,-be-happy dept

The Techdirt Greenhouse

Thursday

Wednesday

Tuesday

Sunday

Saturday

Friday

More

Tools & Services

Company

Contact

More

from the relax,-be-happy dept

Techdirt Daily Newsletter

The Techdirt Greenhouse

Tools & Services

Company

Contact

More