Verizon Is Undermining Efforts To Archive Yahoo Groups...For No Coherent Reason
from the ill-communication dept
Verizon's often sad efforts to pivot from curmudgeonly old telco to sexy new Millennial advertising giant have not gone as the company had hoped. From the failure of its Go90 streaming service to its clumsy effort to turn AOL and Yahoo into a Facebook-killing ad empire, Verizon often can't get out of Verizon's way. The "consumer comes last" executive mindset of the government-pampered telecom monopoly is frequently reflected by its policies, like Verizon's decision to acquire Tumblr, ban one of the most compelling aspects of the service (adult content and art), then turn around and sell it at a massive loss.
When archivists attempted to try and preserve a lot of the adult-themed art that Verizon was jettisoning, Verizon responded by banning archivist IP ranges for no coherent reason. Much like Facebook, Verizon positively adores looking at a controversial situation, then coming up with the worst possible policy and PR response. You know, like that time they hired a fake journalist to pretend the company wasn't trying to kill net neutrality.
Another case in point. Back in October, Verizon and Yahoo informed users of Yahoo Groups that the 20 year community would be shut down coming this December 14. Archivists set about trying to catalog and store the decades of conversations, images, and content on the platform. But Verizon being Verizon, those archivists now say the company is actively undermining their efforts, including banning Archive Team email addresses being used to archive content, and actively blocking tools used for the same purpose:
"Yahoo banned all the email addresses that the Archive Team volunteers had been using to join Yahoo Groups in order to download data. Verizon has also made it impossible for the Archive Team to continue using semi-automated scripts to join Yahoo Groups – which means each group must be re-joined one by one, an impossible task (redo the work of the past 4 weeks over the next 10 days).
On top of that, something Yahoo did has killed the last third party tool that users and owners have been using to access their messages, photos and files. (PGOlffine).. Note: not everyone who paid for the PGOffline license is being impacted by the problem. but the developer does not have a workaround."
Under Section 230 Verizon faces no liability for the content shared on the platform, and there's no valid reason for them to be fighting back against archival efforts. Yet here we are. Verizon didn't respond to several requests for comment, so it's hard to understand what the telco is thinking, if it's thinking at all. I spoke briefly to Archive Team co-founder Jason Scott and Cory Doctorow, both of whom were than impressed by the company's tone deafness:
"What they are doing is burning 20 years of history and archives maintained by communities with a non-functioning system for backing them up," he said. “They made no real preparations for users to pull the information out because companies like Yahoo! were never designed to allow information to leave their walled gardens."
“This is 20 years of communities, discussion and artifacts from millions of groups, all representing learned information, legal and historical references, and naturally, the conversations of tens of millions of users,” Scott said. “Some of it is likely worthless and some of it is likely precious. It is all being treated like trash."
It's one thing for Verizon to shutter the platform. It's another for Verizon to actively block harmless efforts to preserve 20 years of internet history ahead of the shutdown. But being a government pampered monopoly in a largely non competitive market has left Verizon ill-prepared to actually listen to the communities it impacts (especially when there's no money to be made by doing so), a major reason Verizon's pivot from telco to new media ad darling hasn't quite gone according to plan.
Updated: After Verizon's behavior resulted in some unwanted media attention, the company has finally changed its stance. It now tells me it has extended the deadline for the Yahoo Groups shut down to Friday, January 31, 2020 at 11:59pm PST.
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: archives, blocks, digital history, groups, history, yahoo groups
Companies: archive team, internet archive, verizon, yahoo
Reader Comments
Subscribe: RSS
View by: Time | Thread
The reason
Do you know how expensive it is to transfer that much data out.
What are they going to do at Verizon if the C*O types can't have new shoes made for themselves this month?
[ link to this | view in chronology ]
How is already-collected data affected?
The linked story says "The Archive Team says they’re facing a loss of nearly 80 percent of the data they’ve collected so far". What? How can Verizon's actions cause them to lose data they've already collected? I can't imagine they'd be dumb enough to store it on a Yahoo service, which suggests the statement is simply wrong.
[ link to this | view in chronology ]
Re: How is already-collected data affected?
It's not that 80% of the data they've collected is lost. Here's the actual quote:
So, the data they've managed to download, they obviously still have. What they've lost is access to 80% of the Groups that they had joined as part of their effort to download that data. Which, I suppose, one could interpret as losing access to 80% of the data that they had gained access to.
The idea that what had been lost was 80% of the data already collected seems to stem from a misinterpretation somewhere along the line.
[ link to this | view in chronology ]
Re: Re: How is already-collected data affected?
Which still wouldn't count as "losing data". Thanks for the explanation. That means these 80% of groups aren't going to be completely archived, unless they find a copy elsewhere or work around the bans quickly enough.
It's an open question whether they'll be able to grab the remaining 20%—despite the extended deadline, the update doesn't say they'll stop fighting archivists.
[ link to this | view in chronology ]
That's nice, I guess. But as I read the rest of the article, an extension may not have been needed if they hadn't gone out and actively broken archiving. Even with the extension, if they don't back off on the other anti-archiving actions, it may still be impossible to save everything in time.
[ link to this | view in chronology ]
Re:
These were mostly public mailing lists. Yahoo could copy them all onto a hard drive or two, mail it to the Internet Archive directly, and avoid all this trouble.
[ link to this | view in chronology ]
Re: Just mail a copy
Yes, sort of. Apparently it was 8 PB in 2011. https://news.ycombinator.com/item?id=21274484
More realistically they could sell the content at 1 $ to some competitor. Maybe a Sympa or Discourse commercial hosting provider with enough capital to shoulder the losses for a while in return for publicity? Like SmugMug with Flickr, they could then purge the most expensive groups (porn can probably be kept elsewhere) and keep 99,9 % of the real content.
Or yes, they could just make a giant set of mbox files for the text content of all groups and be done with it in a short while. Then people could import them into their mail client (given Yahoo has said "don't worry, the archives remain in your email!"), mailman or whatever. Did nobody exercise their GDPR art. 20 right to data portability? It says "structured, commonly used and machine-readable format", can't think of anything but mbox. Yahoo is legally obliged o produce mbox files, as I read it.
[ link to this | view in chronology ]
Re: Re: Just mail a copy
Wow. I underestimated. Is that mostly text? It would be more like 1000 hard drives, which is less practical.
They've already got a group of people clamoring for the data. archive.org will host it (in some form, anyway).
It's interesting, but they'd only have to give each person their own data, right? (Emails sent by them, and maybe emails sent to them.) Perhaps if a large enough number of Europeans requested it, Yahoo would decide it's easier to dump all of the data elsewhere and reply with a form-letter showing how to find it.
[ link to this | view in chronology ]
Re: Re:
Why copy?
The hardware is already an antique cybernetically speaking.
It would be cheaper and easier to just decommission the racks and "donate" it all for a nice tax write off and not pay any disposable fees.
[ link to this | view in chronology ]
Reaction to Verizon's extension
Here is the reaction of one of the people trying to archive these groups (TLDR: they're still not happy about it):
[ link to this | view in chronology ]
I thought the 'reason' being used was that Verizon couldn't afford the cost of storing the back ups! Perhaps it needs to rip customers off further with an increase in fees. Or maybe plead poverty again so as to get another load of public money and not do what they're supposed to!
[ link to this | view in chronology ]
Re:
This is just Verizon being Verizon...they couldn't figure out how to make money off it so they decided nobody should have access to it.
[ link to this | view in chronology ]
Yahoo has always been bad for this
Even before Verizon became involved, Yahpoo! were the ones who bought Geocities for some ridiculously-high sum, only to shut it down and destroy everything. This has been their modus operandi for years. An apt comparison is that Yahoo is "a leaky sewerage pipe" because everything it touches becomes watered-down excrement.
Verizon is only making things worse - with obstructing archivists and censoring content being prime examples - but Yahoo was always bad.
[ link to this | view in chronology ]
Re: Yahoo has always been bad for this
That matches Archive Team's opinion:
[ link to this | view in chronology ]
Another step from another corporation in blackholing what used to be cool or just an older way of doing things on the net. They don't want anyone to know what the 90s and 00s were like. Probably also a neasure of "we own it now, so no one else can have it".
[ link to this | view in chronology ]
Maybe they're afraid they're going to hit their cap with all that data going out the pipe.
[ link to this | view in chronology ]
Big deadline is still December 14
The January 31 deadline is for people saving "their own" content - public access is still limited to December 14.
[ link to this | view in chronology ]
I can't put out of my head how these dicks love to get into the bedrooms of the masses and sneak porn in on coffee breaks how much hypocracy is involved here.
[ link to this | view in chronology ]
didn't they undermine the effort to archive the tumblr content that being removed as well?
[ link to this | view in chronology ]
Average Customer Being Misled
Amongst the chatter about archivists, a key fact is being overlooked. Verizon is misleading the average user. Their 'extension" is only for the customers who use Verizon's official download tool. Which does not provide any photos and in many cases, files are missing. The only way Verizon says you can download your photos is to go one by one. Except...after Dec 14, there will be no way to log in to download your photos and files one by one. That is what the third party tool PGOffline was doing (this is a tool used by customers, not the Archive Team). So anyone who listens to what Verizon says will find themselves locked out from their data on Dec 14.
[ link to this | view in chronology ]
Re: Average Customer Being Misled
Our volunteer fire department just got "hosed" by this gotcha. We're reeling from the fact that all our files and photos are gone when we thought we had a month to transfer them.
[ link to this | view in chronology ]