FCC Releases All Net Neutrality Comments As Giant XML Files For Data Analysis

from the open-records dept

While it had trouble keeping its site up during times of intense commenting, the FCC's IT team is now working to make all the submitted comments on its "open internet" net neutrality proposals available to download in a bunch of XML files:
Because of the sheer number of comments and the great public interest in what they say, Chairman Wheeler has asked the FCC IT team to make the comments available to the public today in a series of six XML files, totaling over 1.4 GB of data – approximately two and half times the amount of plain-text data embodied in the Encyclopedia Britannica. The release of the comments as Open Data in this machine-readable format will allow researchers, journalists and others to analyze and create visualizations of the data so that the public and the FCC can discuss and learn from the comments we’ve received. Our hope is that these analyses will contribute to an even more informed and useful reply comment period, which ends on September 10. We will make available additional XML files covering reply comments after that date.
While the more cynical among you may see this as more of a statement on the rather weak capabilities of the current FCC's system for handling searching through the submitted comments, it's still nice to see at least a move towards openness and transparency in sharing this data for others to search through. As we've noted, we've been digging into some of the data on the comments, and hopefully this will make the process much easier.
Hide this

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.

Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.

While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team

Filed Under: comments, fcc, net neutrality, nprm, open data, open internet, xml


Reader Comments

Subscribe: RSS

View by: Time | Thread


  • identicon
    Anonymous Anonymous Coward, 5 Aug 2014 @ 8:39pm

    Where's the...

    ...magnet link? BitTorrent would be the way to distribute those......oh, wait!

    link to this | view in chronology ]

  • identicon
    Shmerl, 5 Aug 2014 @ 8:59pm

    Since it's government public data, it should be in public domain. So in theory no one should stop anyone from taking and redistributing these anyway they want.

    link to this | view in chronology ]

    • icon
      Mike Masnick (profile), 6 Aug 2014 @ 5:06am

      Re:

      Since it's government public data, it should be in public domain. So in theory no one should stop anyone from taking and redistributing these anyway they want.

      That's not accurate. The rules only apply to works produced *by government employees*. That is not the case here. Any copyrights remain with the creators of the content.

      link to this | view in chronology ]

      • icon
        Arthur Moore (profile), 6 Aug 2014 @ 6:04am

        EULA?

        Do you know what the terms and conditions were for submitted comments?

        I'm guessing it was probably something like "Grant the FCC a perpetual license to copy and display this comment." However, I don't know for sure. I figure you all would have picked up on it if there was a copyright assignment clause, though.

        I hope none of the people analyzing this data work for places with legal departments who don't understand fair use. Especially since, that's the only way that anyone reporting on this can directly quote any of the comments.

        Another little aside is that while it might be legal to download these huge files to your personal PC, it's almost certainly illegal to give a copy of them to anyone else. They have to go use up FCC bandwidth by obtaining the files from a "authorized source".

        link to this | view in chronology ]

      • identicon
        Anonymous Coward, 6 Aug 2014 @ 12:45pm

        Re: Re:

        I'm pretty sure that correspondence with a government entity is in the public domain unless it hits one of the exclusion cases. This information does not.

        link to this | view in chronology ]

      • identicon
        Shmerl, 6 Aug 2014 @ 3:33pm

        Re: Re:

        I guess then it can depend on conditions of the submission. If for example they include relinquishing rights into the public domain, then the whole archive is in public domain. For example contributing to Wikipedia binds that to Creative Commons and so on.

        link to this | view in chronology ]

  • identicon
    Anonymous Coward, 6 Aug 2014 @ 12:03am

    What's an "Encyclopedia Britannica"?

    link to this | view in chronology ]

  • identicon
    Anonymous Coward, 6 Aug 2014 @ 12:30am

    Might have to grab this one... I lost the manual for my washing machine.

    link to this | view in chronology ]

  • icon
    MrTroy (profile), 6 Aug 2014 @ 1:30am

    A million monkeys typing on a million keyboards...

    ... though not particularly randomly.

    I wonder why they compare the size of XML marked up data to raw text? I suspect the submission data would drop up to half of its size if displayed as raw text. Also, boo to offering XML files for download without compression! They serve their web pages gzip-encoded (compressed), why not their XML files? Or at least pre-compressed versions... the 5th file zips from 100M down to 5.6M, and 7zip takes it down to 3.5M! Anyway, I guess I wouldn't expect the FCC to know anything about the internet by now.

    Also, I like this from their release:
    Finally, we hope that whatever visualizations are developed using this open data will comply with the standards that allow use and access by differently-abled individuals. The Chairman and the FCC CIO are committed to ensuring accessible web content in multiple forms for all.

    While you're helping us to do our work, would you mind conforming to the standards that we have to adhere to?

    Joking aside, I'm glad they're making this available to everyone. Even if we were to trust that they know exactly what they're doing, if they were doing all the work themselves they'd only analyse the submissions in whatever ways they can both imagine and implement in time. Giving this to the internet lets them benefit from novel ways of parsing the data that they might not have thought about, and it puts the onus on everyone else to try to justify whatever they claim to find in the data.

    If the FCC was to make all of the findings itself without making the data public, anyone who disagreed with the result would simply say that the FCC was selectively parsing the results. Now, whenever anyone tries to make any claims from the data, everyone else will be able to verify those claims... and if someone tries to make a claim without saying how they came to that conclusion, then that will be worth about as much as 1.5Gb of uncompressed XML text.

    link to this | view in chronology ]

    • identicon
      Anonymous Coward, 6 Aug 2014 @ 6:35am

      Re: A million monkeys typing on a million keyboards...

      Unless they added a slew of comments from their corporate masters afterwards.

      link to this | view in chronology ]

  • icon
    Violynne (profile), 6 Aug 2014 @ 4:06am

    I was disappointed to see the XML files don't include IP address, though not surprising given the comment submission form didn't exactly demand users verity themselves.

    This means separation by city and state. It also means a possibility of gross inaccuracies regarding the data. Geez, this could get bad.

    Oops, my cynicism is showing itself again. I suppose I could take the data on face value. Though, it'll be difficult to determine a margin of error without the IP address.

    Don't take this as a fact, because I've not counted the responses in the actual files yet, but cursory scans seems to have a majority favoring classification as common carrier.

    In addition to this, the commentary also seems to be sparse, as though people simply voted without leaving comment.

    Once I do this, I'll sit back and wait for others to post the results so I can determine who's lying and who's honest.

    The FCC did good by releasing these files.

    link to this | view in chronology ]

  • identicon
    beech, 6 Aug 2014 @ 5:31am

    Obama is slipping

    The FEDERAL communication commission is releasing information unredacted without a foia request being filled? That's not the way this administration is supposed to work. ..

    link to this | view in chronology ]

    • icon
      Killer_Tofu (profile), 7 Aug 2014 @ 1:09pm

      Re: Obama is slipping

      That is the way all American (and hopefully other) administrations are supposed to work. They just happen to behave in as opposite a way as possible usually.

      link to this | view in chronology ]

  • identicon
    Anonymous Coward, 6 Aug 2014 @ 5:48am

    Well, at least they didn't release it in Lotus Notes only filetypes.

    link to this | view in chronology ]

  • identicon
    Anonymous Coward, 6 Aug 2014 @ 6:46am

    they used this huge file size to slow the dl process

    link to this | view in chronology ]

  • icon
    JWW (profile), 6 Aug 2014 @ 7:44am

    My guess

    My guess is that data analysis will reveal that commenters favored the FCC going common carrier or implementing strong net neutrality rules by something like 90% to 10%.

    Then the FCC will totally ignore the will of the people and implement "fast" (and by default, slow) lanes anyway.

    link to this | view in chronology ]

  • identicon
    David, 6 Aug 2014 @ 8:03am

    Downloaded it...

    And just from a quick look there are lots of entries which the respondent copy and pasted the "Net neutrality is the First Amendment of the Internet, the principle that Internet
    service providers (ISPs) treat all data equally. As an Internet user, net neutrality is vitally important to me. ..." text. So someones campaign appeared to work!

    link to this | view in chronology ]

    • identicon
      David, 6 Aug 2014 @ 8:06am

      Re: Downloaded it...

      And a quick search shows 111651 comments out of ~447K comments included that first sentence as part of their comment.

      link to this | view in chronology ]

  • identicon
    TRL, 6 Aug 2014 @ 10:20am

    I came across this article after searching for why my comments wouldn't load on the FCC site.

    I just came from trying to make a comment on the FCC comment site. Despite it being made for laywers and law firms and not regular people, the site was still unable to take my comment saying that it "could not add the text to the file" and, after turning it into a PDF and submitting it through their Expert submission "disk quota is full". No matter how big this XML file is, it doesn't represent half the comments people want to share about net neutrality.

    link to this | view in chronology ]

  • identicon
    Eli the Bearded, 6 Aug 2014 @ 2:12pm

    Emailed comments too?

    When the dishwasher manual was exposed here, the site did not appear to include comments that were sent to the email address, only comments posted to the web page. Does this dump have the email comments?

    link to this | view in chronology ]


Follow Techdirt
Essential Reading
Techdirt Deals
Report this ad  |  Hide Techdirt ads
Techdirt Insider Discord

The latest chatter on the Techdirt Insider Discord channel...

Loading...
Recent Stories

This site, like most other sites on the web, uses cookies. For more information, see our privacy policy. Got it
Close

Email This

This feature is only available to registered users. Register or sign in to use it.