Class Action Lawsuit Hopes To Hold GitHub Responsible For Hosting Data From Capital One Breach
from the into-the-breach dept
As soon as the Capital One breach was announced, you knew the lawsuits would follow. Handling the sensitive info of millions of people carelessly is guaranteed to net the handler a class-action lawsuit or two, but this one -- filed by law firm Tycko & Zavareeri -- adds a new twist.
The 28-page lawsuit filed Thursday in the U.S. District Court for the Northern District of California asserted that GitHub "actively encourages (at least) friendly hacking."
It notes that the hacked Capital One information was posted online for months and alleges that the company violated state law to remove the information. "GitHub had an obligation, under California law, to keep off (or to remove from) its site Social Security numbers and other Personal Information," the suit says
Weird legal theory, but one that could possibly to be stretched to target some of the $7.5 billion Microsoft paid to acquire GitHub. But it takes a lot of novel legal arguments to hold a third party responsible for content posted by a user, even if the content contained a ton of sensitive personal info.
The lawsuit [PDF] alleges GitHub knew about the contents of this posting since the middle of April, but did not remove it until the middle of July after being notified of its contents by another GitHub user. The theory the law firm is pushing is that GitHub was obligated to scan uploads for "sensitive info" and proactively remove third-party content. The lawsuit argues GitHub is more obligated than most because (gasp!) it encourages hacking and hackers.
GitHub knew or should have known that obviously hacked data had been posted to GitHub.com. Indeed, GitHub actively encourages (at least) friendly hacking as evidenced by, inter alia, GitHub.com’s “Awesome Hacking” page
GitHub had an obligation, under California law, to keep off (or to remove from) its site Social Security numbers and other Personal Information.
Further, pursuant to established industry standards, GitHub had an obligation to keep off (or to remove from) its site Social Security numbers and other Personal Information.
The "industry standards" the lawsuit references are voluntary moderation efforts engaged in by social media platforms. Certainly no platform would want to be known as the habitual host of exfiltrated credit card data, but comparing the removal of offensive or plainly illegal content to the removal of strings of numbers from a site hosting an unusually large amount of strings of numbers is quite another. The law firm feels this assertion helps its case. It probably doesn't.
Moreover, Social Security numbers are readily identifiable: they are nine digits in the XXX-XX-XXXX sequence. Individuals’ contact information such as addresses are similarly readily identifiable.
Thus, it is substantially easier to identify—and remove—such sensitive data. GitHub nonetheless chose not to.
Nine digits in a sequence. Oh, like phone numbers. And phone numbers tend to be found near addresses, especially when coders and developers are using GitHub as an offshoot of LinkedIn, posting their personal info for employers to find. Even long lists of personal info wouldn't necessarily be innately suspicious. Employers and recruiters looking for people with certain skills have probably compiled all of this freely-provided personal info for easy reference. It's not as easy to moderate content as the litigants believe.
But this belief, if backed by a judge, could add Github's money to the pool of damages. Things will get a lot more interesting once GitHub responds to unintentionally hilarious assertions like these:
GitHub knew or should have known that the Personal Information of Plaintiffs and the Class was sensitive information that is valuable to identity thieves and cyber criminals. GitHub also knew of the serious harms that could result through the wrongful disclosure of the Personal Information of Plaintiffs and the Class.
As an entity that not only allows for such sensitive information to be instantly, publicly displayed, but one that also arguably encourages it, GitHub is morally culpable, given the prominence of security breaches today, particularly in the financial industry.
Well, we'll see how "morally culpable" stands up in court, where "legally culpable" is the actual standard. GitHub will rely on Section 230 to be dismissed from this case and rightly so. The person responsible for posting sensitive data exfiltrated from Capital One is, unsurprisingly, the person who posted the sensitive data exfiltrated from Capital One. Capital One has a duty to protect the information it gathers from customers. A third party site with hosting capabilities does not and it's not nearly as easy to moderate and proactively remove content as this lawsuit says it is.
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: class action, data breach
Companies: capital one, github
Reader Comments
The First Word
“Re: Re: Re: Re: Re: What is the relevant law?
Reading the suit now. They say Github violated state law, CALIFORNIA CIVIL CODE § 1798.85
or otherwise make available to the general public.”
Some nice points:
They claim that GitHub failed up uphold their own terms of service. The TOS in question? "GitHub reserves the right to remove anyone at any time for any reason." So since GitHub has failed to remove Everyone, they haven't upheld their own TOS. Sounds legit!
Claims GitHub should have notified everyone involved about the breach. Clearly spurious - it wasn't a breach of GitHub data/users.
Claims GitHUb should have just plain known that the data was there. Also, clearly silly.
Super silly - claims GitHub is in violation of Federal law regarding safe storage of personal data. Again, not their data!
Oh - violation of federal Wiretap laws! I guess "Throw all the charges at the wall, see what sticks" is what they are going for.
Subscribe: RSS
View by: Time | Thread
How dare Github allow numeric digits to be used in code!
I think it would be safer if GitHub moderated every line of code and checked for copyright as well. How many infringing people are out there? We don't know until GitHub finds those nefarious codethiefs! And don't let them use numbers, those could be sensitive.
You know what... just to be safe... let's make Github just stop letting 3rd parties post to their site. Only employees of GitHub should be allowed to post code to that site, like the publishers that they are! That seems like a sensible solution to solve these rogue hackers from hacking things.
/s
[ link to this | view in chronology ]
Re: How dare Github allow numeric digits to be used in code!
You laugh but that is the end-game for those crying about 230 protection. End all user content because it might be libelous, or copyrighted, or offensive.
[ link to this | view in chronology ]
Or because it tells an uncomfortable truth.
[ link to this | view in chronology ]
Re: Re: How dare Github allow numeric digits to be used in code!
End all user content because it might be libelous, or copyrighted, or offensive, or interferes with their business model.
[ link to this | view in chronology ]
Re: Re: Re: How dare Github allow numeric digits to be used in c
End all user content because Citizens don't know what is best for themselves.
That is why I have to keep shooting these idiots!
Just imagine if they knew how to code, then I'd have to shoot their computers, phones, & cars just to keep 'em safe!
I cant afford that many bullets!
Help me out here! Get rid of 230!!!
[ link to this | view in chronology ]
*--****.
Take that techdirt! Now your millions will be mine!
[ link to this | view in chronology ]
Re:
Looks like TechDirt redacted sensitive info... see how easy it is? If TechDirt can do it, surely GitHub can. ;)
[ link to this | view in chronology ]
Re: Re:
I know right? I mean, I can type in my password and it'll turn into asterisks once the comment is posted. See?
Why can't GitHub do that?
[ link to this | view in chronology ]
Re: Re: Re:
[ link to this | view in chronology ]
Re: Re: Re: Re:
It works here with credit card numbers, but only if you include the full number, expiration date, and the three digit code on the back.
[ link to this | view in chronology ]
Re:
Nice Morse Code Postcript.
[ link to this | view in chronology ]
Why does this look like Capital one trying to get Microsoft to pay them more than they are about to lose in class action law suites?
[ link to this | view in chronology ]
If only lawyers feared repercussions for these sorts of money grabbing stunts...
[ link to this | view in chronology ]
What is the relevant law?
Is this true? If so, then GitHub might be in trouble here.
[ link to this | view in chronology ]
Re: What is the relevant law?
According tot he article, GitHub obeyed that law and promptly removed it when informed.
They are suing for preemptive censoring of all GitHub data. Upload filters.
[ link to this | view in chronology ]
Re: Re: What is the relevant law?
Do you know exactly what law is being referenced? I'm interested in knowing how it's worded.
[ link to this | view in chronology ]
Re: Re: Re: What is the relevant law?
It doesn't matter because it's not a thing. First Amendment and Section 230 preclude Github from being liable in cases like this. As long as they removed the content once notified (which they did) they bear absolutely zero liability.
[ link to this | view in chronology ]
Re: Re: Re: Re: What is the relevant law?
If there's absolutely no basis then the lawsuit will be dismissed and there really isn't much point in Tim posting these types of stories.
[ link to this | view in chronology ]
Re: Re: Re: Re: Re: What is the relevant law?
This is the most likely outcome, yes.
I'm sorry, what? This is newsworthy for several reasons:
Given all that, why should he NOT report on these types of stories?
[ link to this | view in chronology ]
Re: Re: Re: Re: Re: What is the relevant law?
Reading the suit now. They say Github violated state law, CALIFORNIA CIVIL CODE § 1798.85
or otherwise make available to the general public.”
Some nice points:
They claim that GitHub failed up uphold their own terms of service. The TOS in question? "GitHub reserves the right to remove anyone at any time for any reason." So since GitHub has failed to remove Everyone, they haven't upheld their own TOS. Sounds legit!
Claims GitHub should have notified everyone involved about the breach. Clearly spurious - it wasn't a breach of GitHub data/users.
Claims GitHUb should have just plain known that the data was there. Also, clearly silly.
Super silly - claims GitHub is in violation of Federal law regarding safe storage of personal data. Again, not their data!
Oh - violation of federal Wiretap laws! I guess "Throw all the charges at the wall, see what sticks" is what they are going for.
[ link to this | view in chronology ]
Re: Re: Re: Re: Re: Re: What is the relevant law?
The part I liked best from your quote:
Yeah, good luck with that.
[ link to this | view in chronology ]
Re: Re: Re: Re: Re: What is the relevant law?
I use github a lot. I find this story very relevant to why I continue reading Techdirt.
[ link to this | view in chronology ]
Re: Re: Re: What is the relevant law?
sounds like they are trying to apply UK law already (aka upload filters) which is NOT an actual standard in the USA
[ link to this | view in chronology ]
Re: Re: What is the relevant law?
Do you even want github to have an upload filter? There could easily be legitimate code that uses social security numbers. One example could be if someone could have posted open source code on how to filter Social Security numbers from their website. The example would be a fake social security code but if you have any filter that blocks it, then likely it couldn't be uploaded as if the SS example number isn't in the code itself, it is probably on the comments.
[ link to this | view in chronology ]
Re: Re: Re: What is the relevant law?
Sure. I'd be okay with that. Everybody likes to imagine the extreme version of what that might look like and what the consequences would be, but that's probably not what would happen.
[ link to this | view in chronology ]
Re: Re: Re: Re: What is the relevant law?
Even when it is Microsoft controlling those filters, and with a chance to hamper software that competes with their products?
[ link to this | view in chronology ]
Re: Re: Re: Re: Re: What is the relevant law?
The fact that it's Microsoft is pretty much irrelevant. Microsoft also runs Azure, and LinkedIn, and Skype, and they code Office and OneDrive and all kinds of other things. The chances that they would go evil with GitHub and not any of the other things is pretty small.
[ link to this | view in chronology ]
Re: Re: Re: Re: Re: What is the relevant law?
Not mention Microsoft has a shedload of stuff of their own on Github so I doubt they would want to mess with Github...
[ link to this | view in chronology ]
Re: Re: Re: Re: What is the relevant law?
Why? It's a code hosting website. How do you even BEGIN to make an upload filter for code?
As many others have pointed out, it's code, inevitably code is going to contain strings of numbers that match things like SSNs. For example, many programmers will define variables and test data that contain junk data such as "123456789" which would match those filters. As such it's impossible to filter it out without catching legitimate code. And it's not a "well the number of false positives will be small", no it's going to be ANY code with strings of digits, which is likely most code. That kind of a filter would likely render Github useless.
[ link to this | view in chronology ]
Re: Re: Re: Re: Re: What is the relevant law?
123456789 isn't a valid SSN.
[ link to this | view in chronology ]
Re: Re: Re: Re: Re: Re: What is the relevant law?
First off, how do you know?
Second off, the upload filter wouldn't care. It would see 9 digits and block it.
[ link to this | view in chronology ]
Re: Re: Re: Re: Re: Re: What is the relevant law?
It actually could be valid.
Wikipedia contains some good information about the structure of SSNs. The rules are fairly loose, and there's no check digit to ensure that it's not just a nonsense value.
The only public rules around SSNs are:
None of these rules would prevent 123-45-6789 from being issued.
Under the old issuing scheme (retired in June 2011), that number would be a completely valid SSN issued in New York. It would be area 123, group 45, serial 6789.
The newer scheme randomly generates numbers. It's unlikely that this number will be generated, but it's possible.
[ link to this | view in chronology ]
Re: Re: Re: Re: Re: Re: Re: What is the relevant law?
The SSA website has a better description than Wikipedia.
123456789 isn't valid nor is 000000000, 111111111, 222222222, etc...
[ link to this | view in chronology ]
No, not like phone numbers. USA phone numbers have a 10 digits.
[ link to this | view in chronology ]
Re:
Correct me if I'm wrong, but I'm pretty sure github is not restricted to the US of A.
[ link to this | view in chronology ]
Re: Re:
What locale uses the "XXX-XX-XXXX" pattern for phone numbers? I can't find any offhand.
Don't get me wrong, I don't think Github should be held liable for third-party content. I just think Tim's justification that an SSN could be mistaken for a phone number is off-base.
[ link to this | view in chronology ]
Re: Re: Re:
Before filtering on that format, show that nowhere in the world uses it for some other purpose, like part numbers.
[ link to this | view in chronology ]
Re: Re: Re:
Many international telephone numbers will appear as 9 digits. The US doesn't have exclusive right to use a 9 digit sequence for something that can identify a person.
[ link to this | view in chronology ]
Re: Re: Re:
I think the problem is that you're assuming the format is relevant. It isn't.
SSN's are easily and often represented without those dashes. Phone numbers are also represented in many different formats. So the only way an SSN could be declared as obviously so is if you decided that any 9 digit number must be an SSN. This is clearly false.
[ link to this | view in chronology ]
Re: Re: Re: Re:
I'm not the one assuming that; that's the claim explicitly made in the lawsuit, as quoted in the article.
The filing says:
Tim responds:
Which, no, that's not what it says; it refers, explicitly, to "nine digits in the XXX-XX-XXXX sequence." That pattern does not, to the best of my knowledge, match any locale's phone number.
There are plenty of reasons to criticize the lawsuit's implication that Github should have used some kind of automated system to watch for SSNs, from false positives to the ease of circumventing such a filtering system. I don't disagree with Tim's overall point at all. I just disagree with his seeing the phrase "nine digits in the XXX-XX-XXXX sequence" and saying that could be a phone number. To the best of my knowledge, no, it couldn't.
[ link to this | view in chronology ]
Re: Re: Re: Re: Re:
There's a problem with filtering based on the dashes: it's incredibly easy to circumvent by simply removing the dashes, which is itself pretty easy to automate. Searching specifically for the xxx-xx-xxxx format is not a great plan, accordingly. A more general filter looking for just the dashes will run smack dab into the phone number issue.
If you're trying to specifically filter out SSNs, you'll be less easily circumvented by looking for strings of 9 numbers in a row - but that only makes sense in a context where you expect to be dealing with SSNs and have a specific need to filter it.
Even then, you're going to run into false positives - there are street addresses in the US of A that wind up with nine numbers in a row (44546 2500 Street, is the format I've seen) and it's all too easy for those to get snapped up in the problem.
Tim's calling out of phone numbers really isn't all that far off-base. What is incredibly off-base is the lawsuits understanding of ... well, anything.
[ link to this | view in chronology ]
Re: Re: Re: Re: Re:
Store SSN's in any relevant database and you will find they are stored as 9 digit values, generally with no dashes or separations (the most efficient use of what used to be very limited space).
So while 9 digits in that specific format "may" resemble a SSN, that doesn't guarantee that it is a SSN and not some other type of number.
[ link to this | view in chronology ]
Re: Re: Re: Re: Re: Re:
When I took my introductory programming classes, we were taught that something like an SSN should always be stored as a string, since you wouldn't typically do any manipulation on that object. Storing as a number might save a bit of space, but ideally it should be a string and you can (probably should) include those dashes. Although plenty of devs will still use a number to save space or ensure consistent formatting.
The bigger issue IMO is that there's plenty of numbers which are designed to be compatible with SSNs. Penn State University student numbers are the biggest use case I've experienced, but that alone is probably tens or even hundreds of thousands of people/numbers. See, the software was originally designed to just identify students by SSN. But they'd use those numbers, for example, if a professor wanted to post test scores outside their office -- so students could check their score, could compare it to others, but couldn't easily see what another specific student scored. But they eventually realized that posting a big public list of SSNs wasn't a great idea, so they started generating new numbers. They're still formatted like SSNs though because the software and workflows were all designed to use SSNs. Technically the numbers they assign aren't valid (they start with 9) but I can't imagine they're the only ones with SSN clones as ID numbers.
[ link to this | view in chronology ]
No they're not, but I assume that's what Timmy was going for because the article was about the leakage of USA social security numbers. USA Social Security numbers are restricted to citizens of the USA.
That's why I said "USA" phone numbers have 10 digits (formerly wrote "a 10th digit", hence the "a 10 digits" typo in the original post of mine).
[ link to this | view in chronology ]
Re:
Which doesn't make a difference in terms of filtering data. That the stolen data is sourced from inside the US is irrelevant when the place it was posted to deals with data from multiple countries - upload filters will either be easily circumvented if they are geographically restricted (upload from overseas IP Address) or will have a massive false positive rate (9 digit phone numbers).
It's not just phone numbers. There are street addresses that wind up with nine digits in a row (44345 2500 street, par exemple). And if you decide to filter based on format alone (xxx-yy-zzzz) you set yourself to be easily bypassed by removing the dash format.
Additionally, a string of nine digits could easily crop up in code itself - and heaven forbid the code happens to deal with physical or mailing address formats. The lawsuits assertions are made from the perspective of someone who doesn't know a goddamn thing about what they are stating should be easy.
[ link to this | view in chronology ]
Re: Re:
Way too much of this going around lately. And not just lawsuits but also politicians proposing stupid laws.
[ link to this | view in chronology ]
Re: Re:
Or hell, the git hub project could be some code to specially handle social security numbers and have mock social security numbers in some unit test class!
[ link to this | view in chronology ]
Re: Re:
Relax. I was just trying to point out a mistake in the original article. We all know content filtering is hard. There's no reason why my original comment should have 8 replies.
If you disagree, mark it as insightful, as I incited lots of arguments (pun most definitely intended).
[ link to this | view in chronology ]
afaik - the social security number was not to be used for identification purposes other than that of the Social Security system.
Perhaps the method(s) for obtaining credit, loans, etc should be made more secure. If your business is the victim of fraud, I am in no way responsible for it just because some of my personal info was used.
[ link to this | view in chronology ]
123456789, now sue!
Twist, twist, twist, bother and grab money from everyone you can.
[ link to this | view in chronology ]
Re: 123456789, now sue!
I hold the copyright on that string of numbers.
[ link to this | view in chronology ]
Re: Re: 123456789, now sue!
Congratulations on using the US copyright system. Would you like one for the word California too?
[ link to this | view in chronology ]
ZIP Codes?
... and US ZIP codes (ZIP+4) seem to be 9 digits -- or could they be a way to sneak in SSNs along with people's addresses in data sets.
019-08-1338 vs 01908-1338 -- and the punctuation is not usually stored.
[ link to this | view in chronology ]
Re: ZIP Codes?
And zip codes with the 4 digit extension are very revealing as they are very specific to your physical location. My 4 digit zip code extension identifies not only the building I live in, it also indicates on which floor in my building my unit is located.
[ link to this | view in chronology ]
Fun fact, since social security numbers are only 9 digit numbers, there are only 999,999,999 possible numbers. With each numeral being a single byte and a billion bytes in a gigabyte, it'd be fairly trivial to produce a file that contains all valid and all unused social security numbers. Uncompressed, the file would only be ~1GB, and it would probably compress very well.
[ link to this | view in chronology ]
Re:
Wouldn't that expose private information? Of course knowing which ones were valid and/or associating any particular number to a particular individual would require some additional information, but, but, but...MY social security number was revealed and I might live in California.
/s
[ link to this | view in chronology ]
All the social security numbers!
I think I need to make a text file that just contains a count from 000-00-000 to 999-99-9999 and upload it to where-ever I can ... all the social security numbers everywhere!
[ link to this | view in chronology ]
THIS is going to go one forever..
I mentioned before about the out breaks.
Every state, every nation, every one will want to create LAWS/RULES/REGULATIONS to control the net..
there is no Standard for what is/will be passed.
No group/consolidation of Anything, as each will be interpreted ANY WAY THEY WANT IT..
They will backdoor, and go around anything created Just to cause problems..
By the time we/they/it has settled, WE might as well be china/asia/middle east and restrict access from other nations as well as Cut ourselves off from others. And the Corps will love it, because THEN, every game will need to have locations in Every nation just tobe played/used/enjoyed.
Wow, what a way to control the game industry..
The internet. The Biggest experiment in total opinionation(new word??)..
Love the Thought(not fulfilled) that we are an open and non-opinionated nation. That we are the Dream of so many people...but we ACT as bad as the Worst nations.
[ link to this | view in chronology ]
Be wise... capitalize!
PACER needs an automated filter, too... to detect when Random Words in a Lawsuit have been Capitalized (see what I did there?), after which a red-inked rubber stamp marks the document "BULLSHIT". Surely, sharing Personal Information is a far greater crime than sharing personal information.
Of course, capitalization is the text equivalent OF SHOUTING, which brings to mind the immortal quotation from one Squidward Tentacles:
Squidward: People talk loud when they wanna act smart, right?
Plankton (shouts): CORRECT!
[ link to this | view in chronology ]
Anyone can start a website and host anyone. Its not like everyone has in his or her arsonal of business tools a Code of Ethics by which they must adhere to. That is what sucks about the internet.
[ link to this | view in chronology ]
Lots of similar numbers...
When I went to college, the university (PSU) assigned us all a nine digit student number. The reason these were nine digit numbers is because they originally used SSNs, until a decade or two ago when they realized that was a bit of a security issue. All of their existing systems were designed around using SSNs though, so they created new numbers for everyone which used the same formatting so their existing workflows wouldn't need to be modified.
I can't imagine that they're the only place which did something like that. So if you're filtering nine digit numbers that look like SSNs, you're probably going to get a lot of nine digit numbers that aren't SSNs but are designed to look similar, which is going to cause a lot of additional problems...
[ link to this | view in chronology ]
AWS, on the other hand, could be facing some serious damages, given that their employee was the one who stole the data, which as hosted on AWS.
[ link to this | view in chronology ]
I was compromised
So I get a lot of people's Honor and loyalty to github but I just found out about my info bring compromised and the info comes just over a week or so after finding odd transactions being made on my account. They did what protocol they were supposed to n sent me a new card. In my inbox today is the official. Notice that my sensitive info had been compromised by the hack and the unauthorized transactions were a result of said hack attack. Thst being said I am quite pissed and feel violated from thus incident and quite frankly if there indeed was knowledge of it being uploaded and accessible from this site or any prior to the greater public being made aware then yes I do think all involved in not making sure these things were prevented should be held accountable.
Sorry but the fact that github is a subsidiary of Microsoft doesn't make then untouchable or unaccountable, it in fact makes them more liable and responsible to ensure privacy information is protected. They are a company who's main focal point in any of their marketed products is safety and data protection. Even if github is its own entity it should be structured under the same protection umbrella like the rest of Microsoft's invested assets and interests.
Its pretty pathetic that the hack even happened but if you help a wanted murderer in any way shape or form with knowledge of their crimes you are considered an accessory to the crime and are held liable. This is no differant except githubs excuse is they didn't know what types of information it was. Then they should have more stringent measures In place to sniff out such sensitive info. A snowden type algorithm of sorts. I don't know but that's a failure against their t.o.s. and their obligation to make sure their publicly accessed site isn't a safehaven for such individuals and their criminal activity. If not then we might as well call it the darkgit and send it to the darkest region of the Internet.
[ link to this | view in chronology ]
Re: I was compromised
That Github is now owned by Microsoft is irrelevant. Github removed the content as soon as they were made aware of it.
You are asking for magic. You are telling people to Nerd Harder. It is not possible to do what is being asked of them, for multiple reasons already discussed in other comment. If you were compromised, be angry at the idiots who let your data be taken away as opposed to the guys who immediately took down the bulletin board with your information on it as soon as they knew it existed.
[ link to this | view in chronology ]