Microsoft Highlights Why Google's 'Cheater' Accusations Ring Hollow
from the good-for-them dept
We had a long discussion recently about Google's response to discovering that Microsoft used clickstream data from users to help improve the relevance of their own search. Microsoft's Yusuf Mehdi has now written up a much more detailed response from Microsoft's point of view, in which it again clarifies that contrary to Google's statements, Microsoft is not "copying" Google's search results, but merely using clickstream data as one of many (Microsoft says approximately 1,000) variables in improving search relevance. Microsoft does take one cheap shot: noting that, technically, the "honeypot" trick that Google used to uncover this certainly appears to be a form of "clickfraud." That is, it was a trick designed specifically to manipulate Bing's search results.But the key point is made towards the end:
We have brought a number of things to market that we are very proud of -- our daily home page photos, infinite scroll in image search, great travel and shopping experiences, a new and more useful visual approach to search, and partnerships with key leaders like Facebook and Twitter. If you are keeping tabs, you will notice Google has "copied" a few of these. Whether they have done it well we leave to customers. But more importantly, we take no issue and are glad we could help move the industry to adopt some good ideas.That's the point that I tried to make in the original post. History has shown that innovation occurs via competition, and part of that competition often involves competitors building on each other's work. A few months back, I wrote a review of the excellent book Copycats by Oded Shenkar, which makes this point very, very clear. Innovation happens when companies build on each other's work. But, what you learn is that it's not just about "copying," it's about all of the players learning, innovating and expanding the overall market. Just straight up copying rarely does enough to make a difference (in fact, we've discussed this problem in the form of cargo cult copying, where companies just copy some superficial aspect, and discover that it's meaningless). That's clearly not what Microsoft was doing here.
In the comments to our original post, someone made the comment, in defense of Google, by saying if what Microsoft did was okay, then couldn't he just go out and say "I've got a billion dollar search engine idea!" and then just copy Google's results. But, of course, if anyone actually thinks this through, they'd realize that copying Google's search results is not a billion dollar search idea. Assuming that, tomorrow, we launched a "new search engine" that gave the identical results to Google, almost no one would use it. Why would you? There's no real advantage to doing so. And for people who already use Google, it's probably much more integrated into their lives, with Gmail, Google Docs and more. The search results themselves are not the "billion dollar idea." It's the overall execution.
Hopefully Google learns from this and realizes that it has learned plenty from watching Microsoft as well, and complaining about Microsoft using clickstream data is a waste of time. Focus on continuing to innovate, Google, which'll probably mean learning more things from Microsoft, in addition to what you're doing yourself.
To be fair, Matt Cutts also has a put together a decent response, where he points out that the real issue here may be disclosure -- in that Microsoft did not clearly disclose that it was using clicskstream data (and especially how it was using that data). That's a perfectly reasonable point, but it was not the original point that Google raised. I agree that Microsoft could and should be much clearer in its disclosure -- but that's a totally separate issue. Cutts also explains why he thinks that Microsoft really is "copying," but again, even if we grant that premise (which I don't think is accurate), I still don't see why that matters. Copying and improving is a part of the innovative process. Google should embrace it.
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: bing, copying, innovation, search
Companies: google, microsoft
Reader Comments
Subscribe: RSS
View by: Time | Thread
Google does not really do PR that well on a bunch of things like this.
[ link to this | view in chronology ]
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Re:
[ link to this | view in chronology ]
This word you keep using, Google does not think it means what you think it means
Copying, as defined by Google folks over and over again these past few days, is taking results of an innovation and literally copying them.
While one is common in the tech world, and Bing staff makes a valid point of creating multiple innovations that Google has "adopted", the second one is problematic. When Bing says that clickstream (with apparent tailoring for google links) is only one of the signal, it basically means that when the other signals aren't returning normal data, Bing relies *only* on Google. Moreover, it sometimes exploits Google's propriatery autocorrect mechanisms to increase it's relevance in that way. Meanwhile, Microsoft hasn't indicated that Bing is learning something from it, just parroting, and if Google was to disappear one day, Bing that relies on Google so much would be effectively crippled.
[ link to this | view in chronology ]
Re: This word you keep using, Google does not think it means what you think it means
Actually, they appear to record data from their Bing toolbar, which would still be in their system if Google were to disappear. Although their learning would stop if Google disappeared, what they have already learnt would still apply.
[ link to this | view in chronology ]
Re: Re: This word you keep using, Google does not think it means what you think it means
Google has shown that they have the right heuristics to keep their mappings updated, but I'm not sure if Bing can work well enough without "borrowing" Google's...
[ link to this | view in chronology ]
Re: This word you keep using, Google does not think it means what you think it means
The result of my discussion with Marcus Carab can be summarised thus: It depends on how much Microsoft is (indirectly) using Google's results, and we won't know that until more data is made available.
My hunch is, a lot. They must be getting massive amounts of Google data, seeing how many people use Google, and it's all in pure query->document format, no less. In my view, instead of coming up with a better way to analyse the data it already has, Bing is trying to replicate Google's existing semantic links* between terms, which is possibly the hardest thing to tweak when you're making advanced document retrieval systems.
That they say they use "over 1000 variables" is irrelevant, because as any statistician will tell you it's not the number of variables that counts but their weighting. If Bing is aiming to "become Google" because that's the search engine people want, they'll use the query->document data they get from Google to directly reinforce the query->document mappings in their system, which makes the other sources mostly irrelevant...
And that's why this is cheating, in my opinion. Perhaps that's not necessarily a "bad" thing and their technology will eventually and inevitably catch up, but it leaves a bad taste in my mouth all the same.
* For instance, Google may have decided to use a thesaurus (or even automatically learned a thesaurus!) to create a link between the terms "cat" and "feline", so when a user searches for cats, they also get documents about felines. This is not an obvious link for a computer, but it very likely improves retrieval performance. If Bing didn't think to do the same, and they only start showing documents about felines because they saw Google do the same, then their technology is still inferior, so in my book this cannot possibly count as innovation or as science. They are giving the illusion that they are competing with Google, but they are simply giving a "counterfeited" version of their competitor's results that they couldn't recreate by their own means.
[ link to this | view in chronology ]
Re: Re: This word you keep using, Google does not think it means what you think it means
It's not copying in the traditional sense, it's not counterfeiting and it's definitely not stealing.. "Cheating" and "plagiarism" are the only words that I can think of that sound harmless enough to describe this, but even they are overkill.
[ link to this | view in chronology ]
Re: Re: Re: This word you keep using, Google does not think it means what you think it means
In any case, I have to disagree. What you have described IS innovation. Take an idea that someone else had and improve on it. Based on your logic Google's image search is inferior to Bing's because MS had the idea for the infinitely scrolling search and then Google copied the idea.
That's also not what I believe happened here. Rather, MS is looking at user behavior. User searches for a word or phrase in Google or any other search engine and then clicks on links A, B, and F (having decided that C, D, and E are just blog spam). When the search is done on Bing it takes into account that people were clicking on A, B, and F but only a few were clicking on C, D, and E and they didn't stay if they did. When it ranks the results C, D, and E are ranked lower as a result.
Basically, it brings humans into the ranking process to provide more useful results. Digital computers are not nearly as good at recognizing patterns (and thus filtering out junk sites) as the human brain. In some ways, it is sort of like Yahoo did in its early days. Also bear in mind that even after Google engineers fed Bing lots of fake data and fake clickthroughs on nonsense words they still only managed to get Bing to show the site they wanted a like 6 times out of 100 attempts. In other words, using a bullshit scenario that would never happen in real life they were only able to trick Bing a whopping 6% of the time.
[ link to this | view in chronology ]
Re: Re: Re: Re: This word you keep using, Google does not think it means what you think it means
"User searches for a word or phrase in Google or any other search engine and then clicks on links A, B, and F (having decided that C, D, and E are just blog spam)."
But what miraculous process put relevant results in positions A, B and F? Google's algorithm, we can presume. If another search engine copies the results of the algorithm, it means they can fake improved search performance but don't know how it was actually done. They improve their search, but contribute nothing to the users or to search engine technology -- not innovation, in my opinion.
"using a bullshit scenario that would never happen in real life they were only able to trick Bing a whopping 6% of the time."
Which means Bing couldn't absorb all the data the 20 engineers were feeding it and nothing else. As to why, it's anybody's guess. My guesses are, it's either to keep the sparse document vectors smaller (by ignoring rarer terms) and thus cut costs, or maybe they were clever enough to have a safeguard so spammers can't exploit their exploit and Google-bomb them (literally) with fake/dangerous websites for common terms.
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Re:
[ link to this | view in chronology ]
Bing (Powered by Google)
Much like a songwriter, after hearing his song played by a musician coming out and saying "I wrote that." Not trying to stop the musician, just interested in claiming the credit.
I haven't seen Google begin formal action, legal or otherwise. They only seem interested in pointing out the behavior to the press and embarrassing a competitor.
[ link to this | view in chronology ]
Say what you want
[ link to this | view in chronology ]
Re: Say what you want
Google's engineers shouldn't be submitting their clickstream data to Microsoft if they don't want them to use it to better their search results.
In this case Google engineers intentionally manipulated their search results (they said this is impossible during congressional hearings - obviously incorrect) and then intentionally and in an organized manner attempted to use clickstream data to influence Bing search results. That is one form of click fraud.
I get it that you like Google and don't like Microsoft. However, these types of arguments don't make any actual sense.
[ link to this | view in chronology ]
Two headed hydra that can't agree
One head posts about how businesses need to learn to innovate, to compete, and accept that marketplace instead of falling back to legal protections.
The other says that when there is something wrong with a copy we should leave it up to social shunning to make it right.
And yet, despite the fact that Google has not, so far at least, fell back to legal protections, and is actually trying to leave it to social shunning, TechDirt posts are now trying to socially shun Google when they're the ones that were copied.
Make up your mind.
[ link to this | view in chronology ]
Re: Two headed hydra that can't agree
Calling something cheating isn't merely socially shunning, it's implying that they broke the rules. The fact that they haven't sued doesn't mean that what they've said isn't liable to backfire. It would appear that you would rather Techdirt supported Google making inaccurate statements than point out the truth.
That said, Microsoft may have fucked up by how they gathered the data. However, in that case they didn't wrong Google or Google's customers, only their own customers.
[ link to this | view in chronology ]
Re: Re: Two headed hydra that can't agree
It's their opinion and they're allowed to it, though. This is my comment and I'm voicing my opinion. *shrug*
[ link to this | view in chronology ]
Re: Re: Re: Two headed hydra that can't agree
[ link to this | view in chronology ]
Re: Re: Re: Two headed hydra that can't agree
[ link to this | view in chronology ]
Re: Re: Re: Two headed hydra that can't agree
It's not like Google really wasted a lot of time, money, and effort to catch Microsoft in the act, and once it did, it tossed up a blog post about it. Honestly to me, it seems like Google's doing exactly what Techdirt says it should, by socially shunning Microsoft for "cheating". Maybe it should've done it a little more snarky to come out quite a bit more ahead, but still, from what I've read, it's good enough.
I don't quite get why Mike says "Google complaining about Microsoft using clickstream data is a waste of time". It isn't. It puts Google in the better light socially, exactly what Mike has set forth in the past. It's been really hard to read these Google vs Bing articles in the past couple days, since it's a glaring hypocrisy in every one of the articles.
[ link to this | view in chronology ]
Re: Re: Re: Re: Two headed hydra that can't agree
The point is that Google are implying that Microsoft wronged them by cheating, which they technically don't appear to have and thus stand to generate more bad publicity for crying wolf than if they'd just left out the accusation of cheating. Flaming Mike for saying so is OK. Flaming Mike for saying so and suggesting that he is somehow going against his opinions on shaming actual wrongdoings when he doesn't think this is an actual wrongdoing is not OK.
[ link to this | view in chronology ]
It's just an excuse to kick Microsoft when it's down!
t makes sense Google would try to create bad news stories for Microsoft. This is a small piece of propaganda in a much bigger spat.
Microsoft is involved in several law suits against Android. Which Google can't be happy about. And at the moment Microsoft looks particularly stale and week.
Microsoft continues losing money in the search space, it continues to cut projects and product lines, it continues to lay off staffers and it's also still losing top managers. And for the first time in a long time Windows desktop market share is threatening to drop below 90%.
And the recent financial results from Microsoft didn't look good. Even after they tried to explain them away with their own special brand of accounting. The share price for Microsoft stock still fell.
Microsoft are trying to hurt Google at the moment. And Google smells blood. But not it's own.
[ link to this | view in chronology ]
Google and Microsoft Will Settle
[ link to this | view in chronology ]
It has to - it's not like Google wrote the software their services *need* to operate. Microsoft didn't design the CPU that's needed for their OS to operate..
If all innovation in an area was left to a single lateral patent/copyright - we'd still be riding horses if we couldn't afford the buggy from the single producer.
Of course, Microsoft has a long history of just hi-jacking other people's innovations and then boxing them with other software in a vain attempt to make it look like it's 'original'.
[ link to this | view in chronology ]
I am also sure Microsoft would be perfectly happy if Google just went away, even if it hurt Bing's abilities.
I don't have a problem with Microsoft using Google info to improve their search, I don't have a problem with companies looking at products on the market and improving them. What I do believe is wrong is flat out copying content. That is what most musicians and artists have a problem with. Its not taking something they have done and redoing it, it is taking a song and just because its digital, thinking there is a right to distribute it.
[ link to this | view in chronology ]
Missing Backstory
http://www.npr.org/2011/02/02/133443201/Google-Bing-Tussle-Over-Search
The "Search Rip-off" came about as Google's lead engineers started noticing Bing's searches on misspelled words where getting identical fixes and results.
"LAURA SYDELL: When you type a search request into Google, say, Hosni Mubarak, and you're a couple of letters off, Google can usually figure out what you mean.
Mr. AMIT SINGHAL (Software Engineer, Google): And getting these queries right is an incredibly hard task. It's a very challenging algorithm.
SYDELL: That's Amit Singhal. He's the lead of the search team at Google. A few months back, they noticed something strange. A user searched for tarsorrhaphy.
Mr. SINGHAL: It was this real medical procedure that some users generally needed to know about.
SYDELL: The user misspelled it. But Google's algorithms figured out what he needed. Singhal noticed that competitor Bing didn't bring up any results until a few weeks later.
Mr. SINGHAL: Bing started showing the topmost relevant result for that spelling correction to their users.
SYDELL: Hmm.
Mr. SINGHAL: Now, we got suspicious. However, we said, maybe they came up with some clever algorithm and they did it.
SYDELL: But Singhal and his team decided to do a little experiment. They began to do searches for silly made-up words, and they created fake results unrelated to those words. A few weeks later...
Mr. SINGHAL: Microsoft's Bing started showing the same artificial result for the same synthetic query. And this was just conclusive to us at that point."
While Bing has offered great things to Searching, there was clearly a copy of services that could not be explained by just creating their own proper code. As noted, Bing was "learning" from people using Google though IE 7/8, sending over data as to what was being searched and what Google returned with for those queries. That's a level of shady we've come to expect from MS and needs to be called out.
[ link to this | view in chronology ]
Re: Missing Backstory
http://www.npr.org/2011/02/02/133443201/Google-Bing-Tussle-Over-Search
All of that was explained in the original story. Not sure what's new in there? Still not sure why there's a problem there.
[ link to this | view in chronology ]
Re: Re: Missing Backstory
Say, if Bing returns the same Google page but with Bing logo, and Bing ads, etc. Still ok? Can we call this Bing-google-it-for-you innovation?
[ link to this | view in chronology ]
Re: Re: Re: Missing Backstory
I would consider something to be wrong if I said something that was factually incorrect. I haven't seen that in this story yet. There was nothing in that backstory that said anything I had said originally was wrong.
Say, if Bing returns the same Google page but with Bing logo, and Bing ads, etc. Still ok? Can we call this Bing-google-it-for-you innovation?
What do you mean by "ok"?
[ link to this | view in chronology ]
Beware.. who you fear
Microsoft is big and pushy... Google is worse they are arogant and snoopy with no interest in privacy rights etc. Google complaining someone else is looking at thier pubilc data is rich after they have been caught looking data on wifi networks.
[ link to this | view in chronology ]
Re: Beware.. who you fear
[ link to this | view in chronology ]
Marketing ploy
[ link to this | view in chronology ]
Are other search engines affected?
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Taking search result X from Google that happens in a Microsoft browser as a result of query Y, and then replicating it with their own search page certainly meets the definition of "copying", and I personally really struggle to see it as "innovation".
[ link to this | view in chronology ]
Multiple connections linking to ONE base line
Now once they are on each other, they will surely try to co-innovate by this kind of search.
In the end, WE are going to get the better product.
[ link to this | view in chronology ]