Re: Re: This word you keep using, Google does not think it means what you think it means
Problem is, the internet is not a fixed document set. Yesterday's most relevant result could be today's least relevant result.
Google has shown that they have the right heuristics to keep their mappings updated, but I'm not sure if Bing can work well enough without "borrowing" Google's...
Re: This word you keep using, Google does not think it means what you think it means
That's more or less what I was arguing in the comments of the previous article, even though my thoughts focused only on innovation in search quality and not on presentation.
The result of my discussion with Marcus Carab can be summarised thus: It depends on how much Microsoft is (indirectly) using Google's results, and we won't know that until more data is made available.
My hunch is, a lot. They must be getting massive amounts of Google data, seeing how many people use Google, and it's all in pure query->document format, no less. In my view, instead of coming up with a better way to analyse the data it already has, Bing is trying to replicate Google's existing semantic links* between terms, which is possibly the hardest thing to tweak when you're making advanced document retrieval systems.
That they say they use "over 1000 variables" is irrelevant, because as any statistician will tell you it's not the number of variables that counts but their weighting. If Bing is aiming to "become Google" because that's the search engine people want, they'll use the query->document data they get from Google to directly reinforce the query->document mappings in their system, which makes the other sources mostly irrelevant...
And that's why this is cheating, in my opinion. Perhaps that's not necessarily a "bad" thing and their technology will eventually and inevitably catch up, but it leaves a bad taste in my mouth all the same.
* For instance, Google may have decided to use a thesaurus (or even automatically learned a thesaurus!) to create a link between the terms "cat" and "feline", so when a user searches for cats, they also get documents about felines. This is not an obvious link for a computer, but it very likely improves retrieval performance. If Bing didn't think to do the same, and they only start showing documents about felines because they saw Google do the same, then their technology is still inferior, so in my book this cannot possibly count as innovation or as science. They are giving the illusion that they are competing with Google, but they are simply giving a "counterfeited" version of their competitor's results that they couldn't recreate by their own means.
The order to approve more patents "to stay competitive" will come from the top. These examiners will have to comply or they'll get the boot, like all other bureaucrats at the bottom of the food chain...
"I'm sorry, you still haven't convinced me that any of this isn't smart business and smart science."
I never argued either way about it being or not being smart business (I think it is).
As for the science, there's no science. I'd be applauding them if they were trying new algorithms that give more relevant results than Google's, but they aren't. Instead, this multi-billion dollar behemoth is training its algorithm to return the results of its rival. In some circles, that's called cheating...
"Google could only demonstrate that this was even happening by inventing fake queries to isolate the test from the many other data sources Bing no doubt makes use of."
The other data sources, and other Google users. They only enlisted a handful of people, and only waited for about a month -- their test group would have been overpowered by normal users if they had used popular terms.
"Algorithm B could never measure up to Algorithm A under that system."
Nope. If both their algorithms are similar (and they almost certainly are; possibly variations on a Dirichlet-based approach but with different implementation details), given enough time both algorithms will converge to giving almost identical results. So, it's not a matter of surpassing, but a matter of catching up to.
"But remember, this is not a right/wrong situation"
I can almost imagine Ballmer telling his chief engineer "If Google is what people want, then let's give them Google."
"Because I would be pretty astonished if they didn't."
Be under no illusion. I'm not defending Google, I'm criticising Microsoft.
I can totally empathise with Google calling them "cheaters", but only in the academic sense of the word (e.g., cheating on a test rather than defrauding a bank).
I have to admit that I'm speaking more as a science guy than as an arm-chair economist (?), and I agree with Google that this is cheating and not innovation (or, at least not technological innovation!).
From a machine learning/document retrieval point of view, it's quite clear that if algorithm A is trained mostly* on the results of algorithm B, then algorithm A's results will be at most as good as algorithm B's but not better.
In essence, if the two algorithms are very similar, algorithm A is trying to approximate the "internal self-programming" of algorithm B (for document retrieval, typically a bunch of weight/probability vectors).
In a different scenario, a weak learning algorithm could perfectly memorise some of the results of a more powerful algorithm, and return relevant results on common queries. However, that algorithm is "cheating" because it can't generalise to queries it's never seen before.
Perhaps this leads Bing's search engine to give more relevant results, but it's doing it by riding on the coat-tails of Google's algorithm. That's not the way to win any data-mining competitions.
* I'm assuming that they're getting an enormous amount of data from Google searches, which dwarfs the amount of data they get from their own search engine.
The key word there is "managed". The US government was never given any more authority over the internet than any other country. That this service was handed to them for safe-keeping does not implicitly authorize them to meddle with it!
And with that said, if Spain or any other country were to erase US domains from the entire internet, US politicians will be hollering from the roofs.
Mike, I disagree with you (at least for the moment).
As long as Google is merely pointing out that others are taking their search results and pretending they are their own, I can't hold it against them. This is "social mores" at work, and in this situation I feel it's the right thing to do.
HOWEVER, if they start suing everyone I will then agree with you that this is only about "protecting their turf".
Whether or not the sites were randomly declared illegal under US law has no bearing on whether or not they are legal under Spanish law.
The US was tasked to manage some non-country-specific TLDs and they failed to remain impartial. Result? They will be stripped of any similar responsibilities in the future, along with any benefits such responsibility will have bestowed.
I haven't looked at the details, but I recall reading that all of Egypt's infrastructure is owned by four companies. If you look closely at the diagram, you can see usage fall in steps (four of them, in fact) starting at around 4PM.
On the post: Microsoft Highlights Why Google's 'Cheater' Accusations Ring Hollow
Re: Re: This word you keep using, Google does not think it means what you think it means
Google has shown that they have the right heuristics to keep their mappings updated, but I'm not sure if Bing can work well enough without "borrowing" Google's...
On the post: Microsoft Highlights Why Google's 'Cheater' Accusations Ring Hollow
Re: This word you keep using, Google does not think it means what you think it means
The result of my discussion with Marcus Carab can be summarised thus: It depends on how much Microsoft is (indirectly) using Google's results, and we won't know that until more data is made available.
My hunch is, a lot. They must be getting massive amounts of Google data, seeing how many people use Google, and it's all in pure query->document format, no less. In my view, instead of coming up with a better way to analyse the data it already has, Bing is trying to replicate Google's existing semantic links* between terms, which is possibly the hardest thing to tweak when you're making advanced document retrieval systems.
That they say they use "over 1000 variables" is irrelevant, because as any statistician will tell you it's not the number of variables that counts but their weighting. If Bing is aiming to "become Google" because that's the search engine people want, they'll use the query->document data they get from Google to directly reinforce the query->document mappings in their system, which makes the other sources mostly irrelevant...
And that's why this is cheating, in my opinion. Perhaps that's not necessarily a "bad" thing and their technology will eventually and inevitably catch up, but it leaves a bad taste in my mouth all the same.
* For instance, Google may have decided to use a thesaurus (or even automatically learned a thesaurus!) to create a link between the terms "cat" and "feline", so when a user searches for cats, they also get documents about felines. This is not an obvious link for a computer, but it very likely improves retrieval performance. If Bing didn't think to do the same, and they only start showing documents about felines because they saw Google do the same, then their technology is still inferior, so in my book this cannot possibly count as innovation or as science. They are giving the illusion that they are competing with Google, but they are simply giving a "counterfeited" version of their competitor's results that they couldn't recreate by their own means.
On the post: And Now Europe Feels The Need To Catch Up To China And The US In The Self-Destructive Patent Race
Re: Re: One thing assured
On the post: And Now Europe Feels The Need To Catch Up To China And The US In The Self-Destructive Patent Race
Re:
On the post: And Now Europe Feels The Need To Catch Up To China And The US In The Self-Destructive Patent Race
Re: One thing assured
On the post: Google's Childish Response To Microsoft Using Google To Increase Bing Relevance
Re: Re: Re: Re: Re: Re:
O rly? ;)
"I'm sorry, you still haven't convinced me that any of this isn't smart business and smart science."
I never argued either way about it being or not being smart business (I think it is).
As for the science, there's no science. I'd be applauding them if they were trying new algorithms that give more relevant results than Google's, but they aren't. Instead, this multi-billion dollar behemoth is training its algorithm to return the results of its rival. In some circles, that's called cheating...
On the post: Google's Childish Response To Microsoft Using Google To Increase Bing Relevance
Re: Re: Re: Re:
The other data sources, and other Google users. They only enlisted a handful of people, and only waited for about a month -- their test group would have been overpowered by normal users if they had used popular terms.
"Algorithm B could never measure up to Algorithm A under that system."
Nope. If both their algorithms are similar (and they almost certainly are; possibly variations on a Dirichlet-based approach but with different implementation details), given enough time both algorithms will converge to giving almost identical results. So, it's not a matter of surpassing, but a matter of catching up to.
"But remember, this is not a right/wrong situation"
I can almost imagine Ballmer telling his chief engineer "If Google is what people want, then let's give them Google."
"Because I would be pretty astonished if they didn't."
Be under no illusion. I'm not defending Google, I'm criticising Microsoft.
On the post: TSA Starts Testing New Scanners That Don't Show Your Naked Body
Re: The system does not involve new machines. Instead, it relies on new software.
This is what I heard. The hardware is the same, but the scanned 3d model is distorted and applied onto a generic figure.
Safety concerns are still relevant.
On the post: Google's Childish Response To Microsoft Using Google To Increase Bing Relevance
Re: Re:
I can totally empathise with Google calling them "cheaters", but only in the academic sense of the word (e.g., cheating on a test rather than defrauding a bank).
On the post: Google's Childish Response To Microsoft Using Google To Increase Bing Relevance
Re: Re:
From a machine learning/document retrieval point of view, it's quite clear that if algorithm A is trained mostly* on the results of algorithm B, then algorithm A's results will be at most as good as algorithm B's but not better.
In essence, if the two algorithms are very similar, algorithm A is trying to approximate the "internal self-programming" of algorithm B (for document retrieval, typically a bunch of weight/probability vectors).
In a different scenario, a weak learning algorithm could perfectly memorise some of the results of a more powerful algorithm, and return relevant results on common queries. However, that algorithm is "cheating" because it can't generalise to queries it's never seen before.
Perhaps this leads Bing's search engine to give more relevant results, but it's doing it by riding on the coat-tails of Google's algorithm. That's not the way to win any data-mining competitions.
* I'm assuming that they're getting an enormous amount of data from Google searches, which dwarfs the amount of data they get from their own search engine.
On the post: Homeland Security Seizes Spanish Domain Name That Had Already Been Declared Legal
Re: Re: Re:
No, "because they can" does not make the point moot!
"A judge issued a warrant to seize property located in the U.S. What specific international duty does that violate? None that I know of."
.us is located in the US.
.com means "company" (duh!) and it's an international address.
On the post: Homeland Security Seizes Spanish Domain Name That Had Already Been Declared Legal
Re: Re: Re: Not surprising.
On the post: Homeland Security Seizes Spanish Domain Name That Had Already Been Declared Legal
Re: Re: Re: Re: Re:
And with that said, if Spain or any other country were to erase US domains from the entire internet, US politicians will be hollering from the roofs.
On the post: Google's Childish Response To Microsoft Using Google To Increase Bing Relevance
As long as Google is merely pointing out that others are taking their search results and pretending they are their own, I can't hold it against them. This is "social mores" at work, and in this situation I feel it's the right thing to do.
HOWEVER, if they start suing everyone I will then agree with you that this is only about "protecting their turf".
On the post: Homeland Security Seizes Spanish Domain Name That Had Already Been Declared Legal
Re: Not surprising.
Ironically, their individual paychecks are heavier than the paychecks of anyone you've ever met. Combined.
On the post: Homeland Security Seizes Spanish Domain Name That Had Already Been Declared Legal
Re: Re: Re:
On the post: Homeland Security Seizes Spanish Domain Name That Had Already Been Declared Legal
Re:
The US was tasked to manage some non-country-specific TLDs and they failed to remain impartial. Result? They will be stripped of any similar responsibilities in the future, along with any benefits such responsibility will have bestowed.
On the post: The Impact Of Egypt Cutting Itself Off From The Internet
On the post: A Look At How Egypt Shut Down The Internet
On the post: Just Under 100,000 Sued In Mass Copyright Infringement Suits Since Start Of 2010
Re:
If "ripping things off" is better for the public than being sued en masse, perhaps it's time for the silly politicians to do their jobs.
Next >>