Freebasing The Web... Or Just Making A Tired Idea Sound Fresh Again?

from the trying-to-figure-this-out dept

Fri, Mar 9th 2007 3:30pm — Mike Masnick

It seems like the buzz of the day is Danny Hillis' new company, Metaweb Technologies, which is releasing a product called Freebase. The description is basically a mix between Wikipedia and the Open Directory Project, or perhaps it's simply Wikipedia-with-metadata. It's yet another shot at Tim Berners-Lee's vision of the semantic web, where there's metadata about data, making it easier for computers to actually make use of the data. Of course, many people have pointed out that there are tremendous problems with the idea of a semantic web -- the biggest one being how do you actually get all that metadata connected to the data in the first place. From what's being discussed so far, it sounds like Metaweb is simply hoping that everyone will come in and do it for them, in the same spirit as Wikipedia (and, in fact, they've already sucked in much of Wikipedia's data to start with). In some ways, this project also reminds me of Cycorp, the big attempt to feed a computer all sorts of information while hoping that artificial intelligence would emerge. If you can make it into a fun game, people will do all sorts of things for you -- but there doesn't appear to be much of a game designed here.

Another concern is that this is simply creating yet another data silo. While they do appear to have made it open so that others can make use of the data, you still have to put the data into Freebase in the first place. However, perhaps the biggest problem with this concept is the very idea that you can accurately explain data with metadata. While the examples being given aren't too complicated (defining the name of a company as being a company, having an address be an address) that can get very complicated very fast -- and forcing metadata structures on existing data can often confuse things by forcing categories on things where they don't quite apply. Either that, or you get so much metadata that it's effectively useless. So, consider us skeptical, but intrigued. There are some very smart people working on this (and others whose opinions we trust seem impressed and awed by the project), but so far, it's just not clear how all the metadata will keep getting classified and how useful it will be once that's set.

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.

Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.

While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team

3 Comments

If you liked this post, you may also be interested in...

Reader Comments

Subscribe: RSS

View by: Time | Thread

Chris, 9 Mar 2007 @ 4:30pm

well..
In instances like this I like to say at least someones trying. But you don't hold the same regard to the kid trying to put the square block in the triangle hole. Will be interesting to see what develops.
[ link to this | view in chronology ]
|333173|3|_||3, 10 Mar 2007 @ 1:27am

What might be more usefull would be to produce a sort of programming language for text, so that each word has a specific meaning and part of speech, and a fixed, rigid structure enables the subject and object of verbs to be identified, and logical sntences to be built. A compiler would then be needed to produce text output in your language of choice, which could be produed as a FireFox extension and LaTeX package. Not only would this make life simpler for international web 2.0 prjects like Wikipedia (removing both the issues of "the German version says X, the English version says Y, which is correct?" and the argument over Commonwealth and US spellings in English pages), but it would also make it easier for AI projects to understand the page, and for search engines to get more relevant results.
I do appreciate that there would be considerable difficulties with this, such as the need for huge dictionaries for each language, and the fact that it would be hard to take into account changes in the way language is used. It would also be hard to produce good sentances, although it would probably be better than most machine translations now. Other problems would be finding a source of fixed definitions (perhaops the OED, wit numbers for each of the subsidiary definitions could be used for standard words, and a scientific dictionary for technical terms). The structure would also have to be less forgiving of mistakes than English, meaning that whiule at teh moment if someone makes a mess of a sentence on Wikipedia, you can try to figure out what is going on, whereas the translator would just have to escape the entire sentence and skip to the start of the next. The largest problem, and probably the killer, is that it would be hard for people to learn it (think of esperanto, which no-one uses) and even harder to keep within its structures (think of the illiterate crap that gets posted here every day - yes, I know that I and the other |333173|3|_||3 [there is at least one other person using my name] post).
FInally, there is the issue of evolution. As new words are needed, someone is going to have to create them, which means that you could end up with the situation in France where they have the language officially defined. Preferably the creators of the project should define the standards, until it becomes large enough for ISO to take over.
[ link to this | view in chronology ]
Brad, 12 Mar 2007 @ 5:42pm

Hmm....
Why the hell is this thing called "Freebase"? Yeah, there's a small "free" and "base" association with the product...but really?

That'd be like calling my new mass-consumer product "Crackpipe" because it contains a data security tool.

Drug references are all well and good...until you actually try to sell them.
[ link to this | view in chronology ]