So did I. If it doesn't scan until after it punches the top that adds even another layer of annoyance (and cost) to the consumer:
"Oops!. You've inserted a non-approved pod. As an added bonus we've fucked up the seal on your non-approved pod to make sure your non-approved product goes stale as soon as possible. Have a nice day!"
Every time I've used a Keurig machine (a relative owns one) all I end up with is a cup filled with a liquid that is almost, but not quite, entirely unlike tea.
YaCy's DHT index only stores what they term a Reverse Word Index (RWI). The entries only associate a word with url's that contain that word.
When you search, the client receives a list of url's that contain your search word from your own index and your peers. It then verifies that word is on each of the resulting url's pages and creates the snippets at that point. The snippets aren't saved anywhere in the index. Yes, this approach adds some time when waiting for results, but it assures the resulting pages exist and removes bad links from the index.
YaCy seems to be scaling up just fine with over 350 thousand words and almost 2 billion url's currently.
oh, one of the white knights of copyright actually suggests pirating?
I really don't think KT is a white night of anything.
In my opinion, he's just some idiot on the internet who tends to yap alot without thinking much past his own nose, let alone thinking things out to their logical conclusions.
Some tips and tricks I've learned to make YaCy run better:
- Increase the RAM setting. Default is 600MB. I have a 4GB so I give YaCy 1 GB (1200MB). I would give more if this was dedicated node, but since it's my laptop, 1GB seems to play nice with other stuff that's I'm running.
- Limit crawl maximum. Default is 6000 PPM (pages per minute) and that is pretty large. I share my internet connection with other people and devices so I limit it to 300 PPM so I don't hog all the bandwidth and piss anyone off.
- Increase language ranking to 15 (max). I tend to like reading stuff in English, but that's just me.
- Turn on Heuristics settings so it automatically crawls one level deep on every page returned in the results. This way if you do a search and the results kind of suck - wait ten minutes, do the search again and the results are better because it was "learning" about what you just searched for.
I also turn on the "site operator shallow crawl". When you enter a search query in the format "site:somewebsite.com" it automatically crawls that site one level deep.
The biggest poison for any search engine is "SEO" people knowing exactly what to do to rank well. Once they know that, they will repeat it as many times as needed to totally dominate results and render the searches effectively worthless.
With YaCy the user controls the ranking, since it's done at the client. The user also controls their own blacklist of results. I've been running a YaCy node for over a year and really have had to blacklist only two entities - one was a porn link spammer and the other was an annoying link farm without any actual content.
A system where the ranking process is open source is pretty much doomed to an early death, as the results will be almost entirely spam within hours of it reaching a reasonable level of user searches.
Not at all. YaCy doesn't seem to be useless because of spam at all. I'm migrating to using YaCy almost exclusively now since I have it set up to crawl based on what I search for, my results are very relevant to me.
Aereo could be back in business today, provided that they negotiate under the 1976 law a deal for the programming that they resell.
How exactly would Aereo do that when the ruling in WPIX, Inc. v. IVI, Inc. states explicitly that "that Internet retransmissions services did not constitute cable systems under section 111" and are therefore not entitled to a compulsory license?
It seems to me that SCOTUS left them in limbo - they "look" too much like a cable company, but they are "not enough like a cable company" to receive compulsory licenses.
I think copyright reform should limit terms to, say, no more than 35 years with one renewal. Corporations would not have the right to renewal. It would no longer be life plus some period of time. It would be life plus a period of time not to exceed 70 years.
That's too long if you ask me. Something near the original 14 years + another 14 year renewal, like where the original US copyright laws started is a better position to start bargaining from.
Even better would be something like the sliding scale that Derek Khanna suggested, where free at first, consecutive renewals cost increasing amounts. This would allow us to stay within the Berne Convention by *allowing* copyright to have sufficient lengths, but not necessarily encouraging it. This would also help with our orphaned works problem too.
Also, I don't think excluding corporations from owning copyrights is very feasible, especially in collaborative endeavors. Why shouldn't the entity footing the bill for a $100 million dollar movie own the copyrights for it?
You might have a valid point if VW's customers were all engaged in criminal activity or covert acts, but they are not, plain and simple.
You might have a point if users of Tor were all engaged in criminal activity or covert acts, but they are not, plain and simple.
More and more people are using Tor/I2p/Freenet/YaCy/proxies/etc.. for the simple reason that they don't like their private internet activities being monitored 24/7 by invasive governments and corporations.
Anonymity is traditionally considered a natural right in the US. Being anonymous isn't a criminal act in itself, nor is it even a true indicator of criminal activity.
Aereo set up the antennas the way they did for a specific purpose and it was not to conform to the law.
Ummm. That's 100% wrong. Aereo purposely set up their system to conform with existing statutes and case law (ie: engineered by lawyers, instead of technicians).
SCOTUS punted on actually looking at the technology to determine if it actually broke any laws and ruled on the premise that Aereo "looked" too much like a cable company instead.
They saw it for what it was, and told them that they can operate if they operate by the same laws that all other re-broadcasters are held to.
How exactly would Aereo do that when the ruling in WPIX, Inc. v. IVI, Inc. states explicitly that "that Internet retransmissions services did not constitute cable systems under section 111" and are therefore not entitled to a compulsory license?
It seems that Aereo had no choice other than shutting down since SCOTUS left them in a no-mans land - they're "too much" like a cable system to be fair use, yet not enough like a cable system to be afforded a section 111 compulsory license.
As for search, the amount of cross referencing etc required, given the size of the Internet really does require a very large data center, preferably duplicated for reliability, just to support the traffic needed to build and search indexes.
I disagree. YaCy has tackled the inherit problems and scalability of a distributed search engine fairly well.
Not only that, if you turn on some of YaCy's heuristics functions it "learns" all about that small corner of the internet you personally care about by crawling based off your search results.
Re: Re: Re: Re: Re: Re: Re: Opinions, world cup and pocket lint
To expand my analogy even further to compare it to the Aereo case:
I use the "loophole" in jaywalking laws and cross the street legally at the crosswalk, but SCOTUS rules that I really didn't cross the street legally because my walking at the crosswalk looks to much like what a rickshaw service does.
Re: Re: Re: Re: Re: Re: Opinions, world cup and pocket lint
Sorry, still missing the point.
No, I think you are missing the point. You went off on some weird tangent about being carried across the road or using a rocket as the "loophole". The "loophole" in the my jaywalking analogy is simply crossing the street at the crosswalk. Or in everyone else's vernacular, except yours: "following the law".
On the post: FBI Directly Spying On Prominent Muslim-American Politicians, Lawyers And Civil Rights Activists
Re: Re:
Is it also completely justified if NSA and FBI are watching over-aggressive, bigoted, internet commentators with handgun avatars?
Just wondering.
On the post: Keurig Begins Demonstrating Its Coffee DRM System; As Expected, It Has Nothing To Do With 'Safety'
Re: Re: Re: Re: Re: Hacked in 3...2...1...
So did I. If it doesn't scan until after it punches the top that adds even another layer of annoyance (and cost) to the consumer:
"Oops!. You've inserted a non-approved pod. As an added bonus we've fucked up the seal on your non-approved pod to make sure your non-approved product goes stale as soon as possible. Have a nice day!"
On the post: Keurig Begins Demonstrating Its Coffee DRM System; As Expected, It Has Nothing To Do With 'Safety'
On the post: Distributed Search Engines, And Why We Need Them In The Post-Snowden World
Re: Re: Re:
When you search, the client receives a list of url's that contain your search word from your own index and your peers. It then verifies that word is on each of the resulting url's pages and creates the snippets at that point. The snippets aren't saved anywhere in the index. Yes, this approach adds some time when waiting for results, but it assures the resulting pages exist and removes bad links from the index.
YaCy seems to be scaling up just fine with over 350 thousand words and almost 2 billion url's currently.
On the post: The Trials Of Being A Techdirt Writer Volume 1: Stupid Copyright Popups When Pressing CTRL-C
Re: Re:
I really don't think KT is a white night of anything.
In my opinion, he's just some idiot on the internet who tends to yap alot without thinking much past his own nose, let alone thinking things out to their logical conclusions.
On the post: Distributed Search Engines, And Why We Need Them In The Post-Snowden World
Re:
For YaCy it's a DHT (distributed hash table) and it's stored and shared in little bits and pieces from each user's hard drive.
Basically, it's "stored" the same way a torrent is "stored" in the swarm.
On the post: The Trials Of Being A Techdirt Writer Volume 1: Stupid Copyright Popups When Pressing CTRL-C
Re: Re:
[ *whooshing sound* ]
☺ ← kenichi tanaka
_____________________________
On the post: Distributed Search Engines, And Why We Need Them In The Post-Snowden World
YaCy Tips
- Increase the RAM setting. Default is 600MB. I have a 4GB so I give YaCy 1 GB (1200MB). I would give more if this was dedicated node, but since it's my laptop, 1GB seems to play nice with other stuff that's I'm running.
- Limit crawl maximum. Default is 6000 PPM (pages per minute) and that is pretty large. I share my internet connection with other people and devices so I limit it to 300 PPM so I don't hog all the bandwidth and piss anyone off.
- Increase language ranking to 15 (max). I tend to like reading stuff in English, but that's just me.
- Turn on Heuristics settings so it automatically crawls one level deep on every page returned in the results. This way if you do a search and the results kind of suck - wait ten minutes, do the search again and the results are better because it was "learning" about what you just searched for.
I also turn on the "site operator shallow crawl". When you enter a search query in the format "site:somewebsite.com" it automatically crawls that site one level deep.
On the post: Distributed Search Engines, And Why We Need Them In The Post-Snowden World
Re: Re:
With YaCy the user controls the ranking, since it's done at the client. The user also controls their own blacklist of results. I've been running a YaCy node for over a year and really have had to blacklist only two entities - one was a porn link spammer and the other was an annoying link farm without any actual content.
A system where the ranking process is open source is pretty much doomed to an early death, as the results will be almost entirely spam within hours of it reaching a reasonable level of user searches.
Not at all. YaCy doesn't seem to be useless because of spam at all. I'm migrating to using YaCy almost exclusively now since I have it set up to crawl based on what I search for, my results are very relevant to me.
On the post: Funniest/Most Insightful Comments Of The Week At Techdirt
Re: Re: Re: loophole
How exactly would Aereo do that when the ruling in WPIX, Inc. v. IVI, Inc. states explicitly that "that Internet retransmissions services did not constitute cable systems under section 111" and are therefore not entitled to a compulsory license?
It seems to me that SCOTUS left them in limbo - they "look" too much like a cable company, but they are "not enough like a cable company" to receive compulsory licenses.
On the post: VP Of EU Commission On Copyright Reform: 'I'd Sing You Happy Birthday, But I Don't Want To Have To Pay The Royalties'
Re: OK, people, you've got Copyright wrong
That's too long if you ask me. Something near the original 14 years + another 14 year renewal, like where the original US copyright laws started is a better position to start bargaining from.
Even better would be something like the sliding scale that Derek Khanna suggested, where free at first, consecutive renewals cost increasing amounts. This would allow us to stay within the Berne Convention by *allowing* copyright to have sufficient lengths, but not necessarily encouraging it. This would also help with our orphaned works problem too.
Also, I don't think excluding corporations from owning copyrights is very feasible, especially in collaborative endeavors. Why shouldn't the entity footing the bill for a $100 million dollar movie own the copyrights for it?
On the post: VP Of EU Commission On Copyright Reform: 'I'd Sing You Happy Birthday, But I Don't Want To Have To Pay The Royalties'
Re: Sherlock Holmes
That claim got smacked down pretty hard by Judge Richard Posner and was covered and discussed here on Techdirt already:
https://www.techdirt.com/blog/?tag=arthur+conan+doyle
On the post: Austrian Tor Exit Node Operator Found Guilty As An Accomplice Because Someone Used His Node To Commit A crime
Re: arm waving frantic
You might have a point if users of Tor were all engaged in criminal activity or covert acts, but they are not, plain and simple.
More and more people are using Tor/I2p/Freenet/YaCy/proxies/etc.. for the simple reason that they don't like their private internet activities being monitored 24/7 by invasive governments and corporations.
Anonymity is traditionally considered a natural right in the US. Being anonymous isn't a criminal act in itself, nor is it even a true indicator of criminal activity.
On the post: The Aereo Ruling Is A Disaster For Tech, Because The 'Looks Like Cable' Test Provides No Guidance
Re: Re: Re: Re: Re: Re: Re: Rental
Ummm. That's 100% wrong. Aereo purposely set up their system to conform with existing statutes and case law (ie: engineered by lawyers, instead of technicians).
SCOTUS punted on actually looking at the technology to determine if it actually broke any laws and ruled on the premise that Aereo "looked" too much like a cable company instead.
On the post: Did Aereo Kill The Cablevision Ruling That Enabled So Much Innovation? Who The Hell Knows?
Re: Re: Re: Re:
How exactly would Aereo do that when the ruling in WPIX, Inc. v. IVI, Inc. states explicitly that "that Internet retransmissions services did not constitute cable systems under section 111" and are therefore not entitled to a compulsory license?
It seems that Aereo had no choice other than shutting down since SCOTUS left them in a no-mans land - they're "too much" like a cable system to be fair use, yet not enough like a cable system to be afforded a section 111 compulsory license.
On the post: How The Copyright Wars Have Harmed Privacy And A Free Press
Re: Re: Re: Re:
I disagree. YaCy has tackled the inherit problems and scalability of a distributed search engine fairly well.
Not only that, if you turn on some of YaCy's heuristics functions it "learns" all about that small corner of the internet you personally care about by crawling based off your search results.
On the post: Facebook Fighting Against Massively Broad Warrant From NY District Attorney For All Information From 381 Accounts
Re: Probably cause?
Nothing worse than correcting someone else's typo with a comment that has a typo. Geesh.
Should read: ...I know I'm being pedantic here
On the post: Facebook Fighting Against Massively Broad Warrant From NY District Attorney For All Information From 381 Accounts
Probably cause?
Shouldn't that be "probable cause"?
And yes, I know I'm be pedantic here, but misuse of that term happens to be one of my pet peeves.
On the post: Even Hollywood Publications Are Concerned That Aereo Decision Kills Innovation And Harms Consumers
Re: Re: Re: Re: Re: Re: Re: Opinions, world cup and pocket lint
I use the "loophole" in jaywalking laws and cross the street legally at the crosswalk, but SCOTUS rules that I really didn't cross the street legally because my walking at the crosswalk looks to much like what a rickshaw service does.
On the post: Even Hollywood Publications Are Concerned That Aereo Decision Kills Innovation And Harms Consumers
Re: Re: Re: Re: Re: Re: Opinions, world cup and pocket lint
No, I think you are missing the point. You went off on some weird tangent about being carried across the road or using a rocket as the "loophole". The "loophole" in the my jaywalking analogy is simply crossing the street at the crosswalk. Or in everyone else's vernacular, except yours: "following the law".
Next >>