Can We Clean The Dead Sites Out Of Search Engines, Please?
from the thanks dept
Tech columnist James Derk has written up a fun column detailing a bunch of recent tech annoyances -- some of which are amusing and/or dead on. For example, computer companies who overload your computer with useless, intrusive and annoying software that is nearly impossible to remove (for example, Quickbooks, which comes with some computers, and as Derk points out: "Most large businesses don't use Quickbooks, most small businesses already have it, and most consumers don't want it.") Another amusing (and unfortunate) annoyance he lists is a computer he bought with a non-standard power supply -- which the company has run out of, and no longer has a supplier for. In other words, when his existing power supply broke, that's it for the computer. However, the most interesting may be his request to clean out the dead sites from the internet. He's sick of sites that no longer exist clogging up search engines: "I am getting very annoyed that no one has cleaned the Web lately. Lots of the sites I find in search engines no longer exist. I would like a house-cleaning day where we purge all of the search databases and start over. We all ought to have a "do-over" day." Wouldn't that be nice?Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Reader Comments
Subscribe: RSS
View by: Time | Thread
Re-indexing the web?
[ link to this | view in chronology ]
Re: Re-indexing the web?
[ link to this | view in chronology ]
Re: Re-indexing the web?
[ link to this | view in chronology ]
Why not just delete on visiting?
[ link to this | view in chronology ]
Re: Why not just delete on visiting?
Use AJAX to avoid users having to navigate though multiple pages once they click the "Broken Link" Link.
Then just have Google automatically re-index those pages and if they don't exist, then dump them.
I do agree with Urza9814's idea that you check every day for a week... this ensures that the site wasn't just down for an hour or so for maintainence.
[ link to this | view in chronology ]
Reindexing and cleaning the Internet is a non-issu
[ link to this | view in chronology ]
Don't they already?
I think the indices are being cleaned up, though it's probably less frequent to prevent servers with extensive outages from being dumped just because they had a string of bad luck when the spiders came knocking.
I can imagine a counter for every index in the search history that increments each time the spider visits unsuccessfully, and resets when it's successful.
*My* gripe are those www.instructionsonhowtocleanyourdigitalcamera.ac links that have nothing to do with what I'm looking for - it's just a spam site with a shitload of popups and a barrage of search words.
[ link to this | view in chronology ]
Re: Don't they already?
[ link to this | view in chronology ]
***PDF and CGI search indexing are mostly USELESS*
And what is the point of indexing "CGI" pages without their identifiers? Whats the point of going to page.cgi if all it will so is tell you, "page not found, please use your back button" - Who's brilliant idea was it to index these or any pages that need qualifiers -- message forums, I can understand. but if you are going to index them to your search engine, please index them with their full qualifiers (the full URL *PLUS* the stuff that comes after the "?" and "&" commands).
[ link to this | view in chronology ]
Re: ***PDF and CGI search indexing are mostly USEL
[ link to this | view in chronology ]
Re: ***PDF and CGI search indexing are mostly USEL
And this thing you mentioned about "logging in as you", is crazy. Because 1) if the message forum saves user information (loging name/password) in the qualifier portion of .CGI? or .PHP? strings, then what good is even having a login. And 2) how would the search engine even know to login as any particular user? Most search engines index (or try to) as an anonymous visitor... since most people that will use the link the search engine gathers, will also ultimately login as anonymous for their first visit.
Indexing a website by it's URL and then modifying it (by removing it's qualifiers) and including that new URL into your search engine, makes your search engine's data useless. This is the JUNK that should be removed from all engines, as it does nothing more than clutter the results with "irrelevant" results.
[ link to this | view in chronology ]
Re: Don't they already?
[ link to this | view in chronology ]
No Subject Given
I see they have options to only show web pages updated 3 months, 6 months, or year in their advanced page but that's too tedious for me when I'm feeling lazy... and isn't exactly going to do what I'd like to see.
[ link to this | view in chronology ]
OFFTOPIC
[ link to this | view in chronology ]
Re: OFFTOPIC
[ link to this | view in chronology ]
Easy solution + fix for a bigger problem
Considering I can't imagine this being more than a 1 week project for a small team of talented people and once the right algorythims are in place and tested functioning only statistical quality control will need to be done, team of pigeons anyone? A false addition to the dead link list could be done per form that first compares the URL to the deleted list to prevent usage as a submission form. Submissions that pass are automatically reindex within... 3 or so hours?
[ link to this | view in chronology ]
driver software annoyances
[ link to this | view in chronology ]
it would help
[ link to this | view in chronology ]
Early but seems fitting..
*** Attention ***
It's that time again!
As many of you know, each year the Internet must be shut down for 24 hours in order to allow us to clean it. The cleaning process, which eliminates dead email and inactive ftp, www and gopher sites, allows for a better-working and faster Internet.
This year, the cleaning process will take place from 23:59pm(GMT) on March 31st until 00:01am(GMT) on April 2nd. During that 24-hour period, five powerful Internet-crawling robots situated around the world will search the Internet and delete any data that they find.
In order to protect your valuable data from deletion we ask that you do the following:
1. Disconnect all terminals and local area networks from their Internet connections.
2. Shut down all Internet servers, or disconnect them from the Internet.
3. Disconnect all disks and hardrives from any connections to the Internet.
4. Refrain from connecting any computer to the Internet in any way.
We understand the inconvenience that this may cause some Internet users, and we apologize. However, we are certain that any inconveniences will be more than made up for by the increased speed and efficiency of the Internet, once it has been cleared of electronic flotsam and jetsam. We thank you for your cooperation.
Fu Ling Yu
Interconnected Network Maintenance staff
Main branch, Massachusetts Institute of Technology
Sysops and others: Since the last Internet cleaning, the number of Internet users has grown dramatically. Please assist us in alerting
the public of the upcoming Internet cleaning by posting this message where your users will be able to read it. Please pass this message on to other sysops and Internet users as well.
Thank you.
[ link to this | view in chronology ]
Re: Early but seems fitting..
(Isn't April 1 a dead giveaway?)
[ link to this | view in chronology ]
Re: Early but seems fitting..
[ link to this | view in chronology ]
Re: Early but seems fitting..
[ link to this | view in chronology ]
Re: Early but seems fitting..
[ link to this | view in chronology ]
Dead Links
Come on techies!
NoMorePoints.com
[ link to this | view in chronology ]
No Subject Given
[ link to this | view in chronology ]