The Decentralized Web Could Help Preserve The Internet's Data For 1,000 Years. Here's Why We Need IPFS To Build It.

from the protocols-not-platforms dept

Tue, May 5th 2020 3:40pm — Carson Farmer

The internet economy runs on data. As of 2019, there were over 4.13 billion internet users generating more than 2.5 quintillion bytes of data per day. By the end of 2020, there will be 40 times more bytes of data than there are stars to observe in space. And all of this data is powering a digital revolution, with the data-driven internet economy already accounting for 6.9% of U.S. GDP in 2017. The internet data ecosystem supports a bustling economy ripe with opportunity for growth, innovation, and profit.

There’s just one problem: While user-generated data is the web’s most valuable asset, internet users themselves have almost no control over it. Data storage, data ownership, and data use are all highly centralized under the control of a few dominant corporate entities on the web, like Facebook, Google, and Amazon. And all that data centralization comes at an expensive cost to the ordinary internet user. Today’s internet ecosystem, while highly profitable for a few corporations, creates incentives for major platforms to exercise content censorship over end-users who have nowhere else to go. It is also incompatible with data privacy, insecure against cybercrime and extremely fragile.

The web’s fragility in particular presents a big problem for the long-term sustainability of the web: we’re creating datasets that will be important for humanity 1000 years from now, but we aren’t safeguarding that data in a way that is future-proof. Link rot plagues the web today, with one study finding that over 98% of web links decay within 20 years. We are exiting the plastic era, and entering the data era, but at this rate our data won’t outlast our disposable straws.

To build a stronger, more resilient and more private internet, we need to decentralize the web by putting users back in control of their data. The web that we deserve isn’t the centralized web of today, but the decentralized web of tomorrow. And the decentralized web of tomorrow will need to last the next 1,000 years, or more.

Our team has been working for several years to make this vision of a decentralized web a reality by changing the way that apps, developers, and ordinary internet users make and share data. We couldn’t be doing this today without the InterPlanetary File System (IPFS)—a crucial tool in our toolbox that’s helping us tear down the major technological hurdles to building a decentralized web. To see why, we need to understand both the factors driving centralization on the web today, and how IPFS changes the game.

In fact, I want to make a bold prediction: in the next one to two years, we’re going to see every major web-browser shipping with an IPFS peer, by default. This has already started with the recent announcement that Opera for Android will now support IPFS out of the box. This type of deep integration is going to catalyze a whole range of new user and developer experiences in both mobile and desktop browsers. Perhaps more importantly, it is going to help us all safeguard our data for future net-izens.

Here’s how:

With the way the web works now, if I want to access a piece of data, I have to go to a specific server location. Content on the internet today is indexed and browsed based on where it is. Obviously, this method of distributing data puts a lot of power into the hands of whoever owns the location where data is stored, just as it takes power out of the hands of whoever generates data. Major companies like Google and Amazon became as big as they are by assuming the role of trusted data intermediaries, routing all our internet traffic to and through their own central servers where our data is stored.

Yet, however much we may not like “big data” collecting and controlling the internet’s information, the current internet ecosystem incentivizes this kind of centralization. We may want a freer, more private and more democratic internet, but as long as we continue to build our data economy around trusted third-party intermediaries who assume all the responsibilities of data storage and maintenance, we simply can’t escape the gravitational pull of centralization. Like it or not, our current internet incentives rely on proprietary platforms that disempower ordinary end users. And as Mike Masnick has argued in his essay "Protocols, Not Platforms: A Technological Approach to Free Speech", if we want to fix the problems with this web model, we’ll have to rebuild the internet from the protocol layer up.

That’s where IPFS comes in.

IPFS uses “content-addressing,” an alternative way of indexing and browsing data that is based, not on where that data is, but on what it is. On a content-addressable network, I don’t have to ask a central server for data. Instead, the distributed network of users itself can answer my data requests by providing precisely the piece of data requested, with no need to reference any specific storage location. Through IPFS, we can cut out the data intermediaries and establish a data sharing network where information can be owned by anyone and everyone.

This kind of distributed data economy undermines the big data business model by reinventing the incentive structures of web and app development. IPFS makes decentralization workable, scalable and profitable by putting power in the hands of end users instead of platforms. Widespread adoption of IPFS would represent the major upgrade to the web that we need to protect free speech, resist surveillance and network failure, promote innovation, and empower the ordinary internet user.

Of course, the decentralized web still needs a lot of work before it is as user-friendly and accessible as the centralized web of today. But already we’re seeing exciting use cases for technology built on IPFS.

To get us to this exciting future faster, Textile makes it easier for developers to utilize IPFS to its full potential. Some of our partners are harnessing the data permanence that IPFS enables to build immutable digital archives that could withstand server failure and web decay. Others are using our products (e.g., Buckets) to deploy amazing websites, limiting their reliance on centralized servers and allowing them to store data more efficiently.

Textile has been building on IPFS for over three years, and the future of our collaboration on the decentralized web is bright. To escape the big data economy, we need the decentralized web. The improvements brought by IPFS, release after release, will help make the decentralized web a reality by making it easier to onboard new developers and users. As IPFS continues to get more efficient and resilient , its contribution to empowering the free and open web we all deserve will only grow. I can’t wait for the exponential growth we’ll see as this technology continues to become more and more ubiquitous across all our devices and platforms.

Carson Farmer is a researcher and developer with Textile.io. Twitter: @carsonfarmer

Filed Under: archiving, decentralized web, ipfs, platforms, preserving data, protocols

81 Comments

If you liked this post, you may also be interested in...

Reader Comments

Subscribe: RSS

View by: Time | Thread

Anonymous Anonymous Coward (profile), 5 May 2020 @ 3:39pm

I like this idea, but...
Question, how will IP maximalists feel about this development? My feeling is that they will make sure all their IP is sequestered on their servers. Quite possibly to their disadvantage.

Of course it will make 'taking down' things they claim are their IP more difficult as there won't be any easy 'place' to go after it. To that end, I bet they fight this advancement tooth and nail, even to the point of trying to buy legislation against it.

"Your file, and all of the blocks within it, is given a unique fingerprint called a cryptographic hash."

Another question is what is the relationship between the unique fingerprints and IP addresses?
[ link to this | view in thread ]
Anonymous Coward, 5 May 2020 @ 4:29pm

I don't get it.

If this is more private, how exactly is changing the location of my information from "Google's servers" to "an arbitrary number of stranger's devices" better?

Presumably the owner of the data would also gatekeep access to their own information, but through what mechanism would they be able to recover access to their data in the inevitable event that a criminal phishes the end-user for the key, or their computer dies, or any number of other situations that happens that exclusive access to someone's key-holding device (presumably their PC and/or phone) is compromised?
[ link to this | view in thread ]
Anonymous Coward, 5 May 2020 @ 4:37pm

A number of things wrong with this

Data storage, data ownership, and data use are all highly centralized under the control of a few dominant corporate entities on the web, like Facebook, Google, and Amazon.

This is simply not true, apart from the questionable inclusion of Facebook. Yes, if you post something to Facebook that data is stored on Facebook's servers and they maintain much of the control over that data. You, as the poster, are still free to delete or modify it so not all control is lost. But bigger than that, is anything, anything at all, posted to Facebook worth preserving for future generations?

As for Google, they merely index data found around the internet. They don't house it. The data owner still has their data wherever they chose to put it and remains in complete control of it. Google's search engine came about because what we had before that, shared cross-linking to other sites, was trash. All Google search does in index all of that data so it is easier to find than following a poor chain of shared links. While Google may be a behemoth, it is such because they make a ton of money selling ads which all those data providers put all over their own sites as a way of making money, shared with Google. It is not huge because of some imagined data theft and storage.

Amazon's AWS and related services are nothing more than server space that people rent in order to host their own data. Amazon doesn't own that data at all. And people are free to choose from countless other hosting options and even host their data on a server in their own closet.

This entire basis argument is a farce.

The web’s fragility in particular presents a big problem for the long-term sustainability of the web: we’re creating datasets that will be important for humanity 1000 years from now, but we aren’t safeguarding that data in a way that is future-proof.

The only fragility in the web lies in the DNS system which is actually fairly robust. As for preserving data, well, that's an entirely different problem to that posed by this article. Data preservation, by definition, means backing it up and storing it in (surprise!) centralized data stores. Copyright law could easily stand in the way of such an effort as it has for the Internet Archive. Whatever the solution to this problem, data redundancy is inherently and diametrically opposed to data owners retaining full control of their data. It's directly antithetical.

To build a stronger, more resilient and more private internet, we need to decentralize the web by putting users back in control of their data.

Again, users are already in control of their data, at least any data that matters. You might argue that Twitter discussions/debates with politicians are important to preserve for posterity but that's easily answered by noting all of the news outlets that cheerfully capture those tweets in articles that will still be around, barring a data backup error, for a very long time.

And again, data storage is already, as it has always been, fully decentralized. The only thing this article seems to be legitimately advocating for is a different kind of search engine. While that probably has some value, none of the arguments supporting this assumed need are at all convincing or even related to the proposed solution.

The article reads like "I need a new car because astrophysics is purple." Sure, maybe you do need a new car but wtf?
[ link to this | view in thread ]
Anonymous Coward, 5 May 2020 @ 4:40pm

In fact, I want to make a bold prediction: in the next one to two years, we’re going to see every major web-browser shipping with an IPFS peer, by default.

That is indeed bold. I heard a similar prediction about Tor during the Snowden days. Didn't happen. I don't really get what's special about IPFS. Seems pretty similar to Freenet from 2 decades ago. Despite the claim IPFS is "more private", I see no information on the site about that. It looks kind of like BitTorrent to me, and it's well known that strangers can see what you download over BT.
[ link to this | view in thread ]
Anonymous Coward, 5 May 2020 @ 4:46pm

Re: A number of things wrong with this
Another thought:

I challenge the author of this article to post it to slashdot. I'm willing to bet that not only will all of the above be repeated there (in far greater detail, both technical and savagely insulting) but the article will also be shredded as shameless self-promotion. If a technical article doesn't survive exposure there it shouldn't be posted elsewhere and the author should probably sit down for some deep self-reflection and a complete rethink of the idea.
[ link to this | view in thread ]
Anonymous Coward, 5 May 2020 @ 4:51pm

Re:
There's no specific claim that IPFS is more private here, or elsewhere. IPFS (the protocol) is not more private. The data you add to IPFS can be encrypted or not. The real benefit/claim here is about control IMO.
[ link to this | view in thread ]
Anonymous Coward, 5 May 2020 @ 5:30pm

if I read this right
Isn't it just basically a slightly more polished version of Freenet, thats been around for 20 years?
[ link to this | view in thread ]
Cpt Feathersword (profile), 5 May 2020 @ 5:34pm

How is this different from BitTorrent?
[ link to this | view in thread ]
Anonymous Coward, 5 May 2020 @ 7:45pm

Seems more likely there will be an extinction than 1000 years of anything if you understand the trajectory of web architecture.
[ link to this | view in thread ]
Anonymous Coward, 5 May 2020 @ 8:02pm

Found while RTFM: pinning services.
So to ensure persistent availability of important data, you'll pay some data center to keep it around for you. How is this different from today?
[ link to this | view in thread ]
Anonymous Coward, 5 May 2020 @ 8:25pm

Re: Re:
The quote is "To build a stronger, more resilient and more private internet, we need to decentralize the web by putting users back in control of their data. … Our team has been working for several years to make this vision of a decentralized web a reality by changing the way that apps, developers, and ordinary internet users make and share data."

And you're right, they never claim IPFS achieves or even attempts to achieve that goal. Seems misleading for them to repeatedly talk up such goals in the context of IPFS, without saying which ones IPFS does or does not try to achieve.
[ link to this | view in thread ]
Anonymous Coward, 5 May 2020 @ 8:45pm

Re:

you'll pay some data center to keep it around for you. How is this different from today?

If that server goes down, your computer may automatically be able to locate the data elsewhere (on another server, or on a peer).

I wonder why they say "your" data. Does anything stop me (or, say, archive.org) from "pinning" someone else's published data?
[ link to this | view in thread ]
frank87 (profile), 5 May 2020 @ 11:32pm

Re: Re: Re:
That's private in contrast to corporate. Not private as in privacy.
[ link to this | view in thread ]
frank87 (profile), 5 May 2020 @ 11:37pm

Just like Bittorrent
The experience with distributed data storage is that links get stale. You're lucky if lasts years Bittorrent still had any seeders.
[ link to this | view in thread ]
PaulT (profile), 6 May 2020 @ 1:09am

Re:
"If this is more private, how exactly is changing the location of my information from "Google's servers" to "an arbitrary number of stranger's devices" better?"

I think you're looking at the wrong definition of "private". The idea is not make things private in terms of who can access the data, the idea is to move the data from corporate control to control by private entities.
[ link to this | view in thread ]
PaulT (profile), 6 May 2020 @ 1:13am

Re: Just like Bittorrent
I lot of that has to do with the way people access torrents. Most people are leechers, who grab the file then disconnect when they get what they come for. There's no incentive to stay seeding, and in fact if they're accessing illegal content there's an incentive to seed for as short a time as possible. You see different ratios when accessing legal content such as Linux ISOs.

However, if the point is not to access individual files, but rather to provide overall data storage for legal purposes, there's no incentive to leech and so people would agree to leave things connected.
[ link to this | view in thread ]
Molly, 6 May 2020 @ 1:53am

Re: A number of things wrong with this
You've got some serious blinders on about the problems the web is facing today. "The only fragility in the web lies in DNS" is sadly far from true - just look at all the regularly broken links as data shifts location, all the tools that become unusable once an owner pushes an update, our inability to track and version changes to data we care about, and the centralized nature of most apps and tools we use. These are serious flaws and brittleness. Using content-addressed data, and using peer-to-peer networks for direct collaboration between devices without central middlemen, helps alleviate that central control and dependence. See Juan Benet talk about this more, and the underlying technology at play here: https://www.youtube.com/watch?v=2RCwZDRwk48&list=PLuhRWgmPaHtSgYLCqDwnhsQV6RxKDrkqb&index=2

I think you take "Google" at very face value (aka, the search engine) - but you neglect to account for the pervasive suite of centralized tools and services offered (Gmail, Docs, YouTube, Hangouts, Chrome), all of which store their application data (and metadata about your ads profile) on central Google servers. Having worked on tools AT Google, architecting these products to work offline and peer-to-peer is infeasible because of the gravity this centralized model exerts. Better tools can be built on a more resilient and flexible system.
[ link to this | view in thread ]
Anonymous Coward, 6 May 2020 @ 1:58am

Re: Re: Re:
You can see more about Textile Threads here (https://docs.textile.io/threads/introduction/) - which handle data encryption on IPFS.
[ link to this | view in thread ]
Scary Devil Monastery (profile), 6 May 2020 @ 2:10am

Re: I like this idea, but...
"Question, how will IP maximalists feel about this development?"

How do you think? From the pov of the incumbent gatekeepers the ideal situation is one where every book and media recording in the world is burned at six month intervals to make room for the Next Big Thing.

Old accessible entertainment and media is a competitor. It's that simple.
[ link to this | view in thread ]
Scary Devil Monastery (profile), 6 May 2020 @ 2:16am

Re: if I read this right
"Isn't it just basically a slightly more polished version of Freenet, thats been around for 20 years?"

More or less. A few bells and whistles have been added but essentially it's still the same idea.

At some point it may become convenient enough to work but until that point I wouldn't hold my breath. The current internet infrastructure isn't going to support IFPS as a scalable long-term solution until every PC user both uses IFPS and physically owns a part of the solution rendering long-term storage viable.

I can only imagine the flood of synchronization traffic trying to compensate for billions of users continually switching out their PC's and changing their hard drives...
[ link to this | view in thread ]
Scary Devil Monastery (profile), 6 May 2020 @ 2:17am

Re: How is this different from BitTorrent?
"How is this different from BitTorrent?"

A very good question.
[ link to this | view in thread ]
flyinginn (profile), 6 May 2020 @ 4:05am

Ignoring the technical feasibility issues for a moment, this reminds me of the introduction of the Dewey Decimal system for library location. Instead of describing availability by location ("American Farm Tractor" by Randy Leffingwell is on floor 3, stack 14, shelf 2) it can be located by subject (631.3.x) anywhere that has a copy. The content with IPFS could be anything but there's always the risk that it will be misfiled. It seems to be the opposite of an IP, which is content-agnostic and inherently transient.
[ link to this | view in thread ]
Anonymous Coward, 6 May 2020 @ 4:54am

Re: Being Productive in the Lockdown with Talento.
Sorry, this is not your private advertising platform.
[ link to this | view in thread ]
Anonymous Coward, 6 May 2020 @ 7:51am

Don't Mistake Foolishness for Fearlessness
"...bold prediction..."

As someone who's been contributing bandwidth and storage volume to Freenet for over a decade, I would read "bold" as "silly in this context."
[ link to this | view in thread ]
Anonymous Coward, 6 May 2020 @ 8:13am

Re: Re: A number of things wrong with this

all the regularly broken links as data shifts location

Existing search engines are pretty good at picking up the new data (less so about forgetting the old). While IPFS might present an alternate means of locating that moved data it's not likely much faster than existing technology. This is a minor improvement, at best.

all the tools that become unusable once an owner pushes an update

This is a problem, for sure, but it's not a problem with the internet, it's a problem with product development. IPFS won't do a damn thing to resolve this. And thanks to greed, nothing else likely will either.

our inability to track and version changes to data we care about

This is antithetical to data retention by the owner and therefore not in line with the article author's goals. However, absolutely nothing is stopping you from doing this right now. There is no "inability" here.

the centralized nature of most apps and tools we use

This is just point #2 repeated in different words.

Gmail, Docs and YouTube are all solutions the public wanted as replacements for equivalent tools that are slightly more cumbersome to manage. For example, I still use a non-gmail client and manage my own email servers. I also use LibreOffice instead of Docs despite the crappy UX. I don't really care about the videos (which could easily be hosted on a home server instead of Google's but people would rather try to make some cash from Google's ads).

You're not describing problems with the internet. The article offers an architecture-level solution to a problem that doesn't exist and all of the arguments in defense of this idea talk about high-level issues that have nothing to do with the internet itself, only with it's users. And many or most of those arguments are just wrong. Good luck changing the users. IPFS is certainly not the solution to that problem.
[ link to this | view in thread ]
JJ, 6 May 2020 @ 8:19am

Worse privacy and security than the regular web?
The IPFS website is paragraph after paragraph about how cool and wonderful and awesome IPFS is, and also has lots of low-level technical info, but makes very little attempt to answer questions that informed readers will almost certainly have: How does this system compare to other similar systems? Why is it likely to succeed where others have failed? How does it plan to deal with the threats that are likely to arise?

I've found that when a project isn't interested in answering those questions clearly, thoughtfully, and openly, it's because it doesn't have good answers. Frank, critical discussion of IPFS's pros, cons, threats, and solutions needs to be front and center in their communication.

It looks like from a privacy and security point of view, IPFS is much like bittorrent, which means that any peer can see what data you're accessing, and also what data you're "seeding" or hosting. It becomes trivial for governments and corporations to spy on who is accessing and hosting what content. Since IPFS only works if users agree to publicly host and share content, this means that anything controversial will be dangerous to access, and even more dangerous to provide access to.

IPFS does have room for adding tor-like security layers on top of it, but it seems like that would destroy most of the benefits of using the system.

Worse still, the system does not seem to attempt to guarantee a certain level of redundancy or availability for any particular data. Every user hosts only the data they choose, for only as long as they choose. (Compare this to existing distributed blockchain-based file storage systems, which are designed to guarantee file availability.)

So, the content that is most likely to be preserved is the content that is most popular and least controversial - i.e. content that is pretty likely to be preserved anyway, and very likely the content that is least important to preserve.
[ link to this | view in thread ]
Anonymous Coward, 6 May 2020 @ 8:51am

Re: Re: Re: A number of things wrong with this

Existing search engines are pretty good at picking up the new data

That does not solve the broken link problem. IPFS does, because the data is identified by a hash, and not a location. Give me the data known as.... works when the data has been moved, while give me the data at link.... fails.
[ link to this | view in thread ]
Anonymous Coward, 6 May 2020 @ 9:13am

Re: Re:
Nope, nothing stopping this. In fact, it's a great feature of IPFS. Collaborative/collective preservation of digital media. If something is important to you, you are able to archive it for others in that sense.
[ link to this | view in thread ]
Anonymous Coward, 6 May 2020 @ 9:16am

Re:
Very insightful. This is a great analogy that has been used before to describe IPFS. However, the fact is that there is no risk of misfiling, because the content address is the hash of the content itself. It is a unique fingerprint that cannot feasibly be produced from any other bit of digital data.
[ link to this | view in thread ]
Rekrul, 6 May 2020 @ 9:17am

Re: Re: Just like Bittorrent

However, if the point is not to access individual files, but rather to provide overall data storage for legal purposes, there's no incentive to leech and so people would agree to leave things connected.

Even with 100% legal files, not everyone has the storage capacity to leave everything on their system forever. I filmed a few gameplay videos a while back, but they're sitting on an external drive that isn't connected very often. Many things I have were burned to DVDs and then deleted from my system.
[ link to this | view in thread ]
Rekrul, 6 May 2020 @ 9:25am

Edonkey2K had a system like this where you point it at a directory and it shares everything in there, with files being accessed by their hash and not just their name. Availability of files was spotty at best and has diminished over time. BitTorrent allows users to share individual files in a similar way, but any torrent more than a few months old is almost certainly dead by now.

I agree that having central points of failure for data is a bad idea, but depending on users keeping that data available for others to access isn't any better. Only the most popular files will be kept online and anything less popular or obscure will disappear quickly. People will delete some files simply because they need to free up space on their drive.

How is that any better than what we have now?
[ link to this | view in thread ]
Anonymous Coward, 6 May 2020 @ 9:28am

Re: A number of things wrong with this

Again, users are already in control of their data, at least any data that matters.

That’s just silly. I don’t think anyone is in a position to decide who’s data matters. That the whole point.
Do you really think users have more control over their data and metadata than google or Facebook? If either of them censor my content, do I have control over that?
[ link to this | view in thread ]
PaulT (profile), 6 May 2020 @ 9:37am

Re: Re: Re: Just like Bittorrent
"Even with 100% legal files, not everyone has the storage capacity to leave everything on their system forever."

Obviously. But, an intelligently distributed system would mean that people dipping in and out would not affect things too much, while most people have a large amount of storage they're not using at all, which only increases with each system they buy.

The question is really how the data to be distributed is prioritised, and the type of data to be distributed. Obviously, there's a different setup if you're talking about video vs textual data, for example.
[ link to this | view in thread ]
PaulT (profile), 6 May 2020 @ 9:38am

Re: Re: Being Productive in the Lockdown with Talento.
That's why the flag button is there, it's unlikely that a reply to that comment would be read.
[ link to this | view in thread ]
Anonymous Coward, 6 May 2020 @ 9:39am

Re: Re: A number of things wrong with this

If either of them censor my content, do I have control over that?

No but you should still have the content and can post it elsewhere. They are not publishers and do not demand control over your content, they just have the right to not show it on their site.

Why do you keep on demanding that they display your posters in their window?
[ link to this | view in thread ]
Rocky, 6 May 2020 @ 9:54am

Re: Worse privacy and security than the regular web?

Worse still, the system does not seem to attempt to guarantee a certain level of redundancy or availability for any particular data. Every user hosts only the data they choose, for only as long as they choose. (Compare this to existing distributed blockchain-based file storage systems, which are designed to guarantee file availability.)

We already have CDN's today, it's would be quite easy for them to add in IPFS which would definitely alleviate any redundancy problems.
[ link to this | view in thread ]
Anonymous Coward, 6 May 2020 @ 10:07am

Re: Re: Re: Re:

That's private in contrast to corporate. Not private as in privacy.

Since when are there degrees of "private" in that sense? Also, the word "privacy" does appear explicitly:
"Today’s internet ecosystem … [is] incompatible with data privacy"
[ link to this | view in thread ]
Anonymous Coward, 6 May 2020 @ 10:13am

Re: Re: Re: Re:
Encryption is only part of privacy. Techdirt.com uses HTTPS, but people can still see I'm accessing the site. Given all the embedded media, they might get a good idea of which stories I'm reading. Timing might reveal which comments are mine.

Perhaps I'm just dense, but I don't understand from that page how Techdirt could publish stories and accept comments while protecting the privacy of readers and commenters. There's mention of relaying, but not necessarily onion-relaying.
[ link to this | view in thread ]
Anonymous Coward, 6 May 2020 @ 11:07am

Re: Re: Re: Re: A number of things wrong with this

That does not solve the broken link problem. IPFS does

If someone is still sharing the data. Who's never run into an unseeded torrent?

because the data is identified by a hash, and not a location

Will people be willing to publish under this model? Companies love to surround their pages with navigation bars, related headlines, and other shit not part of the content being requested. Not to mention post-publication editing, which might even be required for libel-related reasons.
[ link to this | view in thread ]
Anonymous Coward, 6 May 2020 @ 11:15am

Re: Re: if I read this right

The current internet infrastructure isn't going to support IFPS as a scalable long-term solution until every PC user both uses IFPS and physically owns a part of the solution rendering long-term storage viable.

This isn't really an infrastructure problem. I'm sure this could work with 10% of people, maybe much less, running a server—to the extent it works at all. As you hint, bad algorithms are a still-unsolved problem. Lots of people have come up with stuff that works in labs or in small groups of dedicated users.

It's worth keeping in mind that this doesn't have to be perfect, it only has to be better than what exists now, which is many ways isn't great. Things are reliable for popular and recent content, whereas the success rate for accessing years-old links is abysmal. Freenet was mostly worse, long ago when I tried. BitTorrent is better for bandwidth efficiency and speed, with recent content; but it's worse for privacy, and stuff disappears faster than with the web.
[ link to this | view in thread ]
Anonymous Coward, 6 May 2020 @ 12:20pm

Re: Re: Re: Re: Just like Bittorrent

But, an intelligently distributed system would mean that people dipping in and out would not affect things too much

Part of that might be noticing when this rarely-connected drive gets connected, and sharing files during that time. Personally, a lot of the data I get from torrents is still around—just not in its original location or directory layout; and maybe some files are missing, which means I can't seed others files that were sharing blocks with it.
[ link to this | view in thread ]
Anonymous Coward, 6 May 2020 @ 12:22pm

Re: Re:

This is a great analogy

I like it, but I do have to wonder whether people these days are more likely to know about the Dewey system or cryptographic hashes.
[ link to this | view in thread ]
Anonymous Coward, 6 May 2020 @ 12:26pm

Re:

How is that any better than what we have now?

As noted elsewhere, it would make archiving easier.

Content disappears today when whoever originally posted it stops caring. Maybe archive.org has a copy, or maybe they have a copy of something totally different that once used that URL. With IPFS, they or anyone else could run the mirror; people will be able to find it automatically and know they're getting the correct content. If this works.
[ link to this | view in thread ]
Anonymous Coward, 6 May 2020 @ 12:59pm

Re: Re:
Corporations are private entities.
[ link to this | view in thread ]
Anonymous Coward, 6 May 2020 @ 2:22pm

Re: Re: Re:
Wrong, corporations are public entities owned by the public, that is shareholders.
[ link to this | view in thread ]
Anonymous Coward, 6 May 2020 @ 2:25pm

Re: Re: Re: Re: Re: A number of things wrong with this
Publishing under this model, and initial searching will be no different that today. You search etc. and find an article, and use cut and paste to create a link, be that a URL or a hash.
[ link to this | view in thread ]
Ehud Gavron (profile), 6 May 2020 @ 3:08pm

Interplanetary Pot Filled Smoke
Great undefined buzzwords that all feel good in a long time -- can't beat that.

Interplanetary Pot Filled Smoke sounds great. Hope it works out well for
- the people who give you money
- the money you take from them
Maybe... just maybe... if you're REALLY serious about a content based search that has nothing to do with "servers" (oh my aching gasping laughing chest) you should hire some people who are experts at this ... oh wait... you did? They told you this wasn't viable? Oh, sorry.

/backs out of room slowly.

E
[ link to this | view in thread ]
Anonymous Coward, 6 May 2020 @ 5:22pm

Re: Re: Re: Re: Re: Re: A number of things wrong with this

Publishing under this model, and initial searching will be no different that today.

If you follow a link to a document identified by hash, it's not going to be able to show you the site's top 10 articles as of that moment or whatever else they want to push. It will be more like a newspaper, where everyone sees the same thing. If there's an ad, it'll be the same for everyone, unless they're using some hybrid model that pulls that via a non-hash protocol.
[ link to this | view in thread ]
Anonymous Coward, 6 May 2020 @ 5:25pm

Re: Re: Re: A number of things wrong with this

Why do you keep on demanding that they display your posters in their window?

Where did you see a demand? Things like IPFS will avoid the need for that. They are the "elsewhere" you mentioned.
[ link to this | view in thread ]
Ehud Gavron (profile), 6 May 2020 @ 6:02pm

Re: Re: Re: Re:

Wrong, corporations are public entities owned by the public, that is shareholders.

Please don't be rude while displaying arrogance and a lack of understanding.

Corporations are entities which allow some portion to be owned by non-qualified public investors. They are not "public" in the sense that they are owned by "the public" such as a utility.

Shareholders -- depending on the stock -- may have zero voting, zero dividends, and zero rights -- except, of course to receive voluminous writings and to resell their shares, hopefully for more than they paid for them.

Don't confuse a corporation with shares on the market which we typically do call "a public company" with "a public entity owned by the public" -- which does not exist.

Try starting your answer with something you wouldn't use on your wife, kids or friends. "WRONG!!! BZZT!!!" isn't a good introduction.

E
[ link to this | view in thread ]
Anonymous Coward, 6 May 2020 @ 11:11pm

Re: Re: Re: Re: Re:
There are different types of corporations. Some are private, some are public, some are nontraditional in the US and arranged around a charter rather than a different form of regulatory filing.

It is a complex and boring area of domestic and international law.
[ link to this | view in thread ]
Scary Devil Monastery (profile), 7 May 2020 @ 12:56am

Re: Re: Re: if I read this right
"I'm sure this could work with 10% of people, maybe much less, running a server—to the extent it works at all. As you hint, bad algorithms are a still-unsolved problem. Lots of people have come up with stuff that works in labs or in small groups of dedicated users."

10% is a lot unless you've got the sort of ubiquitous penetration of the online population that, say, microsoft has.

For this sort of decentralized private network to scale well you need a lot of changes in how the average online users participates in the general network. IFPS needs to be piggybacked on top of multiple widespread applications used by everyone, for instance. Historically that's never worked well until MS managed to integrate xbox live with windows 10.

Secondly you need sufficient amounts of people to donate significant amounts of hard drive space - and there we hit the freenet issue where the solution simply will not scale. You don't need the storage space sufficient to store all data on the internet - you need multiple times that storage space to ensure redundancy with individual users filling the role of hard drives in a RAID 6+ array.

"It's worth keeping in mind that this doesn't have to be perfect, it only has to be better than what exists now, which is many ways isn't great."

The question being whether we even can realistically shoot for something significantly better until every netizen has become, effectively, a server park all their own. Sure, storage has dropped in price a lot and most people have more capacity than they use...but not enough to make any sort of difference.

Until that is significantly changed IFPS is simply a new type of freenet - useful for filesharers perhaps, and for the top layer of media considered "interesting" or new. But not a backup option for the net as a whole.

As you implied, scaling is what usually buries initiatives like these. In theory it's genius. In practice the world still isn't ready.
[ link to this | view in thread ]
Scary Devil Monastery (profile), 7 May 2020 @ 1:00am

Re: Re: Re: Re: Just like Bittorrent
"...But, an intelligently distributed system would mean that people dipping in and out would not affect things too much..."

Yeah, but as I usually say, although the concept of turning every netizen's PC into a hard drive node for a RAID 6+ array that still means you need enough storage space to store the entirety of the internet multiple times in order for that redundancy to exist.

And you can guess what happens to network bandwidth when the synchronization efforts alone starts the equivalent of DDoSing every router and server on the network.
[ link to this | view in thread ]
PaulT (profile), 7 May 2020 @ 1:35am

Re: Re: Re: Re: Re: Just like Bittorrent
"still means you need enough storage space to store the entirety of the internet multiple times in order for that redundancy to exist"

If there's zero planning put into what is stored and how, sure.

"And you can guess what happens to network bandwidth when the synchronization efforts alone starts the equivalent of DDoSing every router and server on the network"

So, it would be designed not to do that in order to avoid such an obvious problem?

Your objections seem to rest on the idea that nobody planning the system has any idea how anything works in reality.
[ link to this | view in thread ]
Scary Devil Monastery (profile), 7 May 2020 @ 7:17am

Re: Re: Re: Re: Re: Re: Just like Bittorrent
"If there's zero planning put into what is stored and how, sure."

Well, yes, but it's pretty much given that there will be zero planning unless you manage to somehow curate the content. At this point I'll just refer you to a variant of Mike's arguments as to why you can't moderate at scale. By the time we have the tech enabling us to collate and sort all the data online we'll have the tech needed to auto-moderate all of the internet as well. And, I suspect, also the tech needed for real-life star trek replicators given how far off in sci-fi territory that is.

"So, it would be designed not to do that in order to avoid such an obvious problem?"

Then you face the unavoidable flip side of avoiding said problem - that if a set of data is taken offline that data won't be available any longer.

Look, people keep saying "distributed data" and forgetting that said data is still stored - in multiple copies, usually, in multiple locations. "Cloud-based storage" really just means "A whole buttload of server farms storing the same sets of data ten times over".
To store data offline and have it available at a moment's notice either you have it stored in a place which will always be accessible - and this rules out every private user - or you need to have it in multiple places simultaneously.

If you have it in multiple places simultaneously then every time anyone changes the "source" copy - and good luck determining which one that'll be after a while - there will be a massive wave of data transfers as the new setup mirrors itself across the network. For every file so affected.

I dunno about you but for me the idea of the entire damn internet synchronizing itself repeatedly looks like the vision of a million people simultaneously sticking etherkillers into their national network trunks.

"Your objections seem to rest on the idea that nobody planning the system has any idea how anything works in reality."

No, my objection rest on the fact that multiple private entities, corporate and individual, have pursued this issue many, MANY times over, starting from old freenet and continuing to the present day. It's a brilliant solution which has always collided with the fact that it doesn't scale well.

That doesn't mean we shouldn't keep trying and applaud any serious attempt to take this further. But it does mean we should remain aware of what is currently possible and not.

IFPS is brilliant. A worthy addition to the bittorrent protocol. And we do need more decentralization of the web.
It will, in itself, do very little for data preservation, however, simply because in the end the data still has to be stored somewhere. And if that somewhere isn't a corporate storage facility or webmotel then that somewhere must be voluntarily donated netizen hard drive storage.

And there we run into the same issue freenet encountered. It's great if you've got enough people donating storage for the data to be preserved.

It's just scale. Data compression has limits. No matter how clever you are you can't expect to hold the atlantic ocean in a few thousand drinking glasses. We're literally in LENpeg compression territory here.
[ link to this | view in thread ]
PaulT (profile), 7 May 2020 @ 7:51am

Re: Re: Re: Re: Re: Re: Re: Just like Bittorrent
"Well, yes, but it's pretty much given that there will be zero planning unless you manage to somehow curate the content."

No, there are ways to get around that. I can think of quite a few, such a hashing larger files and using that to get around trying to store full copies of every site that uses the file in question. Also, you can be choosy over which sites are preserved, you don't have to plan to store multiple copies of YouTube for this to be viable for a huge number of independent sites.

"At this point I'll just refer you to a variant of Mike's arguments as to why you can't moderate at scale."

Nothing I'm talking about has anything to do with moderation.

"Then you face the unavoidable flip side of avoiding said problem - that if a set of data is taken offline that data won't be available any longer."

No, the entire point is to decentralise storage so that doesn't happen.

"To store data offline and have it available at a moment's notice"

What does that have to do with the subject here?

"the idea of the entire damn internet synchronizing itself repeatedly"

Again, you're making a major assumption that's not on the table. Why would everything be constantly syncing, rather than periodic checks to ensure that sufficient copies are available? The latter wouldn't necessarily be more overhead than existing heartbeat and other checks that are already prevalent.
[ link to this | view in thread ]
Scary Devil Monastery (profile), 8 May 2020 @ 3:27am

Re: Re: Re: Re: Re: Re: Re: Re: Just like Bittorrent
"No, there are ways to get around that. I can think of quite a few, such a hashing larger files and using that to get around trying to store full copies of every site that uses the file in question."

Then you lose data as a result. I can think of several webpages where data considered extremely relevant to various stakeholders would be distributed between the actual plaintext in the html script, in assorted downloads linked to through the site, and in older versions of the same page, archived and accessible only through separate links.

Let's assume, for a second, that not every host holding data indexes and curates the data they present. Much as it is today, in fact. You'll still be stuck needing to store and retain absolutely everything - or face the likely issue of loss.
Sure, you can swing an axe and expect the majority of online information to be on the right side of the 80-20 divide...but that's what we currently have, with most of the data stored with the four Big Ones. I'll concur that wresting that data away from decidedly partial interests is useful but it's not a 100%, 80%, or even 60% solution. It's "better than nothing" and will remain at that state until we see a radical shift in how the average netizen handles and contributes to both their own storage and that of others.

"...you don't have to plan to store multiple copies of YouTube for this to be viable for a huge number of independent sites."

As the ContentID example shows what you WILL end up with ARE going to be multiple copies. Minor changes in a file, deliberate or not, producing a different hash. Every version of a file producing enough of a difference to not be recognized as a copy of an original by any algorithm currently in use.
It's not as if there's something like a universal internet OS ensuring that, as is the case on an individual device, every new version of a file has a unique identifier set to discern which files is a copy of another one or not. Nor, I hold, would we WANT a system like that.

"Nothing I'm talking about has anything to do with moderation."

It's the same argument - and the same logic. The ability to properly plan about storing the whole of the internet, without ending up with multiple copies of files which are, essentially, the same content with very minor variations, requires curation at massive scale. The exact same reason why moderation at scale doesn't work well - because you can't program any algorithm precisely enough without turning it into an actual person - who then needs to be 100% objective, to boot. And we're short of actual objective people to do the curation.

"No, the entire point is to decentralise storage so that doesn't happen."

Meaning multiple redundancy - which means multiple copies of everything you want stored. That's how decentralization works.
It's still the old Raid array problem where the requirement of having access to the data even if some hard drives crash or are unplugged, inevitably means you spend several times more storage on keeping redundancies.

"What does that have to do with the subject here?"

Me mangling my sentences. Should have been "to store data and have it available even when the origin is offline...". Mea Culpa.

"Why would everything be constantly syncing, rather than periodic checks to ensure that sufficient copies are available? The latter wouldn't necessarily be more overhead than existing heartbeat and other checks that are already prevalent."

Because every time any holder of a file alters anything in it - deliberately or by normal data corruption, it will generate a conflict in which version is to be considered correct and force an update - unless you want to introduce multiple file versions arising spontaneously in addition to the issue of needing copies to maintain redundancy.
It's again a question of scale. Microsoft and other companies have spent god alone knows how much money and effort developing and pushing decentralized storage solutions onto corporate and private consumers. For private consumers this tends to work fine. It's a 1:1 storage solution or just, bluntly put, a virtual extra hard drive.

For corporations, despite the overwhelming advantage of usually having a closed-ecology intranet, unique identifiers for every user, usually a known pre-defined platform and application setup, and with full administrative control over the laptops of the individual users, the solution is *messy, to say the least, generating no end of maintenance and administration issues.

And that's just trying to keep a 1:1 storage while not giving a rat's ass whether 1 or 10000 people retain copies of the same file. The obvious solution - running a shared drive - only works by assuming that every user either has restricted read-only access, or full control over read and write - at which point mayhem happens, even when the drive is shared only among a dozen people.

This is not in the end an issue of technology. It's both a technical problem of trying to conciliate two fundamental opposite logics (standardized control vs individual freedom), and a people problem (people set up, index and flag their data entirely according to their own preferences) we won't solve easily by simply nerding harder.
And every option that would make it just a little easier tends to be one we do not, under any circumstances, want.

Namely putting all the end point under centralized control. This is the sort of solution techies usually present as a joke or impossibility - and which people like Bill Barr then run with.

After seeing the various attempts at creating this wonderful decentralized environment - early filesharing clients, freenet, bittorrent, etc - and their respective attempts to accomplish full decentralization - what we've ended up with is currently what appears to work. It's not as if this is a new Holy Grail we just haven't pursued enough.

So as I said, I still think it's incredibly positive that people keep making efforts. IFPS is a fine enabler and force multiplier for online freedom already as is.

And someday in the future we may have an entirely different structure of the online environment which allows it to assist in building an internet where data "storage" is fully fluid and never tied to any individual location. But until that time- and at the minimum we need a few paradigm shifts in individually available storage and network technology and infrastructure before we are at the point where that becomes possible - it will remain an application mainly used by enthusiasts and the politically engaged.

Doesn't make it worthless, even if it should turn out that today all it might be good for is to keep long-dead torrent links alive.
[ link to this | view in thread ]
PaulT (profile), 8 May 2020 @ 7:32am

Re: Re: Re: Re: Re: Re: Re: Re: Re: Just like Bittorrent
"Then you lose data as a result."

Why would hashing data and using that to stop creating unnecessary duplication lead to a loss of data? I can understand concerns of versioning and bloat, but I don't see how this method of mitigation causes data loss.

"but that's what we currently have, with most of the data stored with the four Big Ones"

Except - and this is the entire point - the data is no longer under their control. Even if that's all that's achieved, it's still achieved one of the stated goals.

"Meaning multiple redundancy - which means multiple copies of everything you want stored"

Yes...? That is what we're discussing here - decentralised storage so that it's not dependent on a single node or provider.

"As the ContentID example shows what you WILL end up with ARE going to be multiple copies."

Yes, but you end up with far less unnecessary duplication than you would without controlling for that.

"Because every time any holder of a file alters anything in it - deliberately or by normal data corruption, it will generate a conflict in which version is to be considered correct and force an update"

...and most files aren't changed after they are created with any kind of regularity, and most will be of relatively trivial sizes. Some websites will present different challenges, but most private webpages really aren't updated all that regularly after initial publication. Versioning would be something to be considered, but you're not talking about millions of copies of the same files, or at least you shouldn't be. Someone would have to do the maths to find the optimal number, but it should be low enough that incremental changes aren't going to cause huge numbers of updates.

"The obvious solution - running a shared drive - only works by assuming that every user either has restricted read-only access, or full control over read and write - at which point mayhem happens, even when the drive is shared only among a dozen people."

That has nothing to do with the system under discussion. Everything that will be stored is already on the public internet, why would things like access restriction be required? Same with versioning - you need some control over which version is being distributed, but we're talking about distribution of the latest version, not archival backups.

"a people problem (people set up, index and flag their data entirely according to their own preferences)"

You again seem to making a lot of fundamental assumptions that have nothing to do with what I'm talking about. I'm talking about distribution of files as they are published, which would require no more user interaction than indexing currently requires. I'm thinking along the lines of rsync - sure, you might need to do some changes at some point, but by and large once the cron script is set up you shouldn't need to do anything control that from the user end unless your requirements change.

"Doesn't make it worthless, even if it should turn out that today all it might be good for is to keep long-dead torrent links alive."

There's no fundamental difference between that and what we're discussing. It's all a matter of implementation, which is not a problem unless you keep assuming that it somehow instantly has to cover the entire internet on day one and requires vastly more management than any current file management system. The only reason why torrents go dead is because people stop seeding, which shouldn't be a problem with a system that automates the availability of files. Plus, as I've mentioned before, the main reason torrents go dead is because people don't want to be caught seeding files they shouldn't be once they've leeched what they need, it's not as much of a problem with legal content unless it's severely outdated.
[ link to this | view in thread ]
Rekrul, 8 May 2020 @ 2:07pm

Re: Re: Re: Re: Just like Bittorrent

Obviously. But, an intelligently distributed system would mean that people dipping in and out would not affect things too much,

Only if the files they have are duplicated elsewhere. As I understand it, the whole system depends on people storing local copies of files. If enough people don't do that, then there aren't sufficient copies online to keep the files available. Unless of course it automatically makes a local copy of every file you access, every web site you look at, every video you view, etc. Which seems like a recipe for disaster when people start inadvertently filling up their hard drives just from daily usage, or they happen to stumble across something illegal and it gets saved to their system and then made available for the rest of the world to access.

while most people have a large amount of storage they're not using at all, which only increases with each system they buy.

Except that when most people buy a new system, they just chuck the old one, drives included. Or they take a sledge hammer to them, since CSI and NCIS have taught them that data can NEVER be erased from a hard drive. All of the data that they will have collected just goes in the trash and they start from scratch. Sure, they might backup photos and personal files, but most people don't even know how to do that.

The question is really how the data to be distributed is prioritised, and the type of data to be distributed. Obviously, there's a different setup if you're talking about video vs textual data, for example.

There are text files and copies of web sites that I saved on previous systems that are now sitting on external drives or burned to disc. Also, for this system to work, the data has to remain unchanged. What happens when someone edits a file to correct typos or add notes? What if they take a bunch of files on a similar topic and Zip them together for ease of organization? What if they convert them to PDF for printing? I recently discovered that Staples can't print saved web pages, everything has to be PDF, Word or RTF.
[ link to this | view in thread ]
PaulT (profile), 8 May 2020 @ 10:43pm

Re: Re: Re: Re: Re: Just like Bittorrent
"Only if the files they have are duplicated elsewhere. "

Which would be the entire point of such a distributed system, yes. Why would a system that's intended to decentralise the web have single points of failure?

"If enough people don't do that, then there aren't sufficient copies online to keep the files available"

Yes, which is why there would be impetus to both reduce the amount of storage required and to encourage sharing.

"Which seems like a recipe for disaster when people start inadvertently filling up their hard drives just from daily usage, or they happen to stumble across something illegal and it gets saved to their system and then made available for the rest of the world to access."

It's only a recipe for disaster if it's badly managed, and people start getting prosecuted for what's essentially a large cache for sites they've never visited. Even so, I'd suggest the first form of the system would be concentrated on things like text and smaller images and expand from there once the proof of concept has been done.

"Except that when most people buy a new system, they just chuck the old one, drives included."

Yes they do.. and they buy them so infrequently that the new system has a massive amount of storage compared to the old one. Apart from a slight bump as people move from SATA to SSD, capacities are going up constantly and exponentially.

"All of the data that they will have collected just goes in the trash and they start from scratch"

Yes... and that doesn't matter to a truly distributed system since you just create another copy from another source. It would be like replacing a drive in a RAID array. The data remains available even as you have to destroy one of the original drives. You're only in trouble if every node goes away, but a properly designed system should make that vanishingly unlikely to happen.

"What if they take a bunch of files on a similar topic and Zip them together for ease of organization? "

Are those files served on the public web? If not, they're irrelevant to the aim of what we're talking about. If so, they would fit within the realms of what we're discussing. I'm sure it could be accounted for, however. There are standard Linux tools that allow you to view the contents of compressed files without opening them, so you could conceivably omit these files if the originals are available elsewhere. Password protected files are a different story, but at that point it's bad management by the site owner, and there's no accounting for that no matter who's serving the data.

"I recently discovered that Staples can't print saved web pages, everything has to be PDF, Word or RTF."

Probably because web pages don't always format naturally to the printed page and wastes a lot of paper if printed as-is. But, again, that's irrelevant. If you want to print it, you create the PDF. The fact that the page you're looking at was created using a different type of storage should be irrelevant to your ability to do things in your browser.
[ link to this | view in thread ]
Hankon, 10 May 2020 @ 3:41am

if you're looking for cybersecurity, then join Utopia. This is a reliable and secure browser where everything remains secret!
[ link to this | view in thread ]
Scary Devil Monastery (profile), 11 May 2020 @ 6:42am

Re: Re: Re: Re: Re: Re: Re: Re: Re: Re: Just like Bittorrent
"Why would hashing data and using that to stop creating unnecessary duplication lead to a loss of data?"

Umm...let's go back a bit and try this again; Hashing the data only means you've got a checksum which matches one given file. It will identify that file and that file only.

I've got a file. It's a media file called "Ten hours of Paint drying" - or THOPD. It's an old DivX file at 20 GB, 4k resolution. It provides the checksum of "X".
My friend streamrips this file but thinks 720p res is good enough, saving a lot of space. The new checksum is "x".
I see this happening and think that OK he's got a point, the file's a bit big so I recode it in 2k res and in .264. New checksum is "Xx".

Parallel to this development some other dude films the very same spot of drying paint and encodes it in 4k, mp4. Checksum of that one, "Y". Rinse and repeat.

Now you have about fifteen different hashes which ALL refer to the exact same content, in various types of encoding and detail. This does not save space. It only means IFPS runs into the exact same phenomenon as can be observed on normal torrent index pages where you search for a specific file by name and come up with 58 versions, all with a unique torrent pointing their way.

So you invariably end up with the problem of either somehow having to curate the hashed content, OR preserving ALL of it. Any measure aimed at curation will fail to some degree and so there'll be loss of data. Preserving all of it means storing effectively identical copies of more or less everything. Even something as simple as a text file which can be coded in .cbr, .pdf, .odt...etc et ad nauseam.

"Except - and this is the entire point - the data is no longer under their control. "

Isn't it? Who owns the storage space? Because one thing I can say right now - it's not "decentralized" if all of it is still in the same drive racks in some Redmond server park.
And the storage need isn't covered by private individuals either.

"Yes, but you end up with far less unnecessary duplication than you would without controlling for that."

Torrent indexers use the exact same method and they still end up with a dozen pages of checksum hashes all pointing to separately encoded versions of one and the same work.

"Everything that will be stored is already on the public internet, why would things like access restriction be required?"

Case example - storage of file X. Multiple copies created to maintain redundancy. The origin data repository where X resides drops out of the web. For reasons (deliberate or not) several of those copies are altered enough to render the current checksum inoperable.
- Register the altered versions as new uniques?
- Try to synchronize the changes across all stored copies?
- Restore the altered versions to whatever is stored under the original checksum?
Either option opens a whole new can of worms and headaches. The simple solution is that anything online is to be stored in perpetuity and that altered data is to be considered new data. But you can imagine, i hope, how THAT affects storage needs.

"Same with versioning - you need some control over which version is being distributed, but we're talking about distribution of the latest version, not archival backups."

Pretty sure this debate derailed completely from the OP by now since that illustrated the use of IFPS primarily for archiving purposes. How would you go about curating the latest versioning when you do not, in fact, have any common denominator telling you that hashsums X and x are one and the same because it's really just a 4k DivX repackaged as a 2K mkv?

The Bittorrent approach is that it retains simplicity and brute-forces the issue by simply assuming that every checksum is a unique identifier. Hence why you can end up with 58 "versions" of a file which, decoded, all turn out to be identical but until then are considered 58 unique files by every protocol involved.

"I'm talking about distribution of files as they are published..."

And that is where the "people problem" shows up which leads us right back to my initial assertion;

"turning every netizen's PC into a hard drive node for a RAID 6+ array that still means you need enough storage space to store the entirety of the internet multiple times in order for that redundancy to exist."

To which you replied;

"If there's zero planning put into what is stored and how, sure."

I think we're on the same track, then, because distributing files as they are published does mean we get the "zero planning" bit because no way in hell do we find a way to curate which data set is unique and which is not. Leading to versioning across multiple users AND multiplied by the necessary redundancy.

And my main doubt here is whether we can manage to make that work at scale when the only available actual storage we have is voluntary donations by individual users.

"...assuming that it somehow instantly has to cover the entire internet on day one and requires vastly more management than any current file management system."

Most current decentralized file management systems are maintenance-heavy even when they only cover a single corporation in their network, and DO rely on using intermediate server storage solutions. I'm not sure "current" systems are good ballpark comparison points, honestly.

For the solution discussed here we are cutting away all of the maintenance, the intermediate storage solutions, and a lot of the infrastructure enabling peer-peer recognition when it comes to synchronization. DHT is admittedly proof of concept that maintenance and synchronization can become non-issues.

But proper indexing and storage? That's going to be tricky. We don't have the divination magic required for the first to become something even remotely searchable any better than Tribler's attempts at rendering DHT the backbone of indexing...
...and the storage issue requires, imho, a radical shift in general netizen purchasing behavior. Not everyone wants to or can let a separate hard drive to do their part in archiving the internet.
[ link to this | view in thread ]
Lilly, 9 Jun 2020 @ 4:12am

Yes! I absolutely agree with that. I was surprised when I found out that there is an Internet that is different from ours.https://utopia.fans/blog/centralized-vs-decentralized-network-whats-better-for-your-security Since then, I want to know more about decentralized Internet. This is a very interesting opportunity to increase the security of your data and to ensure complete anonymity on the Internet.
[ link to this | view in thread ]
Ehud Gavron (profile), 9 Jun 2020 @ 7:05am

Utopia spam
The spam is just getting worse.

There is no such thing as a "centralized Internet", which necessarily means no such thing as a "decentralized Internet".

This is not "a very interesting opportunity to increase the security of your data" or anything.

There is nothing that will "ensure complete anonymity on the Internet".

E
[ link to this | view in thread ]
Anonymous Coward, 9 Jun 2020 @ 7:47am

Re: Utopia spam
Of course it's getting worse if you're specifically trawling for month old articles to whine about.

"There is no such thing as a "centralized Internet"

It's rare but wait till there's a region outage on AWS or a major Microsoft service outage and see how many people complain about all their services being down. That's not how it's intended but in practise it's more centralized that you'd wish to believe
[ link to this | view in thread ]
Ehud Gavron (profile), 9 Jun 2020 @ 11:24am

AWS or Microsoft is no "the internet"
The Internet is not AWS and it's not Google and it's not Microsoft and it's not DNS and it's not any one anything. It's a network of networks, and there's nothing centralized about it. Pretending otherwise has nothing to do with "what []I'd] wish to believe" nor reality.

KTB
[ link to this | view in thread ]
PaulT (profile), 9 Jun 2020 @ 11:19pm

Re: AWS or Microsoft is no "the internet"
"The Internet is not AWS"

No, but a very large amount of its content depends on AWS.

"It's a network of networks, and there's nothing centralized about it"

Wrong. While the whole thing is not, and should not, be centralised, there are large numbers of centralised networks and bottlenecks. The amount of stuff hosted on AWS does not mean that it's completely centralised of course, but when a single provider outage can affect thousands, if not millions of websites, then it's not truly decentralsied either.
[ link to this | view in thread ]
Ehud Gavron (profile), 10 Jun 2020 @ 2:38am

Me about the Internet: "It's a network of networks, and there's nothing centralized about it"

Idiot: Wrong

Idiot continues talking about AWS.

As I said before, the Internet is an interconnection of networks. AWS, Google Cloud Services, Microsoft Azure IS NOT THE INTERNET and NEVER WILL BE. They are services ON the Internet.

The Internet is not centralized. It never was, even in the DARPA days, and hopefully never will.

Your desire to pretend that YOUR reliance on some service requires that service be up is for you to deal with your choice of vendor. It has nothing to do with the Internet, isn't centralized, and for practical purposes has nothing do with one company's geolocated servers.

Just admit it. The Internet isn't centralized. The services you pretend ARE the Internet (e.g. "AWS, Microsoft") are neither the Internet nor are centralized... but may be under the control of one global company (Amazon, Google, Microsoft, Facebook). Nobody but a European fool thinks "that's the Internet".

To sum up: As I said before, the Internet is not centralized. I'm done attempting to discuss with those who don't care, don't listen, and continue to repeat their claim. Arguing with a pig is stupid -- it wastes your time and annoys the pig.

Good morning, piggie.

E
[ link to this | view in thread ]
PaulT (profile), 10 Jun 2020 @ 2:52am

Re:
Well, calling me names doesn't help discuss the actual argument I was making, but it does confirm that you are an idiot who completely missed the subject of the conversation you dived into a month after it started.

"The Internet isn't centralized"

As a whole, no. However, a huge amount of it is now dependent on a handful of corporations. This is a problem. Are you denying this, or are you just whining about semantics?

"Nobody but a European fool thinks "that's the Internet"."

Now, you're just being a bigoted twat as well. Nationality has nothing to do with the subject being discussed, except you chose to whine about old news to make yourself feel superior to the words actually being said.

"Arguing with a pig is stupid -- it wastes your time and annoys the pig."

I refer to analogy about laying chess with a pigeon. It doesn't matter how clever or insightful the moves are, the pigeon will just knock over the pieces and think its won.

Enjoy your fake victory. I'll be here to discuss the old conversation you dived into if you can be bothered to not be dick while doing so. I suggest starting with the actual points being made, not the ones you imagined were being made.
[ link to this | view in thread ]
Ehud Gavron (profile), 10 Jun 2020 @ 3:29am

Me about the Internet: "It's a network of networks, and there's nothing centralized about it"

Idiot: Wrong

Idiot continues talking about AWS.

Idiot adds:

Well, calling me names doesn't help discuss the actual argument I was making

Right. You were making an idiotic argument that AWS/Google/Microsoft/Facebook is "the centralized Internet".

It's idiotic, indefensible, and you should stop there and repent.

Instead you should probably look up "it's" vs "its" and "dived" vs "dove" and reflect that when you're trying to appear intellectually superior (oh, SORRY, "superioUr") you should at least appear literate.

Best small-side-of-the-pond-wishes to you,

E
BTW The bible calls them doves... the symbols of peace, ass.
[ link to this | view in thread ]
PaulT (profile), 10 Jun 2020 @ 3:35am

Re:
"Right. You were making an idiotic argument that AWS/Google/Microsoft/Facebook is "the centralized Internet"."

That was not the argument I, nor the article that got your panties in such a twist, were making.

Sorry you decided to be a twat instead of addressing the actual one.
[ link to this | view in thread ]
PaulT (profile), 10 Jun 2020 @ 3:59am

Re:
Just as an example, btw:

https://www.theregister.com/2017/03/01/aws_s3_outage/

Now, look at this paragraph:

This is by no means an exhaustive list of things that fell over or were wobbly today, due to the S3 downtime, but here's a start: Docker's Registry Hub, Trello, Travis CI, GitHub and GitLab, Quora, Medium, Signal, Slack, Imgur, Twitch.tv, Razer, heaps of publications that stored images and other media in S3, Adobe's cloud, Zendesk, Heroku, Coursera, Bitbucket, Autodesk's cloud, Twilio, Mailchimp, Citrix, Expedia, Flipboard, and Yahoo! Mail (which you probably shouldn't be using anyway). Readers also reported that Zoom.us and some Salesforce.com services were having problems, as were Xero, SiriusXM, and Strava.

Note how many things on that list are not merely websites, but tools, platforms and communication infrastructure used by thousands, if not millions of other companies. It's like a who's who of every service used by startups, as well as many other larger entities.

Call it what you want, but the web in the modern sense sure as fuck is not decentralised, even if you think that centralised is the wrong way to refer to it.
[ link to this | view in thread ]
Ehud Gavron (profile), 10 Jun 2020 @ 4:30am

Re: Re:
You really should reach up there and unbunch your panties. Oh, sorry, knickers.

E
[ link to this | view in thread ]
PaulT (profile), 10 Jun 2020 @ 4:37am

Re: Re: Re:
So, no response to the very real issue of thousands, if not millions of sites across the web being impacted by an outage at a single provider who they might not directly use, all because you're fixated on the use of a single word?

I am glad a lot of those actually in charge of the way the web works aren't as dumb or petty as you, but that doesn't change the fundamental issue being discussed.
[ link to this | view in thread ]
Ehud Gavron (profile), 10 Jun 2020 @ 5:21am

Paul once again opened his talky-talky without engaging his thinky-thinky and said:

I am glad a lot of those actually in charge of the way the web works aren't as dumb or petty as you, but that doesn't change the fundamental issue being discussed.

No, the "Internet" is not "the way the web works." The Internet is not centralized. Neither is "the web."

Call me dumb -- that isn't the word you mean. Dumb means incapable of speech.

Call me petty -- that's just an ad hominem.

It doesn't change the fact that I've founded, participated, and sold Internet companies since before you even got on the net.

But hey, knock yourself out.

Good night,

E
P.S. I note you responded to nothing I wrote, provided no links, didn't fix your grammar, etc. That's telling! Don't call anyone "dumb" again. It's wrong and it's insulting to people who can't talk.
[ link to this | view in thread ]
PaulT (profile), 11 Jun 2020 @ 12:19am

Re:
"I note you responded to nothing I wrote"

Says the guy who ignored the very clear explanation of why you were wrong, instead responding with whining about the words used.

My fault for trying to honestly debate with a guy trawling for old articles he can use to whine about a word used, I guess.

"Dumb means incapable of speech"

Words have multiple definitions. You should read a dictionary some time before using semantic arguments about words as your only issue.
[ link to this | view in thread ]
PaulT (profile), 11 Jun 2020 @ 12:20am

Re:
dumb[ duhm ]
adjective, dumb·er, dumb·est.
lacking intelligence or good judgment; stupid; dull-witted.

Yeah, it fits.
[ link to this | view in thread ]
Ehud Gavron (profile), 11 Jun 2020 @ 2:49am

lacking intelligence or good judgment; stupid; dull-witted.

Yeah, it fits.

I suppose when you go to the Paul T Dictionary of Stupid Definitions you can make it fit. Glad you found yourself.

In the real world the definition is
adjective
adjective: dumb; comparative adjective: dumber; superlative adjective: dumbest
1. temporarily unable or unwilling to speak. "they stood dumb while the attacker poured out a stream of abuse"
https://www.google.com/search?q=definition+of+dumb

You're an argumentative person who insists on trying to be right, even when asked many times to provide sources. You don't provide sources because there aren't any.

Stop making up this stupid silly nonsense and go back to your ditch-digging. You're doing a wonderful job here so I'm guessing that's what you do for a living.

Also next time you want to play footloose with a dictionary, use a real one, not whatever you made up from EuroDictionary™.

E
[ link to this | view in thread ]
PaulT (profile), 11 Jun 2020 @ 5:16am

Re:
"1. temporarily unable or unwilling to speak. "they stood dumb while the attacker poured out a stream of abuse""

Oh for fucks sake..

You see that 1. next to the definition you picked? That indicates there are multiple definitions. As in, you are lying if you pretend that only one exists.

Are you actually this stupid?

No wonder you lost your shit over the use of a single word in a month old thread when the usage of a dictionary is beyond your means.
[ link to this | view in thread ]
Anonymous Coward, 11 Jun 2020 @ 4:07pm

Re: Re:
Like I said.

The Internet isn't centralized. I provided links.

Amazon/Google/Facebook are a small part of stuff that uses the Internet. They're not centralized either.

You're incapable of defining "dumb" to be "stupid" without making it up.

Give up and go back to digging ditches. This one is getting pretty deep.

Like I explained to you, Junior, you know nothing about centralization of the Internet. There are two key rules here:
- he who asserts must prove. I've provided links. You - none.
- he who opines without knowledge is an empty kettle making a lot of noise. - That's you.
Have a great night!

E
[ link to this | view in thread ]
flyinginn (profile), 17 Nov 2020 @ 6:01pm

Re: Re: Dewey Decimal System
The hash ensures uniqueness, as does the analogous legacy library index. But the misfile is physical. If the hash-identified copy is no longer online, it's gone. So redundancy arises, and with it the need for oceanic tides of data synchronisation, which in turn demand rate-limiting data 'bulkheads'.
[ link to this | view in thread ]