The Data Universe™ is a large-scale, distributed, content-addressable, secure data storage and resource sharing system.

"The Universe abhors
Free Disk Space,
Unused Bandwidth, and
Idle Processors."

-- Brian McMillin

"...All those moments will be lost in time like tears in the rain..."

-- Roy Batty, Blade Runner

"The only files you don't have dozens of copies of are the important ones."

-- Brian McMillin

"The adage 'If it ain't broke, don't fix it.' is a recipe for mediocrity."

-- Brian McMillin

"Anyone who was concerned about Y2K must have been operating under the delusion that this high-tech stuff was actually working in the first place."

-- Brian McMillin

What does all that mean?

Large-Scale Conceptual memory address space is greater than 128 bits. If every person on earth shot uncompressed video for 1000 years it would not use a meaningful fraction of this storage capability.

Distributed Every computer on the Internet is welcome to, and would benefit from, sharing resources in the Data Universe.

Content-Addressable All data is broken into manageable size blocks which are given a unique identifier based on their content. This identifier allows blocks to be copied and retrieved as needed throughout the network.

Secure Data Storage1 All data blocks automatically replicate to other computers, eliminating the need for specific backups.
Secure Data Storage2 All data blocks may retrieved by any computer in the network, eliminating the possibility that computer or network failure would leave the data inaccessible.
Corollary: "The [network will interpret] censorship as damage and route around it." -- John Gilmore
Secure Data Storage3 All data blocks may be optionally encrypted for privacy.
Secure Data Storage4 All data blocks are identified by their actual content, making it "computationally infeasible" to corrupt, revise or spoof data entered into the Universe. Subsequent versions of a file or document will have new, unique, identifiers and all versions will co-exist and be accessible within the Universe.

Resource Sharing Resources include:
(1) Disk space,
(2) Processing Distributed Queries,
(3) General Computational Capability, and
(4) Network communication bandwidth.

Occasionally Asked Questions

Is this just a concept? Yes, but... I do have a preliminary implementation that shows how most things should work. There are lots of opportunities to model the performance as it scales up, as well as implementing practical protocols for the real world.

How can anybody make any money with this? The Data Universe is an enabling technology, much like the Internet itself or the World Wide Web. Companies that design specialized hardware, high-performance software, improved user interfaces, or optimal retrieval strategies add value that customers will pay for.

This sounds a lot like a Peer-To-Peer Network? Yes. It is truly peer-to-peer and has no central servers like BitTorrent, or Master computers like Napster, Kazaa, or WinMX. It can be used to share pictures, music, video, individual files or copies of entire CDs or DVDs. There are major differences, though. All files are broken into convenient Blocks containing data or file descriptions. These blocks are scattered redundantly across the network allowing universal access and making total deletion virtually impossible. Annotations may be added in the form of new file descriptions for existing files at any time. Pieces of files can come from many hosts, and the complete file need not reside on any host.

Won't this tend to fill all available disk space? Yes. Unused disk space contributes nothing but cost. Using the disk space automatically provides redundancy, improves data availability, reduces access latency, prevents "hotspot" bottlenecks and eliminates censorship. Need to add new data blocks? Just write them to disk. Whatever blocks got overwritten can be retrieved from elsewhere in the Universe.

Is this Open Source? You betcha! The goal is to have published, peer-reviewed standards and open-source reference implementations. I don't see how you could develop reliable, trustworthy software otherwise.

Could this be used as the File System on a single computer? Yes. Making the hard drive into a Content Addressable memory is a goal that the industry is currently stumbling toward. The fact that this approach is inherently "Self Journaling" is an added bonus and means that previous versions of software, databases or documents can be retrieved at any time.

Can this be used for Streaming Video? Yes. A video feed becomes a permanent archive that can be retrieved at any time. The network will tend to optimize the bandwidth usage of each connection. Copies of video segments are automatically shared, eliminating the concept of a single server as the source of the feed for all users. Essentially, an unlimited number of users may view a feed without degrading the performance of a "hot spot" in the network.

You're kidding. Right? I did not say that you could get an unlimited number of real-time streaming video viewers. Each viewer would be generating queries and retrieving data blocks (video segments) from other users that had already received them. This implies an exponential cascade of data available to be shared. The trick is to find these blocks in a timely manner. Streaming video will tend to "run behind" by some factor. (But you probably want to skip the commercials, anyway...)

Getting all this to work seamlessly will earn extra credit for the interested individual.

