A Tool to Build Pandora's Box

October 8th, 2011 Permalink

Despite the benefits of openness and the difficulties inherent in restraining the passage of information, there will always be people who build their business models on secrecy and restricting access to information. The existential risk these folk introduce by doing so is that the information they curate behind their paywalls will vanish when they do - dying organizations even today usually don't see fit to free the information they produced, but rather let it evaporate. There are legal costs for larger organizations inherent in doing the right thing, and simple non-trivial costs in time and effort for smaller organizations.

So information dies, and this is a grand waste. The threat is especially noticeable in the scientific field, which hosts a grand and ongoing battle over closed journals, but just as much information in the form of code, research, documentation, and even simple human communications is lost in the everyday failures of startups and larger companies.

But this could all be avoided, or at the very least minimized in the same way that we collaborate to minimize the possibility of loss of openly published information. Consider this scenario:

  • An RFC is put forward for a standard protocol for identifying, encrypting, and publishing data. It would be a very simple thing: the end result is a block of encrypted data or an archive of blocks of encrypted data that is then made available through something perhaps akin to an RSS feed, perhaps akin to the Tor network. The data is associated with some form of URL, perhaps negotiated in a distributed way. Encryption keys are long, randomly generated in the software internals, and discarded after one use - even the publisher doesn't know what they are, and each piece of content is encrypted with a different key.
  • Much like RSS, this functionality becomes broadly incorporated into tools for everything from web publishing to editing to email. Organizations and individuals can choose to publish as much or as little as they desire of the information they generate into encryption feeds.
  • The existence of encryption feeds allows interested archiving groups to amass Pandora's Boxes of concealed information - which are in effect backup copies of everything that is published into the feeds, but only accessible in a practical way to people in the future using more advanced technology than is available now.

I'm sure you can think of half a hundred reasons as to why there would be archiving groups building Pandora's Box given the presence of encryption feeds: everything from Google's rationale for their caches to individual packrats or interested non-profits who want to help the greater goal of retaining information over the long term. Equally, I'm sure you can think of half a hundred reasons why there might evolve a social pressure for publishers to use the encryption tools: a point at which failing to set up encryption feeds becomes gauche and a sign of backwardness, and the cost of feeding Pandora's Box is the same as the cost of an RSS feed, which is to say largely negligible if you're already running an IT infrastructure.

The point of this exercise is to think of a way in which closed data warehouses and information-generating concerns can publish their content for posterity in an ongoing way at the same time as they operate paywalls and restrict access in the present. At this stage of history, publishing into Pandora's Box with long encryption keys will likely give a two decade window before decryption via quantum computing is a going concern - and possibly longer than that if encryption feeds are both ubiquitous and anonymous in the same sense as use of the Tor network is anonymous. If Pandora's Box is exceedingly large, then finding any specific piece of data as a practical concern will require further evolution in the power of computing beyond merely efficient decryption.

Building Pandora's Box is one of those interesting ideals which, much like irrigating the Sahara, is well within our present capabilities and will provide benefits to our descendants. It sits on the verge of something that might happen, might have already happened had things worked out differently, but seems to be a long way from where we stand now.

Still, the destruction of data for the sake of business models is something that we could do more to avoid. There are technological solutions to that problem, ones that are so cheap to implement for the data-hoarders and data-discarders that the cost could be balanced in their eyes against the goodwill they would generate by taking part. These projects are therefore worth considering.