Long Term Archiving

Recently, my old friends at UHY Hacker Young made the news as they celebrated the closure of a 34 year old Corporate Recovery case – there was only one person still at the firm who had been on the case when it started, and he is now the Managing Partner.

My immediate thought (which I admit says more about me than it does about anything else) was “I wonder how many different WP systems have been used on that file in it’s life?”. You can envisage how the file starts with yellowing typewritten appointment documents, and moves through early impact printers with Wordstar, WordPerfect, the first versions of Microsoft Word, and on to every iteration of Microsoft Office.

If that client file had been held in electronic files from the earliest days (just possible in 1975, I suppose) would they still be accessible?

One of the issues occupying the minds of Document Management professionals is the long term viability of electronic documents. How can you be sure that the electronic file you store today will still be readable in ten or twenty years? Will your Office 2003 document be readable by a copy of Office 2023? I just checked – Word 2007 can still open WordPerfect 5 documents, but WordStar is not on the list, let alone things like IBM DisplayWrite, Samna Word, BusiPost, MultiMate, Symphony, or the other long-extinct systems that we all used in the 1980’s.

There is a classic case study that is often referenced – The BBC Domesday Project.

In 1984, the BBC initiated an educational project to create an ‘Electronic Domesday Book’. Schools all over the country were invited to contribute to a collection of photos, articles and drawings from schoolchilden. This information was digitised and written to a 12″ laser disk that could be accessed via appropriate software on a BBC Microcomputer with a custom-built Philips laser-disk player – one of the first generation of consumer optical players. Nowadays we all just use Google Earth, but then it demonstrated a real breakthrough in what IT could do to collate data from multiple sources and make it navigable by anybody.

The project ran its course – everyone was very happy and life went on. The laser disk player, as is the way of things, was superseded by CD and DVD, and went out of production.

Some years later, it became clear that there was a real risk that the project would be lost to future generations because the analogue laser-disk format had fallen out of use, and it transpired that nobody (including Philips themselves) could actually locate a working example of the player. Every school had gradually replaced their BBC micros, chucked the players out, and moved on. Noboody had thought to preserve a player, because everyone assumed that there were lots of them floating about. There weren’t.

It took a real effort by enthusiasts to re-unite a working Domesday system, Further work was then needed to extract the information and transcode it into digital format so that it could be stored onto modern digtal media at the National Archives.

For more details on the restoration project for the BBC Domesday system, read this…

My point is – today’s ubiquitous storage format can vanish suprisingly rapidly – one day the shops are full of cassette tapes, and suddenly, they aren’t. Minidisk? DAT tapes? VHS tapes? They’re all gone or going, and they’ve all been used at some time as computer archive media.

Where your business documents are concerned, the problem is the same – I know firms that have collections of WordPerfect documents on their network still – but no working copy of WordPerfect. Luckily Microsoft Office can still read those files, but that cannot be assured for the future.

A while back, Adobe engaged in a project with the ISO to develop a file format (PDF/A) that would remain readable over very long periods. The file format is based on good old PDF, but it includes extra information to ensure that future computer systems don’t need access to ANY external resources to make the files accessible. An increasing number of PDF products now support the PDF/A format, and many Governments now mandate use of this format for their own filing. It’s still not very well-known, but support is growing, and I have started to gently encourage the use of PDF/A over ‘normal’ PDF wherever possible.

Accountants only have to worry about keeping stuff for a few decades (unless you’re Hacker Young!) but records in the Nuclear industry have a statutory 150-year life, and things like military records are retained indefinitely for historical reasons.

Just as a side point – keeping stuff on paper is NOT a panacea – ask anybody hoping to access the UK 1931 census records, or patrons of Iron Mountain’s Bromley document store in 2006. Paper cannot be easily backed-up – even an outdated backup beats a smoking ruin.

What to take from this?

There is a truism that a backup is not a backup until its been sucessfully restored. This is true not just of the physical media, but also of the format in which the data is stored. Guarding your data is fine, but equal attention must be given to preserving the means to read that data.