Open document standards
IBM recently lent considerable high-profile backing to ODF (the Open Document Format), announcing that it’s Workplace Managed Client, a thin-client alternative to Microsoft Office, will support ODF in version 2.6 next year. IBM’s product joins a growing swell of other products supporting ODF, including offerings from OpenOffice.org and Sun Microsystems.
Personally, I’ve had to deal with too many proprietary formats over the years — ODF offers an XML-based open standard that might solve many problems.
This could be the tipping point for a shift away from proprietary document standards, such as Microsoft Office, toward more open and interchangeable technologies. Many countries with emerging economies, such as China, Brazil and India, as well as local and state governments in the United States are starting to demand open standards. In fact, a recent highly publicized proposal in Massachusetts would require compliance with ODF — and that probably means a shift away from Microsoft Office.
ODF has been receiving a huge amount of attention industry-wide, garnering increasing support from Adobe Systems, Apple, Computer Associates, Corel, Google, Intel, Nokia, OpenOffice.org, Oracle, Red Hat and others — all of whom attended IBM and Sun’s global kickoff for ODF in early November.
Microsoft has countered with its own brand of “openness.” It’s Open Office XML format has been submitted to the International Standards Organization (ISO) and is expected to be a key part of Office 12. Even so, Microsoft is notorious for maintaining closed standards that meet its own agenda, but few others’. It seems likely that the Open Office XML initiative is little more than marketing-enabled defensive technique to fend off what may well be a better — but non-Microsoft — standard.
Having experienced Microsoft file formats personally — in fact writing a translator for Microsoft Word’s files — I’ve come to respect the complexity of the problem. The Word file format is remarkably efficient at dealing with exceptionally large documents that cannot possibly fit in memory. In fact it enables many operations on a document by manipulating the file directly, without loading a file into memory at all… it’s almost like a “document file system.” Certainly one of Word’s strengths is its robustness and ability to handle very large files — although some might argue that the very format we are discussing creates a lot of “bloat” in Microsoft documents. Unfortunately, an XML-based persistence model likely makes such file-based optimizations impossible.
Facing the realities of using XML to exclusively store files it seems unreasonable to expect all documents to be represented in pure XML. XML was not designed to be an efficient format. Restrictions imposed by XML’s intrinsic nature would make file processing operations intensely difficult — and that means transferring documents wholly into memory. Yet, universal support for long-term storage in ODF is an excellent goal — a format that is platform independent, non-proprietary and universal is desperately needed. Imagine how much more compatible our favorite word processing tools would be if we had a common, well-defined medium for communiciation.
One of the most important things the ODF standards body could do to improve adoption would be focusing on some of these usability issues. XML is not intuitively suited to optimized disk operations. Even so I’m convinced that some hard work in the area could result in an “annotated XML-based format,” in essence a mechanism for reading and working with portions of a file and supporting an annotation segment that amounts to journaling (e.g.: a “scratch area” that records operations that can be consolidated at a later time). Putting such attention into the standard would increase its viability as an active document format — making it more usable for memory intensive applications and side-stepping the issue of transient document representation (that “scratch disk” again).
It may not seem that ODF is the place to deal with manipulation of the document, but I stand by the recommendation. As a purely storage-oriented format, ODF will be relegated to “yet another import or export filter we need to support.” The successful standard will be one that inserts itself directly into the native storage model — becoming a viable format for storing documents of any size.














