Understanding the OpenDocument Format

The Open Document Format for Office Applications (OpenDocument) is the native format used for LibreOffice word processing files, spreadsheets, charts, and presentations. OpenDocument is an open format developed as a standard for office applications. OpenDocument was released by a Technical Committee (TC) at the Organization for the Advancement of Structured Information Standards (OASIS) consortium in 2005 and approved as an ISO/​IEC International Standard in 2006 as ISO/​IEC 26300:2006.

At the time OpenDocument was developed, the most popular format for office documents was the MS Office format set. At the time, the MS formats were closed and proprietary, yet these formats were so common that they became a pseudo-standard for saving, sharing, and transmitting document files. (See the box entitled “Standards and Pseudo-Standards.”) The widespread use of these proprietary file formats increased Microsoft’s power over the market, allowing them to dominate and eliminate competitors.
 

Standards and Pseudo-Standards

When software is forced to adapt its format to a dominant product that is not based on open standards, the dominant vendor can use this power position to influence the market. Often the dominant product becomes a de facto, pseudo-standard that has never been vetted or approved but is merely adopted by the market out of necessity. Any flaws in the dominant product are then echoed out to the whole industry. For instance, the 40-year-old “leap year bug,” which was inherited from VisiCalc and Lotus 1-2-3 and passed to Excel, causes the market-leading spreadsheet to incorrectly represent the year 1900 as a leap year, which can cause errors for applications that count days or measure time ranges. Developers of other office suites had to replicate the bug for compatibility. According to the vendor, the faulty state has to be maintained, even across new releases of the product, because fixing it would cause too many problems. So, users are encouraged to keep the faulty spreadsheets, because the vendor protects its commercial interests in spite of the needs of users who have paid for a software license.

 

OpenDocument evolved from the XML specification for the OpenOffice.org office suite, a forerunner to LibreOffice. The OpenOffice developers wanted a free and open office format that could compete with Microsoft’s file formats. Since then, OpenDocument has become popular throughout the world as a free alternative for storing and transmitting document data.

Legal issues, and competition from other tools, eventually caused Microsoft to bring more openness to their once-hidden formats. The Microsoft Office Open XML format (OOXML or MOX) is open enough to see and write programs with, but Microsoft continues to maintain some control over the standard.

OpenDocument, on the other hand, is independent of LibreOffice or any other software project. The open standard process means that OpenDocument is now supported by dozens of other office tools in addition to LibreOffice.

What Is It?

A document format describes the structure for data saved in a document file. The data inside a modern office doc consists of much more than letters and numbers. The document must store information on fonts, character tags, paragraph tags, and pagination, as well as headers and footers, footnotes, tables of contents, and revision tracking. The format also records information on document encryption, digital signatures, and macros. In addition, the format must be resilient, because each file will be read and edited multiple times by different applications, including applications that were not involved in creating it. At the end of the process, the file must preserve its integrity and characteristics, and – most importantly – guarantee that contents are preserved independently from the application.

OpenDocument is not a single format but a collection of formats for the various file types needed in an office productivity suite. See Table 1 for a list of OpenDocument file types and file extensions.

Table 1: OpenDocument File Types and Extensions
File Type Extension
 Text  .odt
 Spreadsheet  .odt
 Presentation  .odp
 Drawing  .odg
 Chart  .odc
 Formula  .odf
 Image  .odi
 Database  .odb

 

A Short History of OpenDocument

The first meeting to discuss the OpenDocument standard was held on December 16, 2002. OpenDocument 1.0 specifications were published on May 1, 2005 and then submitted to ISO/​IEC Joint Technical Committee 1 (JTC1) on November 16, 2005 under a Publicly Available Specification (PAS). After a six-month review period, on May 3, 2006, OpenDocument unanimously passed its six-month Draft International Standard (DIS) ballot in JTC1 (ISO/​IEC JTC1/​SC34), with broad participation, and was “approved for release as an ISO and IEC International Standard” under the name ISO/​IEC 26300:2006. OpenDocument 1.2, approved as ISO/​IEC 26300:2015 on June 17, 2015, adds accessibility features, RDF-based metadata, a spreadsheet formula description based on OpenFormula, support for digital signatures, and some features suggested by the public. OpenDocument 1.3 was approved as an OASIS committee specification on January 21, 2020, and as a full OASIS standard in June 2021. Future versions of LibreOffice will add support for its new features, such as digital signatures and OpenPGP-based encryption, plus improvements in areas such as change tracking.

Inside OpenDocument

An OpenDocument file is typically stored and distributed in a ZIP container, which includes several XML files and the associated binary content, such as images or other media. Each OpenDocument file contains four key embedded XML files: manifest.xml, the table of contents for the ZIP container; meta.xml, with document-level metadata; styles.xml, with style definitions for the document; and content.xml, with all the document’s structured content. OpenDocument provides a flexible container that can embed almost any existing digital object and allow it to be used in an office document.

If you want to see what an OpenDocument file really looks like, start with a LibreOffice Writer document and change the .odt extension at the end of the file name to .zip. Then use a ZIP archive tool to look inside the container (Figure 1).

Figure 1
Figure 1: To view the contents of an OpenDocument file, give the file name a .zip extension and open it with a ZIP archive tool.

OpenDocument Advantages

The open XML format used with OpenDocument means that software vendors can develop applications that are interoperable by design. Open XML-based formats like OpenDocument also allow stricter security checks than older binary and pseudo-standard formats. Because of its advantages, OpenDocument has been selected by several governments as the standard for document exchange within the public administration. The United Kingdom, the Netherlands, France, Sweden, and Taiwan are leading with adopting OpenDocument in the public sphere, and other governments are exploring OpenDocument as well.

OpenDocument Future

In a world where most documents are exchanged in digital format, an open standard should be a priority in schools and all educational institutions. Governments need to take a more proactive role in promoting open standards in education and the public sphere. Free software organizations are doing as much as they can to fill the gap, but the cause of open standards requires active advocacy. The Document Foundation has re-launched the OpenDocument Adoption Technical Committee (Figure 2), with the support of other Free Software Foundation projects, companies, and public administrations. Continued investment in open standards will help protect users and developers from the security, interoperability, and efficiency problems associated with closed formats.

Figure 2
Figure 2: The OASIS OpenDocument Technical Committee provides information for organizations that are considering adopting OpenDocument-based tools.

 

This article originally appeared in LibreOffice Expert and is reprinted here with permission.

Want to read more? Check out the latest edition of LibreOffice Expert.

FOSSlife Newsetter

Comments