Open Document Format for Dynamic Spreadsheet Compilation

This article gives an overview the process of using Open Document format (ODF) to create spreadsheets. I will describe in general the ODF format, pros and cons, and process of developing such a library.

Open Document Version 1.2 was used for this project (the latest Version of ODF available at the time of this writing is 1.3).

During the course of developing spreadsheet data export capability for a recent project, I discovered Open Document Format. Open Document Format is a xml approach to document creation. It is used to great effect by OpenOffice.org (http://www.openoffice.org).

ODF Overview:

A minimal .odf format spreadsheet file contains the following files/directories:

  • content.xml
  • styles.xml
  • meta.xml
  • mimetype
  • META-INF/manifest.xml

The most important files of this set are content.xml (which contains all of the data and some style markup) and styles.xml (which contains additional style markup not contained in content.xml).

ODF Pros and Cons:

Pros:

  • Writing ODF format is computationally fast since it merely adds xml markup to the plain text data.
  • ODF is readable and writeable by Microsoft Excel and OpenOffice.org as well as many other applications ().
  • Small file size (the xml fileset is stored in a zip archive).
  • Many of the features of Microsoft Excel are supported including:
    • Header/Footer
    • Paper Size and Orientation
    • Cell/Row/Column Style (color,font size, border, merged cells)
    • Page Scale During Print

Cons:

  • Microsoft Excel 2007’s implementation although viable is less robust than that offered by OpenOffice.org.
  • Missing support for Excel macros means that computations must be done on the server side and results in a static spreadsheet document
  • Differences in formulas between Excel and OpenOffice.org mean that the document must effectively be written for Excel or the other applications or that computations must be done on the server side (i.e.: openoffice SUM(1; 2; 3), excel SUM(1,2,3))

Library Design and Implementation:

My data export library was based on OpenOffice.org’s approach, and follows this http://develop.opendocumentfellowship.com/ tutorial. It was designed using a cookie-cutter type approach that involves creating a template file using OpenOffice.org, extracting the strings that contain the overall file format, extracting the strings that contain the style, and repetatively substituting the composite string for each row of the excel. When new formats or features were needed, the feature was selected in OpenOffice.org, the output file examined, and the relevant strings extracted and templated as necessary. This approach requires some time, because once the feature has been selected and added to the source the output must then be tested with Microsoft Excel to ensure compatibility. A better approach would be to select the feature in Microsoft Excel and then test with OpenOffice.org. This is because OpenOffice.org accepts the Excel format most of the time whereas the converse is not true.

Another more flexible and long-term approach to constructing this library would be to create a grammer based on the xml tags, and write the xml directly. I have also read about others creating ODF libraries using .odt (open document template).