Opening ODT and DOCX – are they human readable?

by Shane Perris on Friday, 18 January, 2008

in how-to,tutorials

One of the supposed benefits of XML is that documents produced in this format are able to be opened as a text file and read by normal people, allowing the content to be recovered, even if the formatting was unavailable Tired of wondering just how human readable either format was, I decided to take a look for myself.

I created a simple document in both Open Office as .odt (Open Document Text) and in MS Office 2007 as .docx that had a heading, some paragraphs, an unordered list and an ordered list. I used the Loren Ipsum generator that can be found at Lipsum.

(click on images for larger versions)

Screenshot of an open document text file.

A screenshot of a Windows Office 2007 docx file

.odt is followed by .docx

To start off with, I opened both documents up in Wordpad to see what they looked like. Not at all human readable.

Screenshot of an odt file opened up in a text editor

Screenshot of a docx file opened up in a text editor

A quick trawl through a Google search revealed that .odt is a container format that compresses all the relevant file parts in to one file. I changed the file extension from .odt to .zip and opened it up to have a look. 

Screenshot of an odt file opened up as a zip file

Screenshot of the xml of an odt f

 

 

What worked for one format might work for the other. I took a punt, changed the file extension from .docx to .zip, held my breath, crossed my fingers, closed my eyes and double-clicked…

 

Screenshot of a docx file opened up as a zip file

Screenshot of the xml in a docx file

 

…and discovered that in .docx, the goodies are there, albeit buried a little deeper.

Both .odt and .docx are human readable, after a fashion. If for some reason in the distant (or not-so-distant) future either format is unreadable in its container form, with some effort the data could be extracted. It may even be possible to extract large parts of the formatting, but that’s beyond my ability to assess.

In my assessment, .odt comes out ahead slightly in the human readable stakes: it isn’t buried quite so deep and comes with less additional XML-related formatting and overhead. As to which is the better format overall, I’ll leave that as an exercise for the reader (although I wish I could create .odt inside of Office 2007 – I do love the new Office user interface).

Leave a Comment