Vladimir Klepov as a Coder

Quick Tip: docx is a zip Archive

Microsof Office's docx files are actually zip archives with a bunch of XMLs and all the attached media. Super useful, everyone should know it!

When I tell my colleagues, friends, or students about it, they don't take me seriously the first time. So, here we go again. If you have a docx (or xlsx, or pptx) file, you can unzip it with unzip proj.docx -d proj or any other unarchiver and get a folder with all the stuff that makes up the document:

From here, you can:

  • quickly grab all the media from word/media
  • work with the document (word/document) via an XML parser (or grep / sed, but it's a secret)

And do all the other marvellous stuff — no Office or even GUI needed. Now go and spread the light of this newfound knowledge and never complain about docx again!

More? All articles ever
Older? Advanced Promise Coordination: Rate Limiting Newer? Simpifying AngularJS controllers with ES5 get / set