Quick Tip: docx is a zip Archive
Written in
Microsof Office's docx
files are actually zip archives with a bunch of XMLs and all the attached media. Super useful, everyone should know it!
When I tell my colleagues, friends, or students about it, they don't take me seriously the first time. So, here we go again. If you have a docx (or xlsx, or pptx) file, you can unzip it with unzip proj.docx -d proj
or any other unarchiver and get a folder with all the stuff that makes up the document:
From here, you can:
- quickly grab all the media from
word/media
- work with the document (
word/document
) via an XML parser (or grep / sed, but it's a secret)
And do all the other marvellous stuff — no Office or even GUI needed. Now go and spread the light of this newfound knowledge and never complain about docx again!
Hello, friend! My name is Vladimir, and I love writing about web development. If you got down here, you probably enjoyed this article. My goal is to become an independent content creator, and you'll help me get there by buying me a coffee!