Converting documents from the .tex format of LaTeX to the .docx format of MS Word
Table of Contents
Preface on LaTeX and MS Word
I have been using LaTeX for about thirty years to write documents of all kinds: court documents, reports, research, projects and more.
After overcoming the challenging initial learning curve, therefore, it is very unlikely that one will go back to “traditional” word processing systems: the extraordinary typographic quality and considerable time savings have become indispensable.
Sometimes, however, it is necessary to exchange material written in LaTeX to friends and colleagues who have not yet had the opportunity to appreciate it.
In such cases, it is necessary to convert the text into readable word processing software format. Mainly I am referring to the .docx format, typical of MS Word but also readable by other similar systems, such as LibreOffice Writer that I used for this article.
The conversion procedure is neither unique nor immediate: sometimes it is necessary to complete the result with some minor intervention.
In this article I explain, without claiming completeness, how I analyzed and solved this need.
Conversion through htlatex.
In the TeXLive distribution there is the htlatex command.
The function of htlatex is to convert .tex format to .html format.
Exporting is very simple, just type on the command line:
htlatex file_name.tex
.
No options and no exported file name: all generated by htlatex.
At the outcome of the procedure you get a sequence of documents: one of them is in .HTML format that can be directly in MS Word and saved, then, in .docx format.
The .docx version might be fine as long as there are no numbered lists (typical of witness chapters in court documents) that are affected by carriage returns added between the sequence number and the content.
Therefore, it is necessary to edit each chapter by removing the carriage return, and this could be time-consuming in the case of complex documents perhaps with first- and second-level numbering.
Conversion through Pandoc
An alternative is Pandoc, a universal cross-platform converter.
For the conversion in question, you type the following command line:
pandoc inputfile.tex -o outputfile.docx
.
Pandoc is very efficient, but sometimes a few minor tweaks need to be added to get the job done.
Particularly in documents structured with sections, subsections, numbered lists, and indexes (typical of court documents), I had to make the following changes:
- Insert “justified” formatting into the basic text: simply edit a paragraph with that style sheet and apply the change to all similar paragraphs.
- Change the formatting of the First and (if present) Second Level headings by changing the color from blue to black. Again, the modification of a single First (and Second) Level header can be extended to all headers on the same level.
- Delete some unnecessary additions at the beginning of the Document.
- Reinsert (in the case of complex documents), the Table of Contents.
- Fix the numbering of trial chapters and produced documents. In the case of sub-numbering, “structure” numbering should be applied to obtain the sequence 1.1, 1.2, 2.1, 2.2, etc.
In the end you get a PDF, exported in a format compatible with Microsoft Word or LibreOffice Writer, with the same structural characteristics as the original in LaTeX.
Document export examples
To make the description clearer, I give four examples.
In the first example, I show the PDF obtained from a document directly from LaTeX:
In the second example, I show the outcome of the conversion by applying the htlatex command and then opening, the HTML with LibreOffice Writer:
In the third example, I show the raw outcome of converting .tex to .docx with the Pandoc command:
In the fourth example I show the reworked outcome of the previous conversion:
Thank you for your attention.