Text documents: from PDF to vector images

Table of Contents

Subject of this article

Recently I needed to convert some documents from PDF format, containing text generated by LaTeX in GNU/Linux operating systems, into vector images.

While avoiding conversion from online services, I basically found three interesting solutions: two in command-line mode (pdf2svg and pdftocairo) and one, very famous, in graphical mode (Inkscape).

In this article I report my evaluations highlighting some differences deriving from the source of the PDF documents and the behaviour of three Linux distributions.

Some interesting references on the subject:

pdf2svg

It’s a command line software, very easy to use, reliable and fast.

The followin is the command scheme:

pdf2svg <in file.pdf> <out file.svg> [<page no>]

You can specify the number of pages to be exported.

Ideal for quick and direct operations from PDF to SVG.

It does not export to other formats, but if you simply need to export a PDF to a vector image, it is the fastest and most effective solution.

I should note, however, that the author of the software does not seem to encourage the use of this software.

On the pdf2svg home page you can read the following warning:

Note: since this utility was written, the maintainers of Poppler have written a utility that works on the same principle: pdftocairo. I recommend that you use their utility since it is better maintained than mine.

pdftocairo

It’s another command-line software. It’s, in particular, just the one recommended by the author of pdf2svg, as reported above.

It is part of the poppler-utils package and may have already been installed by default in your Linux distribution.

Very rich in options, with the ability to export to various formats and not just .svg.

The usage pattern is as follows:

pdftocairo [options] PDF-file [output-file]

But it is not without problems.

In Fedora and Arch Linux, exporting PDFs generated (only) by LaTeX and converted to .svg, produced white pages.

The problem is reported on the net by other users, as in this post.

Exporting to other image formats (.jpg and .png) did not give any problem.

As well as no problem exporting to .svg text documents generated by sources other than LaTeX (for example documents generated by LibreOffice Writer).

In Fedora the problem disappeared after the installation recommended in this page, namely:

$ sudo dnf install perl-File-Copy.

In Arch Linux, however, the component installation does not work and generates the response “Could not find all required packages “.

For that distribution I have not found, therefore, any solution, even if I’m almost sure that one exists (if any reader has solved this problem in Arch I ask to let me know).

In Linux Ubuntu the problem has never occurred since the first use.

Inkscape

For managing vector images, in general, Inkscape is the absolute reference.

Among its infinite functions, the software also allows you to open PDFs, choosing the specific page to display and even the conversion method, and to save the result in .svg format.

The advantage is that the vector image is immediately available for editing and processing.

The vector format (.svg), in fact, makes it possible to extract, move and mix the text fragments in the images, as if they were digital ‘post-its’.

In summary

  • pdf2svg for quick operations to the .svg format only.
  • pdftocairo for operations involving other image formats.
  • Inkscape for exporting and contextual editing of vector images.

Thank you for your attention.

Lawyer

Let’s talk about technology?