January 8, 2026

Two open source tools for editing PDF documents: GUI versus CLI.

1. Introduction

The “PDF” is the essentially static document format par excellence.

However, even PDF documents can be modified, especially with operations such as deleting, adding, reversing pages, inserting text, and more.

There are various applications, including free ones, for performing these operations.

In this article, I will report some parallel observations between two systems for page manipulation: one graphical, PDF Arranger, and one command line, PDFtk.

April 26, 2024

A Python program to copy text from various PDFs and collect it into a single document in Markdown language.

1. Subject of this article.

The goal is to generate a simple program to collect the text contained in various PDFs generated directly from word processing programs and insert the various fragments into a single document in Markdown language by separating the fragments with second-level headings corresponding to the name of the source documents.

March 9, 2024

Reducing the size of single or multiple PDF documents in GNU/Linux Bash and Python

Abstract: Compression of PDF documents is a useful technique to reduce the space occupied by these files and facilitate their transmission and storage. In this article, starting from a page devoted to compressing single PDFs, I present two methods for compressing multiple PDF documents. The reference page is as follows: “Linux shell script to reduce PDF file size (simple verification required to enter) and allows you to operate on single PDFs in command-line bash code in the GNU/Linux terminal. Based on the previous one, I tried to extend the procedure to operate on multiple PDFs. In the end I present a simple application in Python with graphical interface. I admit that I asked for some help from ChatGPT and Copilot.

January 8, 2024

Automatic generation of hyperlinks in LaTeX environment, using Vim's Regular Expressions, between PDF documents.

Vim is an editor with endless capabilities. It can also generate hyperlinks in LaTeX language to other locally stored documents, thanks to its built-in Regular Expressions. For lawyers, this means linking a legal document with its related evidentiary materials. This is the analysis of the procedure.

1. Subject of this article.

Sometimes it is necessary to include in a main PDF document a list of documents to be retrieved with specific hyperlinks dedicated to each item in the list.

June 26, 2023

Powerful OCR system under GNU/Linux for PDF documents managed from command line and with refinement by Vim.

Introduction.

The idea came from reading this article about optical character recognition (OCR) in the GNU/Linux environment from images and PDF, managed from the command line.

Obviously, PDF documents are those scanned from paper original, i.e., not obtained by direct saving of document in digital format. For the latter, no OCR is needed.

The article is very well written and the end result is very good.

I wondered if it would be possible to aggregate all the steps into a single text command.

May 2, 2022

Text documents: from PDF to vector images

Subject of this article

Recently I needed to convert some documents from PDF format, containing text generated by LaTeX in GNU/Linux operating systems, into vector images.

While avoiding conversion from online services, I basically found three interesting solutions: two in command-line mode (pdf2svg and pdftocairo) and one, very famous, in graphical mode (Inkscape).

In this article I report my evaluations highlighting some differences deriving from the source of the PDF documents and the behaviour of three Linux distributions.