Below you will find pages that utilize the taxonomy term “PDF”
A Python program to copy text from various PDFs and collect it into a single document in Markdown language.
1. Subject of this article.
The goal is to generate a simple program to collect the text contained in various PDFs generated directly from word processing programs and insert the various fragments into a single document in Markdown language by separating the fragments with second-level headings corresponding to the name of the source documents.
Reducing the size of single or multiple PDF documents in GNU/Linux Bash and Python
Abstract: Compression of PDF documents is a useful technique to reduce the space occupied by these files and facilitate their transmission and storage. In this article, starting from a page devoted to compressing single PDFs, I present two methods for compressing multiple PDF documents. The reference page is as follows: “Linux shell script to reduce PDF file size (simple verification required to enter) and allows you to operate on single PDFs in command-line bash code in the GNU/Linux terminal. Based on the previous one, I tried to extend the procedure to operate on multiple PDFs. In the end I present a simple application in Python with graphical interface. I admit that I asked for some help from ChatGPT and Copilot.
Automatic generation of hyperlinks in LaTeX environment, using Vim's Regular Expressions, between PDF documents.
Vim is an editor with endless capabilities. It can also generate hyperlinks in LaTeX language to other locally stored documents, thanks to its built-in Regular Expressions. For lawyers, this means linking a legal document with its related evidentiary materials. This is the analysis of the procedure.
- 1. Subject of this article.
- 2. Main document configuration.
- 3. RegEx formula for automatic link generation.
- 4. Explanation of the RegEx formula.
- 5. Management of “underline character”.
- 6. Links within the text
1. Subject of this article.
Sometimes it is necessary to include in a main PDF document a list of documents to be retrieved with specific hyperlinks dedicated to each item in the list.
Powerful OCR system under GNU/Linux for PDF documents managed from command line and with refinement by Vim.
Introduction.
The idea came from reading this article about optical character recognition (OCR) in the GNU/Linux environment from images and PDF, managed from the command line.
Obviously, PDF documents are those scanned from paper original, i.e., not obtained by direct saving of document in digital format. For the latter, no OCR is needed.
The article is very well written and the end result is very good.
I wondered if it would be possible to aggregate all the steps into a single text command.
Text documents: from PDF to vector images
Subject of this article
Recently I needed to convert some documents from PDF format, containing text generated by LaTeX in GNU/Linux operating systems, into vector images.
While avoiding conversion from online services, I basically found three interesting solutions: two in command-line mode (pdf2svg and pdftocairo) and one, very famous, in graphical mode (Inkscape).
In this article I report my evaluations highlighting some differences deriving from the source of the PDF documents and the behaviour of three Linux distributions.