This paper was originally a presentation in French delivered at the PDF Day – France by Loïc Carrère, CEO of ORPALIS, in April 2019.
Organized by the PDF Association, the PDF days are the meeting place of the PDF industry, where experts conduct educational (non-commercial) presentations, panel, and discussion-based sessions about the format.
The richness of PDF offers many opportunities to reduce the weight of existing documents.
Organizations need to meet more and more legal requirements for archiving and data retention and often adopt a strategy to reduce the amount of storage used by their existing documents.
The PDF Optimization in-depth series
This series of articles will address the issues and constraints of such an approach, as well as various optimization methods that can be applied.
We will try to describe a maximum of optimization techniques, with or without loss of data, which can be adapted according to one’s expectations. We will discuss them with case studies dealing with documents of different nature (documents with vector content and documents containing only images).
Therefore, we will focus on the following issue: can compression allow data loss? If so, to what extent?
Choosing lossless or lossy compression
We will introduce several methods of compression, some without loss of data, others with degradation. For this second category, it will be necessary to decide in advance whether the loss of data is tolerable, if so, to what extent.
For reducing the file size of PDF documents, images are the very first logical candidates for any compression. The reason is apparent – the image can be compressed with retaining its approximate representation of the original data without losing its meaning.
Did you know that a 50% compression applied to a single image will decrease the file size of that image by 90%?
Lossy compression often drastically reduces the file size, but it is at the expense of an irreversible loss of information. Some of the removed data are redundant. Some of them are not, but these are mostly not noticeable to the users in the result. There is no way back once you have used lossy.
When you want to perform further processing on the image, do not select lossy compression.