@rl_dane the surprising steps are the lossy ones ;-)
* the (lossy) downsampling to 1bpp and (lossy) thresholding enabled "lossless" run-length encoding or whatnot to compress at such a high ratio
* the OCR step likely also wasn't lossless — for every very-slightly-unique splotch on the page with a visual pattern _close enough_ to a prototypical `a`/`b`/`c`/… (visually) it probably got replaced with a shared version of said ~letter instead