@rl_dane the surprising steps are the lossy ones ;-)
* the (lossy) downsampling to 1bpp and (lossy) thresholding enabled "lossless" run-length encoding or whatnot to compress at such a high ratio
* the OCR step likely also wasn't lossless — for every very-slightly-unique splotch on the page with a visual pattern _close enough_ to a prototypical `a`/`b`/`c`/… (visually) it probably got replaced with a shared version of said ~letter instead
To your first point, you're absolutely right. Thresholding yeilds far more than an 8:1 compression because PNG is far more able to crunch bilevel graphics vs. grayscale.
To your second point, you're describing the #JBIG lossy compressor for scanned documents and monochrome images, and yeah, that's super cursed. I'd be surprised if that's what ocrmypdf is doing, but it's possible? ¯\_(ツ)_/¯