• pwalker@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    43
    arrow-down
    1
    ·
    edit-2
    2 days ago

    Some years ago I heard about this German guy that found a mind boggling bug in Xerox scanners and the whole story how they tried to play it down is really insane. So definitely worth watching, unfort only with German audio: https://youtu.be/7FeqF1-Z1g0

    • GamingChairModel@lemmy.world
      link
      fedilink
      English
      arrow-up
      52
      ·
      2 days ago

      He’s written up his findings in English, for anyone who prefers English over German or text over video.

      But basically the JBIG2 image compression algorithm used in those scanners looked for certain repeating patterns, and incorrectly compressed certain portions of the image into “close enough” blocks of pixels. Unfortunately, that meant that scanned number data wasn’t guaranteed to be accurate, even when the decoded output clearly looked like a number with no distortion or noise.

      It’s worth the full read.

      • Kay Ohtie@pawb.social
        link
        fedilink
        English
        arrow-up
        19
        ·
        2 days ago

        You left out what I feel is the best part: even in the “uncompressed” mode, even when that was disabled, it was still happening sometimes.

        • GamingChairModel@lemmy.world
          link
          fedilink
          English
          arrow-up
          10
          ·
          21 hours ago

          To be precise, the “lossless” compression is still a compression algorithm. They just didn’t implement the steps that actually make the compression algorithm lossless.

          From the write up:

          JBIG2, the image format used in the affected PDFs, usually has lossless and lossy operation modes. Pattern Matching & Substitution„ (PM&S) is one of the standard operation modes for lossy JBIG2, and „Soft Pattern Matching“ (SPM) for lossless JBIG2 (Read here or read the papery by Paul Howard et al.1)). In the JBIG2 standard, the named techniques are called „Symbol Matching“.

          PM&S works lossy, SPM lossless. Both operation modes have the basics in common: Images are cut into small segments, which are grouped by similarity. For every group only a representative segment is is saved that gets reused instead of other group members, which may cause character substitution. Different to PM&S, SPM corrects such errors by additionally saving difference images containing the differences of the reused symbols in comparison to the original image. This correction step seems to have been left out by Xerox.

          • Kay Ohtie@pawb.social
            link
            fedilink
            English
            arrow-up
            2
            ·
            20 hours ago

            TIL! Thank you for the added detail, I hadn’t read the full write up but had watched his presentation in English and it was wild to hear presented.