The real story here is the highlighting of the flawed attempt to redact a document.
Happens a lot.
I used to work at a company that requested a lot of documents from municipal governments. We found random people's SSNs on more than one occasion.
The only foolproof way I trust is to redact it digitally, print it on paper, and scan it.
I’ve used bad redaction to my advantage at work to make money, I’m all for other people using bad redaction techniques :)
Print single-sided. Otherwise there's a risk that nominally invisible bleed through from the other side can be enhanced. It's better to just convert a PDF to images directly and redact that.
That is the EXACT process I automated the non-redacting parts of using cron jobs and task folders at a past job.
Flatten everything to a set of just images.
Have normal human staff draw black boxes over anything to be redacted.
Compose a new 'PDF' that's a set of 'scanned' images.
I wonder if using certain kinds of inks could cause slight differences in reflectivity over the redacted text, leaving artifacts that could be used to reconstruct the text in scanned documents? Seems like applying strips of opaque tape over the redacted text might be the most certain method, though maybe overkill after all.
This was sort-of the winning solution to an underhanded C contest to redact an image. Hazily remembered, but the winner used a trick where already black pixels got redacted to one color black and already white pixels got to an ever so slightly different black. Reversing the image would then make it trivial to read the original black-on-white text.
I remember that one: the two blacks were not slightly different, they were both exactly black but written in different ways.
The image was in PPM format, which stores the color components of the pixels as ASCII text (so a white pixel is stored as "255 255 255" and a black one is "0 0 0"). To redact the image, the code replaced every digit of the numbers with '0', so white became "000 000 000" and black stayed as "0 0 0". Both are black and indistinguishable if you're viewing the image, but you can tell them apart by looking at the file text.
Sadly the UCC homepage seems to have vanished, but I found this account from the author: http://notanumber.net/archives/54/underhanded-c-the-leaky-re...
Not 100% sure offhand, but I _think_ the the final step in my process chain (repack everything into a PDF) would have converted the input image formats and thus defeated that type of input. As it was, they were effectively 'redacted' using MSpaint to clobber over the rasterized data, so I was more concerned with minimizing the file size.
Ah right. Had to be sneaky enough to escape being outright flagged. A not-quite-black would have failed the test.
You don't need to be that paranoid. Converting to a raster image format is sufficient.
What if you just block out text in PDF, then Print to PDF - does that retain the text behind the black block?
If it does, then Export to PNG almost certainly removes it (while also removing all other selectable text)
That sounds pretty foolproof so long as your black box fill method doesn't fill with a 99% opacity, or a flood fill leaves behind a few invisible anti-aliased pixels, or the merge operation of the black box doesn't result in some multiplication leaving a few bits of difference. Even if you erased the layer below, then filled above, I've had erasures vary in the bits outside the alpha channel messing up games using the texture info.
Overall, I kind of understand the paranoia even though in principle it does sound pretty foolproof.
Wouldn’t all those fears apply to printing as well?
The alpha channel ones would not apply to printing, and overall printing is an extremely lossy operation, where all those minute details get washed out in approximate ink levels and the muddiness of the physical world. It might not be totally foolproof, esp for a very accurate print process (don't use your photo printer maybe), but it's probably many orders of magnitude noisier..
I think if you're really concerned, you'd print it once, apply physical black tape on it (or cut out with a razor), then scan that :)
Printing would presumably have enough imprecision to mask those.
>What if you just block out text in PDF, then Print to PDF - does that retain the text behind the black block?
Possibly depending on the application you use to print and the printer driver. Acrobat has some unexpected behaviours when printing.
- [deleted]
- [deleted]
> I’ve used bad redaction to my advantage at work to make money
You've certainly piqued my curiosity. Can you say any more?
I sell construction work. Sometimes my customers will have me price up something that someone else priced to them and they will send me a competitor’s redacted scope letter with the pricing blanked out so I can bid ‘apples to apples’ aka the same scope of work.
I’ve unredacted proposals using the ‘unflatten’ command in Bluebeam Revu (which is by far the best PDF editor) which allowed me to underbid my competitor and win the job (and at a higher price than I would’ve submitted).
Definitely an ethical grey area, but an edge is an edge ;)
I really don't think this is grey, I think these cases have clear legal implications, though I'm not a lawyer. You are circumventing redaction, regardless of how boneheaded it is, the intent was clear.
I'd not do this if I were you.
The information was in the document they sent me, they should’ve removed it completely if they didn’t want me to see it. The situation is identical to them mailing me a paper copy with a black piece of paper scotch taped over the price.
There are zero legal implications, it was a private contract. My customers regularly tell me the exact price that my competitors have submitted to them and that isn’t illegal.
Probably there are legal implications for attorneys circumventing redaction in legal documents but construction proposal letters have no protections against unredaction.
Morally gray, sure.
Legally, I can't see what's wrong with using information that you have, even if the other party didn't intend for you to have it. Lawyers themselves will use information in court that was accidentally sent to them by a counter-party, and that the other lawyer never intended them to have.
It may be technically an issue with some government bids, if you need to file an affidavit certifying you had no such knowledge.
But how would they prove it? And, doing so would reveal that they fucked up in the first place by sending it to you.
- [deleted]
I would be really surprised if there was a law against this, and even if there was who really cares? As long as you don't make it super obvious (like consistently bid 1p under the competition) nobody will know.
There's no way there's a legal case that can be made against him imo
Probably trading of some sort?
I should go sell to an intelligence agency a malicious PDF editor that covertly shares the plain text version any time someone uses the block out tool.
There are billions of PDF files out there, but the ones are being redacted are the most valuable of the lot.
Did they mean to redact though? The slide makes it seem like they are hidden because they are being adjusted, not because they are secret.