> Years ago it would've required a supercomputer and a PhD to do this stuff
This isn't actually true. You could do this 20 years ago on a consumer laptop, and you don't need the information you get for free from text moving under a filter either.
What you need is the ability to reproduce the conditions the image was generated and pixelated/blurred under. If the pixel radius only encompasses, say, 4 characters, then you only need to search for those 4 characters first. And then you can proceed to the next few characters represented under the next pixelated block.
You can think of pixelation as a bad hash which is very easy to find a preimage for.
No motion necessary. No AI necessary. No machine learning necessary.
The hard part is recreating the environment though, and AI just means you can skip having that effort and know-how.
In fact, there was a famous de-censoring that happened because the censoring which happened was a simple "whirlpool" algorithm that was very easy to unwind.
If media companies want to actually censor something, nothing does better than a simple black box.
> nothing does better than a simple black box
You still need to be aware of the context that you're censoring in. Just adding black boxes over text in a PDF will hide the text on the screen, but might still allow the text to be extracted from the file.
Indeed. And famously, using black boxes as a background on individual words in a non-monospaced font is also susceptible to a dictionary attack on an image of the widths of the black boxes.
And even taking sharpie and drawing a black box doesn't mean the words can be seen at a certain angle or by removing the sharpie ink but not the printed ink.
Really, if you need to censor something create a duplicate without the originals. Preferably literally without the originals as the size of the black box is also an information leak.
No need for the monospaced requirement - it would reduce the search space, but it's solvable even before this reduction.
The additional leakage provided by non-monospace is rather large. With monospace all you know is the character count.
Curious anyone know if the specific censoring tool in the MacOS viewer has this problem? I had assumed not because they warn you when using the draw shapes tool that text below it can be recovered later and they don't warn you about that when using the censoring tool.
Ah yes, Mr. Swirl Face.
This was pretty different though. The decensoring algorithm I'm describing is just a linear search. But pixelation is not an invertible transformation.
Mr. Swirl Face just applied a swirl to his face, which is invertible (-ish, with some data lost), and could naively be reversed. (I am pretty sure someone on 4chan did it before the authorities did, but this might just be an Internet Legend).
A long while ago, taking an image (typically porn), scrambling a portion of it, and having others try to figure out how to undo the scrambling was a game played on various chans.
Christopher Paul Neil is a real person who went to jail.
Shhhhh!!!!!
Yeah but if you read about him it serves as a rallying cry for right wing types since he's an example of the candaian legal systems extreme leniency. This guy should be in prison forever and he's been free since 2017. Look at his record of sentencing. I love being a bleeding heart liberal/progressive and all, but this is too far.
Furthermore, don't look too hard at Isreal and it's policy of being very, very open to pedophiles and similar types.
A completely opaque shape or emoji does it. A simple black box overlay is not recommended unless you that’s the look you’re going for. Also very slightly transparent overlays come in all shapes and colors and are hard to recognize whether it’s a black box or another shape, so either way you need to be careful it’s 100% opaque.
Noob here, can you elaborate on this ? if you take for example a square of 25px and change the value of each individual pixels to the average color of the group, most of the data is lost, no ? if the group of pixels are big enough can you still undo it ?
Yeah most of the information is lost, but if you know the font used for the text (as is the case with a screencast of a macOS window), then you can try every possible combination of characters, render it, and then apply the averaging, and see which ones produce the same averaged color that you see. In addition, in practice not every set of characters is equally likely --- it's much more likely for a folder to be called "documents" than "MljDQRBO4Gg". So that further narrows down the amount of trying you have to do. You are right, of course, that the bigger the group of pixels, the harder it gets: exponentially so.
Yeah, a screencast of a MacOS window is probably one of the best case scenarios for this.
As a toy example if you have pixels a,b,c and you blur to (a+b)/2,(b+c)/2, you can recover back c - a. You might be able to have a good estimate of the boundary condition ("a"), e.g. you can just use (a+b)/2 as an approximation for "a", so the recovered result might look fairly close.
You'll basically want to look up the area of deconvolution. You can interpret it in linear algebra terms as trying to invert an ill-conditioned matrix, or in signal processing terms as trying to multiply by the inverse of the PSF. In real-world cases the main challenge is doing so without blowing up any error that comes from quantization noise (or other types of noise).
See https://bartwronski.com/2022/05/26/removing-blur-from-images... and https://yuzhikov.com/articles/BlurredImagesRestoration1.htm
Depends how much data you have. If a 25x25 square is blurred to a single pixel, that will take more discrete information to de-blur than if it were a 2x2 square. So a longer video with more going on, but you can still get there.
How many pixels are in `I` vs how many are in `W`? different averages as a result. With subpixel rendering, kerning, etc, there is minute differences between the averages of `IW` and `WI` as well, order can be observed. It is almost completely based on having so much extra knowledge: the background of the text, the color of the text, the text renderer, the font, etc, there are a massive amount of unknowns if it is a random picture, but if we have all this extra knowledge, it massively cuts down on the amount of things we need to try: if its 4 characters, we can make a list of all possibilities, do the same mosaic, and find the closest match, in nearly no time.
It's not that you're utterly wrong; some transformations are irreversible, or close to. Multiplying each pixel's value by 0, assuming the result is exactly 0, is a particularly clear example.
But others are reversible because the information is not lost.
The details vary per transformation, and sometimes it depends on the transformation having been an imperfectly implemented one. Other times it's just that data is moved around and reduced by some reversible multiplicative factor. And so on.
TLDR: Most of the data is indeed "lost". If the group of pixels are big enough, this method alone becomes infeasible.
More details:
The larger the group of pixels, the more characters you'd have to guess, and so the longer this would take. Each character makes it combinatorially more difficult
To make matters worse, by the pigeonhole principle, you are guaranteed to have collisions (i.e. two different sets of characters which pixelate to the same value). E.g. A space with just 6 possible characters, even if limited to a-zA-Z0-9, that's 62*6 = 56800235584, while you can expect at most 2048 color values for it to map to.
(Side note: That's 2048 colors, not 256, between #000000 and #FFFFFF. This is because your pixelation / mosaic algorithm can have eight steps inclusive between, say, #000000 and #010101. That's #000000, #000001, #000100, #010000, #010001, #010100, #000101, and #010101.
Realistically, in scenarios where you wouldn't have pixel-perfect reproduction, you'd need to generate all the combos and sort by closest to the target color, possibly also weighted by a prior on the content of the text. This is even worse, since you might have too many combinations to store.)
So, at 25 pixel blocks, encompassing many characters, you're going to have to get more creative with this. (Remember, just 6 alphanumeric characters = 56 billion combinations.)
Thinking about this as "finding the preimage of a hash", you might take a page from the password cracking toolset and assume priors on the data. (I.e. Start with blocks of text that are more likely, rather than random strings or starting from 'aaaaaa' and counting up.)
this gets exponentially harder with a bigger blur radius, though.
- [deleted]
Something like https://github.com/Jaded-Encoding-Thaumaturgy/vapoursynth-de..., you mean?