Holy mackerel! Image analogies
October 30, 2001 10:26 PM Subscribe
Holy mackerel! Image analogies are an NYU-developed technique for "teaching" the computer an image filter. Their software can do things like fill in the blank in the analogy (photo of a swan):(pastel rendering of a swan)::(photo of a landscape):________. I'm not doing it justice. Their site has some compelling examples of what they can do. Gee-whiz factor of 8.5!
sylloge, I wish there were some progress in the learn by analogy stuff, especially in realtime for repeated user actions.
posted by cps at 11:36 PM on October 30, 2001
posted by cps at 11:36 PM on October 30, 2001
Holy shi-ot! This is VERY cool... I kept thinking 'hoax' but it's right there on NYU's site... Wow... Adobe's Macromedia's gonna be pissed!
posted by fooljay at 11:38 PM on October 30, 2001
posted by fooljay at 11:38 PM on October 30, 2001
> Gee-whiz factor of 8.5!
And a Cheez Whiz factor of at least 7.0.
posted by pracowity at 11:48 PM on October 30, 2001
And a Cheez Whiz factor of at least 7.0.
posted by pracowity at 11:48 PM on October 30, 2001
Hrm. This was on slashdot a couple of months ago. I think one of the coolest things was the 'texture inlargement' feature. Imagine in 10 or 20 years from now when computers are fast an powerfull enough to do this on the fly to old PC and playstation games.
posted by delmoi at 12:37 AM on October 31, 2001
posted by delmoi at 12:37 AM on October 31, 2001
This project is brilliant! I hope the applications though go beyond the normal, "We can make money off of this to make plugins spiel." That potomac example is out of this world....this is a 9.5!
posted by nakedjon at 12:38 AM on October 31, 2001
posted by nakedjon at 12:38 AM on October 31, 2001
In case anyone is wondering and can't be bothered to read the paper, the technique is similar to fractal compression - the analogy is effectively the compression process.
At least, that's my take on it after giving the details a quick scan. If anyone has a more informed summary, please correct me.
posted by andrew cooke at 2:44 AM on October 31, 2001
At least, that's my take on it after giving the details a quick scan. If anyone has a more informed summary, please correct me.
posted by andrew cooke at 2:44 AM on October 31, 2001
one step for mankind toward making the stupid lasso tool obselete.
posted by elle at 3:46 AM on October 31, 2001
posted by elle at 3:46 AM on October 31, 2001
You know, this sort of thing spotlights some interesting copyright issues.
So you come up with a proggy that can look at un-filtered picture A and filtered picture B, and the program itself can mimic the process used by the original filter. Which, of course, renders the original filter useless, except for the original's likely superior computational efficiency - which is becoming less valuable as CPU speeds ramp up.
It is not hard to imagine building a look-n-learn system that could, for instance, "digest" Microsoft Word, and formulate a program identical to it in every way. Yet, the resultant code would not be derivative. There would be no reverse engineering going on. Legal? Interesting.
posted by Opus Dark at 3:54 AM on October 31, 2001
So you come up with a proggy that can look at un-filtered picture A and filtered picture B, and the program itself can mimic the process used by the original filter. Which, of course, renders the original filter useless, except for the original's likely superior computational efficiency - which is becoming less valuable as CPU speeds ramp up.
It is not hard to imagine building a look-n-learn system that could, for instance, "digest" Microsoft Word, and formulate a program identical to it in every way. Yet, the resultant code would not be derivative. There would be no reverse engineering going on. Legal? Interesting.
posted by Opus Dark at 3:54 AM on October 31, 2001
That tool works by looking at how short range structure is modified between the two trial images (in particular, it looks at rearranging the pixels in A to give B).
There's no way to extend that to do what you want for code. First, it does nothing on large scales and second, Word isn't a modified version of something else (a closer analogy would be to argue that by comparing Word 2001 and Word 2002 and applying the same changes to another program, you'd get the same bugfixes/enhancements).
The argument that "something may be possible" like you describe still stands - I'm just saying that what you see here isn't anything like what you're thinking of. It's neat, but not as smart as it looks.
posted by andrew cooke at 3:59 AM on October 31, 2001
There's no way to extend that to do what you want for code. First, it does nothing on large scales and second, Word isn't a modified version of something else (a closer analogy would be to argue that by comparing Word 2001 and Word 2002 and applying the same changes to another program, you'd get the same bugfixes/enhancements).
The argument that "something may be possible" like you describe still stands - I'm just saying that what you see here isn't anything like what you're thinking of. It's neat, but not as smart as it looks.
posted by andrew cooke at 3:59 AM on October 31, 2001
It is not hard to imagine building a look-n-learn system that could, for instance, "digest" Microsoft Word, and formulate a program identical to it in every way.
It's an interesting idea, though I couldn't say how feasible. The thing is, when you pass the program an image, a single pass over the image yields all the information the program is going to get about the image, i.e. the image has only one state. If the "software analogy" synthesizer were to work based on observing Word's behavior, it would be (prohibitively?) difficult to show the synthesizer enough states of Word for it to produce an act-alike. You could, in theory, run over the code, but what would the analogy be? And would the results be precise enough to be usable as machine code?
I think it would be interesting to see this applied to image compression- the potomac example showed that the system is capable of producing convincing aerial photo images from images which look to me to be much more friendly to existing compression algorithms. Maybe there are images for which it would be more efficient to transmit one of those "texture-by-numbers" images with an annotation saying, "psst! this is an aerial photo!"
posted by MonkeyMeat at 4:16 AM on October 31, 2001
It's an interesting idea, though I couldn't say how feasible. The thing is, when you pass the program an image, a single pass over the image yields all the information the program is going to get about the image, i.e. the image has only one state. If the "software analogy" synthesizer were to work based on observing Word's behavior, it would be (prohibitively?) difficult to show the synthesizer enough states of Word for it to produce an act-alike. You could, in theory, run over the code, but what would the analogy be? And would the results be precise enough to be usable as machine code?
I think it would be interesting to see this applied to image compression- the potomac example showed that the system is capable of producing convincing aerial photo images from images which look to me to be much more friendly to existing compression algorithms. Maybe there are images for which it would be more efficient to transmit one of those "texture-by-numbers" images with an annotation saying, "psst! this is an aerial photo!"
posted by MonkeyMeat at 4:16 AM on October 31, 2001
I thought “fake,” too, until I saw all of the math in the paper. Not that I understand it, but if you can fake the math, who needs to fake the images...
One caveat (from the PDF paper):
“The finished painting consists of 11 glazes, using a total of 2750 iterations of the simulator, rendered at a resolution
of 640 by 480 pixels in 7 hours on a 133 MHz SGI R4600 processor.”
7 Hours! Yow! Not quite the interactive application I might hope for...
posted by jpburns at 4:58 AM on October 31, 2001
One caveat (from the PDF paper):
“The finished painting consists of 11 glazes, using a total of 2750 iterations of the simulator, rendered at a resolution
of 640 by 480 pixels in 7 hours on a 133 MHz SGI R4600 processor.”
7 Hours! Yow! Not quite the interactive application I might hope for...
posted by jpburns at 4:58 AM on October 31, 2001
My favorite part (boosting the score to 9.0 at least):
the system can do what we always scoff at in movies
How many times do I laugh out loud in late 'high tech thrillers' where there are still images or grainy video (from a surveillance camera, say) that the young, hiply-dressed computer hacker can zoom in on and enhance so that we can see the killer / terrorist / keystrokes / eye-color?
Answer: a lot.
Well, scoff no more.
posted by zpousman at 5:27 AM on October 31, 2001
the system can do what we always scoff at in movies
How many times do I laugh out loud in late 'high tech thrillers' where there are still images or grainy video (from a surveillance camera, say) that the young, hiply-dressed computer hacker can zoom in on and enhance so that we can see the killer / terrorist / keystrokes / eye-color?
Answer: a lot.
Well, scoff no more.
posted by zpousman at 5:27 AM on October 31, 2001
(On scoffing at movies) The method shown here cannot be any better than the "best" image restoration algorithm which (for some obscure mathematical definition of "best") is already known - Maximum Entropy (top left example).
Is it obvious I'm supposed to be testing software today? :-/
posted by andrew cooke at 6:20 AM on October 31, 2001
Is it obvious I'm supposed to be testing software today? :-/
posted by andrew cooke at 6:20 AM on October 31, 2001
Actually, andrew, it could be better in one sense. This image enhancement is not better _and_ accurate. You train the engine with a trainer image and it uses that data to enhance another image.
As a result it badly distorts the second image with data coming from the trainer. It can be better in that it can enhance an image so that it _looks_ tons better than a straight-ahead enhancer could ever make it.
But the caveat is that it could make your blurry family picture look like a razor sharp photo of people made of wood in a forest scene or a spy photo from space.
posted by n9 at 6:44 AM on October 31, 2001
As a result it badly distorts the second image with data coming from the trainer. It can be better in that it can enhance an image so that it _looks_ tons better than a straight-ahead enhancer could ever make it.
But the caveat is that it could make your blurry family picture look like a razor sharp photo of people made of wood in a forest scene or a spy photo from space.
posted by n9 at 6:44 AM on October 31, 2001
This blows my goddamn mind.
posted by untuckedshirts at 7:29 AM on October 31, 2001
posted by untuckedshirts at 7:29 AM on October 31, 2001
it's only a matter of time before this succumbs to will's law of the lowest common denominator:
"technology becomes mainstream when somebody finds an application for it in the creation or dissemination of pr0n"posted by willconsult4food at 9:01 AM on October 31, 2001
« Older Haunted House = No STDs! | Cronyism, alive and well in Boston Newer »
This thread has been archived and is closed to new comments
Very cool indeed -- now, if only we could do this for formatting documents (I wasted a few hours today inn a cycle of cut-paste-apply style-redo formatting lost in the style change-repeat).
posted by sylloge at 10:46 PM on October 30, 2001