"a picture of a person walking through the sky", and other errors
February 3, 2015 2:09 PM   Subscribe

INTERESTING.JPG is an AI trying its hardest to describe the contents of random news photos. Sometimes it does quite well. Sometimes it thinks ice is sheep. See also: Novice Art Blogger. See also, if you're daring: the super duper completely not-safe-for-work porn-analysis robot @NSFW_JPG. Via mefi's own cmyr on Projects.
posted by cortex (40 comments total) 48 users marked this as a favorite

"a motorcycle that is carrying a small amount of bicycles on the back of it" is now my favorite thing.
posted by Foosnark at 2:24 PM on February 3, 2015 [6 favorites]

posted by blue_beetle at 2:27 PM on February 3, 2015 [2 favorites]

MetaFilter: a group of people are trying to have their uniforms fight
posted by delfin at 2:32 PM on February 3, 2015 [2 favorites]


(picture is of a sack of laundry forgotten in the freezing rain)
posted by boo_radley at 2:32 PM on February 3, 2015 [11 favorites]

"a young girl is brushing her teeth"
posted by asterix at 2:43 PM on February 3, 2015 [10 favorites]

These are strangely poetic and unexpectedly lovely.
posted by treepour at 3:08 PM on February 3, 2015 [1 favorite]

This is so much better for being inaccurate.
posted by JHarris at 3:30 PM on February 3, 2015

(That said, if it turns out to be another horse_ebooks-style hoax, yeah, I will express Google Reader cancellation levels of disgust.)
posted by JHarris at 3:31 PM on February 3, 2015 [3 favorites]

I imagine something like this happens when someone is trying to relate a fantastic personal vision in everyday language.
posted by King Sky Prawn at 3:32 PM on February 3, 2015 [2 favorites]

a bouquet of flowers that are on a wall .

posted by Room 641-A at 3:37 PM on February 3, 2015

this NSFW one is hillariously accurate (if somewhat understated).
posted by mrjohnmuller at 3:41 PM on February 3, 2015 [14 favorites]

I've occasionally wondered if we will create AI and have it turn out all Jersey Shore.
posted by srboisvert at 3:45 PM on February 3, 2015 [1 favorite]

Now imagine these are descriptions offered by a toddler with a surprisingly large vocabulary.

(Ironically, I can't think of an adjective for that. Vocabulous?)
posted by lucidium at 3:47 PM on February 3, 2015 [1 favorite]

(That said, if it turns out to be another horse_ebooks-style hoax, yeah, I will express Google Reader cancellation levels of disgust.)

I took two of the images and fed them back into the AI service used, and it gave me the same text as the Twitter posts. If this is fake, this conspiracy goes all the way to the top!
posted by ymgve at 4:04 PM on February 3, 2015 [7 favorites]

It's never a pair of scissors, you dumb robot!
posted by codacorolla at 4:07 PM on February 3, 2015 [2 favorites]

dare you to click on: three construction workers are working on one another
posted by ennui.bz at 4:17 PM on February 3, 2015 [3 favorites]

Yeah, there's no real reason to think it's anything other than as advertised: a visual analysis tool trained on a large set of data and doing a pretty good job at some basics while still being way, way out of its depth compared to the complexity of the human mind. A hoax would be more consistently funny or more consist about its weirdness.

We talk about it a little bit on the most recent mefi podcast; I was really charmed by the whole thing because I think it's a great example of both accomplishing a lot in terms of computer learning and of how easy it is for such a thing to fall down on a difficult task.

What I love about the NSFW one is it's being thrown at a very specific visual context for which it has presumably received zero training. The Reuters photographs at least have a pretty decent chance of having some of the trained elements appearing on a regular basis, but unless you're going to specifically dump a whole bunch of context-specific training into it re: people doing porny stuff, it'd never even have a chance to guess what's going on.
posted by cortex at 4:20 PM on February 3, 2015 [5 favorites]

this NSFW one is hillariously accurate (if somewhat understated).

I mean, it's a hot picture, and the caption is brilliant on its own, and somehow the two together make something sublime. [this is good] etc.
posted by feckless fecal fear mongering at 4:30 PM on February 3, 2015 [1 favorite]

"a man sitting at a game holding a box of food ."

Also rans:
  • a man is sitting at a table while eating a hot dog .
  • a man sits at a table with a large piece of food on it .
  • a man eating food from a box at a table .
  • a man holding a plate of food at a desk .
  • a man sitting at a table with a couple of sandwiches .

posted by jjwiseman at 4:30 PM on February 3, 2015 [1 favorite]

What I love about the NSFW one is it's being thrown at a very specific visual context for which it has presumably received zero training.

this seems consistent with the toddler hypothesis from upthread
posted by indubitable at 4:34 PM on February 3, 2015

Yeah but I really want to see one trained only on porn and then turned loose on AP photos, if only so that I can see a robot describe a bunch of politicians as sucking cocks.
posted by agentofselection at 4:37 PM on February 3, 2015 [10 favorites]

That image classifier is awesome, just tried feeding it various images from imgur. While it's way off the mark on some images, some leave me pretty impressed.

many sausages or hot dogs cooking on a grill - nailed it

two adorable dogs side on opposite sides of a pot filled with beautiful flowers - only one dog, but it identified the flower pot

the cat is taking a nap beside of the black dog - it managed to realize there is both a dog and cat in the picture
posted by ymgve at 4:41 PM on February 3, 2015 [1 favorite]

And now I'm slightly disappointed - the "fancy" sentences describing the pictures don't seem to be computer generated, they are picked from (I assume) human-generated descriptions of the image in the AI's data bank which it deems to be the "closest" picture.

Like this one
posted by ymgve at 5:13 PM on February 3, 2015 [1 favorite]

I think if it were trying to generate the sentences from scratch, we would see more grammatical failure.

So this means somewhere out there is a photo of "a motorcycle that is carrying a small amount of bicycles on the back of it" that was used in the training set.
posted by RobotHero at 5:34 PM on February 3, 2015

I'm guessing the generator can generalize from examples and mix & match, so it's truly generative in terms of captions, not just spitting back the closest existing caption.
posted by jjwiseman at 5:36 PM on February 3, 2015

So this means somewhere out there is a photo of "a motorcycle that is carrying a small amount of bicycles on the back of it" that was used in the training set.

I suspect the approach is similar to the one in this thread, the results certainly are along those lines and this is something I think a bunch of groups have been working on lately. If this is right, there is a network that is purely language (representing something like word co-occurrence statistics), tied (making one big network) to a network that is trained on lots of images annotated with natural language descriptions; my understanding is that these models roughly learn how parts of images correspond to parts of the annotated descriptions, and then use their language network to generate descriptions when presented with novel images. There probably were images in the training set with one or more bicycles and images with one or more motorcycles, but nothing with the combination would be needed to generate this description.

This is unlikely to be a fake, as technology like this definitely exists (the independent deep learning technology for vision and language respectively have been around for a little while, then there seems to have been a bit of a race recently to figure out how to tie them together). Sadly the stanford group doesn't have a freeform demo page [still] so we can't yet race the two.
posted by advil at 6:01 PM on February 3, 2015 [2 favorites]

Yeah, I don't think Bill and Elon need to worry for a while.

(Unless of course the AI is purposefully putting out 75% garbage in order to lure us all into a false sense of security. That's always possible.)
posted by Tell Me No Lies at 6:24 PM on February 3, 2015

I want to see this describe what pictures of the erowid recruiting Twitter feed looks like.

Ok, so that doesn't make any sense, but it would be awesome.
posted by oceanjesse at 6:30 PM on February 3, 2015 [1 favorite]

a young girl is brushing her teeth
a man holding a dog with his teeth
a young girl is brushing her teeth
a man wearing a black shirt with his teeth
a young girl is holding her teeth
a young girl is about to brush her teeth
a person taking a picture of a bite of his teeth

When I upload a picture of teeth to the AI service, it doesn't even use the word teeth. Curious.
posted by jessamyn at 6:31 PM on February 3, 2015

That's funny, I uploaded a picture of someone brushing their teeth and it came back with "bukkake, definitely bukkake".
posted by dr_dank at 6:46 PM on February 3, 2015 [6 favorites]

The results for a selection of images of "a young girl is brushing her teeth" 72% Nipple, 61% lipstick, 53% Band Aid, 32% Diaper, 16% lollipop (sfw)
posted by metaphorever at 7:04 PM on February 3, 2015

three construction workers working on one another - actually, if you squint, the fireball does resemble three people wearing orange doing things to one another.
posted by univac at 8:33 PM on February 3, 2015

The random ones I looked through, it seemed to do a lot better on the porn than the non-porn. Maybe because there's less variation in the images?
posted by lollusc at 10:03 PM on February 3, 2015

two adorable dogs side on opposite sides of a pot filled with beautiful flowers - only one dog, but it identified the flower pot

"Curiously enough, the only thing that went through the mind of the bowl of petunias as it fell was 'Oh no, not again.'"
posted by dry white toast at 10:52 PM on February 3, 2015 [1 favorite]

Can someone feed it some paintings please? I would but I am on a phone...
posted by thegirlwiththehat at 1:22 AM on February 4, 2015 [1 favorite]

It's not images, but you might check out CMU's Nell - https://twitter.com/cmunell. It's publicly trainable. It's trying to build a conceptual framework on ideas and language.
posted by Samizdata at 9:16 AM on February 4, 2015 [1 favorite]

If anyone is interested, here's the paper describing the caption-generating system this is using: Multimodal Neural Language Models. It's very recent work.
posted by cmyr at 10:06 AM on February 4, 2015 [5 favorites]

Okay, more sophisticated than I first thought. And some of the posts do seem grammatically fine but I can't imagine the photograph that would make that description make sense, so I should have granted more flexibility to that.

"a group of men in a country being led by a building ."
posted by RobotHero at 2:01 PM on February 4, 2015

"a group of men in a country being led by a building ."

Probably someone leading a group of men by/past a building. The running man in front sort of resembles a drum major, for example.
posted by taz at 1:20 AM on February 5, 2015

Oh I assumed it meant the building was leading them.
posted by RobotHero at 8:33 AM on February 5, 2015

« Older manipulating the image through removing the flesh...   |   Cody and the Gang are at the Park Newer »

This thread has been archived and is closed to new comments