Netflix Contest Over Due to Privacy Concerns
March 13, 2010 7:24 AM   Subscribe

Netflix has ended the $1 million Next Big Thing contest, which would have rewarded a team to improve their recommendation engine.

It was a sequel to an earlier contest to improve their recommendation engine by 10%. The contest was shut down as a settlement due to privacy concerns in a lawsuit lead by a closeted lesbian mother.
posted by mccarty.tim (58 comments total) 15 users marked this as a favorite
 
So they just shut down the whole prom?
posted by East Manitoba Regional Junior Kabaddi Champion '94 at 7:34 AM on March 13, 2010 [44 favorites]


As a Netflix customer, I'm pretty disappointed. I thought the contest was one of the cooler things they did. This is why we can't be referred to nice things.
posted by mccarty.tim at 7:44 AM on March 13, 2010 [15 favorites]


This was a lot more info than the first contest, which had no demographic data:
The data set of more than 100 million entries will include information about renters’ ages, gender, ZIP codes, genre ratings and previously chosen movies.
If Netflix is so into this idea, I don't see why they don't change their privacy policy to allow it.
posted by smackfu at 7:56 AM on March 13, 2010


It seems to me that they could've just redacted the ZIP codes and everything would've been fine. Those shouldn't play a role in movie suggestion algorithms anyway.
posted by The Winsome Parker Lewis at 8:01 AM on March 13, 2010


Honestly, how can rental history out anyone? I used to have a gay roommate and we split netflix, so i still get recommended tons of movies geared towards gay people. I have watched plenty of movies that fall into this (I'm a big fan of Todd Haynes, It's inexplicable) but I am not gay, and If anyone thinks otherwise I could certainly care less or convince them otherwise (if any really really hot ladies need me to prove to them that i am straight, memail me, just let's not tell my girlfriend, okay?) But this is stupid. Shit, I've watched Weeds, The union, and Smiley Face recently on netflix, I have evil Bong in my que but i have never smoked pot in my life. Great now future employers with too much time on their hands might "find out" that i am a huge pothead. Oh noes.

I mean unless she was getting lesbian porn (which they don't offer hardcore porn) I don't see what the deal is. And are people even really that concerned with what other people have rented?
posted by djduckie at 8:02 AM on March 13, 2010


Ask my username what I think of protecting your identity online.
posted by mccarty.tim at 8:03 AM on March 13, 2010


The data set of more than 100 million entries will include information about renters’ ages, gender, ZIP codes, genre ratings and previously chosen movies.

Couldn't they have encoded these things? Like write all the zips backwards or something so they are still in the same groupings but unreversible (obviously backwards is not the best way to accomplish that). Give code numbers for the movies without identifying what they are, etc.
posted by DU at 8:03 AM on March 13, 2010 [2 favorites]


Their privacy policy protects it, but this law has the last word.
posted by Benjy at 8:04 AM on March 13, 2010


Great now future employers with too much time on their hands might "find out" that i am a huge pothead. Oh noes.

If you don't think employers make idiotic-but-life-destroying decisions, you haven't been in the workforce long.
posted by DU at 8:05 AM on March 13, 2010 [3 favorites]


I've been following this story. It's a shame, the first Netflix contest had really interesting results. It reminds me a bit of the AOL search log release, a great example of how even anonymized data in aggregate can end up being very personal. See also: Identifying John Doe and Total Information Awareness.

Privacy through obscurity really is over. As a culture we lack the understanding of this new reality. Companies like Netflix (or Google or your ISP) have a special responsibility to protect data, even anonymized data. But ultimately that's going to be a losing battle.
posted by Nelson at 8:06 AM on March 13, 2010


But ultimately that's going to be a losing battle.

This fatalism is only warranted if Huge Corps are allowed to remain in existence with the political power they enjoy now.

"Bringing corporations under control" is the solution to so many of the world's current problems it really should be in the top 3, maybe top 1, of the list.
posted by DU at 8:09 AM on March 13, 2010 [5 favorites]


This lawsuit was nothing but a money generator for the lawyers. It's a site that offers service to it's customers', being mail order it doesn't have a bias. NetFlix caved due to cost, which will be passed to me, its' user. F' You.
posted by Mblue at 8:09 AM on March 13, 2010 [1 favorite]


Honestly, how can rental history out anyone?

Logic tells us that someone who rents a lot of lesbian-themed dramas and comedies and chewed through all of The L Word over a couple of months could easily be straight. But whoever said that homophobia involved logic? "Rents a ton of gay stuff" is definitely meaningful evidence that someone is gay, and depending on the circumstances can absolutely qualify as outing.
posted by Tomorrowful at 8:12 AM on March 13, 2010


I guess I can see where ZIP code might be useful for providing suggestions for certain types of (average?) movie-watchers. People in the rural midwest might be more inclined to watch Paul Blart: Mall Cop than people in, say, the Bay area. But speaking as someone who would rather watch movies based on my individual preferences than regional trends — I like some weird, unpopular, and foreign stuff — I would prefer that they didn't take my location into account at all.
posted by The Winsome Parker Lewis at 8:13 AM on March 13, 2010 [2 favorites]


I don't want anyone to know I watched Revenge of the Fallen
posted by Damienmce at 8:19 AM on March 13, 2010 [2 favorites]


The Winsome Parker Lewis: "It seems to me that they could've just redacted the ZIP codes and everything would've been fine. Those shouldn't play a role in movie suggestion algorithms anyway."

At the bottom of my suggested movies there is a Local Favorites for Brooklyn, New York section. I assume that it's pretty zipcode specific. Most of the suggestions are for Russian/Yiddish language films. I live in a predominantly Orthodox Jewish area that isn't that large.
posted by Splunge at 8:24 AM on March 13, 2010


based on my individual preferences than regional trends

Yeah, but your individual preferences are shaped, at least to a certain extent, by the people you interact with and talk to, who still tend to be people you know in real life. Possibly because you're more willing to trust the opinions of someone you know rather than some shut-in on Metafilter who has nothing better to do on a Saturday morning than argue with your fair point.
posted by yerfatma at 8:25 AM on March 13, 2010


Damn. I was thinking about enter that contest too! It was just a drawing, right?

Once again, lesbians keeping me from being a millionaire.
posted by cjorgensen at 8:25 AM on March 13, 2010 [1 favorite]


This fatalism is only warranted if Huge Corps are allowed to remain in existence with the political power they enjoy now.

No, I don't think you can blame corporations or governments here. Data that does not threaten privacy in isolation will threaten privacy in aggregation. It's a fundamental law of information, just like "information wants to be free" is fundamentally true. Trying to regulate against companies aggregating data, publishing databases, or analyzing aggregate data will help short term. But long term it's a losing battle just as much as requiring DRM for music and video is a losing battle.

(And to head off the inevitable criticism: "information wants to be free" is not a moral statement that information should be free. It's just an observation that it's impossible to lock information up, to prevent data from escaping whatever restricted container someone tries to bottle it up in. That reality is why I believe it's impossible to prevent data aggregation. And once data is aggregated, privacy ends.)
posted by Nelson at 8:34 AM on March 13, 2010 [2 favorites]


whole point about these competitions is that these equations actually have to work in real life to win.

Wrong and right. The whole point is to suggest movies you'll like and want to rent, thus increasing rentals and profit, which is a businesses. Users' freely give feedback, including their sexual orientation so you know when they have a bias. I'm retired military, when I give a war film bad reviews, it doesn't mean anything , it just means I didn't like the film, but you know I have a point of view.
posted by Mblue at 8:39 AM on March 13, 2010


but I am not gay, and If anyone thinks otherwise I could certainly care less

In a lot of states, you could lose your job, your housing, custody of your child. You don't have a reason to care, but a lot of actual gay people do.
posted by rtha at 8:47 AM on March 13, 2010 [2 favorites]


Comparing the first contest dataset to other databases was enough to identify many users. Given a birthdate, gender, & zip code is enough to uniquely identify 87% of the US, & Netflix was going to just list age instead of birthdate - which isn't enough.
posted by Pronoiac at 9:04 AM on March 13, 2010


I am very gay supportive and try and keep track of stuff, but i guess i was unaware that it was that severe in some places. I didn't mean to sound unsupportive. I mean i do care about what affects gay people, i just don't see a connection between the two things in this instance. (and then i would argue that the person making this complaint would only have a case if they were in any sort of danger of the above mentioned scenarios).
posted by djduckie at 9:07 AM on March 13, 2010


Dammit- and my team, SETEC ASTRONOMY, was making such great progress...

I do wonder sometimes if being completed exposed for who we are might ultimately be a good thing. Yes, on the one hand I can recite the value of privacy, etc... but then I wonder, "Do I have any secrets? Why?" But so long as total information sharing is voluntary and incomplete- so long as we are not all fully telepathic- privacy remains important. Having said that, it's still a weak lawsuit (as people have said, the data could have been scrubbed in a way to alter/encode/randomize what little might have made things personally identifiable). and this woman staying closeted is a mistake for her personally and as a member of society.
posted by hincandenza at 9:27 AM on March 13, 2010


Honestly, how can rental history out anyone? I used to have a gay roommate and we split netflix
Lets see, how many 33 year old women in zip code XXXXX have netflix and rent 90% lesbian porn?

This contest was a stupid, stupid idea. It would have been easy to uniquely identify people with this data, just like the AOL Search data leak
This lawsuit was nothing but a money generator for the lawyers. It's a site that offers service to it's customers', being mail order it doesn't have a bias. NetFlix caved due to cost, which will be passed to me, its' user. F' You.
Wtf are you talking about? You can't just run around breaking the law (someone pointed out the Video Privacy Protection Act).

Secondly, the lawsuits are obviously going to cost less then the $1 million dollar prize. Netflix should allow people to opt into this data dump
At the bottom of my suggested movies there is a Local Favorites for Brooklyn, New York section. I assume that it's pretty zipcode specific. Most of the suggestions are for Russian/Yiddish language films. I live in a predominantly Orthodox Jewish area that isn't that large.
Are they broken down by age and gender?
posted by delmoi at 9:36 AM on March 13, 2010 [2 favorites]


At the bottom of my suggested movies there is a Local Favorites for Brooklyn, New York section. I assume that it's pretty zipcode specific. Most of the suggestions are for Russian/Yiddish language films. I live in a predominantly Orthodox Jewish area that isn't that large.

The "Local Favorites for Berkeley, California" list in my account is pretty hilariously typical of Berkeley: foreign films, indie dramas and political documentaries.

I'm retired military, when I give a war film bad reviews, it doesn't mean anything , it just means I didn't like the film, but you know I have a point of view.

This reminds me: the "voted most helpful" review for For All Mankind (which is great, btw) begins with the line, "I worked in Mission Control during the Apollo 16 & 17 and Skylab Missions. Pretty neat, and he gives it five stars.
posted by brundlefly at 9:38 AM on March 13, 2010


(Imagine a close quote after "Skylab Missions.")
posted by brundlefly at 9:39 AM on March 13, 2010


No one's mentioned the Video Privacy Protection Act of 1988 yet? Fun triva: for 14 years, your video rental records had better (federal) privacy protection than your medical records.

On preview: what delmoi said.
posted by mhum at 9:43 AM on March 13, 2010 [1 favorite]


I do wonder sometimes if being completed exposed for who we are might ultimately be a good thing

David Brin explores this idea in The Transparent Society, which started as an article in Wired. The article is mostly just "privacy is over", stated rather obnoxiously; the book explores more the "ok, now what?" part. His view is controversial, to say the least, but I think it's underappreciated. If you believe that privacy really is over as a natural consequence of information processing technologies, then it's time to start talking about Plan B.
posted by Nelson at 9:44 AM on March 13, 2010 [1 favorite]


"Netflix has ended the $1 million Next Big Thing contest, which would have rewarded a team to improve their recommendation engine."

Spoiler alert!
posted by alvarete at 9:50 AM on March 13, 2010


Couldn't they have encoded these things?

I've worked with scrambled geo-codes before, and given enough information I've suspected it trivial to decode many of them. As a researcher, you sign off that you won't do it, so I have never checked my guesses accuracy.
posted by a robot made out of meat at 9:50 AM on March 13, 2010


I worry that someone sometime will analyse my MeFi favourites and see what a enormous nerd I am.

And this is a real own goal on Netflix account... I'm sure I saw somewhere that they could have easily made the data blind, but I'm not enough of a stats-head to remember how.
posted by fearfulsymmetry at 9:57 AM on March 13, 2010


Paul Ohm wrote an interesting article about anonymized datasets.

Also, the original Netflix de-anonymization paper and a FAQ that might answer some questions in this thread.
posted by null terminated at 10:09 AM on March 13, 2010


Are they broken down by age and gender?

No, they are not. I wasn't making a value judgement, merely stating a fact.
posted by Splunge at 10:46 AM on March 13, 2010


No, they are not. I wasn't making a value judgement, merely stating a fact.

Actually, they could be. I'm sure they're better off making suggestions towards a 25 year old dude in zip code than just person in zip code. Part of improving their algorithm is probably figuring out how well age correlates to movie preference.
posted by graventy at 10:58 AM on March 13, 2010


This seems like much ado about nothing.

First, the Netflix database alone reveals nothing. There are at least 10,000 households per ZIP code, about 25,000 individuals. Age and gender would provide hundreds of matches per ZIP code.

Second, no one is required to give their age and gender to Netflix. You may volunteer that information as a movie reviewer, but it is not required.

Third, the de-anonymization paper does not identify people directly. It only identifies a correlation between two databases, for example Netflix and IMDB. So if you write the same review on both sites, it might be possible to suggest that the same person wrote both. Or if you review the same list of movies on both sites. Similarly if you write the same comment or similar comments using the same phrases on Metafilter and your personal blog about de-clawing cats, a simple google search might be able to link the two. Even if you could link the two, it still doesn't reveal your identity unless you publicly reveal your identity on at least one of the sites.

The simple solution is just not to give Netflix your age and gender if you don't want to.
posted by JackFlash at 11:13 AM on March 13, 2010


That's great if you live in a big city. My hometown zip code? 1,900 people.
posted by graventy at 11:16 AM on March 13, 2010


Also worth noting? Probably not a great place to be outed.
posted by graventy at 11:19 AM on March 13, 2010


Netflix is a silver-tongued flatterer. It claims I like "Cerebral Comedies" and "Mind-bending Dramas", but then says I want to see "Crank II: High Voltage". By all means, improve the recommendation engine.
posted by acrasis at 11:32 AM on March 13, 2010 [3 favorites]


You do want to see Crank II. It's great fun. Although, it probably fits into the "Really Dumb Action" category.
posted by graventy at 11:37 AM on March 13, 2010 [1 favorite]


I think that Crank II could fit into both "Cerebral Comedies" and "Mind-bending Dramas" quite well.
posted by brundlefly at 11:51 AM on March 13, 2010 [1 favorite]


hippybear, are you volunteering?
posted by Pronoiac at 11:54 AM on March 13, 2010


Maybe she was mad Netflix recommended Pretty Woman. She HATES Pretty Woman.
posted by b2walton at 1:16 PM on March 13, 2010


Can someone explains to me how the "outing" would work exactly?

I mean lets say that a person has a que that is full of gay or lesbian cinema, we know how old they are, what zip code they live in what gender they are and that they really liked the L-word fragil rock and the goonies. To suspect someone based on this info we would need to know the person, have access to either their netflix account or have some other means of knowing exactly what movies they are watching (something someone this paranoid would seem to keep secret). It seems like if you had the info to make an assumption based on the data they would give you, then you already had the data through a different means.

Being sincere, could someone explain this to me?
posted by djduckie at 2:08 PM on March 13, 2010 [1 favorite]


Crank 1 is just so much better constructed.
posted by Artw at 2:16 PM on March 13, 2010


and this woman staying closeted is a mistake for her personally and as a member of society.

But that's her decision to make or not make and her lesson to learn, or not learn. The lawsuit may be bogus, but on a personal level, I can see wanting to protect yourself from having that decision taken away from you.
posted by billyfleetwood at 2:17 PM on March 13, 2010


and this woman staying closeted is a mistake for her personally and as a member of society

It's her choice, and more importantly coming out can have severe personal consequences. I don't think she should have to put her family at risk if she doesn't want to --- I admire those who choose to be open and push back against this sort of thing, but I also understand not wanting your whole life to be turned upside down because of it. It's quite possible that the personal effect would in fact be quite negative for her, even if it did contribute in some small way to the society as a whole.

From her lawsuit: "believes that, were her sexual orientation public knowledge, it would negatively affect her ability to pursue her livelihood and support her family and would hinder her and her children’s’ ability to live peaceful lives within Plaintiff Doe’s community"

In Ohio that may well be the case. Certainly it's still the case in many other places I know.
posted by wildcrdj at 2:31 PM on March 13, 2010


I'm in the recommendations business, and several of my relatives asked me why I didn't do the Netflix contest. Netflix certainly got the word out, and I think this contest was a great investment on their part.

I didn't participate. My reasoning was that competing for a $1M prize, with a ton of other people, was like playing a low-odds lottery for a relatively small payout.

If I can make something better than Netflix, I'm sure I can make more than $1M from it.

But this post highlights the other problem. You might spend a year working on their task, and at the end, they call it quits.
posted by zippy at 2:46 PM on March 13, 2010


First, the Netflix database alone reveals nothing. There are at least 10,000 households per ZIP code, about 25,000 individuals. Age and gender would provide hundreds of matches per ZIP code.

Not quite. There are plenty of small ZIP codes. For instance, 07851 had only 250 people in the 2000 census. Adding gender and you know you have it down to the 114 females. Add age and you probably can narrow it down to a few people. Just hope someone else is your age to preserve even that little anonymity.
posted by smackfu at 3:06 PM on March 13, 2010


Can someone explains to me how the "outing" would work exactly?

Presumably:
1. I think my neighbor Martha is a lesbian, and I want to protect her kids from that awful awful lifestyle.
2. We live in a small town and she subscribes to Netflix.
3. Knowing her age and zip code, I could pretty easily track down the movies she's rented.
4. I distribute...fliers, or something? Maybe send a few "concerned citizen" notes to her church/place of employment? Let her family know?

There's a lot of ways the information could be used in a damaging way, and we've got to 'protect' those children.

Probably a good idea to opt out of this kind of crap, if you can.
posted by graventy at 3:14 PM on March 13, 2010


djduckie: De-anonymization of the original Netflix dataset was done the following way:

1) A netflix user rates movies on netflix. She considers some ratings to be private and some public.
2) This same user publishes a few of the same ratings on another site. This might be IMDB.com, a blog, etc. For example, a user might have the following ratings:
  • The L Word, Season 1: 4/4
  • Being Lesbian: 2/4
  • On the Waterfront: 4/4
  • His Girl Friday: 3/4
  • ...
This user assumes The L Word and Being Lesbian is private. On her blog, she duplicates her ratings for On the Waterfront and His Girl Friday.

3) Since the Netflix dataset gives an entire rating history for each user, knowing a few ratings can narrow down the set of users to which she belongs. The surprising result is how few movie ratings need to be known before the data can be de-anonymized.


From the Paul Ohm paper linked above (pg. 20):
If an adversary knows the precise ratings a person in the database has assigned to six obscure movies, and nothing else, he will be able to identify that person 84% of the time. If he knows approximately when (give or take two weeks) a person in the database has rated six movies, whether or not they are obscure, he can identify 99% of the people in the database.99 In fact, knowing when ratings were assigned turns out to be so powerful, that knowing only two movies a rating user has viewed (with the precise ratings and the rating dates give or take three days), an adversary can reidentify 68% of the users.

To summarize, the next time your dinner party host asks you to list your six favorite obscure movies, unless you want everybody at the table to know every movie you have ever rated on Netflix, say nothing at all.
This attack is unrelated to location information.
posted by null terminated at 3:22 PM on March 13, 2010 [2 favorites]


I don't want to dismiss the privacy concerns here, because there are some real issues, but based on what null terminated has to say, if you have a significant secret in your life that you don't want broken, it looks like you'd better not rate movies on Netflix at all.
posted by immlass at 3:38 PM on March 13, 2010


This is why we can't have nice things.
posted by Cool Papa Bell at 3:52 PM on March 13, 2010


I really don't get this. She's a lesbian right? Who uses the internet? Google? Social networking sites? Surely Netflix is not the only source of lesbian content that she has accessed on the internet. I'm pretty sure it would be easier to "prove" she is a lesbian by taking a look at her internet activity and her online acquaintances. Hack her wireless. Her email. Search her trash for credit card statements from hotlesbians.com. Just about anything other than her Netflix account would probably give you a better picture of her sexual activity and would contain specific identifying information such as her name or IP. General small town and online gossip is probably as accurate as Netflix if you want to start guessing the orientation of people you know. Who the hell is going to go through all this trouble anyway? If your employer/neighbor is a homophobe all they need is a gossip rumor to make your life miserable. Leet hax0rz skills not necessary.
posted by Procloeon at 7:14 PM on March 13, 2010


I would just like to point out that I, a straight male, have also ended up with a taste preference for "Gay and Lesbian Dramas."* So Netflix's gaydar isn't quite working at 100%.

I believe Season 1 of "The L Word" to be the culprit.
posted by nathancaswell at 10:11 AM on March 14, 2010


Although to be fair, I did watch it with my lesbian roommates.
posted by nathancaswell at 10:12 AM on March 14, 2010


hippybear - I read that as chastising those who weren't out, & in context, dismissing fears of being outed.

Sorry about that - it was an unfair reading.
posted by Pronoiac at 8:10 PM on March 14, 2010


How is this going to protect anyone's privacy? The data is already out in the public ...
posted by Maztec at 11:13 PM on March 14, 2010


« Older This is Why You're Fat (and why I am too)   |   Catholic Church child sex abuse scandal reaches... Newer »


This thread has been archived and is closed to new comments