Saturday, 19 September 2015

Not Even the People Who Write Algorithms Really Know How They Work

Sometimes there’s a little crack in the web that is just big enough to catch a glimpse of who the robots running the show think you are.
You might deduce, for example, that the tracking software that watches you browse has figured out you’re shopping for a Halloween costume. Lo and behold, ads for gorilla suits and fairy wings start popping up in the margins of every other website you visit. Or maybe you just rewatched a bunch of Twilight Zone episodes on Netflix. It makes sense that the site then recommends Black Mirror andQuantum Leap.
But much of the time, there’s no way to tell why information is filtered the way it is online. Why is one person’s status update on Facebook prioritized in your News Feed over another’s? Why does Google return a different order of search results for you than for the person sitting next to you, googling the same thing?
These are the mysteries of the algorithms that rule the web. And the weird thing is, they aren’t just inscrutable to the people clicking and scrolling around the Internet. Even the engineers who develop algorithms can’t tell you exactly how they work.
And it’s going to get more convoluted before it gets clearer. In fact, for a few reasons, it probably won't get clearer ever. First of all, there’s virtually no regulation of data-collection in the United States, meaning companies can create detailed profiles of individuals based on huge troves of personal data—without those individuals knowing what’s being collected or how that information is being used. “This is getting worse,” said Andrew Moore, the dean of computer science at Carnegie Mellon University.
Which means, Moore told me, we are “moving away from, not toward the world where you can immediately give a clear diagnosis” for what a data-fed algorithm is doing with a person’s web behaviors. I once explored the idea that we might eventually be able to subscribe to one algorithm over another on Facebook as a way to know exactly how the information filter was working. A nice thought experiment, perhaps, but one that assumes the people who write algorithms know with any level of precision or individuality how they work.
“You might be overestimating how much the content-providers understand how their own systems work,” said Moore, who is also a former vice president at Google. He didn’t want to talk about Google in particular, but he did present another hypothetical: Imagine a company showing movie recommendations.
“You might want to say, ‘Why did you recommend this movie?’ When you're using machine-learning models, the model trains itself by using huge amounts of information from previous people,” he said. “Everything from the color of the pixels on the movie poster through to maybe the physical proximity to other people who enjoyed this movie. It’s the averaging effect of all these things.”
These things, the bits of information that a machine-learning model picks through and prioritizes, might include 2,000 data points or 100,000 of them. “One of the researchers at Carnegie Mellon,” Moore said, “just launched a new machine-learning system which can handle putting together tens of billions of little pieces of evidence.”
Which means the systems that determine what you see on the web are becoming more complex than ever. Factor in questions about how those algorithms might hurt people and the picture is murkier still. Consider, for example, Facebook's patent for technology that could trace a person’s social network—a tool that lenders could use to consider the credit ratings of a person’s Facebook friends in deciding whether to approve a loan application. “If the average credit rating of these members is at least a minimum credit score, the lender continues to process the loan application,” Facebook wrote in the patent filing. “Otherwise, the loan application is rejected.”
“That is a really difficult problem,” Moore said. "You’re asking a computer that’s obviously not that smart in the first place to predict whether this person is a risk based on what we know about them—but [you’re telling it], ‘Please exclude these features that, as a society, we think would be illegal.’ But it’s very hard or impossible for the engineers to know for sure that the computer hasn’t inadvertently used some piece of evidence which it shouldn’t.”
All this means that as algorithms become more complex, they become more dangerous. The assumptions these filters make end up having real impact on the individual level, but they’re based on oceans of data that no one person, not even the person who designed them, can ever fully interpret.

No comments: