Work – Page 5

Unintended Consequences

21 February, 2019

I spent yesterday in a workshop with a bunch of very smart folks who spend their days trying to break things. You can debate at some length the ethics of this behavior, but that’s not really the point as far as I’m concerned. What really struck me in the workshop was humanity’s ability to create astonishingly complex systems, and our corresponding inability to understand them fully. Complexity carries a cost, as it turns out, and the full bill is rarely apparent until we are well down the road.

Our keynote speaker at the workshop was Daniel Gruss, a professor from the Graz University of Technology in Austria. I’ve come to know Daniel a bit since he and a separate team led by Daniel Genkin at the University of Michigan simultaneously found the Meltdown and Spectre exploits which were published back in January 2018. Daniel’s sly smile when he presents on these topics — he appears to find it all highly amusing — belies his real concern over these vulnerabilities and the systems and institutions that keep producing them.

Here is the link to Daniel’s talk at the workshop. It’s worth watching.

The key thing to understand about micro-architecture vulnerabilities, and in fact many IT security issues, is that they come about in response to customer demand for features and performance. Spectre, in particular, is only possible because processor makers long ago introduced a feature called “speculative execution,” which to be honest is an amazing feat of technology in and of itself. What speculative execution actually does is use free processor time to guess what a program is going to do next, before it actually does it. This works because the time required to retrieve data from system main memory is so much larger than the time required to execute a few instructions that may not be needed. When speculative execution wins, it is because the processor has already executed some code that turns out to have been the right code once the data from slow memory comes back. When it loses, it is because the data points down the other code path, in which case the processor throws away the results of its gamble and continues as if nothing had happened. It turns out the processor guesses right often enough that this feature results in major performance gains, with the result that it is present in all modern CPUs.

The downside of speculative execution is that when the processor guesses wrong, it can leave orphaned data in the processor cache until it is cleaned up later. A determined attacker can use this fact to read data she shouldn’t be able to read, including things like your encryption keys and your password. This is called a “side-channel attack” because it relies on a side effect of normal processing — the abandoned data in the cache — rather than a direct vulnerability in the program that is running.

It is important to note that this kind of attack is significantly harder to carry out than a simple phishing expedition that fools users into typing their username and password somewhere they shouldn’t. The fact that it is possible to learn private facts without first breaking into a system should be scary for large institutions — governments, banks, the large cloud providers — but less so for an average person with a laptop and a phone. Nonetheless, it is a real vulnerability, and although operating system vendors like Red Hat have done heavy work making it harder to exploit, the vulnerabilities are on the chip and there is only so much that can be done to mitigate them. What’s worse is that chipmakers don’t seem to be in any rush to produce real fixes to the problem, and they justify this by saying — accurately — that customers don’t care. Even large institutions with lots to lose are more interested in chasing performance than in protecting their data (or, more importantly, that of their citizens or their customers).

There is another important piece of this puzzle that Daniel (and other security researchers) are passionate about. With one notable exception that I will touch on in a moment, all processors are based on proprietary designs which are guarded very closely. This might not be a problem if processors were simply engines designed to churn through programming instructions in a linear way, but they are far more complex than this. The speculative execution and predictive branching features I mention above are only one of hundreds or even thousands of complex subsystems on the chip, all there to improve performance, none well documented. Daniel and the rest of the security community do their work by sniffing in the dark, trying to guess where a vulnerability might lie. They do this because if they find a vulnerability and tell everyone about it, we can hopefully prevent it being used against us by adversaries with bad intent. There is no doubt that these adversaries exist in government, organized crime, and elsewhere, and that the researchers they employ are every bit as skilled as Daniel, and that when they find a way to break into a system they are not going to be telling the press about it. So the processor makers, by keeping their designs and the security vulnerabilities inherent in them private, expose all of us to an unknown, unquantifiable risk.

I mentioned an exception. It turns out that there is a growing open-source community around a freely available Instruction Set Architecture called RISC-V. An Instruction Set Architecture is simpler than it sounds — it is essentially the set of instructions that a processor is guaranteed to implement. (The Intel ISA we have been using since the IBM PC came out is called the x86 architecture.) Anyone is free to take the RISC-V ISA and build a processor design that implements it, and in fact several folks in academia and industry are using the RISC-V ISA to design processors. Some processor designs are themselves open sourced like BlackParrot (BU/UW), Rocket (U C Berkeley), SweRV (Western Digital) and Ariane (ETH Zurich). Ideally these groups would open source not just the design but also the toolchain and the physical tools that they use to build the actual chips — but one step at a time.

Now, RISC-V, in order to be relevant, has to have optimizations that can compete with the proprietary chipmakers’ designs, which means it too will certainly have vulnerabilities. (Daniel Gruss is fond of saying that every optimization introduces an opportunity for a security flaw.) The difference is that with RISC-V everyone can see them. So, even though these are incredibly complex systems and it is unlikely that even a very skilled engineer can understand the entire thing all at once, it is possible that a community of people working together can do real analysis that will mitigate all kinds of security flaws… and that, in turn, will allow customers to understand exactly what security they are giving away in order to get better performance.

Daniel Genkin, the Spectre researcher at U of M, spends a lot of time talking about how we need to stop talking about security as an all-or-nothing thing, and instead start talking about it in terms of contracts. Something like “This chip has these performance characteristics, and also limits speculative execution to these kinds of safe operations or this kind of non-private data.” You can imagine that this would at last let customers start making intelligent decisions about how to spend their purchasing dollars. I suspect that as the world becomes home to more processor types and architectures, that processor makers will start offering these kinds of tradeoffs — especially if their chips are based on an open design where they can actually show buyers what they’re getting. Operating in this way won’t eliminate vulnerabilities, by any means, but it would at least let us say we’re trying.

UPDATE: I missed an important part of what RISC-V is — it is an ISA, not a processor design. There are a number of open-source processor designs that implement the RISC-V ISA.

It’s My Birthday

20 February, 2019

And I am spending it in the best way possible: sitting in a micro-architecture security conference that I organized!

Here is a photo of some birthday calamondins. That is all.

(They’re not quite ripe yet… Once they ripen they are the best garnish ever for a Gin and Tonic.)

Where Does The Truth Lie?

18 February, 2019

It is, of course, very difficult to know anything for certain. Authorities as widely separated as Werner Heisenberg and Laurence Sterne have written at length about the impossibility of measuring everything (in the first case) and the impossibility of fully understanding anything (in the second). This poses problems for engineers who are in the business of making decisions about precisely what to do based on a set of supposed “facts.” It poses even more problems for managers like me, whose faculties are so withered that telling fact from fiction is usually an exercise in blindfolded dart-throwing.

A good manager, faced with her inability to distinguish engineering fact from engineering myth, builds a network of people she can trust to do that for her. Naturally this presents all kinds of problems with reinforced bias, institutional inertia, and so on, but it’s honestly the only recourse for someone trying to manage a technical team without having the time to get down into the details and see for herself. In my case, I rely on my own sense of the self-awareness and even self-assuredness of the engineers I talk to to try to get a picture of the truth. The most reliable information generally comes from the engineers who are more self-aware than self-assured; put another way, never trust anyone who is absolutely certain they are correct.

You might ask, why, as a manager, do I even need to understand the truth at all? Can’t I just delegate that to some senior people, let them direct everything, and sit back and collect my paycheck? To some extent the answer is yes, I can and I should. Unfortunately, as someone trying to prioritize which research projects Red Hat focuses on, I do actually need to know enough about the facts and what’s coming to make a reasonable judgement. This is especially true because I am, more or less out of necessity, the only one with the full picture of everything we’re doing. I think what I’m saying is that it is impossible to understand the big picture well, while also being aware of all the details. It’s not just a matter of mental capacity. Creating a big picture requires eliding some details, requires approximating things, and that activity is incompatible with knowing for certain all the details of the case.

Here’s a real example of what I’m talking about. A very senior Red Hat engineer, whom I trust implicitly partly out of awe and partly because I know a whole bunch of other people who also trust him, has taken a strong position that we are approaching the end of the era of the general-purpose CPU. Without getting too technical, if his position is correct, then we are in for some massive upheavals in the business of computing in the next few years. Everything from the design of the hardware we buy to the way we build and distribute software will need to change in significant ways to accommodate a new world of purpose-built processors.

Another very senior Red Hat engineer, whom I also trust because he has forgotten more about actual processor engineering than I will ever know, believes this is bunk. His position is that we are still ten years out from the theoretical limit of processor scaling and that we have been through cycles of people wrongly predicting the end of the general-purpose CPU multiple times. Every one of those times, the processor manufacturers have tricked their way through to the next generation and the major change my first source is anticipating has not appeared.

The facts in this case are particularly important because I am about to schedule a half-day, very public discussion at the Red Hat Summit around the end of the general-purpose CPU, and if it turns out to obviously be bunk between now and then, I’m going to look foolish along with Red Hat.

I’m not sure there is a real solution to this. Sometimes I am going to back the wrong horse, and sometimes I’ll pick the right one and look like a genius. I guess the important thing is not to go too long on any single position unless you’re prepared to lose big, no matter how much you trust the person you’re relying on. The only alternative is to dive into the weeds on every decision and try to understand it all yourself, which is not only exhausting but also ineffective as described above.

I have one other point on this. If you read the above carefully you will notice that all the work I have to do in reaching a decision is social. I have to decide whom I trust, whether they have a hidden agenda (whether they are aware of it or not), their degree of bias and in which direction, and so on. If there is a technical component of this decision at all, it is well outweighed by the social component. I believe this is typical of the decisions managers and even engineers make. So, the next time someone tells you that the right way to come to an agreed solution to a problem is to have a rational debate on the merits, tell them there is no such thing. The best you can do is listen carefully, take a deep breath, and throw for the bullseye.

« Previous Page
1
…
3
4
5
6
7
…
10
Next Page »