Oakland: Practical Timing Side Channel Attacks against ASLR
Kernelspace ASLR; you only get ONE try to guess the jump, so brute force does not work
Idea: timing side channel attack against ASLR, since usermode and kernelmode share hardware
Approach: Double page fault probing: measure how long it takes before an exception happens for the entirety of kernel space, 2s. This gives you a very good idea if things are mapped somewhere. Correlation exists in page granularity
The reason this works is because of TLB: TLB entry is created for protected memory, but next time the page fault is delivered faster. But if it's unmapped, then no TLB is entry and the fault is not faster.
Can identify what it is using an "allocation signature", since you know how the code is laid out and how big it is (only need 100K)
Approach: Cache probing with set associative caches. Execute short system call, invalidate cache set si, and then check system call time. These measurements tell you L3 index, which tells you about the physical address, and they overlap with 12-16 of kernel, which are the ONLY randomized bits (in Windows)! It takes 2 seconds to carry out.
Mitigation: separate cache for privileged and non-privileged (this sucks! We've talked about this in IFC work before)
Note: higher randomization = higher fragmentation
Mitigation: randomize virtual address
Q: Relevance of attack on windows. The Windows kernel has so much information disclosure, callbacks, NT system query information addresses, so it's just so bad. Why not do a different OS? A: Many of these come from compatibility issues, i.e. 10 year old APIs. If I were an attacker, I would just use this information, but even if you have a perfect information, e.g. perfectly implemented kernel, it's still possible to attack. so this is still a strong statement. Q: Interaction between two attacks, e.g. page fault and cache, do they interfere when you measure the time? Do you see any interaction? You have a cache-hit/cache-miss, either an interaction or two, or are they separate? A: The attacks are separate, but you always have interaction between cache measurements, that's why you do it multiple times. They have some noise, but the timings are really small (500 clocks), it isn't likely something happens during this. You have to do it multiple times because within the CPU, the caches are complicated, with many concurrent things going on.
Q: Can you accomplish the cache partitioning in software? A: Interesting idea. I'm not sure if I understand the approach. Q: Only give pages to userspace which will not map onto the same cache lines. A: This would be possible. The thing is, you have to allocate a buffer from usermode to do this, and you need to know the physical address in usermode. If you are able to hide the physical address from attacker, it's a complicated topic; discussed in paper, but this wouldn't be possible at all.
Q: Under what circumstances that it check kernel memory, and is there a way to find out if you are being probed? Can we combine probing with defense? A: You can try to detect these access patterns; they don't tend to happen. You could try, but in the end, this would mean it would take longer to do the probes to evade detection. You could also try terminating the application, but you will still get compatibility issues.
Spender commentary: http://forums.grsecurity.net/viewtopic.php?f=7&t=3367