Lectures - Week 5 (Mixed)
One of the most fundamental concepts in security is the idea of a vulnerability - a flaw in the design of a system which can be used to compromise (or cause an unintended usage of) the system. A large majority of bugs in programming are a result of memory corruptions which can be abused to take control - the most ubiquitous example of this is the buffer overflow; the idea that you can overwrite other data alongside a variable which can change both data and control of a program. The common case is when programmers fail to validate the length of the input when reading in a string in C. Another fairly common bug relates to the overflow of unsigned integers; failing to protect against the wraparound can have unintended consequences in control flow.
Richard also mentioned in the 2016 lectures the idea of a NOP sled which I found quite interesting. The idea is that due to run time differences and randomisation of the stack, the address the program will jump to (from the return address) can sometimes be difficult to predict. So to make it more likely it will jump where the attack wants, he converts a large set of memory to NOP (no operation) instructions which will just skip to the next one; then finally after the âNOP sledâ his code will execute.
printf(â%s Printf Vulnerabilitiesâ);
One of the most hilarious programming vulnerabilities related to the usage of the printf function. Basically if you have an input which is accepted from the terminal and you plug this (without parsing) into a printf, an attacker could potentially feed in an input such as â%sâ. (i.e. the title) Now since you havenât specified a 2nd argument, it will just keep reading all the variables in memory until you hit a â\0âł. In fact you can abuse this even further to overwrite memory with the â%nâ format string - it will overwrite an integer with the number of characters written so far.
Many of the bugs we know of today are actually reported in online databases such as the National Vulnerability Database or Common Vulnerability & Exposures (CVE) Databases. There is actually lots of pretty cool examples online in these, however most of these have been actually fixed - we call them zero day vulnerabilities if the vendor hasnât fixed them (and if they are then abused then zero day exploits).
When working in security, itâs important to understand the potential legal consequences associated with publicly releasing damaging vulnerabilities in software. This is where responsible disclosure comes in - the idea that if you find a bug you disclose it to a software vendor first and then give them a reasonable period of time to fix it first. I think I discussed an example from Googleâs Project Zero Team a couple weeks ago - however just from a quick look there was a case in March where their team released the details on a flaw in the macOSâs copy-on-write (CoW) after the 90 day period for patching. (itâs important to note they gave them reasonable time to fix it)
This was a pretty cool website we got referred to mainly regarding the top bugs relating to web security (link); Iâll give a brief overview here:
Injection - sends invalid data to get software to produce an unintended flow of control (i.e. SQL injection)
Broken authentication - logic issues in authentication mechanisms
Sensitive data exposure - leaks in privacy of sensitive customer data
XML External Entities (XXE) - parsing XML input with links to external bodies
Broken action control - improper access checks when accessing data
Security misconfigurations - using default configs, failing to patch flaws, unnecessary services & pages, as well as unprotected files
Cross-Site Scripting (XSS) - client injects Javascript into a website which is displayed to another user
Insecure deserialisation - tampering with serialization of user data
Using components with known vulnerabilities - out of date dependencies
Insufficient logging and monitoring - maintaining tabs on unusual or suspicious activity, as well as accesses to secure data
Just a couple of the bugs that were explored in some of the 2016 lecture videos:
Signed vs unsigned integers casts - without proper checks can lead to unintended control flow
Missing parenthesis after if statement - only executes next line and not all within the indentation
Declaring array sizes wrong - buf[040] will be interpreted as base 8
Wrong comparators - accidentally programming â=â when you intended â==â
A lot of the more common bugs we used to have are getting a lot easier to detect in the compilation process; GCC has a lot of checks built in. Valgrind is also a really awesome tool to make sure your not making any mistakes with memory.
I actually discussed this idea already in the week 1 lectures here - just for the sake of revision I will give a basic overview here. The basic idea is that WEP uses a stream cipher RC4 which XORs the message with a key; however the issue is that we know information about the structure of TCP/IP packets. Within a local network the local IPs are usually of the form A.B.C.D (i.e. 192.168.20.4 for a specific computer) where each letter represents a byte in the range 0-255. (0-255 are usually reserved and not used for computers in the network) Due to subnetting (i.e. with a subnet mask 255.255.255.0 on private network) the last byte D is usually the only one that changes - this means we effectively have 254 combinations.
Since we know where the destination address is located within the packet, an attacker can potentially record a packet and modify this last byte - they can send out all 256 possible combinations to the router (remember itâs encrypted so we canât limit it to 254). The router will then decrypt the message and then encrypt it with the key used for communications with the attacker - and voila the system is compromised.
Richard gave a brief overview of the basis of many of our hash functions which is the Merkle-Damgard construction. The basic idea behind it is to break the message into blocks - the size varies on the hash type and if its not a multiple of the required length then we need to apply a MD-compliant padding function. This usually occurs with Merkle-Damgard strengthening which involves encoding the length of the original message into the padding.
To start the process of hashing we utilise an initialisation vector (number specific to the algorithm) and combine it with the first message block using a certain compression function. The output of this is then combined with the 2nd message block and so forth. When we get to the end we apply a finalisation function which typically involves another compression function (sometimes the same) which will reduce the large internal state to the required hash size and provide a better mixing of the bits in the final hash sum.
I think after looking at the Merkle-Damgard construction it now becomes pretty obvious why using MACs of the form h(key|data) where the length of the data is known are vulnerable to length-extension attacks. All you need to be able to reverse in the above construction is the finalisation function and the extra padding (which is dependent upon the length which weâre assuming we know); then you can keep adding whatever message blocks you want to the end!
The whole idea behind these signatures is providing authentication - the simplest method of this is through asymmetric key encryption (i.e. RSA). If your given a document, you can just encrypt it with your private key - to prove to others that you indeed signed it, they can attempt to decrypt it with your public key. There is a problem with this approach however - encryption takes a lot of computation and when the documents become large it gets even worse. The answer to this is to use our newfound knowledge of hashing for data integrity - if we use a hash (âsummary of the documentâ), we can just encrypt this with our private key as a means of signing it!
One of the hardest issues we face with the âinterwebsâ is that it is very difficult to authenticate an entity on the other end. Weâve sort of scrambled together a solution to this for verifying websites - certificate authorities. (I could go on for ages about the problems with these being âsingle points of failureâ but alas I donât have time)
The idea behind these bodies is that a website will register with the entity with a specific public key. The CA will then link this public key (in a âbig olâ secure databaseâ) with the âidentityâ of the website. To understand how it works its best to consider the example of when you access any website with HTTPS. (i.e. SSL) When you visit www.example.com, they will then provide their public key and a digital signature of key (signed by the cert authorityâs private key) in the form of a X.509 certificate. The user will have access to CAâs public key as part of their browser and will then be able to verify the identity of the website. (the cert is encrypted with the CAâs private key - see above image) An attacker is unable to fake it as they donât know the certificate authoritiesâ private key.
Attacking Hashed Passwords
Given there is only a limited number of potential hashes for each algorithm, there is a growing number of websites online which provide databases of plaintext and their computed hashes - these are what we refer to as hash tables. We can check a hash very quickly against all the hashes in this database - if we find a match, we either know the password or have found a collision.
Rainbow tables are a little more complex - in order to make one you need a hashing function (the same as the password) and a reduction function; the latter is used to convert the hash into text (i.e. a base64 encode and truncation). These tables are made of a number of âchainsâ of a specified length (letâs say we choose 1,000,000) - to create a chain you start with a random seed then apply both the above functions to this seed. You then iteratively do this process another 1,000,000 times (chosen length) and store the final seed and value (only these). In order to try and determine a match to the rainbow table, you apply the same two functions to the password for the required length - however, at each step you compare the hash to the result of each of the chains in the table. If you find a match, you can reproduce the password.
Both of the attacks against password hashes described above rely on an attacker doing a large amount of work in advance, which they will hopefully be able to use in cracking many passwords in the future. (itâs basically a space-time tradeoff) An easy way we can destroy all the work they have done is through a process known as salting. Basically what you do is generate a random string which you concatenate with the password when creating a hash - you need to store this alongside the password in order to check it in future. This means an attacker canât use pre-computed tables on your hash; they have to do all the work again for your particular salt!
Richard discussed another interesting concept called âkey stretchingâ in the lectures - itâs basically the idea that you can grab a âweak password hashâ and continuously hash it with a combination of the (âhashâ + âpasswordâ + âsaltâ). This process of recursively hashing makes it insanely difficult for an attacker to bruteforce. This is combined with the effects of a âsaltâ which (on its own) renders rainbow tables (âpre-computed hashesâ) useless.
Signing Problems with Weak Hashes
One of the problems with using a hash which is vulnerable to second-preimage attacks is that it becomes a lot easier to sign a fake document. Consider the example of a PDF document certifying that I give you $100. If you wanted you could modify the $100 to $100,000, however this would change the resultant hash. However since itâs a PDF you could modify empty attribute fields or add whitespace such that you can modify the hash an enormous amount of times (i.e. to bruteforce the combinations). Since the hash is vulnerable to second-preimage this means that given an input x (the original signed document) we are able to find an xâ (the fake signed document) such that h(x) = h(xâ).
Dr Lisa Parker (guest speaker)
I wasnât able to make the morning lecture, although I will try and summarise my understanding of the key points from this talk:
More holistic approaches to systems improvement have better outcomes (âgrassroots approachâ is just as important as targeted)
Unconscious bias is present across a wide variety of industries (i.e. judges harsher before lunch, doctors prescribing drugs for free lunch)
Codes of conduct intended to reduce corruption; pharmaceuticals try to dodge with soft bribes, advertising, funding research
Transparent reporting reduces malpractice
Enforcing checklists useful for minimising risk
OPSEC Overview (extended)
We traditionally think of OPSEC has been based in the military, however many of the principles can apply in more everyday scenarios:
Identifying critical information
Analysis of vulnerabilities
Application of appropriate OPSEC measures
A lot of the ideas behind gathering information (recon) revolve around collecting ârandom dataâ, which at first may not appear useful, however after managing to piece them together, they are. One of the quotes from Edward Snowden (I think) I found quite interesting, âIn every step, in every action, in every point involved, in every point of decision, you have to stop and reflect and think, âWhat would be the impact if my adversary were aware of my activities?ââ. I think itâs quite powerful to think about this - however at the same time we donât want to live with complete unrealistic paranoia and live as a hermit in the hills.
One of the easiest ways to improve your OPSEC is through limiting what you share online, especially with social media sites. Some of the basic tips were:
Donât share unless you need to
Ensure it canât be traced (unless you want people to know)
Avoid bringing attention to yourself
You can try and conceal your identity online through things like VPNs and Tor Browser. It is important that in identities you have online that you donât provide a means to link them in any way (i.e. a common email) if you donât want someone to be able to develop a âbigger pictureâ about you. For most people, I think the best advice with regards to OPSEC, is to âblend inâ.
I am really not surprised that the usage of common names, dates and pets is as common as it is with passwords. Most people just like to take the lazy approach; that is, the easiest thing for them to remember that will âpass the testâ. Linking closely with this is the re-use of passwords for convenience - however for security this is absolutely terrible. If your password is compromised on one website and your a âworthy targetâ, then everything is compromised.Â
HaveIBeenPwned is actually a pretty cool website to see if youâve been involved in a breach of security. I entered one of my emails, which is more of a âthrowaway oneâ I use for junk-ish accounts on forums and whatnot - it listed that I had been compromised on 11 different websites. I know for a fact that I didnât use the same password on any of those; secondly for most of them I didnât care if they got broken.
I think offline password managers are an âalright wayâ to ensure you have good unique passwords across all the sites you use. (be cautious as they can be a âsingle point of failureâ) However when it comes to a number of my passwords which I think are very important - Iâve found just randomly generating them and memorising them works pretty well. Another way is to form long illogical sentences and then morph them with capitalisation, numbers and symbols. You want to maximise the search space for an attacker - for example if your using all 96 possible characters and you have a 16-character password then a bruteforce approach would require you to check 2^105 different combinations (worst-case).
The way websites store our passwords is also important to the overall security - they definitely shouldnât be stored in plaintext, should use a âsecure hash functionâ (i.e. not MD5) and salted. Iâm pretty sure I ranted about a mobile carrier that I had experiences with earlier in my blog, that didnât do this. This means if the passwords were âinevitablyâ all stolen from the server, the attacker just gets the hashes, and they canât use rainbow tables because you hashed them all. Personally, I really like the usage of multi-factor authentication combined with a good password (provided those services donât get compromised right?). Although, you want to avoid SMS two-factor as itâs vulnerable to SIM hijacking.