How do I get started with CTFs?
This is a pretty common question for people who want to get into the CTF world (or hacking in general) but don't really know where to start. (If you don't even know what a CTF is, or don't understand how capture the flag relates to computer hacking, check out https://ctftime.org/ctf-wtf/ . We mostly do Jeopardy.) There are lots of books out there or video tutorials, but a disappointingly large number of them are either educating you on a very specific canned tool ("Become a Wireshark Wizard", "Metasploit for Fun and Profit"), and/or are devoid of working examples to practice on. And practice is really the crucial thing here. This is why I always recommend practice sites and old problems instead (and, for everyone I know who plays competitively, this is basically where they got a lot of their education). So the below should be a guide for where to get started.
The correct starting place will vary a lot depending on your current background, and what you want to explore. So, starting with the very basics:
0. Know some quick-to-write programming language. Well.
This can be anything reasonably flexible. Nowadays the standard starter language seems to be Python. I deserve some flak for it, but I do most of my CTF work in Java and Mathematica. (The latter is a closed platform, but if you're a college student you can probably get it free, and it's great for big integer actions or algorithmically oriented problems.) C is also fine and will help for more complicated things down the line, but is probably not the most efficient for many easy tasks.
The language doesn't matter so much as being able to quickly do simple tasks in it. As some examples, you should be able to easy
Read in a file, 4 bytes at a time, and print each of those as a 4-byte integer
Write a matrix multiplication function that works on arbitrary size matrices
Read in a page from a given URL, find all the URLs mentioned that end in ".png", and download those images to a directory
Solve the first 15 problems on Project Euler: https://projecteuler.net/archives
This doesn't mean you need to be able to do all of these right now. I can never remember the syntax for downloading a webpage right away. But you should be able to do each of those above 4 tasks in an hour or less, allowing yourself free access to Google and StackOverflow for anything short of a complete given solution.
If you're not currently at the stage of knowing such a programming language well, then your education is somewhat beyond the scope of this post for now. Take an intro CS course at your university, or look at the incredibly numerous tutorials on line. And again, Project Euler is a great way to start learning about simple little tasks like what you'll often need to do. In particular, you do not need to be familiar with any principles of design or git or anything like that to be a good cracker.
1. Learn basic Linux commands
Like it or not, the world's servers and professional programming environments all run on Linux. And everything interesting you do on Linux is done via the command line. (Not to mention that, usually when you're breaking into another server, you don't have any GUI.)
If you've never used a command line before, then http://ryanstutorials.net/linuxtutorial/ can you get you started with that. Once you understand the basic navigation, a nice place to get started playing around is http://overthewire.org/wargames/bandit/ . This will also give you a place to test if you don't have a personal installation of Linux handy.
If you want a copy of Linux on a machine of your own (and you probably will, very quickly!) your main options are either installing a virtual machine, dual-booting, or finding a shell server you can get a personal account on. The first one means installing it like a normal OS, the second one means running inside of your main OS. You can find plenty of tutorials on both of these online, and their advantages/disadvantages. The third option means finding a club or group that has a server you can get an account on. (Good options are your school's computer clubs, CS department, or maybe even your dorm.)
Now've you got the rudiments, you have a few options for what to pursue. If you want to start "hacking" right away -- the most accessible challenges will be those that are just exploring messed up system configurations. These will teach you a good familiarty with Linux, but don't expect to see any challenges like this in any CTF. This is what http://overthewire.org/wargames/leviathan/ offers. Spend a bit of time on it if you want. But beyond that, you will need to learn more skills.
Sections 3 through 6 below don't have many example problems. The place I really recommend starting off with is picoCTF. You can go to https://2013.picoctf.com/compete#Title and I made a throwaway account for people reading this tutorial. The Team Name is TumblrTutorialThrowaway, and the password is the sum of all primes up to (and including) 31337. At the time of this posting, only the first few problems are unlocked, but quickly I expect to see them all open up. Go through and try to do lots of problems. They'll give you good introductions to applying the material mentioned below.
Beyond that set of problems, there are naturally other CTFs still up (newer picoCTFs, HSCTF on the easy end; and lots and lots of competitive ones). The hub for all CTFs is http://ctftime.org/ . Go there, pick a CTF at random (in general higher rating == harder problems), choose a problem with a category you like, and see if it still works. You can expect all problems to still work except for exploitation and web, which require running a server, and so might dead. These practice problems usually also have writeups, which are great if you have no idea where to start yet.
In no particular order, various skills (which correspond to different problem types) follow, and each of them has no prerequisites on the others unless states.
2.1 Learn C. Understand memory. Know the stack.
This is where you can start digging into classical "exploitation", often referred to as "pwning". You'll have to learn the C language (that is, not C++) well enough to do tasks like:
Read in a file, read in 4-byte integers, print them out in hexadecimal
Build and safely deallocate a linked list
Recursively navigate a directory structure
Understand what a void *, the difference between unsigned char and char, and what it means to "overflow a buffer".
Know what fork() or exec() are, give an example use of each
Know what a function pointer is, give an example use
Once you've got that you can try simple memory exploitation. Terms you'll want to look at "buffer overflow", "stack smashing"; understanding exactly how the stack is structured. Then you really get started. This is what http://overthewire.org/wargames/narnia/ (and other challenges on the site: Behemoth and Utumno) fall under. http://smashthestack.org/ also has a variety like this, but a lot of is much harder. An important feature here is that your challenges will have source code. This is not realistic for almost any competition environments.
2.2 Learn assembly. Learn to reverse engineer. Prereq: 2.1
This is probably still something you can get from a course at your university, to a small degree. However, that course will be oriented towards writing assembly, not reading it. There are many courses out there to give a tutorial on assembly reveng. You should learn how to use a dissassembler (such as objdump) and how to use it on simple programs you wrote yourself. You should learn how to use gdb for watching a program as it executes. Learn how to read registers and memory; how to set breakpoints. You can now try either exploitation problems that don't have source code (like more smash the stack, or OverTheWire's "Maze" category). There are also many problems where reverse engineering is the whole point of the problem. For this, check out a site like http://crackmes.de/ .
If you're getting more serious about this, you'll want a more dedicated disassembler. IDA is the de facto top disassembler. It also has plugins (HexRays) which let it give a good amount of C code. There other free alternatives (https://retdec.com/decompilation/) but they're generally less powerful. There are cracks online of IDA, but be careful, of course. It's also worth looking into decompilers for other bytecodes: Python, Java, and .NET decompilers are all good to have on hand for a CTF, especially entry-level ones.
2.3 Advanced exploitation and reversing Prereq: 2.2
There are many more topics in exploitation that are things you never really need to learn about otherwise. Check out terms like: * ROP (Return-Oriented Programming) * Format string exploitation * ALSR circumvention * Stack canary circumvention * Self-modifying code / homomorphic code (handle this with memory dumps) ... each of these can quickly turn up results online, with old problems to practice on.
Obviously this is a whole profession unto itself. But there are many great CTF problems that just require carefully thinking about some math. First thing is to get familiar with how cryptosystems work. Start with simple ones like AES. You don't need to know the exact internals of the "block function", but learn about AES-ECB vs. AES-CBC, for instances, and understand the difference. Then move onto understanding Diffie-Hellman and RSA. Try implementing RSA yourself (encryption and decryption).
Next is attacking them. These will be practical attacks that exploit a flat-out problem in the implementation -- not a fancy cryptanalysis attack like linear or differential cryptanalysis. So you don't need to worry about those. Instead, there are certain well-established attacks to be familiar with, such as
Length extension attacks (for hash functions)
Attacks on Electronic Code Book mode encryption
Web problems are a classic groaner. They abound. So you can, for starters,
Learn some PHP, learn its most classic bugs. Learn to abuse them.
Learn XSS attacks .. these have many example problems online and tutorials, but are often viewed as too easy for CTFs. There are more advanced attacks, though, like
Command-injection attacks
Misconfigured file permissions (.htaccess, .htpasswd, directory listings)
Directory traversal attacks
PHP object injection attacks
There are some challenges that you just have to know about the specifics of a language well. The classic examples of these are "jails": a program uses an eval unwisely and you have to abuse it. For some examples with Python, you can try to picoCTF tutorial at https://2013.picoctf.com//problems/pyeval/stage1.html (increment the number in the URL for harder versions, up to stage 4). A more professional level challenge is at https://blog.inexplicity.de/plaidctf-2013-pyjail-writeup-part-i-breaking-the-sandbox.html . If you'd like to see an example in a different language, there's a PHP jail at http://blog.dornea.nu/2016/06/20/ringzer0-ctf-jail-escaping-php/ There are also jails in JavaScript, Ruby, Bash, or any language with an eval command.
Then there are deserialization problems (Java's ObjectInputStream, or Python's pickle system), which often require similar knowledge of the language. There are problems just about understanding really odd languages: Can you understand Brainfuck? What about Glass (https://esolangs.org/wiki/Glass)? Can you read obfuscated Erlang, or reverse engineer Haskell that was compiled down to ARM assembly?
6 Forensics and Miscellaneous
These are pretty open ended. A lot of this comes from knowing steganography (low-bit encoding, for instance), file carving, and file system structure (if I give you the middle half of an .iso, can you read it?). Some of it comes from network knowledge (can you use Wireshark?). A lot of it comes from knowing the internals of certain file structures (comments sections in JPEGs, PNGs, or MP3s; how a PNG compresses data). And a lot of it is random bullshit. Really the best option is to read some documentation on file structure, and try old problems: https://ctftime.org/tasks/?tags=misc&hidden-tags=misc
Feel free to email us! [email protected]. We'll try to point you in the right direction. :)