2019-8-2 17:09 2032


2019-8-2 17:09
本章是二进制漏洞利用的第四篇,原文地址: https://azeria-labs.com/process-memory-and-memory-corruption/


The prerequisite for this part of the tutorial is a basic understanding of ARM assembly (covered in the first tutorial series “ARM Assembly Basics“). In this chapter you will get an introduction into the memory layout of a process in a 32-bit Linux environment. After that you will learn the fundamentals of Stack and Heap related memory corruptions and how they look like in a debugger.


The examples used in this tutorial are compiled on an ARMv6 32-bit processor. If you don’t have access to an ARM device, you can create your own lab and emulate a Raspberry Pi distro in a VM by following this tutorial: Emulate Raspberry Pi with QEMU. The debugger used here is GDB with GEF (GDB Enhanced Features). If you aren’t familiar with these tools, you can check out this tutorial: Debugging with GDB and GEF.

本教程所使用的例子是在ARMv6 32位处理器上编译的,如果你手头没有一个ARM设备,你可以创建自己的实验环境或是使用在虚拟机中创建的虚拟树莓派环境,你可以参考这个教程“使用QEMU模拟树莓派”,这里的调试器使用的GEF(GDB增强组件)加持下的GDB调试器,如果你仍不熟悉这些工具,你可以参考这篇教程:使用GDB和GEF调试


Every time we start a program, a memory area for that program is reserved. This area is then split into multiple regions. Those regions are then split into even more regions (segments), but we will stick with the general overview. So, the parts we are interested are:


1.Program Image



1. 程序镜像

2.  堆区

3.  栈区

In the picture below we can see a general representation of how those parts are laid out within the process memory. The addresses used to specify memory regions are just for the sake of an example, because they will differ from environment to environment, especially when ASLR is used.


Program Image region basically holds the program’s executable file which got loaded into the memory. This memory region can be split into various segments: .plt, .text, .got, .data, .bss and so on. These are the most relevant. For example, .text contains the executable part of the program with all the Assembly instructions, .data and .bss holds the variables or pointers to variables used in the application, .plt and .got stores specific pointers to various imported functions from, for example, shared libraries. From a security standpoint, if an attacker could affect the integrity (rewrite) of the .text section, he could execute arbitrary code. Similarly, corruption of Procedure Linkage Table (.plt) and Global Offsets Table (.got) could under specific circumstances lead to execution of arbitrary code.


.plt, .text, .got, .data, .bss以及其他段,这些是最相关的。比如.text段保存了包含了汇编指令的程序的可执行部分,.data和.bss段保存了变量或程序中被使用到的变量的指针,.plt和.got段存储了例如从共享库中导入的函数的函数指针,从安全的角度看,如果攻击者能影响到文件的完整性(重写).text段,我们就可以执行任意代码。同样的,进程连接表(.plt)和全局偏移表(.got)的损坏也能导致在特定环境下执行任意代码

The Stack and Heap regions are used by the application to store and operate on temporary data (variables) that are used during the execution of the program. These regions are commonly exploited by attackers, because data in the Stack and Heap regions can often be modified by the user’s input, which, if not handled properly, can cause a memory corruption. We will look into such cases later in this chapter.


In addition to the mapping of the memory, we need to be aware of the attributes associated with different memory regions. A memory region can have one or a combination of the following attributes: Read, Write, Execute. The Read attribute allows the program to read data from a specific region. Similarly, Write allows the program to write data into a specific memory region, and Execute – execute instructions in that memory region. We can see the process memory regions in GEF (a highly recommended extension for GDB) as shown below:


azeria@labs:~/exp $ gdb program
gef> gef config context.layout "code"
gef> break main
Breakpoint 1 at 0x104c4: file program.c, line 6.
gef> run
gef> nexti 2
-----------------------------------------------------------------------------------------[ code:arm ]----
      0x104c4 <main+20>        mov    r0,  #8
      0x104c8 <main+24>        bl     0x1034c <malloc@plt>
->    0x104cc <main+28>        mov    r3,  r0
      0x104d0 <main+32>        str    r3,  [r11,  #-8]
gef> vmmap
Start      End        Offset     Perm Path
0x00010000 0x00011000 0x00000000 r-x /home/azeria/exp/program <---- Program Image
0x00020000 0x00021000 0x00000000 rw- /home/azeria/exp/program <---- Program Image continues...
0x00021000 0x00042000 0x00000000 rw- [heap] <---- HEAP
0xb6e74000 0xb6f9f000 0x00000000 r-x /lib/arm-linux-gnueabihf/libc-2.19.so <---- Shared library (libc)
0xb6f9f000 0xb6faf000 0x0012b000 --- /lib/arm-linux-gnueabihf/libc-2.19.so <---- libc continues...
0xb6faf000 0xb6fb1000 0x0012b000 r-- /lib/arm-linux-gnueabihf/libc-2.19.so <---- libc continues...
0xb6fb1000 0xb6fb2000 0x0012d000 rw- /lib/arm-linux-gnueabihf/libc-2.19.so <---- libc continues...
0xb6fb2000 0xb6fb5000 0x00000000 rw-
0xb6fcc000 0xb6fec000 0x00000000 r-x /lib/arm-linux-gnueabihf/ld-2.19.so <---- Shared library (ld)
0xb6ffa000 0xb6ffb000 0x00000000 rw-
0xb6ffb000 0xb6ffc000 0x0001f000 r-- /lib/arm-linux-gnueabihf/ld-2.19.so <---- ld continues...
0xb6ffc000 0xb6ffd000 0x00020000 rw- /lib/arm-linux-gnueabihf/ld-2.19.so <---- ld continues...
0xb6ffd000 0xb6fff000 0x00000000 rw-
0xb6fff000 0xb7000000 0x00000000 r-x [sigpage]
0xbefdf000 0xbf000000 0x00000000 rw- [stack] <---- STACK
0xffff0000 0xffff1000 0x00000000 r-x [vectors]

the Heap section in the vmmap command output appears only after some Heap related function was used. In this case we see the malloc function being used to create a buffer in the Heap region. So if you want to try this out, you would need to debug a program that makes a malloc call (you can find some examples in this page, scroll down or use find function).

Additionally, in Linux we can inspect the process’ memory layout by accessing a process-specific “file”:


azeria@labs:~/exp $ ps aux | grep program
azeria   31661 12.3 12.1  38680 30756 pts/0    S+   23:04   0:10 gdb program
azeria   31665  0.1  0.2   1712   748 pts/0    t    23:04   0:00 /home/azeria/exp/program
azeria   31670  0.0  0.7   4180  1876 pts/1    S+   23:05   0:00 grep --color=auto program
azeria@labs:~/exp $ cat /proc/31665/maps
00010000-00011000 r-xp 00000000 08:02 274721     /home/azeria/exp/program
00020000-00021000 rw-p 00000000 08:02 274721     /home/azeria/exp/program
00021000-00042000 rw-p 00000000 00:00 0          [heap]
b6e74000-b6f9f000 r-xp 00000000 08:02 132394     /lib/arm-linux-gnueabihf/libc-2.19.so
b6f9f000-b6faf000 ---p 0012b000 08:02 132394     /lib/arm-linux-gnueabihf/libc-2.19.so
b6faf000-b6fb1000 r--p 0012b000 08:02 132394     /lib/arm-linux-gnueabihf/libc-2.19.so
b6fb1000-b6fb2000 rw-p 0012d000 08:02 132394     /lib/arm-linux-gnueabihf/libc-2.19.so
b6fb2000-b6fb5000 rw-p 00000000 00:00 0
b6fcc000-b6fec000 r-xp 00000000 08:02 132358     /lib/arm-linux-gnueabihf/ld-2.19.so
b6ffa000-b6ffb000 rw-p 00000000 00:00 0
b6ffb000-b6ffc000 r--p 0001f000 08:02 132358     /lib/arm-linux-gnueabihf/ld-2.19.so
b6ffc000-b6ffd000 rw-p 00020000 08:02 132358     /lib/arm-linux-gnueabihf/ld-2.19.so
b6ffd000-b6fff000 rw-p 00000000 00:00 0
b6fff000-b7000000 r-xp 00000000 00:00 0          [sigpage]
befdf000-bf000000 rw-p 00000000 00:00 0          [stack]
ffff0000-ffff1000 r-xp 00000000 00:00 0          [vectors]

Most programs are compiled in a way that they use shared libraries. Those libraries are not part of the program image (even though it is possible to include them via static linking) and therefore have to be referenced (included) dynamically. As a result, we see the libraries (libc, ld, etc.) being loaded in the memory layout of a process. Roughly speaking, the shared libraries are loaded somewhere in the memory (outside of process’ control) and our program just creates virtual “links” to that memory region. This way we save memory without the need to load the same library in every instance of a program.



A memory corruption is a software bug type that allows to modify the memory in a way that was not intended by the programmer. In most cases, this condition can be exploited to execute arbitrary code, disable security mechanisms, etc. This is done by crafting and injecting a payload which alters certain memory sections of a running program. The following list contains the most common memory corruption types/vulnerabilities:


Buffer Overflows

Stack Overflow

Heap Overflow

Dangling Pointer (Use-after-free)

Format String

1.       缓冲区溢出

  a)         栈溢出

  b)         堆溢出

2.       指针悬挂(释放后再次利用UAF)

3.       格式化字符串

In this chapter we will try to get familiar with the basics of Buffer Overflow memory corruption vulnerabilities (the remaining ones will be covered in the next chapter). In the examples we are about to cover we will see that the main cause of memory corruption vulnerabilities is an improper user input validation, sometimes combined with a logical flaw. For a program, the input (or a malicious payload) might come in a form of a username, file to be opened, network packet, etc. and can often be influenced by the user. If a programmer did not put safety measures for potentially harmful user input it is often the case that the target program will be subject to some kind of memory related issue.



Buffer overflows are one of the most widespread memory corruption classes and are usually caused by a programming mistake which allows the user to supply more data than there is available for the destination variable (buffer). This happens, for example, when vulnerable functions, such as gets, strcpy, memcpy or others are used along with data supplied by the user. These functions do not check the length of the user’s data which can result into writing past (overflowing) the allocated buffer. To get a better understanding, we will look into basics of Stack and Heap based buffer overflows.


Stack Overflow

Stack overflow, as the name suggests, is a memory corruption affecting the Stack. While in most cases arbitrary corruption of the Stack would most likely result in a program’s crash, a carefully crafted Stack buffer overflow can lead to arbitrary code execution. The following picture shows an abstract overview of how the Stack can get corrupted.


As you can see in the picture above, the Stack frame (a small part of the whole Stack dedicated for a specific function) can have various components: user data, previous Frame Pointer, previous Link Register, etc. In case the user provides too much of data for a controlled variable, the FP and LR fields might get overwritten. This breaks the execution of the program, because the user corrupts the address where the application will return/jump after the current function is finished.


To check how it looks like in practice we can use this example:


/*azeria@labs:~/exp $ gcc stack.c -o stack*/
#include "stdio.h"

int main(int argc, char **argv)
char buffer[8];

Our sample program uses the variable “buffer”, with the length of 8 characters, and a function “gets” for user’s input, which simply sets the value of the variable “buffer” to whatever input the user provides. The disassembled code of this program looks like the following:


Here we suspect that a memory corruption could happen right after the function “gets” is completed. To investigate this, we place a break-point right after the branch instruction that calls the “gets” function – in our case, at address 0x0001043c. To reduce the noise we configure GEF’s layout to show us only the code and the Stack (see the command in the picture below). Once the break-point is set, we proceed with the program and provide 7 A’s as the user’s input (we use 7 A’s, because a null-byte will be automatically appended by function “gets”).

我们怀疑gets函数执行完成乎可能会导致内存损坏。为了研究这个问题,在gets函数的分支指令后下一个断点。本例中是在地址 0x0001043c处下断。为了减少干扰我们将GEF的布局设置为只显示代码和堆栈。当断点下好后,我们继续运行程序并输入7个A作为用户输入(我们使用7A,因为一个空字节将被函数“get”自动附加)

When we investigate the Stack of our example we see (image above) that the Stack frame is not corrupted. This is because the input supplied by the user fits in the expected 8 byte buffer and the previous FP and LR values within the Stack frame are not corrupted. Now let’s provide 16 A’s and see what happens.


In the second example we see (image above) that when we provide too much of data for the function “gets”, it does not stop at the boundaries of the target buffer and keeps writing “down the Stack”. This causes our previous FP and LR values to be corrupted. When we continue running the program, the program crashes (causes a “Segmentation fault”), because during the epilogue of the current function the previous values of FP and LR are “poped” off the Stack into R11 and PC registers forcing the program to jump to address 0x41414140 (last byte gets automatically converted to 0x40 because of the switch to Thumb mode), which in this case is an illegal address. The picture below shows us the values of the registers (take a look at $pc) at the time of the crash.

第二个例子中,可见(上图)当我们使用gets函数提供 了过多的数据时,它并没有停止在目标缓冲区的边界处而是一直向栈下方写了下去,从而导致之前的FP值和LR值被损坏。当我们继续运行程序时会产生崩溃(导致“段错误”),因为当执行到当前函数的尾声阶段时,之前的FP和LR的值会被弹出栈分别进入R11和PC寄存器,从而强制让程序跳转到地址0x41414140(最后一个字节由于切换到Thumb模式,会自动转换成0x40),在本例中显然这是一个不合法的地址。这张提下面展示了崩溃时寄存器的值(看下PC)。

Heap Overflow

First of all, Heap is a more complicated memory location, mainly because of the way it is managed. To keep things simple, we stick with the fact that every object placed in the Heap memory section is “packed” into a “chunk” having two parts: header and user data (which sometimes the user controls fully). In the Heap’s case, the memory corruption happens when the user is able to write more data than is expected. In that case, the corruption might happen within the chunk’s boundaries (intra-chunk Heap overflow), or across the boundaries of two (or more) chunks (inter-chunk Heap overflow). To put things in perspective, let’s take a look at the following illustration.


As shown in the illustration above, the intra-chunk heap overflow happens when the user has the ability to supply more data to u_data_1 and cross the boundary between u_data_1 and u_data_2. In this way the fields/properties of the current object get corrupted. If the user supplies more data than the current Heap chunk can accommodate, then the overflow becomes inter-chunk and results into a corruption of the adjacent chunk(s).

如上图所示,当用户与能力向u_data_1 提供更多数据并跨越到u_data_1 和 u_data_2之间时,发生块间堆溢出。这样当前对象的字段/属性会损坏。如果用户提供的数据超过当前堆块能容纳的,那么溢出将变成块间溢出并且导致相邻块的损坏。

Intra-chunk Heap overflow

To illustrate how an intra-chunk Heap overflow looks like in practice we can use the following example and compile it with “-O” (optimization flag) to have a smaller (binary) program (easier to look through).


/*azeria@labs:~/exp $ gcc intra_chunk.c -o intra_chunk -O*/
#include "stdlib.h"
#include "stdio.h"

struct u_data                                          //object model: 8 bytes for name, 4 bytes for number
 char name[8];
 int number;

int main ( int argc, char* argv[] )
 struct u_data* objA = malloc(sizeof(struct u_data)); //create object in Heap

 objA->number = 1234;                                 //set the number of our object to a static value
 gets(objA->name);                                    //set name of our object according to user's input

 if(objA->number == 1234)                             //check if static value is intact
  puts("Memory valid");
 else                                                 //proceed here in case the static value gets corrupted
  puts("Memory corrupted");

The program above does the following:

Defines a data structure (u_data) with two fields

Creates an object (in the Heap memory region) of type u_data

Assigns a static value to the number’s field of the object

Prompts user to supply a value for the name’s field of the object

Prints a string depending on the value of the number’s field


1.       定义了一个具有两个字段的结构体u_data

2.       在堆中创建了一个u_data类型的对象

3.       给对象的数字字段分配一个静态值

4.       提示用户给对象的名称字段提供一个值

5.       根据数字字段的的值打印字符串

So in this case we also suspect that the corruption might happen after the function “gets”. We disassemble the target program’s main function to get the address for a break-point.


In this case we set the break-point at address 0x00010498 – right after the function “gets” is completed. We configure GEF to show us the code only. We then run the program and provide 7 A’s as a user input.


Once the break-point is hit, we quickly lookup the memory layout of our program to find where our Heap is. We use vmmap command and see that our Heap starts at the address 0x00021000. Given the fact that our object (objA) is the first and the only one created by the program, we start analyzing the Heap right from the beginning.


The picture above shows us a detailed break down of the Heap’s chunk associated with our object. The chunk has a header (8 bytes) and the user’s data section (12 bytes) storing our object. We see that the name field properly stores the supplied string of 7 A’s, terminated by a null-byte. The number field, stores 0x4d2 (1234 in decimal). So far so good. Let’s repeat these steps, but in this case enter 8 A’s.


While examining the Heap this time we see that the number’s field got corrupted (it’s now equal to 0x400 instead of 0x4d2). The null-byte terminator overwrote a portion (last byte) of the number’s field. This results in an intra-chunk Heap memory corruption. Effects of such a corruption in this case are not devastating, but visible. Logically, the else statement in the code should never be reached as the number’s field is intended to be static. However, the memory corruption we just observed makes it possible to reach that part of the code. This can be easily confirmed by the example below.


Inter-chunk Heap overflow

To illustrate how an inter-chunk Heap overflow looks like in practice we can use the following example, which we now compile withoutoptimization flag.


/*azeria@labs:~/exp $ gcc inter_chunk.c -o inter_chunk*/
#include "stdlib.h"
#include "stdio.h"

int main ( int argc, char* argv[] )
 char *some_string = malloc(8);  //create some_string "object" in Heap 在堆中创建some_string
 int *some_number = malloc(4);   //create some_number "object" in Heap在堆中创建some_ number

 *some_number = 1234;            //assign some_number a static value 给some_number赋静态值
 gets(some_string);              //ask user for input for some_string 请求用户输入并赋值给some_string

 if(*some_number == 1234)        //check if static value (of some_number) is in tact
 puts("Memory valid");
 else                            //proceed here in case the static some_number gets corrupted
 puts("Memory corrupted");

The process here is similar to the previous ones: set a break-point after function “gets”, run the program, supply 7 A’s, investigate Heap.


Once the break-point is hit, we examine the Heap. In this case, we have two chunks. We see (image below) that their structure is in tact: the some_string is within its boundaries, the some_number is equal to 0x4d2.


Now, let’s supply 16 A’s and see what happens.


As you might have guessed, providing too much of input causes the overflow resulting into corruption of the adjacent chunk. In this case we see that our user input corrupted the header and the first byte of the some_number’s field. Here again, by corrupting the some_number we manage to reach the code section which logically should never be reached.



最后于 2019-8-4 14:11 被amzilun编辑 ,原因:
打赏 + 5.00
打赏次数 1 金额 + 5.00
赞赏  junkboy   +5.00 2019/08/02
最新回复 (0)
登录 | 注册 方可回帖