PE
是Portable Executable
简写,PE文件是32/64位Windows上的主流可执行文件,主要有EXE和DLL文件。 C:\Program Files (x86)\Microsoft SDKs\Windows\v7.1A\Include\WinNT.h
定义了PE文件的数据结构, PE文件的总体框架如下:
DOS头
每个PE文件以一个DOS程序开始,如果在DOS下运行,DOS就能识别出这是一个有效的程序,然后运行DOS stub,在不支持PE文件的操作系统下一般会显示“This program cannot be run in DOS mode”。 PE文件以一个IMAGE_DOS_HEADER
(MZ
头)开始,IMAGE_DOS_HEADER
数据结构如下:
1 | typedef struct _IMAGE_DOS_HEADER { // DOS .EXE header |
在DOS MZ
头后是DOS stub
,DOS下会根据识别出来的MZ
头运行stub
中的代码。 以一个简单的“Hellor, world”为例,
1 |
|
用g++编译后,以十六进制打开:
1 | 00000000h: 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00 ; MZ?.......... |
e_magic
是4D 5A
,对应字符MZ
,#define
如下:
1 |
在文件头偏移3C
处,e_lfanew
指出了PE文件头的位置,值为00 00 00 80
,可以看到在80
偏移处,对应字符PE
。
用IDA分析文件可以看到其代码如下:
程序调用了int 21h
,function
为9,作用是将地址为DS:DX
处的字符串写到标准输出,在0E
偏移处就是以$
结尾的字符串“This program cannot be run in DOS mode”。然后执行
1 | mov ax, 4C01h |
终止程序。
用LordPE可以看到DOS头所有内容如下:
1 | ->DOS Header |
PE头
PE头的数据结构如下:
1 | typedef struct _IMAGE_NT_HEADERS { |
1 | 00000080h: 50 45 00 00 4C 01 08 00 F4 D1 D7 57 00 B4 00 00 ; PE..L...粞譝.?. |
Signature
A 4-byte signature identifying the file as a PE image,#define IMAGE_NT_SIGNATURE 0x00004550 // PE00
给出了定义,与IMAGE_DOS_SIGNATURE
的定义在同一处。
FileHeader
1 | typedef struct _IMAGE_FILE_HEADER { |
- Machine The architecture type of the computer. An image file can only be run on the specified computer or a system that emulates the specified computer。常见标志如下:
Value | Meaning |
---|---|
0x014c | Intel 386 |
0x0200 | Intel 64 |
0x8664 | AMD64 (K8) |
这里为01 4C
。更多的在WinNT.h
中有定义:
1 |
NumberOfSections The number of sections. This indicates the size of the section table, which immediately follows the headers. Note that the Windows loader limits the number of sections to 96.
TimeDateStamp The low 32 bits of the time stamp of the image. This represents the date and time the image was created by the linker. The value is represented in the number of seconds elapsed since midnight (00:00:00), January 1, 1970, Universal Coordinated Time, according to the system clock.
PointerToSymbolTable The offset of the symbol table, in bytes, or zero if no COFF symbol table exists.
NumberOfSymbols The number of symbols in the symbol table.
SizeOfOptionalHeader The size of the optional header, in bytes. This value should be 0 for object files.
Characteristics 文件的属性。
1 |
FileHeader所有内容:
1 | ->File Header |
OptionalHeader
1 | typedef struct _IMAGE_OPTIONAL_HEADER { |
- Magic The state of the image file.
1 |
MajorLinkerVersion The major version number of the linker.
MinorLinkerVersion The minor version number of the linker.
SizeOfCode The size of the code section, in bytes, or the sum of all such sections if there are multiple code sections.
SizeOfInitializedData The size of the initialized data section, in bytes, or the sum of all such sections if there are multiple initialized data sections.
SizeOfUninitializedData The size of the uninitialized data section, in bytes, or the sum of all such sections if there are multiple uninitialized data sections.
AddressOfEntryPoint A pointer to the entry point function, relative to the image base address. For executable files, this is the starting address. For device drivers, this is the address of the initialization function. The entry point function is optional for DLLs. When no entry point is present, this member is zero.
BaseOfCode A pointer to the beginning of the code section, relative to the image base.
BaseOfData A pointer to the beginning of the data section, relative to the image base.
ImageBase The preferred address of the first byte of the image when it is loaded in memory. This value is a multiple of 64K bytes. The default value for DLLs is 0x10000000. The default value for applications is 0x00400000, except on Windows CE where it is 0x00010000.
SectionAlignment The alignment of sections loaded in memory, in bytes. This value must be greater than or equal to the FileAlignment member. The default value is the page size for the system.
FileAlignment The alignment of the raw data of sections in the image file, in bytes. The value should be a power of 2 between 512 and 64K (inclusive). The default is 512. If the SectionAlignment member is less than the system page size, this member must be the same as SectionAlignment.
MajorOperatingSystemVersion The major version number of the required operating system.
MinorOperatingSystemVersion The minor version number of the required operating system.
MajorImageVersion The major version number of the image.
MinorImageVersion The minor version number of the image.
MajorSubsystemVersion The major version number of the subsystem.
MinorSubsystemVersion The minor version number of the subsystem.
Win32VersionValue This member is reserved and must be 0.
SizeOfImage The size of the image, in bytes, including all headers. Must be a multiple of SectionAlignment.
- SizeOfHeaders The combined size of the following items, rounded to a multiple of the value specified in the FileAlignment member.
- e_lfanew member of IMAGE_DOS_HEADER
- 4 byte signature
- size of IMAGE_FILE_HEADER
- size of optional header
- size of all section headers
CheckSum The image file checksum. The following files are validated at load time: all drivers, any DLL loaded at boot time, and any DLL loaded into a critical system process.
Subsystem The subsystem required to run this image.
1 | // Subsystem Values |
- DllCharacteristics The DLL characteristics of the image.
1 | // DllCharacteristics Entries |
SizeOfStackReserve The number of bytes to reserve for the stack. Only the memory specified by the SizeOfStackCommit member is committed at load time; the rest is made available one page at a time until this reserve size is reached.
SizeOfStackCommit The number of bytes to commit for the stack.
SizeOfHeapReserve The number of bytes to reserve for the local heap. Only the memory specified by the SizeOfHeapCommit member is committed at load time; the rest is made available one page at a time until this reserve size is reached.
SizeOfHeapCommit The number of bytes to commit for the local heap.
LoaderFlags This member is obsolete.
NumberOfRvaAndSizes The number of directory entries in the remainder of the optional header. Each entry describes a location and size.
DataDirectory A pointer to the first IMAGE_DATA_DIRECTORY structure in the data directory.
OptionalHeader
的字段如下:
1 | ->Optional Header |
DataDirectory
相关定义如下: 1
2
3
4
5
6
7
8
9
10//
// Directory format.
//
typedef struct _IMAGE_DATA_DIRECTORY {
DWORD VirtualAddress; // The relative virtual address of the table.
DWORD Size; // The size of the table, in bytes.
} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;
数据目录列表如下,Offsets
是相对于OptionalHeader
的偏移:
Offset(PE/PE32+) | Description |
---|---|
60h/70h | Export table address and size |
68h/78h | Import table address and size |
70h/80h | Resource table address and size |
78h/88h | Exception table address and size |
80h/90h | Certificate table address and size |
88h/98h | Base relocation table address and size |
90h/A0h | Debugging information starting address and size |
98h/A8h | Architecture-specific data address and size |
A0h/B0h | Global pointer register relative virtual address |
A8h/B8h | Thread local storage (TLS) table address and size |
B0h/C0h | Load configuration table address and size |
B8h/C8h | Bound import table address and size |
C0h/D0h | Import address table address and size |
C8h/D8h | Delay import descriptor address and size |
D0h/E0h | The CLR header address and size |
D8h/E8h | Reserved |
各字段如下:
1 | DataDirectory (16) RVA Size |
区块表(Section Table)
在PE头
之后就是区块表(Section Table)
。
1 | 00000170h: 00 00 00 00 00 00 00 00 2E 74 65 78 74 00 00 00 ; .........text... |
1 | // |
Name An 8-byte, null-padded UTF-8 string. There is no terminating null character if the string is exactly eight characters long. For longer names, this member contains a forward slash (/) followed by an ASCII representation of a decimal number that is an offset into the string table. Executable images do not use a string table and do not support section names longer than eight characters.
- Misc
- PhysicalAddress The file address.
- VirtualSize The total size of the section when loaded into memory, in bytes. If this value is greater than the SizeOfRawData member, the section is filled with zeroes. This field is valid only for executable images and should be set to 0 for object files.
VirtualAddress The address of the first byte of the section when loaded into memory, relative to the image base. For object files, this is the address of the first byte before relocation is applied.
SizeOfRawData The size of the initialized data on disk, in bytes. This value must be a multiple of the FileAlignment member of the IMAGE_OPTIONAL_HEADER structure. If this value is less than the VirtualSize member, the remainder of the section is filled with zeroes. If the section contains only uninitialized data, the member is zero.
PointerToRawData A file pointer to the first page within the COFF file. This value must be a multiple of the FileAlignment member of the IMAGE_OPTIONAL_HEADER structure. If a section contains only uninitialized data, set this member is zero.
PointerToRelocations A file pointer to the beginning of the relocation entries for the section. If there are no relocations, this value is zero.
PointerToLinenumbers A file pointer to the beginning of the line-number entries for the section. If there are no COFF line numbers, this value is zero.
NumberOfRelocations The number of relocation entries for the section. This value is zero for executable images.
NumberOfLinenumbers The number of line-number entries for the section.
Characteristics The characteristics of the image.
各种标志如下:
1 | // |
Section Table
内容如下:
1 | ->Section Header Table |
一些常见的区块如下:
Name | Meaning |
---|---|
.text | 默认代码区块。 |
.data | 默认读/写数据区块。全局变量/静态变量一般放在此处。 |
.rdata | 默认只读数据区块,很少用到。 |
.idata | 输入表,包含其他外来DLL的函数及数据信息。 |
.edata | 输出表。 |
.rsrc | 资源,包含模块的全部资源:图标、菜单、位图等。只读。 |
.bss | 未初始化数据,很少用到。 |
.crt | 支持CRT所添加的数据。 |
.tls | 支持通过__declspec(thread)声明的线程局部存储变量的数据。 |
.reloc | 可执行文件的基址重定位。 |
区块对齐
在PE文件头IMAGE_OPTIONAL_HEADER
结构中,SectionAlignment
和FileAlignment
分别定义了区块在内存和文件中的对齐值,文件对齐值通常是0x200
,x86系统下内存对齐值通常是0x1000
。
内存偏移与文件偏移的转换
由于内存对齐值与文件对齐值不一定相同,因此数据在内存与文件中的偏移也不一定相同,在一些情况下就需要将文件偏移与内存偏移进行转换。 DOS头、PE头、区块表在两种地址下的偏移都是一样的,而区块则会发生改变,若记\(~\Delta k~\)为内存偏移(RVA)与文件偏移(File Offset)的差,即 \[\Delta k = \rm RVA - File\,Offset = VA - ImageBase - File\,Offset\] 那么每个区块中的\(~\Delta k~\)都是相同的,可以通过上式,用RVA, File Offset, VA任意一个求出其余两者。
区块(Section )
Section Table
之后是各个Section
。
输入表(Import Table)
每个被PE文件引入的DLL都对应了一个输入地址表(Import Address Table)
,IAT包含了一组函数指针。 OptionalHeader
中DataDirectory
的第二项就是Import Table
,输入表以一个IMAGE_IMPORT_DESCRIPTOR(IID)
数组开始,IID数组以一个全0
的IID结构结束。
IMAGE_IMPORT_DESCRIPTOR
1 | typedef struct _IMAGE_IMPORT_DESCRIPTOR { |
OriginalFirstThunk(Characteristics) 包含指向输入名称表(INT)的RVA,INT是一个
IMAGE_THUNK_DATA
数组,每个IMAGE_THUNK_DATA
指向一个IMAGE_IMPORT_BY_NAME
结构。ForwarderChain 第一个被转向的API的索引,一般为0。在PE引用DLL中的API,该API又引用其他DLL中API时使用。
Name 指向DLL的名字。
FirstThunk 指向IAT的RVA,IAT也是一个
IMAGE_THUNK_DATA
数组。
OriginalFirstThunk
与FirstThunk
指向两个本质上相同的IMAGE_THUNK_DATA
数组,如图所示:
IMAGE_THUNK_DATA
INT与IAT中的两个IMAGE_THUNK_DATA
数组均以全0结束。IMAGE_THUNK_DATA
结构如下:
1 | typedef struct _IMAGE_THUNK_DATA64 { |
Name | Meaning |
---|---|
ForwarderString | 指向一个转向者字符串的RVA |
Function | 被输入函数的内存地址 |
Ordinal | 被输入函数的序号 |
AddressOfData | 指向IMAGE_IMPORT_BY_NAME |
IMAGE_IMPORT_BY_NAME
存储输入函数的相关信息。
1 | typedef struct _IMAGE_IMPORT_BY_NAME { |
Hint 该函数在其驻留DLL输出表中的序号。
Name[1] 函数名,ASCII码字符串,以
NULL
结束。
输入地址表(IAT)
IAT会被PE装载器重写,PE装载器搜索OriginalFirstThunk
(如果有),然后迭代搜索数组中的指针,找到每个IMAGE_IMPORT_BY_NAME
指向的输入函数的地址,然后装载器用函数真正入口地址替换IAT中IMAGE_THUNK_DATA
中的值。PE文件装载内存后IAT如下:
考察“Hellor, world”中的输入表,.idata
的RVA为00 00 E0 00
,在文件中的偏移为00 00 A6 00
,\(\Delta k = \mathrm{0x3A00}\),跳转到此处: 1
2
3
4
5
6
7
80000a600h: 78 E0 00 00 00 00 00 00 00 00 00 00 34 E7 00 00 ; x?.........4?.
0000a610h: 9C E1 00 00 D8 E0 00 00 00 00 00 00 00 00 00 00 ; 溼..剜..........
0000a620h: 4C E7 00 00 FC E1 00 00 E4 E0 00 00 00 00 00 00 ; L?...溧......
0000a630h: 00 00 00 00 E4 E7 00 00 08 E2 00 00 74 E1 00 00 ; ....溏...?.t?.
0000a640h: 00 00 00 00 00 00 00 00 00 E8 00 00 98 E2 00 00 ; .........?.樷..
0000a650h: 88 E1 00 00 00 00 00 00 00 00 00 00 24 E8 00 00 ; 堘..........$?.
0000a660h: AC E2 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ..............
0000a670h: 00 00 00 00 00 00 00 00 C0 E2 00 00 D8 E2 00 00 ; ........棱..剽..
每个IID大小为0x14字节,可以看到这里IID数组共有5
个元素,第一个元素的值如下:
1 | 1. ImageImportDescriptor: |
第一个IID所指向INT的RVA为00 00 E0 78
,对应文件偏移为00 00 E0 78 - 00 00 3A 00 = 00 00 A6 78
:
1 | 0000a670h: 00 00 00 00 00 00 00 00 C0 E2 00 00 D8 E2 00 00 ; ........棱..剽.. |
可以看到这个INT一共有23
个元素,第一个元素的值是00 00 E2 C0
,是其所指向的IMAGE_IMPORT_BY_NAME
的RVA,文件偏移为00 00 A8 C0
,跳到此处:
1 | 0000a8c0h: CF 00 44 65 6C 65 74 65 43 72 69 74 69 63 61 6C ; ?DeleteCritical |
从00 00 A8 C2
处开始就是输入函数的名字,可以看到函数名为DeleteCriticalSection
。同理可以得到所有IID相关信息。
1 | ->Import Table |
输出表(Export Table)
输出表一般存在于DLL文件中,用于让其他EXE或DLL调用该DLL中的函数。数据目录表的第一个成员就是输出表,指向一个IMAGE_EXPORT_DIRECTORY(IED)
结构。
1 | // |
Characteristics 总是0
TimeDateStamp 输出表创建的GMT时间
MajorVersion 主版本号,设置为0
MinorVersion 次版本号,设置为0
Name 指向DLL名字的RVA。
Base 输出表的起始序数值,一般是1(不是必须),通过序数查询输出函数时,会减去这个值,然后作为
输出地址表(EAT)
的索引。NumberOfFunctions EAT中的条目数量,如果为0,表明该序数值没有代码或数据输出。
NumberOfNames
输出函数名称表(ENT)
的条目数量,总是小于等于NumberOfFunctions
。小于时符号只通过序数输出。AddressOfFunctions EAT的RVA,EAT是一个RVA数组。
AddressOfNames ENT的RVA,ENT是一个指向ASCII字符串的RVA数组,ASCII字符串按顺序排列。
AddressOfNameOrdinals 输出序数表的RVA。输出序数表是一个字的数组,将ENT中的数组索引映射到相应的输出地址表条目。
用VS2015新建一个默认的DLL文件test_dll.dll
,转到IED处,
1 | 00006b20h: 00 00 00 00 0D CE EB 57 00 00 00 00 7A 7D 01 00 ; .....坞W....z}.. |
各字段如下:
1 | ->Export Table |
0x00017D48
对应的文件偏移为00006B48
,
1 | 00006b40h: 5C 7D 01 00 70 7D 01 00 8F 12 01 00 0A 10 01 00 ; \}..p}..?...... |
第一个函数的RVA是00 01 12 8F
,对应内容
1 | 0000068fh: E9 3C 03 00 00 E9 11 37 00 00 E9 72 15 00 00 E9 ; ?...?7..閞...? |
对应的汇编代码
再看函数名字, 1
2
300006b50h: A3 12 01 00 BD 11 01 00 38 81 01 00 87 7D 01 00 ; ?..?..8?.噠..
00006b60h: 9C 7D 01 00 BC 7D 01 00 DA 7D 01 00 ED 7D 01 00 ; 渳..紏..趠..韢..
00006b70h: 00 00 01 00 02 00 03 00 04 00 74 65 73 74 5F 64 ; ..........test_d
第一个名字RVA是00 01 7D 87
,即文件偏移00 00 6B 87
处,
1 | 00006b80h: 6C 6C 2E 64 6C 6C 00 3F 3F 30 43 74 65 73 74 5F ; ll.dll.??0Ctest_ |
对应的字符串为??0Ctest_dll@@QAE@XZ
。所有的5个名字如下:
1 | Ordinal RVA Symbol Name |
基址重定位表(Base Relocation Table)
PE文件中的定位都假设了文件被装入默认的基地址,如果被装载到内存中其他地址,就需要重定位。基址重定位表位于.reloc
区块,对应数据目录表中的IMAGE_DIRECTORY_ENTRY_BASERELOC
,重定位表由重定位块组成,每个块存放4KB
重定位信息,且各重定位块以DWORD
对齐,重定位块是一个IMAGE_BASE_RELOCATION
结构,所有重定位块以一个VirtualAddress
全0
的IMAGE_BASE_RELOCATION
结束。
IMAGE_BASE_RELOCATION
1 | // |
VirtualAddress 重定位数据的开始RVA地址,重定位项地址加上
VirtualAddress
才是完整的RVA地址。SizeOfBlock 当前重定位结构大小。
TypeOffset 一个数组,每个元素两字节,高4位代表重定位类型;低12位代表重定位地址。常见的重定位类型如下:
#define | Value | Meaning |
---|---|---|
IMAGE_REL_BASED_ABSOLUTE | 0 | 用于对齐 |
IMAGE_REL_BASED_HIGHLOW | 3 | 重定位指向的整个地址都需要修正,几乎总是此情况 |
IMAGE_REL_BASED_DIR64 | 10 | 64位PE文件中,指向的整个地址需要修正 |
举例说明,下面是test_dll
的重定位表的部分数据:
1 | 00008400h: 00 10 01 00 74 00 00 00 D1 36 E1 36 F2 36 05 37 ; ....t...???.7 |
第一个重定位块的VirtualAddress
是00 01 10 00
,TypeOffset
见下表
字段 | 重定位项1 | 重定位项2 |
---|---|---|
TypeOffset 高4位 |
3 | 3 |
TypeOffset 低12位 |
6D1 | 6E1 |
低12位+ VirtualAddress |
000116D1 | 000116E1 |
文件偏移 | 00000AD1 | 00000AE1 |
1 | 00000ad0h: B8 CD 10 01 10 C3 CC CC CC CC CC CC CC CC CC CC ; 竿...锰烫烫烫烫? |
于是10 01 10 CD
和10 01 11 E0
便是需要重定位的数据,这些数据在重定位时,需要加上\(实际装入地址 -文件默认基地址\)
资源(Resource)
Windows程序的各种界面,包括加速键、位图、光标、对话框等都是资源。资源的具体格式已经在下面的注释中给出了。
IMAGE_RESOURCE_DIRECTORY
1 | // |
IMAGE_RESOURCE_DIRECTORY_ENTRY
在IMAGE_RESOURCE_DIRECTORY
之后就是IMAGE_RESOURCE_DIRECTORY_ENTRY
,在匿名结构体中用到了位域成员。
1 | // |
1 | // |
IMAGE_RESOURCE_DATA_ENTRY
1 | // |
第一级目录
结合test_dll
来看,IMAGE_RESOURCE_DIRECTORY
位于文件偏移00 00 7E 00
1 | 00007e00h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 ; ................ |
NumberOfIdEntries
为1,接下来是紧随其后的entry,Name
字段是00 00 00 18
,最高位为0,此时作为ID使用,OffsetToData
字段是80 00 00 18
,最高位为1,表明该entry仍然是一个目录,低31位的值是0x18——所指向目录相对于资源起始地址的偏移,于是所指向目录的文件偏移为7E00+18=7E18。
第二级目录
NumberOfIdEntries
仍然是1,entry的OffsetToData
为80 00 00 30
,指向7E30处。
第三级目录
NumberOfIdEntries
为1,entryOffsetToData
最高位是0,低31位指向一个data entry,文件偏移为7E48。
data entry
OffsetToData
=00 01 C1 70
是data所在的RVA,Size
=00 00 01 7D
,其他两个字段均为0。此处data内容是一段字符:
1 | 00007f70h: 3C 3F 78 6D 6C 20 76 65 72 73 69 6F 6E 3D 27 31 ; <?xml version='1 |
整体结构
1 | ->Resource Tree (detailed dump) |