?? the pe file format.txt
字號:
The PE file format
==================
Purpose
-------
The PE ("portable executable") file format is the format of executable
binaries (DLLs and programs) for MS windows NT, windows 95 and
win32s; in windows NT, the drivers are in this format, too.
It can also be used for object files and libraries.
The format is designed by Microsoft, apparently based on a good
knowledge of COFF, the "common object file format" used for object files
and executables on several UNIXes and on VMS.
The win32 SDK includes a header file <winnt.h> containing #defines and
typedefs for the PE-format. I will mention the struct-member-names and
#defines as we go.
You may also find the DLL "imagehelp.dll" to be helpful. It is part of
windows NT, but documentation is rare. Some of its functions are
described in the "Developer Network".
General Layout
--------------
At the start of a PE file we find an MS-DOS executable ("stub"); this
makes any PE file a valid MS-DOS executable.
After the DOS-stub there is a 32-bit-signature with the magic number
0x00004550 (IMAGE_NT_SIGNATURE).
Then there is a file header (in the COFF-format) that tells on which
machine the binary is supposed to run, how many sections are in it, the
time it was linked, whether it is an executable or a DLL and so on. (The
difference between executable and DLL in this context is: a DLL can not
be started but only be used by another binary, and a binary cannot link
to an executable).
After that, we have an optional header (it is always there but still
called "optional" - COFF uses an "optional header" for libraries but not
for objects, that's why it is called "optional"). This tells us more
about how the binary should be loaded: The starting address, the amount
of stack to reserve, the size of the data segment etc..
An interesting part of the optional header is the trailing array of
'data directories'; these directories contain pointers to data in the
'sections'. If, for example, the binary has an export directory, you
will find a pointer to that directory in the array member
IMAGE_DIRECTORY_ENTRY_EXPORT, and it will point into one of the
sections.
Following the headers we find the 'sections', introduced by the 'section
headers'. Essentially, the sections' contents is what you really need to
execute a program, and all the header and directory stuff is just there
to help you find it.
Each section has some flags about alignment, what kind of data it
contains ("initialized data" and so on), whether it can be shared etc.,
and the data itself. Most, but not all, sections contain one or more
directories referenced through the entries of the optional header's
"data directory" array, like the directory of exported functions or the
directory of base relocations. Directoryless types of contents are, for
example, "executable code" or "initialized data".
+-------------------+
| DOS-stub |
+-------------------+
| file-header |
+-------------------+
| optional header |
|- - - - - - - - - -|
| |
| data directories |
| |
+-------------------+
| |
| section headers |
| |
+-------------------+
| |
| section 1 |
| |
+-------------------+
| |
| section 2 |
| |
+-------------------+
| |
| ... |
| |
+-------------------+
| |
| section n |
| |
+-------------------+
DOS-stub and Signature
----------------------
The concept of a DOS-stub is well-known from the 16-bit-windows-
executables (which were in the "NE" format). The stub is used for
OS/2-executables, self-extracting archives and other applications, too.
For PE-files, almost always this MS-DOS-executable consists of about 100
bytes that output an error message such as "this program needs windows
NT".
You recognize a DOS-stub by the validating the DOS-header, being a
struct IMAGE_DOS_HEADER. The first 2 bytes should be the sequence "MZ"
(there is a #define IMAGE_DOS_SIGNATURE for this WORD).
You distinguish a PE binary from other stubbed binaries by the trailing
signature, which you find at the offset given by the header member
'e_lfanew' (which is 32 bits long beginning at byte offset 60). For OS/2
and windows binaries, the signature is a 16-bit-word; for PE files, it
is a 32-bit-longword with the value IMAGE_NT_SIGNATURE #defined to be
0x00004550.
File Header
-----------
To get to the IMAGE_FILE_HEADER, validate the "MZ" of the DOS-header
(1st 2 bytes), then find the 'e_lfanew' member of the DOS-stub's header
and skip that many bytes from the beginning of the file. Verify the
signature you will find there. The file header, a struct
IMAGE_FILE_HEADER, begins immediatly after it; the members are described
top to bottom.
The first member is the 'Machine', a 16-bit-value indicating the system
the binary is intended to run on. Known legal values are
IMAGE_FILE_MACHINE_I386 (0x14c)
for Intel 80386 processor or better
0x014d
for Intel 80486 processor or better
0x014e
for Intel Pentium processor or better
0x0160
for R3000 (MIPS) processor, big endian
IMAGE_FILE_MACHINE_R3000 (0x162)
for R3000 (MIPS) processor, little endian
IMAGE_FILE_MACHINE_R4000 (0x166)
for R4000 (MIPS) processor, little endian
IMAGE_FILE_MACHINE_R10000 (0x168)
for R10000 (MIPS) processor, little endian
IMAGE_FILE_MACHINE_ALPHA (0x184)
for DEC Alpha AXP processor
IMAGE_FILE_MACHINE_POWERPC (0x1F0)
for IBM Power PC, little endian
Then we have the 'NumberOfSections', a 16-bit-value. It is the number of
sections that follow the headers. We will discuss the sections later.
Next is a timestamp 'TimeDateStamp' (32 bit), giving the time the file
was created. You can distinguish several versions of the same file by
this value, even if the "official" version number was not altered. (The
format of the timestamp is not documented except that it should be
somewhat unique among versions of the same file, but apparently it is
'seconds since January 1 1970 00:00:00' in UTC - the format used by most
C compilers for the time_t.)
This timestamp is used for the binding of import directories, which will
be discussed later.
Warning: some compilers tend to set this timestamp to absurd values.
The members 'PointerToSymbolTable' and 'NumberOfSymbols' (both 32 bit)
are used for debugging information. I don't know how to decipher them,
and I've found the pointer to be always 0.
'SizeOfOptionalHeader' (16 bit) is simply sizeof(IMAGE_OPTIONAL_HEADER).
You can use it to validate the correctness of the PE file's structure.
'Characteristics' is 16 bits and consists of a collection of flags, most
of them being valid only for object files and libraries:
Bit 0 (IMAGE_FILE_RELOCS_STRIPPED) is set if there is no relocation
information in the file. This refers to relocation information per
section in the sections themselves; it is not used for executables,
which have relocation information in the 'base relocation' directory
described below.
Bit 1 (IMAGE_FILE_EXECUTABLE_IMAGE) is set if the file is executable
(i.e. it is not an object file or a library).
Bit 2 (IMAGE_FILE_LINE_NUMS_STRIPPED) is set if the line number
information is stripped; this is not used for executable files.
Bit 3 (IMAGE_FILE_LOCAL_SYMS_STRIPPED) is set if there is no
information about local symbols in the file (this is not used
for executable files).
Bit 4 (IMAGE_FILE_AGGRESIVE_WS_TRIM) is set if the operating system
is supposed to trim the working set of the running process (the
amount of RAM the process uses) aggressivly by paging it out. This
should be set if it is a demon-like application that waits most of
the time and only wakes up once a day, or the like.
Bits 7 (IMAGE_FILE_BYTES_REVERSED_LO) and 15
(IMAGE_FILE_BYTES_REVERSED_HI) are set if the endianess of the file is
not what the machine would expect, so it must swap bytes before
reading. This is unreliable for executable files (the OS expects
executables to be correctly byte-ordered).
Bit 8 (IMAGE_FILE_32BIT_MACHINE) is set if the machine is expected
to be a 32 bit machine. This is always set.
Bit 9 (IMAGE_FILE_DEBUG_STRIPPED) is set if there is no debugging
information in the file. This is unused for executable files.
Bit 10 (IMAGE_FILE_REMOVABLE_RUN_FROM_SWAP) is set if the application
may not run from a removable medium such as a floppy or a CD-ROM. In
this case, the operating system is advised to copy the file to the
swapfile and execute it from there.
Bit 11 (IMAGE_FILE_NET_RUN_FROM_SWAP) is set if the application may
not run from the network. In this case, the operating system is
advised to copy the file to the swapfile and execute it from there.
Bit 12 (IMAGE_FILE_SYSTEM) is set if the file is a system file such
as a driver. This is unused for executable files.
Bit 13 (IMAGE_FILE_DLL) is set if the file is a DLL.
Bit 14 (IMAGE_FILE_UP_SYSTEM_ONLY) is set if the file is not
designed to run on multiprocessor systems (that is, it will crash
there because it relies in some way on exactly one processor).
Relative Virtual Addresses
--------------------------
The PE format makes heavy use of so-called RVAs. An RVA, aka "relative
virtual address", is used to describe a memory address if you don't know
the base address. It is the value you need to add to the base address to
get the linear address.
The base address is the address the PE image is loaded to, and may vary
from one invocation to the next.
Example: suppose an executable file is loaded to address 0x400000 and
execution starts at RVA 0x1560. The effective execution start will then
be at the address 0x401560. If the executable were loaded to 0x100000,
the execution start would be 0x101560.
Things become complicated because the parts of the PE-file (the
sections) are not necessarily aligned the same way the loaded image is.
For example, the sections of the file are often aligned to
512-byte-borders, but the loaded image is perhaps aligned to
4096-byte-borders. See 'SectionAlignment' and 'FileAlignment' below.
So to find a piece of information in a PE-file for a specific RVA,
you must calculate the offsets as if the file were loaded, but skip
according to the file-offsets.
As an example, suppose you knew the execution starts at RVA 0x1560, and
want to diassemble the code starting there. To find the address in the
file, you will have to find out that sections in RAM are aligned to 4096
bytes and the ".code"-section starts at RVA 0x1000 in RAM and is 16384
bytes long; then you know that RVA 0x1560 is at offset 0x560 in that
section. Find out that the sections are aligned to 512-byte-borders in
the file and that ".code" begins at offset 0x800 in the file, and you
know that the code execution start is at byte 0x800+0x560=0xd60 in the
file.
Easy if you know how it works :-)
Optional Header
---------------
Immediatly following the file header is the IMAGE_OPTIONAL_HEADER
(which, in spite of the name, is always there). It contains
information about how to treat the PE-file exactly. We'll also have the
members from top to bottom.
The first 16-bit-word is 'Magic' and has, as far as I looked into
PE-files, always the value 0x010b.
The next 2 bytes are the version of the linker ('MajorLinkerVersion' and
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -