Fat Binaries Mac OSX

Salut à tous ! Après quelques mois sans rien poster sur ce blog, je vais parler un petit peu des fichiers FAT (Mach-O Universal Binary) utilisés sous Mac OSX.

Format de fichier et manipulations

Si vous voulez une description détaillé de ce qu’est un fichier FAT, je ne peux que vous conseiller de lire l’article de wikipedia : https://en.wikipedia.org/wiki/Universal_binary. En résumé, c’est un fichier exécutable qui contient plusieurs fichier Mach-O pour différentes architectures (un fichier Mach-O est l’équivalent d’un ELF sur Linux ou un PE sur Windows).

[tlk:~]$ file /bin/ls
/bin/ls: Mach-O universal binary with 2 architectures
/bin/ls (for architecture x86_64):	Mach-O 64-bit executable x86_64
/bin/ls (for architecture i386):	Mach-O executable i386

Ici, notre exécutable ‘ls’ contient deux versions : une version 64 bits et une version 32 bits. Il est possible de choisir laquelle de ces deux versions exécuter à l’aide de la commande arch

[tlk:~]$ arch -32 /bin/ls -lsa .gdbinit
184 -rw-r--r--  1 tlk  staff  93882  5 aoû 21:29 .gdbinit
[tlk:~]$ arch -64 /bin/ls -lsa .gdbinit
184 -rw-r--r--  1 tlk  staff  93882  5 aoû 21:29 .gdbinit

L’utilitaire lipo permet de manipuler les fichiers FAT. Par exemple, il est possible d’extraire des fichiers Mach-O et de créer un nouveau fichier FAT.

[tlk:~]$ lipo /bin/ls -extract i386 -output /tmp/ls-i386 && file /tmp/ls-i386      
/tmp/ls-i386: Mach-O universal binary with 1 architecture
/tmp/ls-i386 (for architecture i386):	Mach-O executable i386
[tlk:~]$ lipo /bin/ls -extract x86_64 -output /tmp/ls-x86_64 && file /tmp/ls-x86_64
/tmp/ls-x86_64: Mach-O universal binary with 1 architecture
/tmp/ls-x86_64 (for architecture x86_64):	Mach-O 64-bit executable x86_64
[tlk:~]$ lipo /tmp/ls-x86_64 /tmp/ls-i386 -create -output /tmp/ls && file /tmp/ls 
/tmp/ls: Mach-O universal binary with 2 architectures
/tmp/ls (for architecture x86_64):	Mach-O 64-bit executable x86_64
/tmp/ls (for architecture i386):	Mach-O executable i386
[tlk:~]$ md5 /tmp/ls /bin/ls       
MD5 (/tmp/ls) = 7b39f02450f7054ed2868350a6f76fd2
MD5 (/bin/ls) = 7b39f02450f7054ed2868350a6f76fd2

L’utilitaire lipo est très pratique pour manipuler les fichiers FAT, et comme vous pouvez le voir le fichier créé est exactement le même que le fichier d’origine !

Concernant le format du fichier en lui-même il est très simple : un magic number, le nombre d’architecture, puis les informations concernant les fichiers Mach-O.

/*
 * Copyright (c) 1999 Apple Computer, Inc. All rights reserved.
 *
 * @APPLE_LICENSE_HEADER_START@
 * 
 * This file contains Original Code and/or Modifications of Original Code
 * as defined in and that are subject to the Apple Public Source License
 * Version 2.0 (the 'License'). You may not use this file except in
 * compliance with the License. Please obtain a copy of the License at
 * http://www.opensource.apple.com/apsl/ and read it before using this
 * file.
 * 
 * The Original Code and all software distributed under the License are
 * distributed on an 'AS IS' basis, WITHOUT WARRANTY OF ANY KIND, EITHER
 * EXPRESS OR IMPLIED, AND APPLE HEREBY DISCLAIMS ALL SUCH WARRANTIES,
 * INCLUDING WITHOUT LIMITATION, ANY WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE, QUIET ENJOYMENT OR NON-INFRINGEMENT.
 * Please see the License for the specific language governing rights and
 * limitations under the License.
 * 
 * @APPLE_LICENSE_HEADER_END@
 */
#ifndef _MACH_O_FAT_H_
#define _MACH_O_FAT_H_
/*
 * This header file describes the structures of the file format for "fat"
 * architecture specific file (wrapper design).  At the begining of the file
 * there is one fat_header structure followed by a number of fat_arch
 * structures.  For each architecture in the file, specified by a pair of
 * cputype and cpusubtype, the fat_header describes the file offset, file
 * size and alignment in the file of the architecture specific member.
 * The padded bytes in the file to place each member on it's specific alignment
 * are defined to be read as zeros and can be left as "holes" if the file system
 * can support them as long as they read as zeros.
 *
 * All structures defined here are always written and read to/from disk
 * in big-endian order.
 */

/*
 * <mach/machine.h> is needed here for the cpu_type_t and cpu_subtype_t types
 * and contains the constants for the possible values of these types.
 */
#include <stdint.h>
#include <mach/machine.h>
#include <architecture/byte_order.h>

#define FAT_MAGIC	0xcafebabe
#define FAT_CIGAM	0xbebafeca	/* NXSwapLong(FAT_MAGIC) */

struct fat_header {
	uint32_t	magic;		/* FAT_MAGIC */
	uint32_t	nfat_arch;	/* number of structs that follow */
};

struct fat_arch {
	cpu_type_t	cputype;	/* cpu specifier (int) */
	cpu_subtype_t	cpusubtype;	/* machine specifier (int) */
	uint32_t	offset;		/* file offset to this object file */
	uint32_t	size;		/* size of this object file */
	uint32_t	align;		/* alignment as a power of 2 */
};

#endif /* _MACH_O_FAT_H_ */

[tlk:~]$ lipo -detailed_info /bin/ls                                             
Fat header in: /bin/ls
fat_magic 0xcafebabe
nfat_arch 2
architecture x86_64
    cputype CPU_TYPE_X86_64
    cpusubtype CPU_SUBTYPE_X86_64_ALL
    offset 4096
    size 39584
    align 2^12 (4096)
architecture i386
    cputype CPU_TYPE_I386
    cpusubtype CPU_SUBTYPE_I386_ALL
    offset 45056
    size 35696
    align 2^12 (4096)

FUNZ

Bon lipo c’est bien beau, mais ça permet pas de faire des fichiers qui ne respectent pas les spécifications … Donc on va sortir python et coder un petit script pour créer des fichiers FAT un peu spéciaux.
Le script suivant créé un fichier FAT contenant deux Mach-O (un 32 et un 64 bits), à cela il ajoute un nombre aléatoire d’architectures non valides.

#!/usr/bin/env python
# encoding: utf-8

from struct import pack, unpack
import random

with open('./fatfucked', 'wb') as fat_file:
    fat_file.write(pack('>I', 0xcafebabe))
    narchs = random.randrange(20, 0xcc)
    fat_file.write(pack('>I', narchs))
    
    fat_file.write(pack('>IIIII', unpack('>I', " #fa")[0], unpack('>I', "psec")[0], 0, random.randrange(0xffffffff), random.randrange(0xffffffff)))

    headers_to_write = []
    for i in range(narchs-3):
        headers_to_write.append((random.randrange(0xffffffff), random.randrange(0xffffffff), random.randrange(0xffffffff), random.randrange(0xffffffff), random.randrange(0xffffffff)))

    offset = random.randrange(0, 0xff000)
    offset += 0x1000 - offset % 0x1000
    to_write = []
    with open('hello32', 'rb') as macho_file:
        data = macho_file.read()
        size = len(data)
        macho_file.seek(0)
        magic, cputype, cpusubtype = unpack("<III", macho_file.read(4*3))
        to_write.append((offset, data))
        headers_to_write.append((cputype, cpusubtype, offset, size, 0))
        offset += size
        offset = offset + (0x1000 - offset % 0x1000)
    
    with open('hello64', 'rb') as macho_file:
        data = macho_file.read()
        size = len(data)
        macho_file.seek(0)
        magic, cputype, cpusubtype = unpack("<III", macho_file.read(4*3))
        to_write.append((offset, data))
        headers_to_write.append((cputype, cpusubtype, offset, size, 0))
        offset += size
        offset = offset + (0x1000 - offset % 0x1000)

    random.shuffle(headers_to_write)
    for cputype, cpusubtype, offset, size, align in headers_to_write:
        fat_file.write(pack('>IIIII', cputype, cpusubtype, offset, size, align))
    
    for offset, data in to_write:
        fat_file.seek(offset)
        fat_file.write(data)
[tlk:~]$ file fatfucked 
fatfucked: compiled Java class data, version 72.0
[tlk:~]$ lipo -detailed_info fatfucked 
lipo: truncated or malformed fat file (offset plus size of cputype (539190881) cpusubtype (7562595) extends past the end of the file) fatfucked
[tlk:~]$ xxd fatfucked | head
0000000: cafe babe 0000 0048 2023 6661 7073 6563  .......H #fapsec
0000010: 0000 0000 cdbe fc42 cba7 dfd9 dd00 433d  .......B......C=
0000020: ab4d 9af5 e2c4 3a7f 4b9f 094e 1cd4 a514  .M....:.K..N....
0000030: b9ba 1483 843a d360 b894 5a2c 1fb1 8755  .....:.`..Z,...U
0000040: 1040 0982 627c 2696 2039 bbcf 73e0 a478  .@..b|&. 9..s..x
0000050: 8304 9c9a 4df2 6eb3 43d6 9376 ce15 30dc  ....M.n.C..v..0.
0000060: d261 c493 f41f c7da 0c7b cb39 443c dbd7  .a.......{.9D<..
0000070: 8958 6af4 a42e 08dc 36c9 9e04 e1d7 cf44  .Xj.....6......D
0000080: a8c4 2663 9f44 d6df 7a6c 0692 ceb0 46ac  ..&c.D..zl....F.
0000090: ac60 c708 e0c8 5287 264c 2fd2 b565 49f0  .`....R.&L/..eI.
[tlk:~]$ arch -32 ./fatfucked 
Hello from 32bits
[tlk:~]$ arch -64 ./fatfucked
Hello from 64bits

gdb crashera lamentablement … Je vous laisse tester sur IDA ou Hopper 😉

Bonus

J’ai remarqué que si l’offset du fichier Mach-O n’était pas un multiple de 4096 le programme plante à l’entry point (défini par la load command LC_UNIXTHREAD, LC_THREAD ou LC_MAIN)

#!/usr/bin/env python
# encoding: utf-8

from struct import pack, unpack

with open('./fatsegfault', 'wb') as fat_file:
    fat_file.write(pack('>I', 0xcafebabe))
    narchs = 0x1
    fat_file.write(pack('>I', narchs))
    
    offset = 0x4242
    to_write = []
    with open('hello32', 'rb') as macho_file:
        data = macho_file.read()
        size = len(data)
        macho_file.seek(0)
        magic, cputype, cpusubtype = unpack("<III", macho_file.read(4*3))
        to_write.append((offset, data))
        fat_file.write(pack('>IIIII', cputype, cpusubtype, offset, size, 0))
    
    for offset, data in to_write:
        fat_file.seek(offset)
        fat_file.write(data)
[tlk:~]$ ./FatSegfault.py                 
[tlk:~]$ otool -lv ./fatsegfault
...
Load command 10
        cmd LC_UNIXTHREAD
    cmdsize 80
     flavor i386_THREAD_STATE
      count i386_THREAD_STATE_COUNT
	    eax 0x00000000 ebx    0x00000000 ecx 0x00000000 edx 0x00000000
	    edi 0x00000000 esi    0x00000000 ebp 0x00000000 esp 0x00000000
	    ss  0x00000000 eflags 0x00000000 eip 0x00001ef0 cs  0x00000000
	    ds  0x00000000 es     0x00000000 fs  0x00000000 gs  0x00000000
...
[tlk:~]$ ./fatsegfault          
[1]    39933 segmentation fault (core dumped)  ./fatsegfault
[tlk:~]$ gdb -q -c /cores/core.39933
gdb$ bt
#0  0x00001ef0 in ?? ()
gdb$ x/i 0x00001ef0
0x1ef0:	Cannot access memory at address 0x1ef0

Si vous avez le courrage de chercher la raison et que vous trouvez, ça m’intéresserais pas mal de savoir ce qui se passe 🙂