[0day] [exploit] Compromising a Linux desktop using… 6502 processor opcodes on the NES?!

Overview
A vulnerability and a separate logic error exist in the gstreamer 0.10.x player for NSF music files. Combined, they allow for very reliable exploitation and the bypass of 64-bit ASLR, DEP, etc. The reliability is provided by the presence of a turing complete “scripting” inside a music player. NSF files are music files from the Nintendo Entertainment System. Curious? Read on...

Demonstration, and affected distributions
Here is a screenshot of the exploit triggering. Somewhat alarmingly, it does so without the user opening the exploit file -- they only have to navigate to the folder containing the file. More on that below.

xcalc_from_downloads_folder.png

You can download the file: exploit_ubuntu_12.04.5_xcalc.nsf. In the image above, the file has been renamed to “time_bomb.mp3”. We’ll cover why below.

As the filename suggests, this exploit works against Ubuntu 12.04.5. This is an old but still supported distribution. Specifically, for reproducibility, it works against _exactly_ Ubuntu 12.04.5, without further updates. If you take all the updates, you’ll get a new glibc, which changes some code offsets and the exploit will crash. The crash is of course deterministic and it would be possible to code the exploit to cater for arbitrary glibc binaries; this is left as an exercise for the reader.

The vulnerability is in libgstnsf.so, an audio decoder present in the gstreamer-0.10 distribution. Ubuntu 12.04 uses gstreamer-0.10 for all its audio handling needs. Ubuntu 14.04 is apparently affected because the default install includes gstreamer-0.10, but most media handling applications use gstreamer-1.0 which is also installed. The exact circumstances under which Ubuntu 14.04 uses the vulnerable gstreamer-0.10 are not clear. The Ubuntu 16.04 default install has only gstreamer-1.0, which is not affected by this vulnerability.

This exploit works against what I would consider the “default” install. During Ubuntu install, there’s a question along the lines of “hey, do you want mp3s to work?” and of course the correct answer is “yes”. Various extra packages are then installed including gstreamer0.10-plugins-bad. This package includes libgstnsf.so.

Wait, what, an 0day, with an exploit and all?
Yes, an 0day.
As a learning experiment, most of my bug disclosures going forward are going to be 0day. I’ve got a lot of experience participating in so-called “co-ordinated disclosure”, where the receiving vendor takes as long as they wish to fix a vulnerability. (I once waited over a year(!) for Apple to fix a Safari vulnerability.) I’ve got significantly less experience with “full disclosure”, where the public receives details of a risk at the same time as the vendor. To be clear, I’m fairly certain that the correct balance is a compromise somewhere between “full disclosure” and “co-ordinated disclosure”. The Project Zero 90-day deadlines appear to achieve this compromise nicely and there’s a lot of data backing up the policy.

Don’t worry, this particular 0day is very minor, only affecting very old Linux distributions; see above. This 0day is more about fun than impact. Future 0days may be more widespread ;-)

Philosophical 0day question
Is it still an 0day if the patch is released alongside the 0day? Here’s the patch for Ubuntu 12.04:
sudo rm /usr/lib/x86_64-linux-gnu/gstreamer-0.10/libgstnsf.so

While at first glance, this “patch” would appear to remove functionality, it does not. Your wonderful NSF files will still play. WTF? Would you believe that Ubuntu 12 and 14 ship not one but two different code bases for playing NSF files? That’s a lot of code for a very fringe format. The second NSF player is based on libgme and does not appear to have the vulnerabilities of the first.

The attack surface
This exploit abuses a vulnerability in the gstreamer-0.10 plug-in for playing NSF music files. These music files are not like most other music files that your desktop can play. Typical music files are based on compressed samples and are decoded with a bunch of math. NSF music files, on the other hands, are played by actually emulating the NES CPU and sound hardware in real time. Is that cool or what? The gstreamer plug-in creates a virtual 6502 CPU hardware environment and then plays the music by running a bit of 6502 code for a little while and then looking at the resulting values in the virtualized sound hardware registers and then rendering some sound samples based on that.

If you’re curious to play a real NSF file, feel free to download the file: cv2.nsf. It’s the music from Castlevania 2, and you can find similar examples with a simple Google search for a term such as “nsf music files”. If your Linux desktop has support for NSF, you should be able to play it with something like: totem cv2.nsf. (Don’t be surprised if your distribution kindly offers to automagically install a suitable plug-in if one is missing.) It’s just 17264 bytes, which is tiny! Far too small to contain much in the way of samples, but large enough to contain some a small program which sequences requests for the basic NES hardware to make some simple noises.

In order to actually exploit this vulnerability, or a vulnerability like it, there are various plausible and different avenues:

  • Send exploit via e-mail attachment. If the victim downloads and opens the file, they are compromised. Note -- for this to work, you likely need to rename exploit.nsf to exploit.mp3. Most Linux desktops don’t know what to do with an NSF file, but they’ll happily stuff any sequence of bytes in an MP3 file through a media player. Most gstreamer based media players will ignore a file’s suffix and use file format auto detection to load the file with the most appropriate decoder.
  • Partial drive-by download. By abusing Google Chrome’s somewhat risky file download UX, it’s possible to dump files to the victim’s Downloads folder when a booby trapped web page is visited. When the Downloads folder is later viewed in a file manager such as nautilus, an attempt is made to auto thumbnail files with known suffixes (so again, call the NSF exploit something.mp3). The exploit works against the thumbnailer.
  • Full drive-by download. Again, abusing Google Chrome download UX, there’s a path to a possible full drive-by download. This will be explored in a separate blog post.
  • USB drive based attack. Again, opening a USB drive opens up the thumbnailing attack described above.

6502 and NES ROM loading and paging crash course
The 6502 CPU is legendary, appearing in a diverse range of also legendary systems such as the Nintendo Entertainment System, Commodore 64, BBC Micro, etc. It is 8-bit, but with 16-bit addressing, giving a 64kB address space. In the NES application, the upper 32kB of address space (0x8000 - 0xffff) is reserved for ROM, i.e. the read-only data on the cartridge you have stuffed in.

Here lies an interesting problem: what if you want to make a game larger than 32kB? Perhaps you have a game of 16 levels, each level having 16kB of unique graphics and music. No way that’s going to fit into 32kB. To solve this problem, there’s the concept of “banks” and “bank switching”. A bank is simply an aligned 4kB contiguous area of ROM, and there are 8 of them packed between 6502 addresses 0x8000 - 0xfffff. Each of these banks can be mapped to a contiguous, aligned 4kB region inside a cartridge ROM that is potentially much larger than 32kB. At runtime, the NES program can write to a special magic memory location (0x5ff8 - 0x5fff) that contains hardware registers that control what portion of the ROM is mapped to which bank.
Example: if the 6502 CPU writes the value 10 to 0x5ff9, then the 6502 memory locations 0x9000 - 0x9fff will be backed by the bytes at index (10 * 4096) into the cartridge ROM.

The vulnerabilit(ies)
1: Lack of checking ROM size when mapping into 6502 memory and bank switching
(Absent a CVE, you can uniquely identify this as CESA-2016-0001.)
There is a near total lack of bounds checking on proposed ROM mappings. This applies to be the initial ROM load, as well as subsequent ROM bank switching. All of the handling for ROM mapping is in gst-plugins-bad/gst/nsf.c, including:

nsf_bankswitch (uint32 address, uint8 value)
{
 ...
 cpu_page = address & 0x0F;
 roffset = -(cur_nsf->load_addr & 0x0FFF) + ((int) value << 12);
 offset = cur_nsf->data + roffset;
 ...
 cur_nsf->cpu->mem_page[cpu_page] = offset;
 …

In the above code snippet, cur_nsf->data points to the actual ROM data content of the input music file. The format of the file is pretty simple: a 128 byte header (starting with the 4 character sequence “NESM”) and anything following is the ROM. So, for example, if you had a 200 byte input file, that would be 128 bytes of header and 72 bytes of ROM. The entire ROM is kept in the host emulator heap via a single malloc(). As can be seen, the pointer named offset is taken as an arbitrary offset into the ROM heap memory with no checking against any kind of length!

Let’s make this problem concrete: even in the simplest ROM load case of our 200 byte file, the virtual ROM load address will be 0x8000 and the loading code will call nsf_bankswitch() for addresses 0x5ff8 - 0x5fff, with ascending bank indexes 0 - 7. This will result in 6502 virtual address 0x8000 being backed by nsf-> data + 0, 0x9000 backed by nsf->data + 4096, … all the way to 0xf000 backed by nsf->data + (7 * 4096). So even in this very simple example case, reading linearly in the 6502 emulator from 0x8000 to 0xffff will result in the read of 72 bytes of real ROM data followed by 32696 bytes of out of bounds heap data!

But this is just an out of bounds read, because virtual addresses 0x8000 - 0xffff are read only in the emulator. An OOB read is not particularly serious in the context of the emulator. It could be used as a useful tool to bypass ASLR, but any sensitive data read in the host heap can only be used to play sound. The emulator doesn’t have any advanced functionality such as e.g. the ability to egress heap data via network connections. In the most serious case, if the emulator were running as part of an internet server for music conversion, the attacker could render parts of OOB heap content as sound waves in the output file, to try and steal interesting data from the server’s heap.

However, a second logic quirk of this particular emulator makes things more serious:

2: Ability to load or bank switch ROM to writable memory locations
(Probably not an actual vulnerability per se; no identified assigned.)
Other NES music players I’ve looked at do not permit the loading or bank switching of ROM data at addresses below 0x8000. But this particular player does, either via a ROM load address in the file header that is below 0x8000, or via writes to the bank registers 0x5ff6 or 0x5ff7 (other emulators do not even have bank registers as low as 0x5ff6 or 0x5ff7):

static nes6502_memwrite default_writehandler[] = {
 {0x0800, 0x1FFF, write_mirrored_ram},
 {0x4000, 0x4017, apu_write},
 {0x5FF6, 0x5FFF, nsf_bankswitch},
 {(uint32) - 1, (uint32) - 1, NULL}
};

Writing e.g. 0x00 to 0x5ff6 will result in the first 4096 bytes of ROM being mapped read and write at 6502 virtual address 0x6000. In our 200 byte file example, this means that a subsequent write of 0x41 to virutal address 0x6048 will result in 0x41 being written out of bounds relative to the host emulator heap.

As can be appreciated, we now have a lot of read and write control over the host emulator heap and the more experienced exploit writers will realize that successful exploitation is already all but assured.

The exploit: overview
Here is an image of the exploit file inside okteta, a hex editor:

nsf_exploit_okteta.png

This is the full exploit and as you can see, it’s pleasingly compact at 416 bytes. The image has been decorated with three colored lines that show the different pieces of the exploit file:

  • Blue, for the 128 byte header. Much of the header is irrelevant and in fact, clever exploit techniques could conceivably compress some of the payload into the header. It would be a beautiful challenge to take this exploit and try and do the same in as few bytes as possible :) In terms of significant fields in the header:
  • 0x8000 (little endian, as are the subsequent values) at offset 8. This is the virtual ROM load address.
  • 0x8070 at offset 10. The virtual address of the initial 6502 routine that is called once.
  • 0x80a0 at offset 12. The virtual address of the per-frame 6502 routine.
  • 0x41a1 at offset 110. The frame timing. Needs to be left alone to ensure the sound engine works.
  • Orange, for metadata, at the start of the ROM image directly after the header. This metadata is loaded at virtual address 0x8000 and is available and used in the 6502 program. The metadata includes the string “xcalc”, the eventual payload, a constant to search for in the heap to make the exploit reliable, and a table of reads, additions and writes to perform on the main heap in order to progress the exploit.
  • Green, for real 6502 opcodes -- yay! The exploit proceeds via a program written entirely in 6502 assembly. The music playing emulator will emulate these opcodes, but break out of virtual 6502 address space into the host emulator main heap on account of the vulnerability details outlined above.

  • The exploit: details
    So how does the exploit work to reliably pop a calculator in so few bytes? The exploit is 416 bytes: 128 bytes of header and 288 bytes of ROM which are mapped at virtual 6502 address 0x8000. The ROM consists of a bunch of metadata followed by some 6502 opcodes.

    In order to explore 6502 and “compile” 6502 assembly, this web page is a nice resource: Easy 6502. The page also makes the claim “6502 is fun!” -- I concur. The reasonably commented 6502 assembly source for the opcodes in the exploit is here: asm_final_main.asm. With a couple of tiny extra routines: asm_final_init.asm, asm_final_adder.asm.

    The exploit proceeds as follows:
    1: Locate important metadata object of type nes6502_context in main heap
    Because of the vulnerability noted above, any read from 0x8120 - 0xffff will read out of bounds in the host heap, so we can therefore read out of bounds until we locate bytes in the host heap that we believe correspond to the nes6502_context object, which is defined like this:

    typedef struct
    {
      uint8 * mem_page[NES6502_NUMBANKS];  /* memory page pointers */
      ...
      nes6502_memread *read_handler;
      nes6502_memwrite *write_handler;
      int dma_cycles;
      uint32 pc_reg;
      uint8 a_reg, p_reg, x_reg, y_reg, s_reg;
      uint8 int_pending;
    } nes6502_context;

    Why are we keen to locate this particular object? For two reasons. Firstly, it controls how the 6502 virtual memory accesses map to the host heap. By taking control of this mapping, we’ll get read and write access anywhere in the virtual memory of the host process. Secondly, the object contains pointers into the BSS section. Locating the BSS is important later in the exploit.

    We identify the object in memory via the byte sequence 00 00 00 00 00 50 00 00 00 00 00 00 ff 00 00 00 (which you can find in the exploit file at ROM offset 0), on a 16-byte alignment, which corresponds to the initial values in the fields from dma_cycles to s_reg. (Due to a bug, these initial values are never synced with the real current register values, so we can search for them reliably.)

    2: Remap the 6502 virtual read write address 0x6000 to point to nes6502_context::mem_page[6]
    It would be remiss of me not to cite some beautiful 6502, so here’s the code that writes the magic hardware register to map 0x6000 to out of bounds ROM:
    ; Some fairly simple calculations and then memory bank remapping.
    ; Match address is stored at 0x02, e.g.: 0x91b0
    ; Subtract 0x60 from the match address.
    ; This indexes earlier into the real heap metadata object.
    ; It indexes to a real pointer to the backing heap for 6502 0x6xxx RAM.
    ; Subtract 0x8000 to get offset from ROM base.
    LDA $02
    SEC
    SBC #$60
    STA $02
    LDA $03
    SBC #$80
    STA $03
    ; Now, 0x02 contains e.g. 0x1160
    ; Shift the most significant byte to get the ROM bank id.
    ; Each bank is 0x1000 in size.
    LSR
    LSR
    LSR
    LSR
    ; In this case, our bank id is 1.
    ; Write 1 to magic hardware register 0x5ff6.
    ; Causes 0x6xxx RAM to map to 0x1000 into the ROM.
    ; Which will be out of bounds relative to main host heap :-)
    ; Note that 0x6xxx is writable whereas 0x8xxx+ is not, so we need this.
    ; Offset e.g. 0x160 into the 0x6xxx space is a main heap pointer.
    ; This main heap pointer can be read/written from 6502 at e.g. 0x6160.
    STA $5ff6
    If you’re not familiar with 6502, hopefully you got an idea of some of the simple elegance of the opcodes. Perhaps you also noted some headaches:
    • There’s no instruction to bitshift by a variable amount. Hence, 4 LSR’s in a row (logical shift right) to effectively do what is >> 4 in C.
    • This really is an 8-bit processor, no 16-bit registers, so a simple 16-bit calculation has to be broken into two halves with careful management of the the carry flag! (SBC is SuBtract with Carry.)

    Once the correct ROM bank is mapped writable, there’s a further normalization calculation which actually adds the bank offset to the raw host heap pointer nes6502_context::mem_page[6]. This is a very precise memory corruption and it takes effect the next frame. It ensures that 0x6000 points exactly to nes6502_context::mem_page[6] for all possible bank offsets that we might have located it at.

    3: Start a series of read / add / write sequences, in a one-per-frame loop
    With a normalized 6502 virtual address 0x6000 that points exactly to nes6502_context::mem_page[6], we’re in good shape to start reading and writing the full host heap (and stack / BSS / whatever pointers we can find and follow or calculate!) with perfect accuracy, using 6502 opcodes. If we modify the mem_page array, the effect is not visible to 6502 memory accesses until the next frame, so we simply do one memory modification per frame.
    The table used to drive the read / add / write loop is located at ROM offset 0x20 and each entry is 8 bytes, e.g. the first one:
    50 60 08 60 60 6f ff ff
    This means, read 8 bytes from virtual address 0x6050, add (sign extended, effectively a subtraction) 0xffff6f60 to that 8 byte value, and write it back to virtual address 0x6008.

    4: Calculate the address of the libgstnsf.so BSS
    We’re in luck: the value of nes6502_context::read_handler, which now is readily available at virtual address 0x6050, is at a fixed value from the start of the BSS because it points to an object in the BSS. We calculate the start of the BSS and write it to virtual address 0x6008, which is nes6502_context::mem_page[7]. In other words, we just mapped a readable and writable view of the BSS at virtual address 0x7000 in our little 6502 CPU.

    5: Edit the value of the memset() GOT entry
    At offset 0xf8 into the GOT exists the memset() function pointer. This is a pointer into glibc. It is now mapped at virtual address 0x70f8. Do you know what else is in glibc, at a fixed relative offset? system(). By adding a fixed value to the memset() GOT entry, we ensure future calls to memset() will in fact call system().

    6: Map the actual nes6502_context::read_handler object at 0x7000
    Here’s the read_handler definition; the read_handler pointer points to an array of these:

    typedef struct
    {
      uint32 min_range, max_range;
      uint8 (*read_func)(uint32 address);
    } nes6502_memread;

    And here are some of the entries that fill this array:

    static nes6502_memread default_readhandler[] = {
     {0x0800, 0x1FFF, read_mirrored_ram},
     {0x4000, 0x4017, apu_read},
     {(uint32) - 1, (uint32) - 1, NULL}
    };

    As you can see, this object contains function pointers. Useful. Also useful are that these function pointers are called in normal operation of the 6502 memory accesses, when accesses to certain virtual addresses are made.

    7: Change the apu_read() function pointer
    By accessing virtual address 0x7018, we’re now (thanks to step 6 above) accessing index 0x18 into the read_handler BSS object. The apu_read() function pointer, called for reads of 0x4000 - 0x4017, is stored there. We add a little bit to this pointer (0x1d0) in order to in fact change the function pointer to apu_reset(). The reason will become apparent shortly!

    8: Calculate the address of BSS variable apu, again using a fixed relative offset from nes6502_context::read_handler
    apu is defined thusly:

    /* pointer to active APU */
    static apu_t *apu;

    We write the calculated address such that virtual address 0x7000 points to the value of apu.

    9: Copy the value of the apu pointer into the memory bank mappings so we can dereference into the apu object at virtual address 0x7000
    Just chasing a level of pointer indirection here, because the BSS value is just a pointer to the actual object which is on the heap.

    10: Write the string “xcalc” into the apu object
    The apu object is quite large:

    typedef struct apu_s
    {
      rectangle_t rectangle[2];
      triangle_t triangle;
      noise_t noise;
      dmc_t dmc;
      uint8 enable_reg;

      apudata_t queue[APUQUEUE_SIZE];
      ...

    By writing at 0x70f0, we write to offset 0xf0 into this object, which is the the queue buffer field. We write the string “xcalc” here.

    11: Read from the address 0x4000
    … and a calculator appears! Black magic? No, the previous steps set things up carefully to cause this to happen. Here’s the sequence:
    1. 6502 reads from 0x4000.
    2. This is a special memory address that is supposed to call the apu_read() function pointer to handle the access.
    3. Instead, apu_reset() is called because we corrupted the function pointer earlier.
    4. apu_reset() contains the code line: memset (&apu->queue, 0, APUQUEUE_SIZE * sizeof (apudata_t));
    5. But, we corrupted the memset() GOT entry to point to system(), and we wrote the string “xcalc” to apu->queue.
    6. Ergo, system(“xcalc”) is executed, and the calculator appears.

    Additional exploitation notes
    This exploit works equally well when run in the following binaries:
    • totem
    • rhythmbox (works so well that two calcs are popped ;-)
    • gst-launch-0.10
    • nautilus (may launch a subprocess -- totem-video-thumbnailer?)

    This is despite differing heap layouts. The code to scan the heap for the metadata object of interest, as opposed to relying on a fixed offset, is what provides most of the reliability. The astute reader will note that the heap scan runs only forwards and only for about 32kB. So what if heap jitter results in the all important metadata object getting allocated before the ROM data? This is a definite possibility but does not appear to be a big bother in this instance. The NSF decoder runs in a fresh new thread, which in turn generally gets a new heap arena, resulting in decent determinism in heap layout. The metadata object is allocated temporally after the ROM data, so it will typically get placed after. That said, if the ROM data is made bigger, it can (deterministically, due to heap holes of deterministic size) end up after the metadata object.

    Our closing note on heap layout is that, if we needed them, we do have opportunities for heap grooming. Aside from the attacker controlled ROM size, there’s also some variable length header strings (song title etc.) that get heap allocated. Finally, we note that the gstreamer code for format detection is very non-trivial and may offer further opportunities to control heap state.

    What lessons can be learned?
    While investigating this exploit, a number of hardening ideas came up. Also, a comparison of Ubuntu vs. Fedora -- even when extended to the latest releases -- reveals that Ubuntu is slipping behind Fedora for some exploit mitigations. In no particular order:

    • The attack surface of the Linux desktop does not appear to be under control, or adequately monitored for regression. In the case of Ubuntu, adding MP3 support also appears to add support for a huge number of obscure and largely unnecessary audio and video decoders. These contribute little to the desktop experience but greatly to the security risk. The relevant gstreamer extra plug-in packages even identify the additional decoders as “bad” or “ugly” in the package name. An effort to further split the decoders into “useful” ones vs. “obscure / risky” ones is recommended.
    • Initial signs show that the security quality of the gstreamer code is behind the ffmpeg code. One of the reasons for this is likely to be j00ru’s / Google’s significant effort put into improving ffmpeg security: FFmpeg and a thousand fixes. Still, an important question for discussion is whether gstreamer’s decoders should be replaced with an ffmpeg based backend.
    • Ubuntu is not nearly as thorough as Fedora in using ASLR on binaries. Comparing Ubuntu 16.04 vs. Fedora 24, we see that Fedora has ASLR on the binaries for totem, rhythmbox and gst-launch-1.0. Ubuntu only has ASLR on totem.
    • Fedora appears to make good use of RELRO whereas Ubuntu does not. RELRO prevents messing with the function pointers in the GOT.
    • The level of sandboxing is disappointing across both Ubuntu and Fedora. Parsing media files in C has always been a hot spot for security and one mitigation is sandboxing. I don’t see much across Fedora or Ubuntu to use SELinux or AppArmor to meaningfully sandbox totem, rhythmbox, the thumbnailing processes, etc. by default. There does appear to be interest, but the urgency does not seem to be high. One useful source: AppArmor Confinement. There’s also this Ubuntu bug to sandbox all the thumbnailers, open since 2011: gnome thumbnailers should have an apparmor profile.
    • Changes in the Linux glibc heap management code have increased certain aspects of heap layout determinism, particularly for threaded programs. This is a topic for further exploration in a different blog post.
    • 6502 really is fun.

    Closing notes
    There’s a critical reason that decent, reliable exploitation was possible with this bug: the presence of some form of “scripting” language. In this case, that script happens to be 6502 opcodes. Having an exploit running in script enables important exploitation aspects, such as making decisions based on exploitation environment, and in particular, using code to observe the effects of a corruption (such as a memory leak) and make sensible follow-up decisions.

    One of the reasons that browsers and browser plug-ins (Flash, Java) are popular exploitation targets is precisely because they are fundamentally scripting environments.

    Another great example of this phenomena is Windows font parsing and rendering. This has traditionally occurred in the kernel(!!) and rending modern fonts involves…. yes, running a little language to make rendering decisions. Well, many times, attackers have used that same language to cause Windows kernel corruptions and proceed to full ring 0 compromise by using a script-inside-font to make decisions about reliably proceeding with the exploit.


    So watch out for scripting in unexpected places!