Skip to main content

Sheets in Ghidra

Ghidra is a software reverse engineering platform with a robust Java-based extension system.

SheetJS is a JavaScript library for reading and writing data from spreadsheets.

The Complete Demo uses SheetJS to export data from a Ghidra script. We'll create an extension that loads the V8 JavaScript engine through the Ghidra.js1 integration and uses the SheetJS library to export a bitfield table from Apple Numbers to a XLSX workbook.

Tested Deployments

This demo was tested by SheetJS users in the following deployments:

ArchitectureGhidraDate
darwin-arm11.1.22024-10-13

Integration Details

Ghidra natively supports scripts that are run in Java. JS extension scripts require a JavaScript engine with Java bindings.

Ghidra.js1 is a Ghidra integration for RhinoJS, GraalJS and V8. The current version uses the Javet V8 binding.

Loading SheetJS Scripts

The SheetJS NodeJS module can be loaded in Ghidra.js scripts using require:

Loading SheetJS scripts in Ghidra.js
const XLSX = require("xlsx");

SheetJS NodeJS modules must be installed in a folder in the Ghidra script path!

Bitfields and Sheets

Binary file formats commonly use bitfields to compactly store a set of Boolean (true or false) flags. For example, in the XLSB file format, the BrtRowHdr record2 encodes row properties. Bit offsets 91-96 are interpreted as flags marking if a row is hidden or if it is collapsed.

Assembly Implementation

Functions that parse bitfields typically test each bit sequentially:

x86_64 sample assembly with mnemonics
            CASE_1c
41 0f ba e5 1c BT R13D,0x1c
73 69 JNC CASE_1d

;; .... Do some work here (bit offset 28)

CASE_1d
41 0f ba e5 1d BT R13D,0x1d
73 69 JNC CASE_1e

;; .... Do some work here (bit offset 29)

The assembly is approximated by the following TypeScript snippet:

Approximate TypeScript
/* R13 is a 64-bit register */
declare let R13: BigInt;
/* NOTE: asm R13D is technically a live binding */
let R13D: number = Number(R13 & 0xFFFFFFFFn);

if((R13D >> 28) & 1) {
// .... Do some work here (bit offset 28)
}

if((R13D >> 29) & 1) {
// .... Do some work here (bit offset 29)
}

Array of Objects

A bitmask or bit offset can be paired with a description in a JavaScript object.

For example, in the BrtRowHdr record, bit offset 92 indicates whether the row is hidden (if the bit is set) or visible (if the bit is not set). The offset and description can be stored as fields in an object:

Sample metadata for BrtRowHdr offset 92
const metadata_92 = { Offset: 92, Description: "Hidden flag" };

Each object can be stored in an array:

Array of sample metadata for BrtRowHdr
const metadata = [
{ Offset: 91, Description: "Collapsed flag" },
{ Offset: 92, Description: "Hidden flag" },
// ...
];

This is an "Array of Objects". The SheetJS json_to_sheet method3 can generate a SheetJS worksheet object from the array:

Generating a worksheet from the metadata
const ws = XLSX.utils.json_to_sheet(metadata);

The SheetJS book_new method4 generates a SheetJS workbook object that can be written to the filesystem using the writeFile method5:

Exporting the worksheet to file
const wb = XLSX.utils.book_new(ws, "Offsets");
XLSX.utils.writeFile(wb, "SheetJSGhidra.xlsx");

Java Binding

Ghidra.js exposes a number of globals for interacting with Ghidra, including:

  • currentProgram: information about the loaded program.
  • JavaHelper: Java helper to load classes.

Ghidra.js automatically bridges instance methods to Java method calls. It also handles the plugin and file extension details.

Launching the Decompiler

ghidra.app.decompiler.DecompInterface is the primary Java interface to the decompiler. In Ghidra.js, JavaHelper.getClass will load the class.

Java

Launch decompiler process in Java (snippet)
import ghidra.app.script.GhidraScript;
import ghidra.app.decompiler.DecompInterface;
import ghidra.program.model.listing.Program;

public class SheetZilla extends GhidraScript {
@Override public void run() throws Exception {
DecompInterface ifc = new DecompInterface();
boolean success = ifc.openProgram(currentProgram);
/* ... do work here ... */
}
}

Ghidra.js

Launch decompiler process in Ghidra.js
const DecompInterface = JavaHelper.getClass('ghidra.app.decompiler.DecompInterface');
const decompiler = new DecompInterface();
decompiler.openProgram(currentProgram);

Identifying a Function

The getGlobalSymbols method of a symbol table instance will return an array of symbols matching the given name:

/* name of function to find */
const fname = 'MyMethod';

/* find symbols matching the name */
const fsymbs = currentProgram.getSymbolTable().getGlobalSymbols(fname);

/* get first result */
const fsymb = fsymbs[0];

The getFunctionAt method of a function manager instance will take an address and return a reference to a function:

/* get address */
const faddr = fsymb.getAddress();

/* find function */
const fn = currentProgram.getFunctionManager().getFunctionAt(faddr);

Decompiling a Function

The decompileFunction method attempts to decompile the referenced function:

/* decompile function */
const decomp = decompiler.decompileFunction(fn, 10000, null);

Once decompiled, it is possible to retrieve the decompiled C code:

/* get generated C code */
const src = decomp.getDecompiledFunction().getC();

Complete Demo

In this demo, we will inspect the _TSTCellToCellStorage method within the TSTables framework of Apple Numbers 14.2. This particular method handles serialization of cells to the NUMBERS file format.

The implementation has a number of blocks which look like the following script:

if(flags >> 0x0d & 1) {
const field = "numberFormatID";
const current_value = cell[field];
// ... check if current_value is set, do other stuff
}

Based on the bit offset and the field name, we will generate the following row:

const mask = 1 << 0x0d; // = 8192 = 0x2000
const name = "number format ID";
const row = { Mask: "0x" + mask.toString(16), "Internal Name": name };

Rows will be generated for each block and the final dataset will be exported.

System Setup

  1. Install Ghidra, Xcode, and Apple Numbers.
Installation Notes (click to show)

On macOS, Ghidra was installed using Homebrew:

brew install --cask ghidra
  1. Add the base Ghidra folder to the PATH variable. The following shell command adds to the path for the current zsh or bash session:
export PATH="$PATH":$(dirname $(realpath `which ghidraRun`))
  1. Install ghidra.js globally:
npm install -g ghidra.js

If the install fails with a permissions issue, install with the root user:

sudo npm install -g ghidra.js

Program Preparation

  1. Create a temporary folder to hold the Ghidra project:
mkdir -p /tmp/sheetjs-ghidra
  1. Copy the TSTables framework to the current directory:
cp /Applications/Numbers.app/Contents/Frameworks/TSTables.framework/Versions/Current/TSTables .
  1. Create a "thin" binary by extracting the x86_64 part of the framework:
lipo TSTables -thin x86_64 -output TSTables.macho

When this demo was last tested, the headless analyzer did not support Mach-O fat binaries. lipo creates a new binary with support for one architecture.

  1. Analyze the program:
$(dirname $(realpath `which ghidraRun`))/support/analyzeHeadless /tmp/sheetjs-ghidra Numbers -import TSTables.macho

This process may take a while and print a number of Java stacktraces. The errors can be ignored.

SheetJS Integration

  1. Download sheetjs-ghidra.js:
curl -LO https://docs.sheetjs.com/ghidra/sheetjs-ghidra.js
  1. Install the SheetJS NodeJS module:
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz
  1. Run the script:
$(dirname $(realpath `which ghidraRun`))/support/analyzeHeadless /tmp/sheetjs-ghidra Numbers -process TSTables.macho -noanalysis -scriptPath `pwd` -postScript sheetjs-ghidra.js
  1. Open the generated SheetJSGhidraTSTCell.xlsx spreadsheet.

Footnotes

  1. The project does not have a website. The source repository is publicly available. 2

  2. BrtRowHdr is defined in the MS-XLSB specification

  3. See json_to_sheet in "Utilities"

  4. See book_new in "Utilities"

  5. See writeFile in "Writing Files"