Sheets in Ghidra
Ghidra is a software reverse engineering platform with a robust Java-based extension system.
SheetJS is a JavaScript library for reading and writing data from spreadsheets.
The Complete Demo uses SheetJS to export data from a Ghidra script. We'll create an extension that loads the V8 JavaScript engine through the Ghidra.js1 integration and uses the SheetJS library to export a bitfield table from Apple Numbers to a XLSX workbook.
This demo was tested by SheetJS users in the following deployments:
Architecture | Ghidra | Date |
---|---|---|
darwin-arm | 11.1.2 | 2024-10-13 |
Integration Details
Ghidra natively supports scripts that are run in Java. JS extension scripts require a JavaScript engine with Java bindings.
Ghidra.js1 is a Ghidra integration for RhinoJS, GraalJS and V8. The current version uses the Javet V8 binding.
Loading SheetJS Scripts
The SheetJS NodeJS module can be
loaded in Ghidra.js scripts using require
:
const XLSX = require("xlsx");
SheetJS NodeJS modules must be installed in a folder in the Ghidra script path!
Bitfields and Sheets
Binary file formats commonly use bitfields to compactly store a set of Boolean
(true or false) flags. For example, in the XLSB file format, the BrtRowHdr
record2 encodes row properties. Bit offsets
91-96 are interpreted as flags marking if a row is hidden or if it is collapsed.
Assembly Implementation
Functions that parse bitfields typically test each bit sequentially:
CASE_1c
41 0f ba e5 1c BT R13D,0x1c
73 69 JNC CASE_1d
;; .... Do some work here (bit offset 28)
CASE_1d
41 0f ba e5 1d BT R13D,0x1d
73 69 JNC CASE_1e
;; .... Do some work here (bit offset 29)
The assembly is approximated by the following TypeScript snippet:
/* R13 is a 64-bit register */
declare let R13: BigInt;
/* NOTE: asm R13D is technically a live binding */
let R13D: number = Number(R13 & 0xFFFFFFFFn);
if((R13D >> 28) & 1) {
// .... Do some work here (bit offset 28)
}
if((R13D >> 29) & 1) {
// .... Do some work here (bit offset 29)
}
Array of Objects
A bitmask or bit offset can be paired with a description in a JavaScript object.
For example, in the BrtRowHdr
record, bit offset 92 indicates whether the row
is hidden (if the bit is set) or visible (if the bit is not set). The offset and
description can be stored as fields in an object:
const metadata_92 = { Offset: 92, Description: "Hidden flag" };
Each object can be stored in an array:
const metadata = [
{ Offset: 91, Description: "Collapsed flag" },
{ Offset: 92, Description: "Hidden flag" },
// ...
];
This is an "Array of Objects".
The SheetJS json_to_sheet
method3 can generate a SheetJS worksheet object
from the array:
const ws = XLSX.utils.json_to_sheet(metadata);
The SheetJS book_new
method4 generates a SheetJS workbook object that can
be written to the filesystem using the writeFile
method5:
const wb = XLSX.utils.book_new(ws, "Offsets");
XLSX.utils.writeFile(wb, "SheetJSGhidra.xlsx");
Java Binding
Ghidra.js exposes a number of globals for interacting with Ghidra, including:
currentProgram
: information about the loaded program.JavaHelper
: Java helper to load classes.
Ghidra.js automatically bridges instance methods to Java method calls. It also handles the plugin and file extension details.
Launching the Decompiler
ghidra.app.decompiler.DecompInterface
is the primary Java interface to the
decompiler. In Ghidra.js, JavaHelper.getClass
will load the class.
Java
import ghidra.app.script.GhidraScript;
import ghidra.app.decompiler.DecompInterface;
import ghidra.program.model.listing.Program;
public class SheetZilla extends GhidraScript {
@Override public void run() throws Exception {
DecompInterface ifc = new DecompInterface();
boolean success = ifc.openProgram(currentProgram);
/* ... do work here ... */
}
}
Ghidra.js
const DecompInterface = JavaHelper.getClass('ghidra.app.decompiler.DecompInterface');
const decompiler = new DecompInterface();
decompiler.openProgram(currentProgram);
Identifying a Function
The getGlobalSymbols
method of a symbol table instance will return an array of
symbols matching the given name:
/* name of function to find */
const fname = 'MyMethod';
/* find symbols matching the name */
const fsymbs = currentProgram.getSymbolTable().getGlobalSymbols(fname);
/* get first result */
const fsymb = fsymbs[0];
The getFunctionAt
method of a function manager instance will take an address
and return a reference to a function:
/* get address */
const faddr = fsymb.getAddress();
/* find function */
const fn = currentProgram.getFunctionManager().getFunctionAt(faddr);
Decompiling a Function
The decompileFunction
method attempts to decompile the referenced function:
/* decompile function */
const decomp = decompiler.decompileFunction(fn, 10000, null);
Once decompiled, it is possible to retrieve the decompiled C code:
/* get generated C code */
const src = decomp.getDecompiledFunction().getC();
Complete Demo
In this demo, we will inspect the _TSTCellToCellStorage
method within the
TSTables
framework of Apple Numbers 14.2. This particular method handles
serialization of cells to the NUMBERS file format.
The implementation has a number of blocks which look like the following script:
if(flags >> 0x0d & 1) {
const field = "numberFormatID";
const current_value = cell[field];
// ... check if current_value is set, do other stuff
}
Based on the bit offset and the field name, we will generate the following row:
const mask = 1 << 0x0d; // = 8192 = 0x2000
const name = "number format ID";
const row = { Mask: "0x" + mask.toString(16), "Internal Name": name };
Rows will be generated for each block and the final dataset will be exported.
System Setup
- Install Ghidra, Xcode, and Apple Numbers.
Installation Notes (click to show)
On macOS, Ghidra was installed using Homebrew:
brew install --cask ghidra
- Add the base Ghidra folder to the PATH variable. The following shell command
adds to the path for the current
zsh
orbash
session:
export PATH="$PATH":$(dirname $(realpath `which ghidraRun`))
- Install
ghidra.js
globally:
npm install -g ghidra.js
If the install fails with a permissions issue, install with the root user:
sudo npm install -g ghidra.js
Program Preparation
- Create a temporary folder to hold the Ghidra project:
mkdir -p /tmp/sheetjs-ghidra
- Copy the
TSTables
framework to the current directory:
cp /Applications/Numbers.app/Contents/Frameworks/TSTables.framework/Versions/Current/TSTables .
- Create a "thin" binary by extracting the
x86_64
part of the framework:
lipo TSTables -thin x86_64 -output TSTables.macho
When this demo was last tested, the headless analyzer did not support Mach-O fat
binaries. lipo
creates a new binary with support for one architecture.
- Analyze the program:
$(dirname $(realpath `which ghidraRun`))/support/analyzeHeadless /tmp/sheetjs-ghidra Numbers -import TSTables.macho
This process may take a while and print a number of Java stacktraces. The errors can be ignored.
SheetJS Integration
- Download
sheetjs-ghidra.js
:
curl -LO https://docs.sheetjs.com/ghidra/sheetjs-ghidra.js
- Install the SheetJS NodeJS module:
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz
- Run the script:
$(dirname $(realpath `which ghidraRun`))/support/analyzeHeadless /tmp/sheetjs-ghidra Numbers -process TSTables.macho -noanalysis -scriptPath `pwd` -postScript sheetjs-ghidra.js
- Open the generated
SheetJSGhidraTSTCell.xlsx
spreadsheet.
Footnotes
-
The project does not have a website. The source repository is publicly available. ↩ ↩2
-
BrtRowHdr
is defined in theMS-XLSB
specification ↩