Skip to main content

Blazing Fast Data Processing with V8

V8 is an embeddable JavaScript engine written in C++. It powers Chromium and Chrome, NodeJS and Deno, Adobe UXP and other platforms.

SheetJS is a JavaScript library for reading and writing data from spreadsheets.

This demo uses V8 and SheetJS to read and write spreadsheets. We'll explore how to load SheetJS in a V8 context and process spreadsheets and structured data from C++ and Rust programs.

The "Complete Example" creates a C++ command-line tool for reading spreadsheet files and generating new workbooks. "Bindings" covers V8 engine bindings for other programming languages.

Integration Details

The SheetJS Standalone scripts can be parsed and evaluated in a V8 context.

This section describes a flow where the script is parsed and evaluated each time the program is run.

Using V8 snapshots, SheetJS libraries can be parsed and evaluated at build time. This greatly improves program startup time.

The "Snapshots" section includes a complete example.

Initialize V8

The official V8 hello-world example covers initialization and cleanup. For the purposes of this demo, the key variables are noted below:

v8::Isolate* isolate = v8::Isolate::New(create_params);
v8::Local<v8::Context> context = v8::Context::New(isolate);

The following helper function evaluates C strings as JS code:

v8::Local<v8::Value> eval_code(v8::Isolate *isolate, v8::Local<v8::Context> context, char* code, size_t sz = -1) {
v8::Local<v8::String> source = v8::String::NewFromUtf8(isolate, code, v8::NewStringType::kNormal, sz).ToLocalChecked();
v8::Local<v8::Script> script = v8::Script::Compile(context, source).ToLocalChecked();
return script->Run(context).ToLocalChecked();
}

Load SheetJS Scripts

The main library can be loaded by reading the scripts from the file system and evaluating in the V8 context:

/* simple wrapper to read the entire script file */
static char *read_file(const char *filename, size_t *sz) {
FILE *f = fopen(filename, "rb");
if(!f) return NULL;
long fsize; { fseek(f, 0, SEEK_END); fsize = ftell(f); fseek(f, 0, SEEK_SET); }
char *buf = (char *)malloc(fsize * sizeof(char));
*sz = fread((void *) buf, 1, fsize, f);
fclose(f);
return buf;
}

// ...
size_t sz; char *file = read_file("xlsx.full.min.js", &sz);
v8::Local<v8::Value> result = eval_code(isolate, context, file, sz);

To confirm the library is loaded, XLSX.version can be inspected:

  /* get version string */
v8::Local<v8::Value> result = eval_code(isolate, context, "XLSX.version");
v8::String::Utf8Value vers(isolate, result);
printf("SheetJS library version %s\n", *vers);

Reading Files

V8 supports ArrayBuffer natively. Assuming buf is a C byte array, with length len, the following code stores the data in a global ArrayBuffer:

Loading data into an ArrayBuffer in the V8 engine
/* load C char array and save to an ArrayBuffer */
std::unique_ptr<v8::BackingStore> back = v8::ArrayBuffer::NewBackingStore(isolate, len);
memcpy(back->Data(), buf, len);
v8::Local<v8::ArrayBuffer> ab = v8::ArrayBuffer::New(isolate, std::move(back));
v8::Maybe<bool> res = context->Global()->Set(context, v8::String::NewFromUtf8Literal(isolate, "buf"), ab);

Once the raw data is pulled into the engine, the SheetJS read method1 can parse the data. It is recommended to attach the result to a global variable:

/* parse with SheetJS */
v8::Local<v8::Value> result = eval_code(isolate, context, "globalThis.wb = XLSX.read(buf)");

wb, a SheetJS workbook object2, will be a variable in the JS environment that can be inspected using the various SheetJS API functions3.

Writing Files

The SheetJS write method4 generates file bytes from workbook objects. The array type5 instructs the library to generate ArrayBuffer objects:

/* write with SheetJS using type: "array" */
v8::Local<v8::Value> result = eval_code(isolate, context, "XLSX.write(wb, {type:'array', bookType:'xlsb'})");

The underlying memory from an ArrayBuffer can be pulled from the engine:

Pulling raw bytes from an ArrayBuffer
/* pull result back to C++ */
v8::Local<v8::ArrayBuffer> ab = v8::Local<v8::ArrayBuffer>::Cast(result);
size_t sz = ab->ByteLength();
char *buf = ab->Data();

The resulting buf can be written to file with fwrite.

Complete Example

Tested Deployments

This demo was tested in the following deployments:

V8 VersionPlatformOS VersionCompilerDate
12.4.253darwin-x64macOS 14.4clang 15.0.02024-03-15
12.7.130darwin-armmacOS 14.5clang 15.0.02024-05-25
12.5.48win10-x64Windows 10CL 19.39.335232024-03-24
12.5.48linux-x64HoloOS 3.5.17gcc 13.1.12024-03-21
12.7.130linux-armDebian 12gcc 12.2.02024-05-25

This program parses a file and prints CSV data from the first worksheet. It also generates an XLSB file and writes to the filesystem.

When the demo was last tested, there were errors in the official V8 embed guide. Corrected instructions are included below.

The build process is long and will test your patience.

Preparation

  1. Prepare /usr/local/lib:
mkdir -p /usr/local/lib
cd /usr/local/lib

If this step throws a permission error, run the following commands:

sudo mkdir -p /usr/local/lib
sudo chmod 777 /usr/local/lib
  1. Download and install depot_tools:
git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git

If this step throws a permission error, run the following commands and retry:

sudo mkdir -p /usr/local/lib
sudo chmod 777 /usr/local/lib
  1. Add the path to the PATH environment variable:
export PATH="/usr/local/lib/depot_tools:$PATH"

At this point, it is strongly recommended to add the line to a shell startup script such as .bashrc or .zshrc

  1. Run gclient once to update depot_tools:
gclient

Clone V8

  1. Create a base directory:
mkdir -p ~/dev/v8
cd ~/dev/v8
fetch v8
cd v8

Note that the actual repo will be placed in ~/dev/v8/v8.

  1. Checkout the desired version. The following command pulls 12.7.130:
git checkout tags/12.7.130 -b sample

The official documentation recommends:

git checkout refs/tags/12.7.130 -b sample -t

This command failed in local testing:

E:\v8\v8>git checkout refs/tags/12.7.130 -b sample -t
fatal: cannot set up tracking information; starting point 'refs/tags/12.7.130' is not a branch

Build V8

  1. Build the static library.
tools/dev/v8gen.py x64.release.sample
ninja -C out.gn/x64.release.sample v8_monolith
  1. Ensure the sample hello-world compiles and runs:
g++ -I. -Iinclude samples/hello-world.cc -o hello_world -fno-rtti -lv8_monolith \
-ldl -Lout.gn/x64.release.sample/obj/ -pthread \
-std=c++17 -DV8_COMPRESS_POINTERS=1 -DV8_ENABLE_SANDBOX
./hello_world

In older V8 versions, the flags -lv8_libbase -lv8_libplatform were required.

Linking against libv8_libbase or libv8_libplatform in V8 version 12.4.253 elicited linker errors:

ld: multiple errors: unknown file type in '/Users/test/dev/v8/v8/out.gn/x64.release.sample/obj/libv8_libplatform.a'; unknown file type in '/Users/test/dev/v8/v8/out.gn/x64.release.sample/obj/libv8_libbase.a'

Prepare Project

  1. Make a new project folder:
cd ~/dev
mkdir -p sheetjs-v8
cd sheetjs-v8
  1. Copy the sample source:
cp ~/dev/v8/v8/samples/hello-world.cc .
  1. Create symbolic links to the include headers and obj library folders:
ln -s ~/dev/v8/v8/include
ln -s ~/dev/v8/v8/out.gn/x64.release.sample/obj
  1. Build and run the hello-world example from this folder:
g++ -I. -Iinclude hello-world.cc -o hello_world -fno-rtti -lv8_monolith \
-lv8_libbase -lv8_libplatform -ldl -Lobj/ -pthread -std=c++17 \
-DV8_COMPRESS_POINTERS=1 -DV8_ENABLE_SANDBOX
./hello_world

In some V8 versions, the command failed in the linker stage:

ld: multiple errors: unknown file type in '/Users/test/dev/v8/v8/out.gn/x64.release.sample/obj/libv8_libplatform.a'; unknown file type in '/Users/test/dev/v8/v8/out.gn/x64.release.sample/obj/libv8_libbase.a'

The build succeeds after removing libv8_libbase and libv8_libplatform:

g++ -I. -Iinclude hello-world.cc -o hello_world -fno-rtti -lv8_monolith \
-ldl -Lobj/ -pthread -std=c++17 \
-DV8_COMPRESS_POINTERS=1 -DV8_ENABLE_SANDBOX
./hello_world

Add SheetJS

  1. Download the SheetJS Standalone script and test file. Save both files in the project directory:
curl -LO https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js
curl -LO https://docs.sheetjs.com/pres.numbers
  1. Download sheetjs.v8.cc:
curl -LO https://docs.sheetjs.com/v8/sheetjs.v8.cc
  1. Compile standalone sheetjs.v8 binary
g++ -I. -Iinclude sheetjs.v8.cc -o sheetjs.v8 -fno-rtti -lv8_monolith \
-lv8_libbase -lv8_libplatform -ldl -Lobj/ -pthread -std=c++17 \
-DV8_COMPRESS_POINTERS=1 -DV8_ENABLE_SANDBOX

In some V8 versions, the command failed in the linker stage:

ld: multiple errors: unknown file type in '/Users/test/dev/v8/v8/out.gn/x64.release.sample/obj/libv8_libplatform.a'; unknown file type in '/Users/test/dev/v8/v8/out.gn/x64.release.sample/obj/libv8_libbase.a'

The build succeeds after removing libv8_libbase and libv8_libplatform:

g++ -I. -Iinclude sheetjs.v8.cc -o sheetjs.v8 -fno-rtti -lv8_monolith \
-ldl -Lobj/ -pthread -std=c++17 \
-DV8_COMPRESS_POINTERS=1 -DV8_ENABLE_SANDBOX
  1. Run the demo:
./sheetjs.v8 pres.numbers

If the program succeeded, the CSV contents will be printed to console and the file sheetjsw.xlsb will be created. That file can be opened with Excel.

Bindings

Bindings exist for many languages. As these bindings require "native" code, they may not work on every platform.

Rust

The v8 crate6 provides binary builds and straightforward bindings. The Rust code is similar to the C++ code.

Pulling data from an ArrayBuffer back into Rust involves an unsafe operation:

/* assuming JS code returns an ArrayBuffer, copy result to a Vec<u8> */
fn eval_code_ab(scope: &mut v8::HandleScope, code: &str) -> Vec<u8> {
let source = v8::String::new(scope, code).unwrap();
let script = v8::Script::compile(scope, source, None).unwrap();
let result: v8::Local<v8::ArrayBuffer> = script.run(scope).unwrap().try_into().unwrap();
/* In C++, `Data` returns a pointer. Collecting data into Vec<u8> is unsafe */
unsafe { return std::slice::from_raw_parts_mut(
result.data().unwrap().cast::<u8>().as_ptr(),
result.byte_length()
).to_vec(); }
}
Tested Deployments

This demo was last tested in the following deployments:

ArchitectureV8 CrateDate
darwin-x640.92.02024-05-28
darwin-arm0.92.02024-05-25
win10-x640.89.02024-03-24
linux-x640.91.02024-04-25
linux-arm0.92.02024-05-25
  1. Create a new project:
cargo new sheetjs-rustyv8
cd sheetjs-rustyv8
cargo run
  1. Add the v8 crate:
cargo add v8
cargo run
  1. Download the SheetJS Standalone script and test file. Save both files in the project directory:
curl -LO https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js
curl -LO https://docs.sheetjs.com/pres.numbers
  1. Download main.rs and replace src/main.rs:
curl -L -o src/main.rs https://docs.sheetjs.com/v8/main.rs
  1. Build and run the app:
cargo run pres.numbers

If the program succeeded, the CSV contents will be printed to console and the file sheetjsw.xlsb will be created. That file can be opened with Excel.

Java

Javet is a Java binding to the V8 engine. Javet simplifies conversions between Java data structures and V8 equivalents.

Java byte arrays (byte[]) are projected in V8 as Int8Array. The SheetJS read method expects a Uint8Array. The following script snippet performs a zero-copy conversion:

Zero-copy conversion from Int8Array to Uint8Array
// assuming `i8` is an Int8Array
const u8 = new Uint8Array(i8.buffer, i8.byteOffset, i8.length);
Tested Deployments

This demo was last tested in the following deployments:

ArchitectureV8 VersionJavetJavaDate
darwin-x6412.6.228.133.1.3222024-06-19
darwin-arm12.6.228.133.1.311.0.232024-06-19
win10-x6412.6.228.133.1.311.0.162024-06-21
linux-x6412.6.228.133.1.317.0.72024-06-20
linux-arm12.6.228.133.1.317.0.112024-06-20
  1. Create a new project:
mkdir sheetjs-javet
cd sheetjs-javet
  1. Download the Javet JAR. There are different archives for different platforms.
curl -LO https://repo1.maven.org/maven2/com/caoccao/javet/javet-macos/3.1.3/javet-macos-3.1.3.jar
  1. Download the SheetJS Standalone script and test file. Save both files in the project directory:
curl -LO https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js
curl -LO https://docs.sheetjs.com/pres.xlsx
  1. Download SheetJSJavet.java:
curl -LO https://docs.sheetjs.com/v8/SheetJSJavet.java
  1. Build and run the Java application:
javac -cp ".:javet-macos-3.1.3.jar" SheetJSJavet.java
java -cp ".:javet-macos-3.1.3.jar" SheetJSJavet pres.xlsx

If the program succeeded, the CSV contents will be printed to console.

C#

ClearScript is a .NET interface to the V8 engine.

C# byte arrays (byte[]) must be explicitly converted to arrays of bytes:

/* read data into a byte array */
byte[] filedata = File.ReadAllBytes("pres.numbers");

/* generate a JS Array (variable name `buf`) from the data */
engine.Script.buf = engine.Script.Array.from(filedata);

/* parse data */
engine.Evaluate("var wb = XLSX.read(buf, {type: 'array'});");
Tested Deployments

This demo was last tested in the following deployments:

ArchitectureV8 VersionDate
darwin-x6412.3.219.122024-07-16
darwin-arm12.3.219.122024-07-16
win10-x6412.3.219.122024-07-16
win11-arm12.3.219.122024-07-16
linux-x6412.3.219.122024-07-16
linux-arm12.3.219.122024-07-16
  1. Set the DOTNET_CLI_TELEMETRY_OPTOUT environment variable to 1.
How to disable telemetry (click to hide)

Add the following line to .profile, .bashrc and .zshrc:

(add to .profile , .bashrc , and .zshrc)
export DOTNET_CLI_TELEMETRY_OPTOUT=1

Close and restart the Terminal to load the changes.

  1. Install .NET
Installation Notes (click to show)

For macOS x64 and ARM64, install the dotnet-sdk Cask with Homebrew:

brew install --cask dotnet-sdk

For Steam Deck Holo and other Arch Linux x64 distributions, the dotnet-sdk and dotnet-runtime packages should be installed using pacman:

sudo pacman -Syu dotnet-sdk dotnet-runtime

https://dotnet.microsoft.com/en-us/download/dotnet/6.0 is the official source for Windows and ARM64 Linux versions.

  1. Open a new Terminal window in macOS or PowerShell window in Windows.

  2. Create a new project:

mkdir SheetJSClearScript
cd SheetJSClearScript
dotnet new console
dotnet run
  1. Add ClearScript to the project:
dotnet add package Microsoft.ClearScript.Complete --version 7.4.5
  1. Download the SheetJS standalone script and test file. Move both files to the project directory:
curl -LO https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js
curl -LO https://docs.sheetjs.com/pres.xlsx
  1. Replace Program.cs with the following:
Program.cs
using Microsoft.ClearScript.JavaScript;
using Microsoft.ClearScript.V8;

/* initialize ClearScript */
var engine = new V8ScriptEngine();

/* Load SheetJS Scripts */
engine.Evaluate(File.ReadAllText("xlsx.full.min.js"));
Console.WriteLine("SheetJS version {0}", engine.Evaluate("XLSX.version"));

/* Read and Parse File */
byte[] filedata = File.ReadAllBytes(args[0]);
engine.Script.buf = engine.Script.Array.from(filedata);
engine.Evaluate("var wb = XLSX.read(buf, {type: 'array'});");

/* Print CSV of first worksheet */
engine.Evaluate("var ws = wb.Sheets[wb.SheetNames[0]];");
var csv = engine.Evaluate("XLSX.utils.sheet_to_csv(ws)");
Console.Write(csv);

/* Generate XLSB file and save to SheetJSJint.xlsb */
var xlsb = (ITypedArray<byte>)engine.Evaluate("XLSX.write(wb, {bookType: 'xlsb', type: 'buffer'})");
File.WriteAllBytes("SheetJSClearScript.xlsb", xlsb.ToArray());

After saving, run the program and pass the test file name as an argument:

dotnet run pres.xlsx

If successful, the program will print the contents of the first sheet as CSV rows. It will also create SheetJSClearScript.xlsb, a workbook that can be opened in a spreadsheet editor.

Python

pyv8 is a Python wrapper for V8.

The stpyv8 package7 is an actively-maintained fork with binary wheels.

When this demo was last tested, there was no direct conversion between Python bytes and JavaScript ArrayBuffer data.

This is a known issue8. The current recommendation is Base64 strings.

Python Base64 Strings

The SheetJS read1 and write4 methods support Base64 strings through the base64 type5.

Reading Files

It is recommended to create a global context with a special method that handles file reading from Python. The read_file helper in the following snippet will read bytes from sheetjs.xlsx and generate a Base64 string:

from base64 import b64encode;
from STPyV8 import JSContext, JSClass;

# Create context with methods for file i/o
class Base64Context(JSClass):
def read_file(self, path):
with open(path, "rb") as f:
data = f.read();
return b64encode(data).decode("ascii");
globals = Base64Context();

# The JSContext starts and cleans up the V8 engine
with JSContext(globals) as ctxt:
print(ctxt.eval("read_file('sheetjs.xlsx')")); # read base64 data and print

Writing Files

Since the SheetJS write method returns a Base64 string, the result can be decoded and written to file from Python:

from base64 import b64decode;
from STPyV8 import JSContext;

# The JSContext starts and cleans up the V8 engine
with JSContext() as ctxt:
# ... initialization and workbook creation ...
xlsb = ctxt.eval("XLSX.write(wb, {type: 'base64', bookType: 'xlsb'})");
with open("SheetJSSTPyV8.xlsb", "wb") as f:
f.write(b64decode(xlsb));

Python Demo

Tested Deployments

This demo was last tested in the following deployments:

ArchitectureV8 VersionPythonDate
darwin-arm13.0.245.163.13.02024-10-20
  1. Make a new folder for the project:
mkdir sheetjs-stpyv8
cd sheetjs-stpyv8
  1. Install stpyv8:
pip install stpyv8

The install may fail with a externally-managed-environment error:

error: externally-managed-environment

× This environment is externally managed

The wheel can be downloaded and forcefully installed. The following commands download and install version 13.0.245.16 for Python 3.13 on darwin-arm:

curl -LO https://github.com/cloudflare/stpyv8/releases/download/v13.0.245.16/stpyv8-13.0.245.16-cp313-cp313-macosx_14_0_arm64.whl
sudo python -m pip install --upgrade stpyv8-13.0.245.16-cp313-cp313-macosx_14_0_arm64.whl --break-system-packages
  1. Download the SheetJS standalone script and test file. Move both files to the project directory:
curl -LO https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js
curl -LO https://docs.sheetjs.com/pres.xlsx
  1. Download sheetjs-stpyv8.py:
curl -LO https://docs.sheetjs.com/v8/sheetjs-stpyv8.py
  1. Run the script and pass pres.xlsx as the first argument:
python sheetjs-stpyv8.py pres.xlsx

The script will display CSV rows from the first worksheet. It will also create SheetJSSTPyV8.xlsb, a workbook that can be opened with a spreadsheet editor.

Snapshots

At a high level, V8 snapshots are raw dumps of the V8 engine state. It is much more efficient for programs to load snapshots than to evaluate code.

Snapshot Demo

There are two parts to this demo:

A) The snapshot command creates a snapshot with the SheetJS standalone script and supplementary NUMBERS script. It will dump the snapshot to snapshot.bin

B) The sheet2csv tool embeds snapshot.bin. The tool will parse a specified file, print CSV contents of a named worksheet, and export the workbook to NUMBERS.

Tested Deployments

This demo was last tested in the following deployments:

ArchitectureV8 VersionCrateDate
darwin-x6412.6.228.30.92.02024-05-28
darwin-arm12.6.228.30.92.02024-05-23
win10-x6412.3.219.90.88.02024-03-24
win11-x6412.6.228.30.92.02024-05-23
linux-x6412.3.219.90.88.02024-03-18
linux-arm12.6.228.30.92.02024-05-26
  1. Make a new folder for the project:
mkdir sheetjs2csv
cd sheetjs2csv
  1. Download the following scripts:
curl -o Cargo.toml https://docs.sheetjs.com/cli/Cargo.toml
curl -o snapshot.rs https://docs.sheetjs.com/cli/snapshot.rs
curl -o sheet2csv.rs https://docs.sheetjs.com/cli/sheet2csv.rs
  1. Download the SheetJS Standalone script and NUMBERS supplementary script. Move both scripts to the project directory:
curl -o xlsx.full.min.js https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js
curl -o xlsx.zahl.js https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.zahl.js
  1. Build the V8 snapshot:
cargo build --bin snapshot
cargo run --bin snapshot

In some tests, the Linux AArch64 build failed with an error:

error[E0080]: evaluation of constant value failed

|
1715 | assert!(size_of::<TypeId>() == size_of::<u64>());
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the evaluated program panicked at 'assertion failed: size_of::<TypeId>() == size_of::<u64>()'

Versions 0.75.1, 0.82.0, and 0.92.0 are known to work.

  1. Build sheet2csv (sheet2csv.exe in Windows):
cargo build --release --bin sheet2csv
  1. Download the test file https://docs.sheetjs.com/pres.numbers:
curl -o pres.numbers https://docs.sheetjs.com/pres.numbers
  1. Test the application:
mv target/release/sheet2csv .
./sheet2csv pres.numbers

Footnotes

  1. See read in "Reading Files" 2

  2. See "SheetJS Data Model" for more details on the object representation.

  3. See "API Reference" for a list of functions that ship with the library. "Spreadsheet Features" covers workbook and worksheet features that can be modified directly.

  4. See write in "Writing Files" 2

  5. See "Supported Output Formats" type in "Writing Files" 2

  6. The project does not have an official website. The official Rust crate is hosted on crates.io.

  7. The project does not have a separate website. The source repository is hosted on GitHub

  8. According to a maintainer, typed arrays were not supported in the original pyv8 project