Blazing Fast Data Processing with V8
V8 is an embeddable JavaScript engine written in C++. It powers Chromium and Chrome, NodeJS and Deno, Adobe UXP and other platforms.
SheetJS is a JavaScript library for reading and writing data from spreadsheets.
This demo uses V8 and SheetJS to read and write spreadsheets. We'll explore how to load SheetJS in a V8 context and process spreadsheets and structured data from C++ and Rust programs.
The "Complete Example" creates a C++ command-line tool for reading spreadsheet files and generating new workbooks. "Bindings" covers V8 engine bindings for other programming languages.
Integration Details
The SheetJS Standalone scripts can be parsed and evaluated in a V8 context.
This section describes a flow where the script is parsed and evaluated each time the program is run.
Using V8 snapshots, SheetJS libraries can be parsed and evaluated at build time. This greatly improves program startup time.
The "Snapshots" section includes a complete example.
Initialize V8
The official V8 hello-world
example covers initialization and cleanup. For the
purposes of this demo, the key variables are noted below:
v8::Isolate* isolate = v8::Isolate::New(create_params);
v8::Local<v8::Context> context = v8::Context::New(isolate);
The following helper function evaluates C strings as JS code:
v8::Local<v8::Value> eval_code(v8::Isolate *isolate, v8::Local<v8::Context> context, char* code, size_t sz = -1) {
v8::Local<v8::String> source = v8::String::NewFromUtf8(isolate, code, v8::NewStringType::kNormal, sz).ToLocalChecked();
v8::Local<v8::Script> script = v8::Script::Compile(context, source).ToLocalChecked();
return script->Run(context).ToLocalChecked();
}
Load SheetJS Scripts
The main library can be loaded by reading the scripts from the file system and evaluating in the V8 context:
/* simple wrapper to read the entire script file */
static char *read_file(const char *filename, size_t *sz) {
FILE *f = fopen(filename, "rb");
if(!f) return NULL;
long fsize; { fseek(f, 0, SEEK_END); fsize = ftell(f); fseek(f, 0, SEEK_SET); }
char *buf = (char *)malloc(fsize * sizeof(char));
*sz = fread((void *) buf, 1, fsize, f);
fclose(f);
return buf;
}
// ...
size_t sz; char *file = read_file("xlsx.full.min.js", &sz);
v8::Local<v8::Value> result = eval_code(isolate, context, file, sz);
To confirm the library is loaded, XLSX.version
can be inspected:
/* get version string */
v8::Local<v8::Value> result = eval_code(isolate, context, "XLSX.version");
v8::String::Utf8Value vers(isolate, result);
printf("SheetJS library version %s\n", *vers);
Reading Files
V8 supports ArrayBuffer
natively. Assuming buf
is a C byte array, with
length len
, the following code stores the data in a global ArrayBuffer
:
/* load C char array and save to an ArrayBuffer */
std::unique_ptr<v8::BackingStore> back = v8::ArrayBuffer::NewBackingStore(isolate, len);
memcpy(back->Data(), buf, len);
v8::Local<v8::ArrayBuffer> ab = v8::ArrayBuffer::New(isolate, std::move(back));
v8::Maybe<bool> res = context->Global()->Set(context, v8::String::NewFromUtf8Literal(isolate, "buf"), ab);
Once the raw data is pulled into the engine, the SheetJS read
method1 can
parse the data. It is recommended to attach the result to a global variable:
/* parse with SheetJS */
v8::Local<v8::Value> result = eval_code(isolate, context, "globalThis.wb = XLSX.read(buf)");
wb
, a SheetJS workbook object2, will be a variable in the JS environment
that can be inspected using the various SheetJS API functions3.
Writing Files
The SheetJS write
method4 generates file bytes from workbook objects. The
array
type5 instructs the library to generate ArrayBuffer
objects:
/* write with SheetJS using type: "array" */
v8::Local<v8::Value> result = eval_code(isolate, context, "XLSX.write(wb, {type:'array', bookType:'xlsb'})");
The underlying memory from an ArrayBuffer
can be pulled from the engine:
/* pull result back to C++ */
v8::Local<v8::ArrayBuffer> ab = v8::Local<v8::ArrayBuffer>::Cast(result);
size_t sz = ab->ByteLength();
char *buf = ab->Data();
The resulting buf
can be written to file with fwrite
.
Complete Example
This demo was tested in the following deployments:
V8 Version | Platform | OS Version | Compiler | Date |
---|---|---|---|---|
12.4.253 | darwin-x64 | macOS 14.4 | clang 15.0.0 | 2024-03-15 |
12.7.130 | darwin-arm | macOS 14.5 | clang 15.0.0 | 2024-05-25 |
12.5.48 | win10-x64 | Windows 10 | CL 19.39.33523 | 2024-03-24 |
12.5.48 | linux-x64 | HoloOS 3.5.17 | gcc 13.1.1 | 2024-03-21 |
12.7.130 | linux-arm | Debian 12 | gcc 12.2.0 | 2024-05-25 |
This program parses a file and prints CSV data from the first worksheet. It also generates an XLSB file and writes to the filesystem.
When the demo was last tested, there were errors in the official V8 embed guide. Corrected instructions are included below.
The build process is long and will test your patience.
Preparation
- Linux/MacOS
- Windows
- Prepare
/usr/local/lib
:
mkdir -p /usr/local/lib
cd /usr/local/lib
If this step throws a permission error, run the following commands:
sudo mkdir -p /usr/local/lib
sudo chmod 777 /usr/local/lib
- Follow the official "Visual Studio" installation steps.
Using the installer tool, the "Desktop development with C++" workload must be installed. In the sidebar, verify the following components are checked:
- "C++ ATL for latest ... build tools" (
v143
when last tested) - "C++ MFC for latest ... build tools" (
v143
when last tested)
In the "Individual components" tab, search for "Windows 11 SDK" and verify that "Windows 11 SDK (10.0.22621.0)" is checked.
Click "Modify" and allow the installer to finish.
The SDK debugging tools must be installed after the SDK is installed.
-
Using the Search bar, search "Apps & features".
-
When the setting panel opens, scroll down to "Windows Software Development Kit - Windows 10.0.22621 and click "Modify".
-
In the new window, select "Change" and click "Next"
-
Check "Debugging Tools for Windows" and click "Change"
The following git
settings should be changed:
git config --global core.autocrlf false
git config --global core.filemode false
git config --global branch.autosetuprebase always
- Download and install
depot_tools
:
- Linux/MacOS
- Windows
git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git
If this step throws a permission error, run the following commands and retry:
sudo mkdir -p /usr/local/lib
sudo chmod 777 /usr/local/lib
The bundle is a ZIP file that should be downloaded and extracted.
The demo was last tested on an exFAT-formatted drive (mounted at E:\
).
After extracting, verify that the depot_tools
folder is not read-only.
- Add the path to the
PATH
environment variable:
- Linux/MacOS
- Windows
export PATH="/usr/local/lib/depot_tools:$PATH"
At this point, it is strongly recommended to add the line to a shell startup
script such as .bashrc
or .zshrc
These instructions are for cmd
use. Do not run in PowerShell!
It is strongly recommended to use the "Developer Command Prompt" from Visual Studio as it prepares the console to run build tools.
set DEPOT_TOOLS_WIN_TOOLCHAIN=0
set PATH=E:\depot_tools;%PATH%
In addition, the vs2022_install
variable must be set to the Visual Studio
folder. For example, using the "Community Edition", the assignment should be
set vs2022_install="C:\Program Files\Microsoft Visual Studio\2022\Community"
These environment variables can be persisted in the Control Panel.
- Run
gclient
once to updatedepot_tools
:
- Linux/MacOS
- Windows
gclient
gclient
gclient
may throw errors related to git
and permissions issues:
fatal: detected dubious ownership in repository at 'E:/depot_tools'
'E:/depot_tools' is on a file system that doesnot record ownership
To add an exception for this directory, call:
git config --global --add safe.directory E:/depot_tools
These issues are related to the exFAT file system. They were resolved by running
the recommended commands and re-running gclient
.
There were errors pertaining to gitconfig
:
error: could not write config file E:/depot_tools/bootstrap-2@3_8_10_chromium_26_bin/git/etc/gitconfig: File exists
This can happen if the depot_tools
folder is read-only. The workaround is to
unset the read-only flag for the E:\depot_tools
folder.
Clone V8
- Create a base directory:
- Linux/MacOS
- Windows
mkdir -p ~/dev/v8
cd ~/dev/v8
fetch v8
cd v8
Note that the actual repo will be placed in ~/dev/v8/v8
.
cd E:\
mkdir v8
cd v8
fetch v8
cd v8
On exFAT, every cloned repo elicited the same git
permissions error. fetch
will fail with a clear remedy message such as
git config --global --add safe.directory E:/v8/v8
Run the command then run gclient sync
, repeating each time the command fails.
There were occasional git
conflict errors:
v8/tools/clang (ERROR)
----------------------------------------
[0:00:01] Started.
...
error: Your local changes to the following files would be overwritten by checkout:
plugins/FindBadRawPtrPatterns.cpp
...
Please commit your changes or stash them before you switch branches.
Aborting
error: could not detach HEAD
----------------------------------------
Error: 28> Unrecognized error, please merge or rebase manually.
28> cd E:\v8\v8\tools\clang && git rebase --onto 65ceb79efbc9d1dec9b1a0f4bc0b8d010b9d7a66 refs/remotes/origin/main
The recommended fix is to delete the referenced folder and re-run gclient sync
- Checkout the desired version. The following command pulls
12.7.130
:
git checkout tags/12.7.130 -b sample
The official documentation recommends:
git checkout refs/tags/12.7.130 -b sample -t
This command failed in local testing:
E:\v8\v8>git checkout refs/tags/12.7.130 -b sample -t
fatal: cannot set up tracking information; starting point 'refs/tags/12.7.130' is not a branch
Build V8
- Build the static library.
- Intel Mac
- ARM64 Mac
- Linux x64
- Linux ARM
- Windows
tools/dev/v8gen.py x64.release.sample
ninja -C out.gn/x64.release.sample v8_monolith
tools/dev/v8gen.py arm64.release.sample
ninja -C out.gn/arm64.release.sample v8_monolith
tools/dev/v8gen.py x64.release.sample
ninja -C out.gn/x64.release.sample v8_monolith
In some Linux x64 tests using GCC 12, there were build errors that stemmed from
warnings. The error messages included the tag -Werror
:
../../src/compiler/turboshaft/wasm-gc-type-reducer.cc:212:18: error: 'back_insert_iterator' may not intend to support class template argument deduction [-Werror,-Wctad-maybe-unsupported]
212 | std::back_insert_iterator(snapshots), [this](Block* pred) {
| ^
../../build/linux/debian_bullseye_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/stl_iterator.h:596:11: note: add a deduction guide to suppress this warning
596 | class back_insert_iterator
| ^
1 error generated.
This was resolved by manually editing out.gn/x64.release.sample/args.gn
. The
option treat_warnings_as_errors
should be set to false
:
treat_warnings_as_errors = false
tools/dev/v8gen.py arm64.release.sample
Append the following lines to out.gn/arm64.release.sample/args.gn
:
is_clang = false
treat_warnings_as_errors = false
Run the build:
ninja -C out.gn/arm64.release.sample v8_monolith
When this demo was last tested, an assertion failed:
../../src/base/small-vector.h: In instantiation of ‘class v8::base::SmallVector<std::pair<const v8::internal::compiler::turboshaft::PhiOp*, const v8::internal::compiler::turboshaft::OpIndex>, 16>’:
../../src/compiler/turboshaft/loop-unrolling-reducer.h:577:11: required from here
../../src/base/macros.h:215:55: error: static assertion failed: T should be trivially copyable
215 | static_assert(::v8::base::is_trivially_copyable<T>::value, \
| ^~~~~
../../src/base/small-vector.h:25:3: note: in expansion of macro ‘ASSERT_TRIVIALLY_COPYABLE’
25 | ASSERT_TRIVIALLY_COPYABLE(T);
| ^~~~~~~~~~~~~~~~~~~~~~~~~
The build passed after disabling the assertions:
class SmallVector {
// Currently only support trivially copyable and trivially destructible data
// types, as it uses memcpy to copy elements and never calls destructors.
//ASSERT_TRIVIALLY_COPYABLE(T);
//static_assert(std::is_trivially_destructible<T>::value);
public:
static constexpr size_t kInlineSize = kSize;
python3 tools\dev\v8gen.py -vv x64.release.sample
ninja -C out.gn\x64.release.sample v8_monolith
In local testing, the build sometimes failed with a dbghelp.dll
error:
Exception: dbghelp.dll not found in "C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\dbghelp.dll"
This issue was fixed by removing and reinstalling "Debugging Tools for Windows" from the Control Panel as described in step 0.
In local testing, the ninja
build failed with C++ deprecation errors:
../..\src/wasm/wasm-code-manager.h(670,28): error: 'atomic_load<v8::base::OwnedVector<const unsigned char>>' is deprecated: warning STL4029: std::atomic_*() overloads for shared_ptr are deprecated in C++20. The shared_ptr specialization of std::atomic provides superior functionality. You can define _SILENCE_CXX20_OLD_SHARED_PTR_ATOMIC_SUPPORT_DEPRECATION_WARNING or _SILENCE_ALL_CXX20_DEPRECATION_WARNINGS to suppress this warning. [-Werror,-Wdeprecated-declarations]
670 | auto wire_bytes = std::atomic_load(&wire_bytes_);
| ^
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.37.32822\include\memory(3794,1): note: 'atomic_load<v8::base::OwnedVector<const unsigned char>>' has been explicitly marked deprecated here
3794 | _CXX20_DEPRECATE_OLD_SHARED_PTR_ATOMIC_SUPPORT _NODISCARD shared_ptr<_Ty> atomic_load(
| ^
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.37.32822\include\yvals_core.h(1317,7): note: expanded from macro '_CXX20_DEPRECATE_OLD_SHARED_PTR_ATOMIC_SUPPORT'
1317 | [[deprecated("warning STL4029: " \
| ^
2 errors generated.
The workaround is to append a line to out.gn\x64.release.sample\args.gn
:
treat_warnings_as_errors = false
After adding the line, run the ninja
command again:
ninja -C out.gn\x64.release.sample v8_monolith
- Ensure the sample
hello-world
compiles and runs:
- Intel Mac
- ARM64 Mac
- Linux x64
- Linux ARM
- Windows
g++ -I. -Iinclude samples/hello-world.cc -o hello_world -fno-rtti -lv8_monolith \
-ldl -Lout.gn/x64.release.sample/obj/ -pthread \
-std=c++17 -DV8_COMPRESS_POINTERS=1 -DV8_ENABLE_SANDBOX
./hello_world
In older V8 versions, the flags -lv8_libbase -lv8_libplatform
were required.
Linking against libv8_libbase
or libv8_libplatform
in V8 version 12.4.253
elicited linker errors:
ld: multiple errors: unknown file type in '/Users/test/dev/v8/v8/out.gn/x64.release.sample/obj/libv8_libplatform.a'; unknown file type in '/Users/test/dev/v8/v8/out.gn/x64.release.sample/obj/libv8_libbase.a'
g++ -I. -Iinclude samples/hello-world.cc -o hello_world -fno-rtti -lv8_monolith \
-ldl -Lout.gn/arm64.release.sample/obj/ -pthread \
-std=c++17 -DV8_COMPRESS_POINTERS=1 -DV8_ENABLE_SANDBOX
./hello_world
In older V8 versions, the flags -lv8_libbase -lv8_libplatform
were required.
Linking against libv8_libbase
or libv8_libplatform
in V8 version 12.4.253
elicited linker errors:
ld: multiple errors: unknown file type in '/Users/test/dev/v8/v8/out.gn/x64.release.sample/obj/libv8_libplatform.a'; unknown file type in '/Users/test/dev/v8/v8/out.gn/x64.release.sample/obj/libv8_libbase.a'
g++ -I. -Iinclude samples/hello-world.cc -o hello_world -fno-rtti -lv8_monolith \
-lv8_libbase -lv8_libplatform -ldl -Lout.gn/x64.release.sample/obj/ -pthread \
-std=c++17 -DV8_COMPRESS_POINTERS=1 -DV8_ENABLE_SANDBOX
./hello_world
g++ -I. -Iinclude samples/hello-world.cc -o hello_world -fno-rtti -lv8_monolith \
-lv8_libbase -lv8_libplatform -ldl -Lout.gn/arm64.release.sample/obj/ -pthread \
-std=c++17 -DV8_COMPRESS_POINTERS=1 -DV8_ENABLE_SANDBOX
./hello_world
cl /I. /Iinclude samples/hello-world.cc /GR- v8_monolith.lib Advapi32.lib Winmm.lib Dbghelp.lib /std:c++17 /DV8_COMPRESS_POINTERS=1 /DV8_ENABLE_SANDBOX /link /out:hello_world.exe /LIBPATH:out.gn\x64.release.sample\obj\
.\hello_world.exe
Prepare Project
- Make a new project folder:
- Linux/MacOS
- Windows
cd ~/dev
mkdir -p sheetjs-v8
cd sheetjs-v8
cd E:\
mkdir sheetjs-v8
cd sheetjs-v8
- Copy the sample source:
- Linux/MacOS
- Windows
cp ~/dev/v8/v8/samples/hello-world.cc .
- Create symbolic links to the
include
headers andobj
library folders:
- Intel Mac
- ARM64 Mac
- Linux x64
- Linux ARM
ln -s ~/dev/v8/v8/include
ln -s ~/dev/v8/v8/out.gn/x64.release.sample/obj
ln -s ~/dev/v8/v8/include
ln -s ~/dev/v8/v8/out.gn/arm64.release.sample/obj
ln -s ~/dev/v8/v8/include
ln -s ~/dev/v8/v8/out.gn/x64.release.sample/obj
ln -s ~/dev/v8/v8/include
ln -s ~/dev/v8/v8/out.gn/arm64.release.sample/obj
copy E:\v8\v8\samples\hello-world.cc .\
- Observe that exFAT does not support symbolic links and move on to step 11.
- Build and run the
hello-world
example from this folder:
- Linux/MacOS
- Windows
g++ -I. -Iinclude hello-world.cc -o hello_world -fno-rtti -lv8_monolith \
-lv8_libbase -lv8_libplatform -ldl -Lobj/ -pthread -std=c++17 \
-DV8_COMPRESS_POINTERS=1 -DV8_ENABLE_SANDBOX
./hello_world
In some V8 versions, the command failed in the linker stage:
ld: multiple errors: unknown file type in '/Users/test/dev/v8/v8/out.gn/x64.release.sample/obj/libv8_libplatform.a'; unknown file type in '/Users/test/dev/v8/v8/out.gn/x64.release.sample/obj/libv8_libbase.a'
The build succeeds after removing libv8_libbase
and libv8_libplatform
:
g++ -I. -Iinclude hello-world.cc -o hello_world -fno-rtti -lv8_monolith \
-ldl -Lobj/ -pthread -std=c++17 \
-DV8_COMPRESS_POINTERS=1 -DV8_ENABLE_SANDBOX
./hello_world
cl /MT /I..\v8\v8\ /I..\v8\v8\include hello-world.cc /GR- v8_monolith.lib Advapi32.lib Winmm.lib Dbghelp.lib /std:c++17 /DV8_COMPRESS_POINTERS=1 /DV8_ENABLE_SANDBOX /link /out:hello_world.exe /LIBPATH:..\v8\v8\out.gn\x64.release.sample\obj\
.\hello_world.exe
Add SheetJS
- Download the SheetJS Standalone script and test file. Save both files in the project directory:
curl -LO https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js
curl -LO https://docs.sheetjs.com/pres.numbers
- Download
sheetjs.v8.cc
:
curl -LO https://docs.sheetjs.com/v8/sheetjs.v8.cc
- Compile standalone
sheetjs.v8
binary
- Linux/MacOS
- Windows
g++ -I. -Iinclude sheetjs.v8.cc -o sheetjs.v8 -fno-rtti -lv8_monolith \
-lv8_libbase -lv8_libplatform -ldl -Lobj/ -pthread -std=c++17 \
-DV8_COMPRESS_POINTERS=1 -DV8_ENABLE_SANDBOX
In some V8 versions, the command failed in the linker stage:
ld: multiple errors: unknown file type in '/Users/test/dev/v8/v8/out.gn/x64.release.sample/obj/libv8_libplatform.a'; unknown file type in '/Users/test/dev/v8/v8/out.gn/x64.release.sample/obj/libv8_libbase.a'
The build succeeds after removing libv8_libbase
and libv8_libplatform
:
g++ -I. -Iinclude sheetjs.v8.cc -o sheetjs.v8 -fno-rtti -lv8_monolith \
-ldl -Lobj/ -pthread -std=c++17 \
-DV8_COMPRESS_POINTERS=1 -DV8_ENABLE_SANDBOX
cl /MT /I..\v8\v8\ /I..\v8\v8\include sheetjs.v8.cc /GR- v8_monolith.lib Advapi32.lib Winmm.lib Dbghelp.lib /std:c++17 /DV8_COMPRESS_POINTERS=1 /DV8_ENABLE_SANDBOX /link /out:sheetjs.v8.exe /LIBPATH:..\v8\v8\out.gn\x64.release.sample\obj\
- Run the demo:
- Linux/MacOS
- Windows
./sheetjs.v8 pres.numbers
.\sheetjs.v8.exe pres.numbers
If the program succeeded, the CSV contents will be printed to console and the
file sheetjsw.xlsb
will be created. That file can be opened with Excel.
Bindings
Bindings exist for many languages. As these bindings require "native" code, they may not work on every platform.
Rust
The v8
crate6 provides binary builds and straightforward bindings. The Rust
code is similar to the C++ code.
Pulling data from an ArrayBuffer
back into Rust involves an unsafe operation:
/* assuming JS code returns an ArrayBuffer, copy result to a Vec<u8> */
fn eval_code_ab(scope: &mut v8::HandleScope, code: &str) -> Vec<u8> {
let source = v8::String::new(scope, code).unwrap();
let script = v8::Script::compile(scope, source, None).unwrap();
let result: v8::Local<v8::ArrayBuffer> = script.run(scope).unwrap().try_into().unwrap();
/* In C++, `Data` returns a pointer. Collecting data into Vec<u8> is unsafe */
unsafe { return std::slice::from_raw_parts_mut(
result.data().unwrap().cast::<u8>().as_ptr(),
result.byte_length()
).to_vec(); }
}
This demo was last tested in the following deployments:
Architecture | V8 Crate | Date |
---|---|---|
darwin-x64 | 0.92.0 | 2024-05-28 |
darwin-arm | 0.92.0 | 2024-05-25 |
win10-x64 | 0.89.0 | 2024-03-24 |
linux-x64 | 0.91.0 | 2024-04-25 |
linux-arm | 0.92.0 | 2024-05-25 |
- Create a new project:
cargo new sheetjs-rustyv8
cd sheetjs-rustyv8
cargo run
- Add the
v8
crate:
cargo add v8
cargo run
- Download the SheetJS Standalone script and test file. Save both files in the project directory:
curl -LO https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js
curl -LO https://docs.sheetjs.com/pres.numbers
- Download
main.rs
and replacesrc/main.rs
:
curl -L -o src/main.rs https://docs.sheetjs.com/v8/main.rs
- Build and run the app:
cargo run pres.numbers
If the program succeeded, the CSV contents will be printed to console and the
file sheetjsw.xlsb
will be created. That file can be opened with Excel.
Java
Javet is a Java binding to the V8 engine. Javet simplifies conversions between Java data structures and V8 equivalents.
Java byte arrays (byte[]
) are projected in V8 as Int8Array
. The SheetJS
read
method expects a Uint8Array
. The following script snippet performs a
zero-copy conversion:
// assuming `i8` is an Int8Array
const u8 = new Uint8Array(i8.buffer, i8.byteOffset, i8.length);
This demo was last tested in the following deployments:
Architecture | V8 Version | Javet | Java | Date |
---|---|---|---|---|
darwin-x64 | 12.6.228.13 | 3.1.3 | 22 | 2024-06-19 |
darwin-arm | 12.6.228.13 | 3.1.3 | 11.0.23 | 2024-06-19 |
win10-x64 | 12.6.228.13 | 3.1.3 | 11.0.16 | 2024-06-21 |
linux-x64 | 12.6.228.13 | 3.1.3 | 17.0.7 | 2024-06-20 |
linux-arm | 12.6.228.13 | 3.1.3 | 17.0.11 | 2024-06-20 |
- Create a new project:
mkdir sheetjs-javet
cd sheetjs-javet
- Download the Javet JAR. There are different archives for different platforms.
- Linux/MacOS
- Windows
- Intel Mac
- ARM64 Mac
- Linux x64
- Linux ARM
curl -LO https://repo1.maven.org/maven2/com/caoccao/javet/javet-macos/3.1.3/javet-macos-3.1.3.jar
curl -LO https://repo1.maven.org/maven2/com/caoccao/javet/javet-macos/3.1.3/javet-macos-3.1.3.jar
curl -LO https://repo1.maven.org/maven2/com/caoccao/javet/javet/3.1.3/javet-3.1.3.jar
curl -LO https://repo1.maven.org/maven2/com/caoccao/javet/javet-linux-arm64/3.1.3/javet-linux-arm64-3.1.3.jar
curl -LO https://repo1.maven.org/maven2/com/caoccao/javet/javet/3.1.3/javet-3.1.3.jar
- Download the SheetJS Standalone script and test file. Save both files in the project directory:
curl -LO https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js
curl -LO https://docs.sheetjs.com/pres.xlsx
- Download
SheetJSJavet.java
:
curl -LO https://docs.sheetjs.com/v8/SheetJSJavet.java
- Build and run the Java application:
- Linux/MacOS
- Windows
- Intel Mac
- ARM64 Mac
- Linux x64
- Linux ARM
javac -cp ".:javet-macos-3.1.3.jar" SheetJSJavet.java
java -cp ".:javet-macos-3.1.3.jar" SheetJSJavet pres.xlsx
javac -cp ".:javet-macos-3.1.3.jar" SheetJSJavet.java
java -cp ".:javet-macos-3.1.3.jar" SheetJSJavet pres.xlsx
javac -cp ".:javet-3.1.3.jar" SheetJSJavet.java
java -cp ".:javet-3.1.3.jar" SheetJSJavet pres.xlsx
javac -cp ".:javet-linux-arm64-3.1.3.jar" SheetJSJavet.java
java -cp ".:javet-linux-arm64-3.1.3.jar" SheetJSJavet pres.xlsx
javac -cp ".;javet-3.1.3.jar" SheetJSJavet.java
java -cp ".;javet-3.1.3.jar" SheetJSJavet pres.xlsx
If the program succeeded, the CSV contents will be printed to console.
C#
ClearScript is a .NET interface to the V8 engine.
C# byte arrays (byte[]
) must be explicitly converted to arrays of bytes:
/* read data into a byte array */
byte[] filedata = File.ReadAllBytes("pres.numbers");
/* generate a JS Array (variable name `buf`) from the data */
engine.Script.buf = engine.Script.Array.from(filedata);
/* parse data */
engine.Evaluate("var wb = XLSX.read(buf, {type: 'array'});");
This demo was last tested in the following deployments:
Architecture | V8 Version | Date |
---|---|---|
darwin-x64 | 12.3.219.12 | 2024-07-16 |
darwin-arm | 12.3.219.12 | 2024-07-16 |
win10-x64 | 12.3.219.12 | 2024-07-16 |
win11-arm | 12.3.219.12 | 2024-07-16 |
linux-x64 | 12.3.219.12 | 2024-07-16 |
linux-arm | 12.3.219.12 | 2024-07-16 |
- Set the
DOTNET_CLI_TELEMETRY_OPTOUT
environment variable to1
.
How to disable telemetry (click to hide)
- Linux/MacOS
- Windows
Add the following line to .profile
, .bashrc
and .zshrc
:
export DOTNET_CLI_TELEMETRY_OPTOUT=1
Close and restart the Terminal to load the changes.
Type env
in the search bar and select "Edit the system environment variables".
In the new window, click the "Environment Variables..." button.
In the new window, look for the "System variables" section and click "New..."
Set the "Variable name" to DOTNET_CLI_TELEMETRY_OPTOUT
and the value to 1
.
Click "OK" in each window (3 windows) and restart your computer.
- Install .NET
Installation Notes (click to show)
For macOS x64 and ARM64, install the dotnet-sdk
Cask with Homebrew:
brew install --cask dotnet-sdk
For Steam Deck Holo and other Arch Linux x64 distributions, the dotnet-sdk
and
dotnet-runtime
packages should be installed using pacman
:
sudo pacman -Syu dotnet-sdk dotnet-runtime
https://dotnet.microsoft.com/en-us/download/dotnet/6.0 is the official source for Windows and ARM64 Linux versions.
-
Open a new Terminal window in macOS or PowerShell window in Windows.
-
Create a new project:
mkdir SheetJSClearScript
cd SheetJSClearScript
dotnet new console
dotnet run
- Add ClearScript to the project:
dotnet add package Microsoft.ClearScript.Complete --version 7.4.5
- Download the SheetJS standalone script and test file. Move both files to the project directory:
curl -LO https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js
curl -LO https://docs.sheetjs.com/pres.xlsx
- Replace
Program.cs
with the following:
using Microsoft.ClearScript.JavaScript;
using Microsoft.ClearScript.V8;
/* initialize ClearScript */
var engine = new V8ScriptEngine();
/* Load SheetJS Scripts */
engine.Evaluate(File.ReadAllText("xlsx.full.min.js"));
Console.WriteLine("SheetJS version {0}", engine.Evaluate("XLSX.version"));
/* Read and Parse File */
byte[] filedata = File.ReadAllBytes(args[0]);
engine.Script.buf = engine.Script.Array.from(filedata);
engine.Evaluate("var wb = XLSX.read(buf, {type: 'array'});");
/* Print CSV of first worksheet */
engine.Evaluate("var ws = wb.Sheets[wb.SheetNames[0]];");
var csv = engine.Evaluate("XLSX.utils.sheet_to_csv(ws)");
Console.Write(csv);
/* Generate XLSB file and save to SheetJSJint.xlsb */
var xlsb = (ITypedArray<byte>)engine.Evaluate("XLSX.write(wb, {bookType: 'xlsb', type: 'buffer'})");
File.WriteAllBytes("SheetJSClearScript.xlsb", xlsb.ToArray());
After saving, run the program and pass the test file name as an argument:
dotnet run pres.xlsx
If successful, the program will print the contents of the first sheet as CSV
rows. It will also create SheetJSClearScript.xlsb
, a workbook that can be
opened in a spreadsheet editor.
Python
pyv8
is a Python wrapper for V8.
The stpyv8
package7 is an actively-maintained fork with binary wheels.
When this demo was last tested, there was no direct conversion between Python
bytes
and JavaScript ArrayBuffer
data.
This is a known issue8. The current recommendation is Base64 strings.
Python Base64 Strings
The SheetJS read
1 and write
4 methods support Base64 strings through
the base64
type5.
Reading Files
It is recommended to create a global context with a special method that handles
file reading from Python. The read_file
helper in the following snippet will
read bytes from sheetjs.xlsx
and generate a Base64 string:
from base64 import b64encode;
from STPyV8 import JSContext, JSClass;
# Create context with methods for file i/o
class Base64Context(JSClass):
def read_file(self, path):
with open(path, "rb") as f:
data = f.read();
return b64encode(data).decode("ascii");
globals = Base64Context();
# The JSContext starts and cleans up the V8 engine
with JSContext(globals) as ctxt:
print(ctxt.eval("read_file('sheetjs.xlsx')")); # read base64 data and print
Writing Files
Since the SheetJS write
method returns a Base64 string, the result can be
decoded and written to file from Python:
from base64 import b64decode;
from STPyV8 import JSContext;
# The JSContext starts and cleans up the V8 engine
with JSContext() as ctxt:
# ... initialization and workbook creation ...
xlsb = ctxt.eval("XLSX.write(wb, {type: 'base64', bookType: 'xlsb'})");
with open("SheetJSSTPyV8.xlsb", "wb") as f:
f.write(b64decode(xlsb));
Python Demo
This demo was last tested in the following deployments:
Architecture | V8 Version | Python | Date |
---|---|---|---|
darwin-arm | 13.0.245.16 | 3.13.0 | 2024-10-20 |
- Make a new folder for the project:
mkdir sheetjs-stpyv8
cd sheetjs-stpyv8
- Install
stpyv8
:
pip install stpyv8
The install may fail with a externally-managed-environment
error:
error: externally-managed-environment
× This environment is externally managed
The wheel can be downloaded and forcefully installed. The following commands
download and install version 13.0.245.16
for Python 3.13
on darwin-arm
:
curl -LO https://github.com/cloudflare/stpyv8/releases/download/v13.0.245.16/stpyv8-13.0.245.16-cp313-cp313-macosx_14_0_arm64.whl
sudo python -m pip install --upgrade stpyv8-13.0.245.16-cp313-cp313-macosx_14_0_arm64.whl --break-system-packages
- Download the SheetJS standalone script and test file. Move both files to the project directory:
curl -LO https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js
curl -LO https://docs.sheetjs.com/pres.xlsx
- Download
sheetjs-stpyv8.py
:
curl -LO https://docs.sheetjs.com/v8/sheetjs-stpyv8.py
- Run the script and pass
pres.xlsx
as the first argument:
python sheetjs-stpyv8.py pres.xlsx
The script will display CSV rows from the first worksheet. It will also create
SheetJSSTPyV8.xlsb
, a workbook that can be opened with a spreadsheet editor.
Snapshots
At a high level, V8 snapshots are raw dumps of the V8 engine state. It is much more efficient for programs to load snapshots than to evaluate code.
Snapshot Demo
There are two parts to this demo:
A) The snapshot
command creates a snapshot with
the SheetJS standalone script
and supplementary NUMBERS script. It
will dump the snapshot to snapshot.bin
B) The sheet2csv
tool embeds snapshot.bin
.
The tool will parse a specified file, print CSV contents of a named worksheet,
and export the workbook to NUMBERS.
This demo was last tested in the following deployments:
Architecture | V8 Version | Crate | Date |
---|---|---|---|
darwin-x64 | 12.6.228.3 | 0.92.0 | 2024-05-28 |
darwin-arm | 12.6.228.3 | 0.92.0 | 2024-05-23 |
win10-x64 | 12.3.219.9 | 0.88.0 | 2024-03-24 |
win11-x64 | 12.6.228.3 | 0.92.0 | 2024-05-23 |
linux-x64 | 12.3.219.9 | 0.88.0 | 2024-03-18 |
linux-arm | 12.6.228.3 | 0.92.0 | 2024-05-26 |
- Make a new folder for the project:
mkdir sheetjs2csv
cd sheetjs2csv
- Download the following scripts:
curl -o Cargo.toml https://docs.sheetjs.com/cli/Cargo.toml
curl -o snapshot.rs https://docs.sheetjs.com/cli/snapshot.rs
curl -o sheet2csv.rs https://docs.sheetjs.com/cli/sheet2csv.rs
- Download the SheetJS Standalone script and NUMBERS supplementary script. Move both scripts to the project directory:
curl -o xlsx.full.min.js https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js
curl -o xlsx.zahl.js https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.zahl.js
- Build the V8 snapshot:
cargo build --bin snapshot
cargo run --bin snapshot
In some tests, the Linux AArch64 build failed with an error:
error[E0080]: evaluation of constant value failed
|
1715 | assert!(size_of::<TypeId>() == size_of::<u64>());
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the evaluated program panicked at 'assertion failed: size_of::<TypeId>() == size_of::<u64>()'
Versions 0.75.1
, 0.82.0
, and 0.92.0
are known to work.
- Build
sheet2csv
(sheet2csv.exe
in Windows):
cargo build --release --bin sheet2csv
- Download the test file https://docs.sheetjs.com/pres.numbers:
curl -o pres.numbers https://docs.sheetjs.com/pres.numbers
- Test the application:
- Linux/MacOS
- Windows
mv target/release/sheet2csv .
./sheet2csv pres.numbers
mv target/release/sheet2csv.exe .
.\sheet2csv.exe pres.numbers
Footnotes
-
See "SheetJS Data Model" for more details on the object representation. ↩
-
See "API Reference" for a list of functions that ship with the library. "Spreadsheet Features" covers workbook and worksheet features that can be modified directly. ↩
-
The project does not have an official website. The official Rust crate is hosted on
crates.io
. ↩ -
The project does not have a separate website. The source repository is hosted on GitHub ↩
-
According to a maintainer, typed arrays were not supported in the original
pyv8
project ↩