Spreadsheet Data in Python

Pandas is a Python library for data analysis.

SheetJS is a JavaScript library for reading and writing data from spreadsheets.

This demo uses SheetJS to process data from a spreadsheet and translate to the Pandas DataFrame format. We'll explore how to load SheetJS from Python scripts, generate DataFrames from workbooks, and write DataFrames back to workbooks.

The "Complete Example" includes a wrapper library that simplifies importing and exporting spreadsheets.

Pandas includes limited support for reading spreadsheets (pandas.from_excel) and writing XLSX spreadsheets (pandas.DataFrame.to_excel).

SheetJS supports many common spreadsheet formats that Pandas cannot process.

SheetJS operations also offer more flexibility in processing complex worksheets.

Tested Environments

This demo was tested in the following deployments:

Architecture	JS Engine	Pandas	Python	Date
`darwin-x64`	Duktape `2.7.0`	2.2.3	3.13.1	2025-03-31
`darwin-arm`	Duktape `2.7.0`	2.2.3	3.13.2	2025-03-30
`win11-x64`	Duktape `2.7.0`	2.2.3	3.11.9	2025-04-28
`win11-arm`	Duktape `2.7.0`	2.2.3	3.13.2	2025-02-23
`linux-x64`	Duktape `2.7.0`	2.1.4	3.12.3	2025-06-16
`linux-arm`	Duktape `2.7.0`	1.5.3	3.11.2	2025-02-16

Integration Details

sheetjs.py is a wrapper script that provides helper methods for reading and writing spreadsheets. Installation notes are included in the "Complete Example" section.

JS in Python

JS code cannot be directly evaluated in Python implementations.

To run JS code from Python, JavaScript engines¹ can be embedded in Python modules or dynamically loaded using the ctypes foreign function library². This demo uses ctypes with the Duktape engine.

Wrapper

The script exports a class named SheetJSWrapper. It is a context manager that initializes the Duktape engine and executes SheetJS scripts on entrance. All work should be performed in the context:

Complete Example
#!/usr/bin/env python3
from sheetjs import SheetJSWrapper

with SheetJSWrapper() as sheetjs:

  # Parse file
  wb = sheetjs.read_file("pres.numbers")
  print("Loaded file pres.numbers")

  # Get first worksheet name
  first_ws_name = wb.get_sheet_names()[0]
  print(f"Reading from sheet {first_ws_name}")

  # Generate DataFrame from first worksheet
  df = wb.get_df(first_ws_name)
  print(df.info())

  # Export DataFrame to XLSB
  sheetjs.write_df(df, "SheetJSPandas.xlsb", sheet_name="DataFrame")

Reading Files

sheetjs.read_file accepts a path to a spreadsheet file. It will parse the file and return an object representing the workbook.

The get_sheet_names method of the workbook returns a list of sheet names.

The get_df method of the workbook generates a DataFrame from the workbook. The specific sheet can be selected by passing the name.

For example, the following code reads pres.numbers and generates a DataFrame from the second worksheet:

Generating a DataFrame from the second worksheet
with SheetJSWrapper() as sheetjs:
  # Parse file
  wb = sheetjs.read_file(path)

  # Generate DataFrame from second worksheet
  ws_name = wb.get_sheet_names()[1]
  df = wb.get_df(ws_name)

  # Print metadata
  print(df.info())

Under the hood, sheetjs.py performs the following steps:

Pure Python operations read the spreadsheet file and generate a byte string.
SheetJS libraries parse the string and generate a clean CSV.

The read method³ parses file bytes into a SheetJS workbook object⁴
After selecting a worksheet, sheet_to_csv⁵ generates a CSV string

Python operations convert the CSV string to a stream object.⁶
The Pandas read_csv method⁷ ingests the stream and generate a DataFrame.

Writing Files

sheetjs.write_df accepts a DataFrame and a path. It will attempt to export the data to a spreadsheet file.

For example, the following code exports a DataFrame to SheetJSPandas.xlsb:

Exporting a DataFrame to XLSB
with SheetJSWrapper() as sheetjs:
  # Export DataFrame to XLSB
  sheetjs.write_df(df, "SheetJSPandas.xlsb", sheet_name="DataFrame")

Under the hood, sheetjs.py performs the following steps:

The Pandas DataFrame to_json method⁸ generates a JSON string.
JS engine operations translate the JSON string to an array of objects.
SheetJS libraries process the data array and generate file bytes.

The json_to_sheet method⁹ creates a SheetJS sheet object from the data.
The book_new method¹⁰ creates a SheetJS workbook that includes the sheet.
The write method¹¹ generates the spreadsheet file bytes.

Pure Python operations write the bytes to file.

Complete Example

This example will extract data from an Apple Numbers spreadsheet and generate a DataFrame. The DataFrame will be exported to the binary XLSB spreadsheet format.

The Windows build requires Visual Studio with "Desktop development with C++". Commands must be run in a "Native Tools Command Prompt" session.

Install Pandas:

python3 -m pip install pandas

On Windows, Python may be available as python.exe:

python.exe -m pip install pandas

On macOS and Linux, the install command may require root access:

sudo python3 -m pip install pandas

When pip is not installed, the command will fail:

/usr/bin/python3: No module named pip

pip must be installed.

In Debian Linux, pip can be installed through the package manager:

sudo apt-get install pip

On Arch Linux-based platforms including the Steam Deck, python-pip can be installed through the package manager:

sudo pacman -Syu python-pip

In some local tests, the install failed with the following error:

error: externally-managed-environment

Pandas must be installed through the package manager:

Debian and Ubuntu distributions:

sudo apt-get install python3-pandas

Arch Linux-based platforms including the Steam Deck:

sudo pacman -Syu python-pandas

macOS systems with a Python version from Homebrew:

sudo python3 -m pip install pandas --break-system-packages

Build the Duktape shared library:

MacOS
Linux
Windows

curl -LO https://duktape.org/duktape-2.7.0.tar.xz
tar -xJf duktape-2.7.0.tar.xz
cd duktape-2.7.0
make -f Makefile.sharedlibrary
cd ..

curl -LO https://duktape.org/duktape-2.7.0.tar.xz
tar -xJf duktape-2.7.0.tar.xz
cd duktape-2.7.0
make -f Makefile.sharedlibrary
cd ..

Download and extract the source tarball:

curl -LO https://duktape.org/duktape-2.7.0.tar.xz
tar -xJf duktape-2.7.0.tar.xz

Enter the source folder:

cd duktape-2.7.0

Edit src\duk_config.h and add the highlighted lines to the end of the file:

src\duk_config.h (add highlighted lines to end of file)
#endif  /* DUK_CONFIG_H_INCLUDED */

#define DUK_EXTERNAL_DECL  extern __declspec(dllexport)
#define DUK_EXTERNAL       __declspec(dllexport)

Build the Duktape DLL:

cl /O2 /W3 /Isrc /LD /DDUK_SINGLE_FILE /DDUK_F_DLL_BUILD /DDUK_F_WINDOWS /DDUK_COMPILING_DUKTAPE src\\duktape.c

Move up to the parent directory:

cd ..

Copy the shared library to the current folder. When the demo was last tested, the shared library file name differed by platform:

OS	name
macOS	`libduktape.207.20700.so`
Linux	`libduktape.so.207.20700`
Windows	`duktape.dll`

MacOS
Linux
Windows

cp duktape-*/libduktape.* .

cp duktape-*/libduktape.* .

copy duktape-2.7.0\duktape.dll .

Download the SheetJS Standalone script and move to the project directory:

curl -LO https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/shim.min.js
curl -LO https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js

Download the following test scripts and files:

curl -LO https://docs.sheetjs.com/pres.numbers
curl -LO https://docs.sheetjs.com/pandas/sheetjs.py
curl -LO https://docs.sheetjs.com/pandas/SheetJSPandas.py

Edit the sheetjs.py script.

The lib variable declares the path to the library:

sheetjs.py (edit highlighted line)
lib = "libduktape.207.20700.so"

MacOS
Linux
Windows

The name of the library is libduktape.207.20700.so:

sheetjs.py (change highlighted line)
lib = "libduktape.207.20700.so"

The name of the library is libduktape.so.207.20700:

sheetjs.py (change highlighted line)
lib = "./libduktape.so.207.20700"

The name of the library is duktape.dll:

sheetjs.py (change highlighted line)
lib = ".\\duktape.dll"

In addition, the following changes must be made:

str_to_c must be defined as follows:

sheetjs.py (replace str_to_c function)
def str_to_c(s):
  b = s
  if type(b) == str: b = s.encode("latin1")
  return [c_char_p(b), len(b)]

eval_file must open with mode rb:

sheetjs.py (edit highlighted line)
def eval_file(ctx, path):
  with open(path, "rb") as f:
    code = f.read()

Run the script:

python3 SheetJSPandas.py pres.numbers

On Windows, Python may be available as python.exe:

python.exe SheetJSPandas.py pres.numbers

If successful, the script will display DataFrame metadata:

RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   Name    5 non-null      object
 1   Index   5 non-null      int64
dtypes: int64(1), object(1)

It will also export the DataFrame to SheetJSPandas.xlsb. The file can be inspected with a spreadsheet editor that supports XLSB files.

Other Libraries

Other Python DataFrame libraries mirror the Pandas DataFrame API.

Polars

Polars is a similar DataFrame library that offers many features from Pandas DataFrames.

Polars includes limited support for reading and writing spreadsheets by wrapping third-party libraries. In practice, Polars communicates with the third-party libraries using intermediate CSV files.¹²

SheetJS supports many common spreadsheet formats that Polars cannot process.

SheetJS operations also offer more flexibility in processing complex worksheets.

The Pandas example requires a few slight changes to work with Polars:

Polars DataFrames expose write_json instead of to_json:

-  json = df.to_json(orient="records")
+  json = df.write_json(row_oriented=True)

Polars DataFrames do not expose info

Polars Demo

Tested Environments

This demo was tested in the following deployments:

Architecture	JS Engine	Polars	Python	Date
`darwin-x64`	Duktape `2.7.0`	1.26.0	3.13.1	2025-03-31
`darwin-arm`	Duktape `2.7.0`	1.26.0	3.13.2	2025-03-30
`win11-x64`	Duktape `2.7.0`	1.28.1	3.11.9	2025-04-28
`win11-arm`	Duktape `2.7.0`	1.23.0	3.13.2	2025-02-23
`linux-x64`	Duktape `2.7.0`	1.30.0	3.12.3	2025-06-16
`linux-arm`	Duktape `2.7.0`	1.22.0	3.11.2	2025-02-16

Follow the Pandas "Complete Example" through the end.
Edit sheetjs.py.

Near the top of the script, change the import from pandas to polars:

sheetjs.py (edit highlighted line)
from io import StringIO
from polars import read_csv

duk = CDLL(lib)

Within the export_df_to_wb function, change the df.to_json line:

sheetjs.py (edit highlighted line)
def export_df_to_wb(ctx, df, path, sheet_name="Sheet1", book_type=None):
  json = df.write_json()

Polars made a breaking change in the 1.0 release.

For 0.20 and earlier, the row_oriented option must be passed:

sheetjs.py (changes for Polars 0.20)
def export_df_to_wb(ctx, df, path, sheet_name="Sheet1", book_type=None):
  json = df.write_json(row_oriented=True)

Edit SheetJSPandas.py.

In the process function, change df.info() to df:

SheetJSPandas.py (edit highlighted line)
    # Generate DataFrame from first worksheet
    df = wb.get_df()
    print(df)

Change the export filename from SheetJSPandas.xlsb to SheetJSPolars.xlsb:

SheetJSPandas.py (edit highlighted line)
    # Export DataFrame to XLSB
    sheetjs.write_df(df, "SheetJSPolars.xlsb", sheet_name="DataFrame")

Install Polars:

python3 -m pip install polars

On Windows, Python may be available as python.exe:

python.exe -m pip install polars

On macOS and Linux, the install command may require root access:

sudo python3 -m pip install polars

On Windows, the C++ Clang Compiler for Windows component must be installed through the Visual Studio installer.

In some local tests, the install failed with the following error:

error: externally-managed-environment

It is recommended to use a virtual environment.

venv must be installed through the system package manager:

Debian and Ubuntu distributions:

sudo apt-get install python3-venv

venv is included in the python package in Arch Linux-based platforms.
macOS systems with a Python version from Homebrew:

brew install pyenv-virtualenv

After installing venv, the following commands set up the virtual environment:

mkdir sheetjs-polars
cd sheetjs-polars
python3 -m venv .
./bin/pip install polars
cp ../libduktape.* ../SheetJSPandas.py ../sheetjs.py ../*.js ../*.numbers .

Run the script:

python3 SheetJSPandas.py pres.numbers

On Windows, Python may be available as python.exe:

python.exe SheetJSPandas.py pres.numbers

If the virtual environment was configured in the previous step, run:

./bin/python3 SheetJSPandas.py pres.numbers

If successful, the script will display DataFrame data:

shape: (5, 2)
┌──────────────┬───────┐
│ Name         ┆ Index │
│ ---          ┆ ---   │
│ str          ┆ i64   │
╞══════════════╪═══════╡
│ Bill Clinton ┆ 42    │
│ GeorgeW Bush ┆ 43    │
│ Barack Obama ┆ 44    │
│ Donald Trump ┆ 45    │
│ Joseph Biden ┆ 46    │
└──────────────┴───────┘

It will also export the DataFrame to SheetJSPolars.xlsb. The file can be inspected with a spreadsheet editor that supports XLSB files.

See "JavaScript Engines" for more examples. ↩
See ctypes in the Python documentation. ↩
See read in "Reading Files" ↩
See "Workbook Object" ↩
See sheet_to_csv in "Utilities" ↩
See the examples in "IO tools" in the Pandas documentation. ↩
See pandas.read_csv in the Pandas documentation. ↩
See pandas.DataFrame.to_json in the Pandas documentation. ↩
See json_to_sheet in "Utilities" ↩
See book_new in "Utilities" ↩
See write in "Writing Files" ↩
As explained in the Polars documentation, "... the target Excel sheet is first converted to CSV ... and then parsed with Polars’ read_csv() function." ↩

Integration Details​

JS in Python​

Wrapper​

Reading Files​

Writing Files​

Complete Example​

Other Libraries​

Polars​

Polars Demo​

Footnotes​

Integration Details

JS in Python

Wrapper

Reading Files

Writing Files

Complete Example

Other Libraries

Polars

Polars Demo

Footnotes