Skip to main content

Spreadsheet Processing in Mathematica

Mathematica is a software system for mathematics and scientific computing. It supports command-line tools and JavaScript extensions.

SheetJS is a JavaScript library for reading and writing data from spreadsheets.

This demo uses SheetJS to pull data from a spreadsheet for further analysis within Mathematica. We'll explore how to run an external tool to generate CSV data from opaque spreadsheets and parse the data from Mathematica.

Tested Deployments

This demo was tested by SheetJS users in the following deployments:

ArchitectureVersionDate
darwin-x6414.02024-06-05
win10-x6414.02024-06-05

Integration Details

The SheetJS NodeJS module can be loaded in NodeJS scripts, including scripts invoked using the "NodeJS" mode of the ExternalEvaluate1 Mathematica function.

However, the current cross-platform recommendation involves a dedicated command line tool that leverages SheetJS libraries to to perform spreadsheet processing.

External Engines

The following diagram depicts the workbook waltz:

Mathematica

NodeJS can be activated from Mathematica using RegisterExternalEvaluator2. Once activated, JavaScript code can be run using ExternalEvaluate3. If the NodeJS code returns CSV data, ImportString4 can generate a Dataset5.

SheetJS

For a file residing on the filesystem, the SheetJS readFile function6 can generate a workbook object. The exact location can be determined by printing require("process").cwd()7 in ExternalEvaluate:

In[1]:= ExternalEvaluate["NodeJS", "require('process').cwd()"]
Out[1]= "C:\Users\Me\Documents"

After pulling the first worksheet8, the SheetJS sheet_to_csv function9 generates a CSV string.

Complete Function

The following function reads a file, parses the first worksheet and returns a Dataset object assuming one header row.

SheetJSImportFileEE
(* Import file stored in the Documents folder (e.g. C:\Users\Me\Documents) *)
SheetJSImportFileEE[filename_]:=Module[{csv}, (
(* This was required in local testing *)
RegisterExternalEvaluator["NodeJS","/usr/local/bin/node"];

(* Generate CSV from first sheet *)
csv:=ExternalEvaluate["NodeJS", StringJoin[
(* module installed in home directory *)
"var XLSX = require('xlsx');",
(* read specified filename *)
"var wb = XLSX.readFile('",filename,"');",
(* grab first worksheet *)
"var ws = wb.Sheets[wb.SheetNames[0]];",
(* convert to CSV *)
"XLSX.utils.sheet_to_csv(ws)"
]];

(* Parse CSV into a dataset *)
Return[ImportString[csv, "Dataset", "HeaderLines"->1]];
)]

Command-Line Tools

The "Command-Line Tools" demo creates xlsx-cli, a command-line tool that reads a spreadsheet file and generates CSV rows from the first worksheet.

ExternalEvaluate10 can run command-line tools and capture standard output. The following snippet processes ~/Downloads/pres.numbers and pulls CSV data into a variable in Mathematica:

cmd = "/usr/local/bin/xlsx-cli ~/Downloads/pres.numbers"
csvdata = ExternalEvaluate["Shell" -> "StandardOutput", cmd];

ImportString11 can interpret the CSV data as a Dataset12. Typically the first row of the CSV output is the header row. The HeaderLines13 option controls how Mathematica parses the data:

data = ImportString[csvdata, "Dataset", "HeaderLines" -> 1]

The following diagram depicts the workbook waltz:

Complete Demo

This demo tests the NodeJS external engine and dedicated command line tools.

NodeJS Engine

  1. Install NodeJS. When the demo was tested, version 20.14.0 was installed.

  2. Install dependencies in the Home folder (~ or $HOME or %HOMEPATH%):

npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz [email protected]
  1. Open a new Mathematica Notebook and register NodeJS. When the example was tested in Windows, the commands were:
RegisterExternalEvaluator["NodeJS","/usr/local/bin/node"]
FindExternalEvaluators["NodeJS"]

The second argument to RegisterExternalEvaluator should be the path to the node program, which can be found by running the following command in a new terminal window:

which node

If NodeJS is registered, the value in the "Registered" column will be "True".

  1. To determine the base folder, run require("process").cwd() from NodeJS:
ExternalEvaluate["NodeJS", "require('process').cwd()"]
  1. Download pres.numbers and move the file to the base folder as shown in the previous step.

  2. Copy, but do not run, the following snippet into the running notebook:

SheetJSImportFileEE
(* Import file stored in the Documents folder (e.g. C:\Users\Me\Documents) *)
SheetJSImportFileEE[filename_]:=Module[{csv}, (
(* This was required in local testing *)
RegisterExternalEvaluator["NodeJS","/usr/local/bin/node"];

(* Generate CSV from first sheet *)
csv:=ExternalEvaluate["NodeJS", StringJoin[
(* module installed in home directory *)
"var XLSX = require('xlsx');",
(* read specified filename *)
"var wb = XLSX.readFile('",filename,"');",
(* grab first worksheet *)
"var ws = wb.Sheets[wb.SheetNames[0]];",
(* convert to CSV *)
"XLSX.utils.sheet_to_csv(ws)"
]];

(* Parse CSV into a dataset *)
Return[ImportString[csv, "Dataset", "HeaderLines"->1]];
)]

After pasting, edit the highlighted line to reflect the path of the node or node.exe binary. This path was discovered in Step 2.

After editing the snippet, run the expression.

  1. Run the function and confirm the result is a proper Dataset:
SheetJSImportFileEE["pres.numbers"]

SheetJSImportFileEE result

Standalone Binary

  1. Create the standalone xlsx-cli binary14. The commands should be run in a Terminal or PowerShell window:
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz exit-on-epipe commander@2
curl -LO https://docs.sheetjs.com/cli/xlsx-cli.js
npx -y nexe -t 14.15.3 xlsx-cli.js
  1. Move the generated xlsx-cli to a fixed location in /usr/local/bin:
mkdir -p /usr/local/bin
mv xlsx-cli /usr/local/bin/

If there are permission errors, the command should be run with the root user:

sudo mv xlsx-cli /usr/local/bin/

Reading a Local File

  1. In a new Mathematica notebook, run the following snippet:
SheetJSImportFile
SheetJSImportFile[x_] := ImportString[Block[{Print}, ExternalEvaluate[
"Shell" -> "StandardOutput",
"/usr/local/bin/xlsx-cli " <> x
]], "Dataset", "HeaderLines" -> 1]
  1. Download https://docs.sheetjs.com/pres.numbers and save to Downloads folder:
cd ~/Downloads/
curl -LO https://docs.sheetjs.com/pres.numbers
  1. In the Mathematica notebook, run the new function. If the file was saved to the Downloads folder, the path will be "~/Downloads/pres.numbers" in macOS:
data = SheetJSImportFile["~/Downloads/pres.numbers"]

The result should be displayed in a concise table.

SheetJSImportFile result

Reading from a URL

FetchURL15 downloads a file from a specified URL and returns a path to the file. This function will be wrapped in a new function called SheetJSImportURL.

  1. In the same notebook, run the following:
SheetJSImportURL
Needs["Utilities`URLTools`"];
SheetJSImportURL[x_] := Module[{path},(
path = FetchURL[x];
SheetJSImportFile[path]
)];
  1. Test by downloading the test file in the notebook:
data = SheetJSImportURL["https://docs.sheetjs.com/pres.numbers"]

SheetJSImportURL result

Footnotes

  1. See the ExternalEvaluate Node.js example in the Mathematica documentation.

  2. See RegisterExternalEvaluator in the Mathematica documentation.

  3. See ExternalEvaluate in the Mathematica documentation.

  4. See ImportString in the Mathematica documentation.

  5. A Dataset will be created when using the "Dataset" element in ImportString

  6. See readFile in "Reading Files"

  7. See process.cwd() in the NodeJS documentation.

  8. The Sheets and SheetNames properties of workbook objects are described in "Workbook Object"

  9. See sheet_to_csv in "CSV and Text"

  10. See ExternalEvaluate in the Mathematica documentation.

  11. See ImportString in the Mathematica documentation.

  12. A Dataset will be created when using the "Dataset" element in ImportString

  13. See HeaderLines in the Mathematica documentation.

  14. See "Command-line Tools" for more details.

  15. Mathematica 11 introduced new methods including URLRead.