Synthetic DOM
SheetJS is a JavaScript library for reading and writing data from spreadsheets.
SheetJS offers three methods to directly process HTML DOM TABLE elements:
table_to_sheet
1 generates a SheetJS worksheet2 from a TABLE elementtable_to_book
3 generates a SheetJS workbook4 from a TABLE elementsheet_add_dom
5 adds data from a TABLE element to an existing worksheet
These methods work in the web browser. NodeJS and other server-side platforms traditionally lack a DOM implementation, but third-party modules fill the gap.
This demo covers synthetic DOM implementations for non-browser platforms. We'll explore how to use SheetJS DOM methods in server-side environments to parse tables and export data to spreadsheets.
The most robust approach for server-side processing is to automate a headless web browser. "Browser Automation" includes demos.
Integration Details
Synthetic DOM implementations typically provide a function that accept a HTML
string and return an object that represents document
. An API method such as
getElementsByTagName
or querySelector
can pull TABLE elements.
SheetJS methods use features that may be missing from some DOM implementations.
Table rows
The rows
property of TABLE elements is a list of TR row children. This list
automatically updates when rows are added and deleted.
SheetJS methods do not mutate rows
. Assuming there are no nested tables, the
rows
property can be created using getElementsByTagName
:
tbl.rows = Array.from(tbl.getElementsByTagName("tr"));
Row cells
The cells
property of TR elements is a list of TD cell children. This list
automatically updates when cells are added and deleted.
SheetJS methods do not mutate cells
. Assuming there are no nested tables, the
cells
property can be created using getElementsByTagName
:
tbl.rows.forEach(row => row.cells = Array.from(row.getElementsByTagName("td")));
NodeJS
JSDOM
JSDOM is a DOM implementation for NodeJS. The synthetic DOM elements are compatible with SheetJS methods.
The following example scrapes the first table from the file SheetJSTable.html
and generates a XLSX workbook:
const XLSX = require("xlsx");
const { readFileSync } = require("fs");
const { JSDOM } = require("jsdom");
/* obtain HTML string. This example reads from SheetJSTable.html */
const html_str = readFileSync("SheetJSTable.html", "utf8");
/* get first TABLE element */
const doc = new JSDOM(html_str).window.document.querySelector("table");
/* generate workbook */
const workbook = XLSX.utils.table_to_book(doc);
XLSX.writeFile(workbook, "SheetJSDOM.xlsx");
This demo was tested in the following deployments:
JSDOM | Date |
---|---|
25.0.1 | 2024-10-30 |
24.1.3 | 2024-10-30 |
23.2.0 | 2024-10-30 |
22.1.0 | 2024-10-30 |
21.1.2 | 2024-10-30 |
20.0.3 | 2024-10-30 |
19.0.0 | 2024-10-30 |
18.1.1 | 2024-10-30 |
17.0.0 | 2024-10-30 |
16.7.0 | 2024-10-30 |
15.2.1 | 2024-10-30 |
14.1.0 | 2024-10-30 |
13.2.0 | 2024-10-30 |
12.2.0 | 2024-10-30 |
11.12.0 | 2024-10-30 |
10.1.0 | 2024-10-30 |
Complete Demo (click to show)
- Install SheetJS and JSDOM libraries:
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz [email protected]
-
Save the previous codeblock to
SheetJSDOM.js
. -
Download the sample
SheetJSTable.html
:
curl -LO https://docs.sheetjs.com/dom/SheetJSTable.html
- Run the script:
node SheetJSDOM.js
The script will create a file SheetJSDOM.xlsx
that can be opened.
HappyDOM
HappyDOM provides a DOM framework for NodeJS. Older versions required the following patches:
- TABLE
rows
property (explained above) - TR
cells
property (explained above)
HappyDOM 15.7.4
did not require any workarounds.
This demo was tested in the following deployments:
HappyDOM | Date |
---|---|
15.7.4 | 2024-10-30 |
14.12.3 | 2024-10-30 |
13.10.1 | 2024-10-30 |
12.10.3 | 2024-10-30 |
11.2.0 | 2024-10-30 |
10.11.2 | 2024-10-30 |
9.20.3 | 2024-10-30 |
8.9.0 | 2024-10-30 |
7.8.1 | 2024-10-30 |
6.0.4 | 2024-10-30 |
5.4.0 | 2024-10-30 |
4.1.0 | 2024-10-30 |
3.2.2 | 2024-10-30 |
2.55.0 | 2024-10-30 |
Complete Demo (click to show)
- Install SheetJS and HappyDOM libraries:
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz [email protected]
- Download the sample script
SheetJSHappyDOM.js
:
curl -LO https://docs.sheetjs.com/dom/SheetJSHappyDOM.js
- Download the sample
SheetJSTable.html
:
curl -LO https://docs.sheetjs.com/dom/SheetJSTable.html
- Run the script:
node SheetJSHappyDOM.js
The script will create a file SheetJSHappyDOM.xlsx
that can be opened.
XMLDOM
XMLDOM provides a DOM framework for NodeJS. For the
tested version (0.8.10
), the following patches were needed:
- TABLE
rows
property (explained above) - TR
cells
property (explained above) - Element
innerHTML
property:
Object.defineProperty(tbl.__proto__, "innerHTML", { get: function() {
var outerHTML = new XMLSerializer().serializeToString(this);
if(outerHTML.match(/</g).length == 1) return "";
return outerHTML.slice(0, outerHTML.lastIndexOf("</")).replace(/<[^"'>]*(("[^"]*"|'[^']*')[^"'>]*)*>/, "");
}});
This demo was tested in the following deployments:
XMLDOM | Date |
---|---|
0.9.5 | 2024-10-30 |
0.8.10 | 2024-10-30 |
Complete Demo (click to show)
- Install SheetJS and XMLDOM libraries:
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz @xmldom/[email protected]
- Download the sample script
SheetJSXMLDOM.js
:
curl -LO https://docs.sheetjs.com/dom/SheetJSXMLDOM.js
- Run the script:
node SheetJSXMLDOM.js
The script will create a file SheetJSXMLDOM.xlsx
that can be opened.
CheerioJS
Cheerio does not support a number of fundamental properties out of the box. They can be shimmed, but it is strongly recommended to use a more compliant library.
CheerioJS provides a DOM-like framework for NodeJS.
SheetJSCheerio.js
implements the missing
features to ensure that SheetJS DOM methods can process TABLE elements.
This demo was tested in the following deployments:
CheerioJS | Date |
---|---|
1.0.0 | 2024-10-30 |
1.0.0-rc.12 | 2024-10-30 |
Complete Demo (click to show)
- Install SheetJS and CheerioJS libraries:
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz [email protected]
- Download the sample script
SheetJSCheerio.js
:
curl -LO https://docs.sheetjs.com/dom/SheetJSCheerio.js
- Download the sample
SheetJSTable.html
:
curl -LO https://docs.sheetjs.com/dom/SheetJSTable.html
- Run the script:
node SheetJSCheerio.js
The script will create a file SheetJSCheerio.xlsx
that can be opened.
Other Platforms
DenoDOM
DenoDOM provides a DOM framework for Deno. For
the tested version (0.1.48
), the following patches were needed:
- TABLE
rows
property (explained above) - TR
cells
property (explained above)
This example fetches a sample table:
// @deno-types="https://cdn.sheetjs.com/xlsx-0.20.3/package/types/index.d.ts"
import * as XLSX from 'https://cdn.sheetjs.com/xlsx-0.20.3/package/xlsx.mjs';
import { DOMParser } from 'https://deno.land/x/[email protected]/deno-dom-wasm.ts';
const doc = new DOMParser().parseFromString(
await (await fetch('https://docs.sheetjs.com/dom/SheetJSTable.html')).text(),
"text/html",
)!;
const tbl = doc.querySelector("table");
/* patch DenoDOM element */
tbl.rows = tbl.querySelectorAll("tr");
tbl.rows.forEach(row => row.cells = row.querySelectorAll("td, th"))
/* generate workbook */
const workbook = XLSX.utils.table_to_book(tbl);
XLSX.writeFile(workbook, "SheetJSDenoDOM.xlsx");
This demo was tested in the following deployments:
Architecture | DenoDOM | Deno | Date |
---|---|---|---|
darwin-x64 | 0.1.48 | 2.0.4 | 2024-10-30 |
darwin-arm | 0.1.48 | 2.0.4 | 2024-10-30 |
win11-x64 | 0.1.48 | 2.0.4 | 2024-10-30 |
win11-arm | 0.1.48 | 2.0.4 | 2024-10-30 |
linux-x64 | 0.1.48 | 2.0.4 | 2024-10-30 |
linux-arm | 0.1.48 | 2.0.4 | 2024-10-30 |
Complete Demo (click to show)
-
Save the previous codeblock to
SheetJSDenoDOM.ts
. -
Run the script with
--allow-net
and--allow-write
entitlements:
deno run --allow-net --allow-write SheetJSDenoDOM.ts
The script will create a file SheetJSDenoDOM.xlsx
that can be opened.
Deno 2 additionally requires --allow-import
:
deno run --allow-net --allow-write --allow-import SheetJSDenoDOM.ts
Footnotes
-
See "Worksheet Object" in "SheetJS Data Model" for more details. ↩
-
See "Workbook Object" in "SheetJS Data Model" for more details. ↩