Skip to main content

Synthetic DOM

table_to_book / table_to_sheet / sheet_add_dom act on HTML DOM elements. Traditionally there is no DOM in server-side environments.

note

The most robust approach for server-side processing is to automate a headless web browser. "Browser Automation" includes demos.

This demo covers synthetic DOM implementations for non-browser platforms.

NodeJS

JSDOM

JSDOM is a DOM implementation for NodeJS. Given an HTML string, a reference to the table element plays nice with the SheetJS DOM methods:

const XLSX = require("xlsx");
const { JSDOM } = require("jsdom");

/* parse HTML */
const dom = new JSDOM(html_string);
/* get first TABLE element */
const tbl = dom.window.document.querySelector("table");
/* generate workbook */
const workbook = XLSX.utils.table_to_book(tbl);
XLSX.writeFile(workbook, "SheetJSDOM.xlsx");
Complete Demo (click to show)
note

This demo was last tested on 2023 May 18 against JSDOM 22.0.0

1) Install SheetJS and JSDOM libraries:

npm i --save https://cdn.sheetjs.com/xlsx-0.19.3/xlsx-0.19.3.tgz [email protected]

2) Save the following script to SheetJSDOM.js:

SheetJSDOM.js
const XLSX = require("xlsx");
const { readFileSync } = require("fs");
const { JSDOM } = require("jsdom");

/* obtain HTML string. This example reads from SheetJSTable.html */
const html_str = readFileSync("SheetJSTable.html", "utf8");
/* get first TABLE element */
const doc = new JSDOM(html_str).window.document.querySelector("table");
/* generate workbook */
const workbook = XLSX.utils.table_to_book(doc);
XLSX.writeFile(workbook, "SheetJSDOM.xlsx");

3) Download the sample SheetJSTable.html:

curl -LO https://docs.sheetjs.com/dom/SheetJSTable.html

4) Run the script:

node SheetJSDOM.js

The script will create a file SheetJSDOM.xlsx that can be opened.

XMLDOM

XMLDOM provides a DOM framework for NodeJS. Given an HTML string, a reference to the table element works with the SheetJS DOM methods after patching the object.

Complete Demo (click to show)
note

This demo was last tested on 2023 May 18 against XMLDOM 0.8.7

1) Install SheetJS and XMLDOM libraries:

npm i --save https://cdn.sheetjs.com/xlsx-0.19.3/xlsx-0.19.3.tgz @xmldom/[email protected]

2) Save the following codeblock to SheetJSXMLDOM.js:

SheetJSXMLDOM.js
const XLSX = require("xlsx");
const { DOMParser, XMLSerializer } = require("@xmldom/xmldom");

(async() => {
const text = await (await fetch('https://docs.sheetjs.com/dom/SheetJSTable.html')).text();
const doc = new DOMParser().parseFromString( text, "text/html");
const tbl = doc.getElementsByTagName("table")[0];

/* patch XMLDOM */
tbl.rows = Array.from(tbl.getElementsByTagName("tr"));
tbl.rows.forEach(row => row.cells = Array.from(row.getElementsByTagName("td")))
Object.defineProperty(tbl.__proto__, "innerHTML", { get: function() {
var outerHTML = new XMLSerializer().serializeToString(this);
if(outerHTML.match(/</g).length == 1) return "";
return outerHTML.slice(0, outerHTML.lastIndexOf("</")).replace(/<[^"'>]*(("[^"]*"|'[^']*')[^"'>]*)*>/, "");
}});

const workbook = XLSX.utils.table_to_book(tbl);
XLSX.writeFile(workbook, "SheetJSXMLDOM.xlsx");
})();

3) Run the script:

node SheetJSXMLDOM.js

The script will create a file SheetJSXMLDOM.xlsx that can be opened.

CheerioJS

caution

Cheerio does not support a number of fundamental properties out of the box. They can be shimmed, but it is strongly recommended to use a more compliant library.

CheerioJS provides a DOM-like framework for NodeJS. Given an HTML string, a reference to the table element works with the SheetJS DOM methods with some prototype fixes. SheetJSCheerio.js is a complete script.

Complete Demo (click to show)
note

This demo was last tested on 2023 May 18 against Cheerio 1.0.0-rc.12

1) Install SheetJS and CheerioJS libraries:

npm i --save https://cdn.sheetjs.com/xlsx-0.19.3/xlsx-0.19.3.tgz [email protected]

2) Download the sample script SheetJSCheerio.js:

curl -LO https://docs.sheetjs.com/dom/SheetJSCheerio.js

3) Download the sample SheetJSTable.html:

curl -LO https://docs.sheetjs.com/dom/SheetJSTable.html

4) Run the script:

node SheetJSCheerio.js

The script will create a file SheetJSCheerio.xlsx that can be opened.

Other Platforms

DenoDOM

DenoDOM provides a DOM framework for Deno. Given an HTML string, a reference to the table element works with the SheetJS DOM methods after patching the object.

This example fetches a sample table:

SheetJSDenoDOM.ts
// @deno-types="https://cdn.sheetjs.com/xlsx-0.19.3/package/types/index.d.ts"
import * as XLSX from 'https://cdn.sheetjs.com/xlsx-0.19.3/package/xlsx.mjs';

import { DOMParser } from 'https://deno.land/x/[email protected]/deno-dom-wasm.ts';

const doc = new DOMParser().parseFromString(
await (await fetch('https://docs.sheetjs.com/dom/SheetJSTable.html')).text(),
"text/html",
)!;
const tbl = doc.querySelector("table");

/* patch DenoDOM element */
tbl.rows = tbl.querySelectorAll("tr");
tbl.rows.forEach(row => row.cells = row.querySelectorAll("td, th"))

/* generate workbook */
const workbook = XLSX.utils.table_to_book(tbl);
XLSX.writeFile(workbook, "SheetJSDenoDOM.xlsx");
Complete Demo (click to hide)
note

This demo was last tested on 2023 May 18 against DenoDOM 0.1.38

1) Save the previous codeblock to SheetJSDenoDOM.ts.

2) Run the script with --allow-net and --allow-write entitlements:

deno run --allow-net --allow-write SheetJSDenoDOM.ts

The script will create a file SheetJSDenoDOM.xlsx that can be opened.