Skip to main content

Import Tutorial

Many government agencies distribute official data and statistics in workbooks. SheetJS libraries help translate these files to useful information.

The goal of this example is to process Federal Student Aid Portfolio data from a XLS worksheet. We will download and parse a workbook from the US Department of Education. Once the raw data is parsed, we will extract the total outstanding dollar amount and display the data in a table.

The "Live Demo" section includes a working demo in this page! "Run the Demo Locally" shows how to run the workflow in iOS / Android apps, desktop apps, NodeJS scripts and other environments.

The following sequence diagram shows the process:

Download File

The raw data is available in a XLS workbook1. It has been mirrored at https://docs.sheetjs.com/PortfolioSummary.xls

This official dataset is distributed in XLS workbooks.

SheetJS supports a number of legacy and modern formats, ensuring that historical data is not lost in the sands of time.

Downloading the file is straightforward with fetch:

const url = "https://docs.sheetjs.com/PortfolioSummary.xls";
const file = await (await fetch(url)).arrayBuffer();
Code Explanation (click to show)

fetch is a low-level API for downloading data from an endpoint. It separates the network step from the response parsing step.

Network Step

fetch(url) returns a Promise representing the network request. The browser will attempt to download data from the URL. If the network request succeeded, the Promise will "return" with a Response object.

Using modern syntax, inside an async function, code should await the fetch:

const response = await fetch(url);

Checking Status Code

If the file is not available, the fetch will still succeed.

The status code, stored in the status property of the Response object, is a standard HTTP status code number. Code should check the result.

Typically servers will return status 404 "File not Found" if the file is not available. A successful request should have status 200 "OK".

Extracting Data

Response#arrayBuffer will pull the raw bytes into an ArrayBuffer, an object which can represent the file data. Like fetch, the arrayBuffer method returns a Promise that must be await-ed:

const file = await response.arrayBuffer();

The Response object has other useful methods. Response#json will parse the data with JSON.parse, suitable for data from an API endpoint.

Production Use

Functions can test each part independently and report different errors:

async function get_file_from_endpoint(url) {
/* perform network request */
let response;
try {
response = await fetch(url);
} catch(e) {
/* network error */
throw new Error(`Network Error: ${e.message}`);
}

/* check status code */
if(response.status == 404) {
/* server 404 error -- file not found */
throw new Error("File not found");
}
if(response.status != 200) {
/* for most servers, a successful response will have status 200 */
throw new Error(`Server status ${response.status}: ${response.statusText}`);
}

/* get data */
let ab;
try {
ab = await response.arrayBuffer();
} catch(e) {
/* data error */
throw new Error(`Data Error: ${e.message}`);
}

return ab;
}

The file data is stored in an ArrayBuffer.

Parse File

With the file data in hand, XLSX.read2 parses the workbook:

const workbook = XLSX.read(file);

The workbook object follows the "Common Spreadsheet Format"3, an in-memory format for representing workbooks, worksheets, cells, and spreadsheet features.

Explore Dataset

Spreadsheets in the wild use many different inconsistent conventions.

To determine how to process the data, it is best to inspect the file first.

List Sheet Names

As explained in the "Workbook Object"4 section, the SheetNames property is a ordered list of the sheet names in the workbook.

The following live code block displays an ordered list of the sheet names:

Result
Loading...
Live Editor
function SheetJSheetNames() {
  const [names, setNames] = React.useState([]);
  React.useEffect(() => { (async() =>{
    /* parse workbook */
    const url = "https://docs.sheetjs.com/PortfolioSummary.xls";
    const file = await (await fetch(url)).arrayBuffer();
    const workbook = XLSX.read(file);
    /* display sheet names */
    setNames(workbook.SheetNames);
  })(); }, []);
  return ( <>
    <b>Sheet Names</b><br/>
    <ol start={0}>{names.map(n => (<li>{n}</li>))}</ol>
  </> )
}

Inspect Worksheet Data

The Sheets property of the workbook object5 is an object whose keys are sheet names and whose values are sheet objects. For example, the first worksheet is pulled by indexing SheetNames and using the name to index Sheets:

var first_sheet = workbook.Sheets[workbook.SheetNames[0]];

The actual worksheet object can be inspected directly6, but it is strongly recommended to use utility functions to present JS-friendly data structures.

Preview HTML

The sheet_to_html utility function7 generates an HTML table from worksheet objects. The following live example shows the first 20 rows of data in a table:

Live example (click to show)

SheetJS CE primarily focuses on data processing.

SheetJS Pro supports reading cell styles from files and generating styled HTML tables with colors, fonts, alignment and rich text.

Result
Loading...
Live Editor
function SheetJSHTMLView() {
  const [__html, setHTML] = React.useState("");
  React.useEffect(() => { (async() =>{
    /* parse workbook, limiting to 20 rows */
    const url = "https://docs.sheetjs.com/PortfolioSummary.xls";
    const workbook = XLSX.read(await (await fetch(url)).arrayBuffer(), {sheetRows:20});
    /* get first worksheet */
    const worksheet = workbook.Sheets[workbook.SheetNames[0]];
    /* generate and display HTML */
    const table = XLSX.utils.sheet_to_html(worksheet);
    setHTML(table);
  })(); }, []);
  return ( <div dangerouslySetInnerHTML={{__html}}/> );
}

The key points from looking at the table are:

  • The data starts on row 7
  • Rows 5 and 6 are the header rows, with merged cells for common titles
  • For yearly data (2007-2012), columns A and B are merged
  • For quarterly data (2013Q1 and later), column A stores the year. Cells may be merged vertically to span 4 quarters

Extract Data

Extract Raw Data

XLSX.utils.sheet_to_json8 generates arrays of data from worksheet objects.

For a complex layout like this, it is easiest to generate an "array of arrays" where each row is an array of cell values. The screenshot shows rows 5-8:

Rows 5-8

In the array of arrays, row 5 has a number of gaps corresponding to empty cells and cells that are covered in the merge ranges:

// Row 5 -- the gaps correspond to cells with no content
[ , , "Direct Loans", , "Federal Family Education Loans (FFEL)", , "Perkins Loans", , "Total1" ]

Row 7 includes the data for FY2007:

// Row 7 -- column B is covered by the merge
[ 2007, , 106.8, 7, 401.9, 22.6, 8.2, 2.8, 516, 28.3 ]

XLSX.utils.sheet_to_json will generate an array of arrays if the option header: 1 is specified9:

const worksheet = workbook.Sheets[workbook.SheetNames[0]];
const raw_data = XLSX.utils.sheet_to_json(worksheet, {header: 1});

Fill Merged Blocks

Cells A13:A16 are merged:

Rows 13-16

The merged data only applies to the top-left cell (A13). The array of arrays will have holes in cells A14:A16 (written as null):

// Row 13
[2013, "Q1", 508.7, 23.4, 444.9, 22.1, 8.2, 3, 961.9, 38.7]

// Row 14
[null, "Q2", 553, 24.1, 437, 21.6, 8.3, 3, 998.6, 38.9]

// Row 15
[null, "Q3", 569.2, 24.3, 429.5, 21.2, 8.2, 2.9, 1006.8, 38.7]

// Row 16
[null, "Q4", 609.1, 25.6, 423, 20.9, 8.1, 2.9, 1040.2, 39.6]
Live example (click to show)
Result
Loading...
Live Editor
function SheetJSAoAHoles() {
  const [rows, setRows] = React.useState([]);
  React.useEffect(() => { (async() =>{
    /* parse workbook */
    const url = "https://docs.sheetjs.com/PortfolioSummary.xls";
    const workbook = XLSX.read(await (await fetch(url)).arrayBuffer());
    /* get first worksheet */
    const worksheet = workbook.Sheets[workbook.SheetNames[0]];
    const raw_data = XLSX.utils.sheet_to_json(worksheet, {header:1});

    /* pull Excel rows 13:16 (SheetJS 12:15) */
    const rows_13_16 = raw_data.slice(12,16);
    /* display data */
    setRows(rows_13_16);
  })(); }, []);
  return ( <pre>Rows 13:16{rows.map(r => "\n"+JSON.stringify(r))}</pre> );
}

The worksheet !merges property10 includes every merge range in the sheet. It is possible to loop through every merge block and fill cells, but in this case it is easier to post-process the raw data:

let last_year = 0;
raw_data.forEach(r => last_year = r[0] = (r[0] != null ? r[0] : last_year));

JavaScript code can be extremely concise. The "Code Explanation" blocks explain the code in more detail.

Code Explanation (click to show)

Analyzing every row in the dataset

Array#forEach takes a function and calls it for every element in the array. Any modifications to objects affect the objects in the original array.

For example, this loop will print out the first column in the arrays:

raw_data.forEach(r => {
console.log(r);
});

Tracking the last value seen in a column

When looping over the array, Array#forEach can modify variables outside of the function body. For example, the following loop keeps track of the last value:

let last_value = null;
raw_data.forEach(r => {
if(r[0] != null) last_value = r[0];
});

Filling in data

Array#forEach can mutate objects. The following code will assign the last value to the first column if it is not specified:

let last_value = null;
raw_data.forEach(r => {
if(r[0] != null) last_value = r[0];
else if(r[0] == null && last_value != null) r[0] = last_value;
});

Simplifying the code

When r[0] == null and last_value == null, assigning r[0] = last_value will not affect the result in the actual data rows:

let last_value = null;
raw_data.forEach(r => {
if(r[0] != null) last_value = r[0];
else if(r[0] == null) r[0] = last_value;
});

For simple data rows, either r[0] == null or r[0] != null, so the if block can be rewritten as a ternary expression:

let last_value = null;
raw_data.forEach(r => {
(r[0] != null) ? (last_value = r[0]) : (r[0] = last_value);
});

Observing that r[0] must equal last_value, the inner statement can be rewritten to compute the final value and assign to both variables:

let last_value = null;
raw_data.forEach(r => {
last_value = r[0] = (r[0] != null ? r[0] : last_value);
});

It is tempting to take advantage of implicit logical rules:

let last_value = null;
raw_data.forEach(r => {
last_value = r[0] = (r[0] || last_value);
});

This is strongly discouraged since the value 0 is false. The explicit null test distinguishes null and undefined from 0

After post-processing, the rows now have proper year fields:

// Row 13
[2013, "Q1", 508.7, 23.4, 444.9, 22.1, 8.2, 3, 961.9, 38.7]

// Row 14
[2013, "Q2", 553, 24.1, 437, 21.6, 8.3, 3, 998.6, 38.9]

// Row 15
[2013, "Q3", 569.2, 24.3, 429.5, 21.2, 8.2, 2.9, 1006.8, 38.7]

// Row 16
[2013, "Q4", 609.1, 25.6, 423, 20.9, 8.1, 2.9, 1040.2, 39.6]
Live example (click to show)
Result
Loading...
Live Editor
function SheetJSAoAFilled() {
  const [rows, setRows] = React.useState([]);
  React.useEffect(() => { (async() =>{
    /* parse workbook */
    const url = "https://docs.sheetjs.com/PortfolioSummary.xls";
    const workbook = XLSX.read(await (await fetch(url)).arrayBuffer());
    /* get first worksheet */
    const worksheet = workbook.Sheets[workbook.SheetNames[0]];
    const raw_data = XLSX.utils.sheet_to_json(worksheet, {header:1});
    /* fill years */
    var last_year = 0;
    raw_data.forEach(r => last_year = r[0] = (r[0] != null ? r[0] : last_year));

    /* pull Excel rows 13:16 (SheetJS 12:15) */
    const rows_13_16 = raw_data.slice(12,16);
    /* display data */
    setRows(rows_13_16);
  })(); }, []);
  return ( <pre>Rows 13:16{rows.map(r => "\n"+JSON.stringify(r))}</pre> );
}

Select Data Rows

At this point, each data row will have the year in column A and dollar value in column C. The year (first value in the row) will be between 2007 and 2024. The value (third value) will be positive. The following function tests a row against the requirements:

const is_valid_row = r =>
r[0] >= 2007 && r[0] <= 2024 // year (column A) is between 2007 and 2024
&& r[2] > 0; // dollar value (column C) is positive

Array#filter, using the previous test, can select the matching rows:

const rows = raw_data.filter(r => r[0] >= 2007 && r[0] <= 2024 && r[2] > 0);
Live example (click to show)
Result
Loading...
Live Editor
function SheetJSAoAFiltered() {
  const [rows, setRows] = React.useState([]);
  React.useEffect(() => { (async() =>{
    /* parse workbook */
    const url = "https://docs.sheetjs.com/PortfolioSummary.xls";
    const workbook = XLSX.read(await (await fetch(url)).arrayBuffer());
    /* get first worksheet */
    const worksheet = workbook.Sheets[workbook.SheetNames[0]];
    const raw_data = XLSX.utils.sheet_to_json(worksheet, {header:1});
    /* fill years */
    var last_year = 0;
    raw_data.forEach(r => last_year = r[0] = (r[0] != null ? r[0] : last_year));
    /* select data rows */
    const rows = raw_data.filter(r => r[0] >= 2007 && r[0] <= 2024 && r[2] > 0);
    /* display data */
    setRows(rows);
  })(); }, []);
  return ( <pre>{rows.map(r => JSON.stringify(r)+"\n")}</pre> );
}

Generate Row Objects

Looking at the headers:

Rows 5-8

The desired data is in column I. The column index can be calculated using XLSX.utils.decode_col11.

Column Index calculation (click to show)
Result
Loading...
Live Editor
function SheetJSDecodeCol() {
  const cols = ["A", "B", "I"];

  return ( <table><thead><tr><th>Label</th><th>Index</th></tr></thead>
    <tbody>{cols.map(col => ( <tr>
      <td>{col}</td>
      <td>{XLSX.utils.decode_col(col)}</td>
    </tr> ))}</tbody>
  </table> );
}

The desired columns are:

ColumnDescriptionProperty in Object
A / 0Fiscal YearFY
B / 1Fiscal Quarter (if applicable)FQ
I / 8Total Dollars Outstandingtotal

An Array#map over the data can generate the desired row objects:

const objects = rows.map(r => ({FY: r[0], FQ: r[1], total: r[8]}));

This will generate an array of row objects. Each row object will look like the following row:

// 2016 Q1 - $1220.3 (billion)
{ "FY": 2016, "FQ": "Q1", "total": 1220.3 }
Live example (click to show)
Result
Loading...
Live Editor
function SheetJSObjects() {
  const [rows, setRows] = React.useState([]);
  React.useEffect(() => { (async() =>{
    /* parse workbook */
    const url = "https://docs.sheetjs.com/PortfolioSummary.xls";
    const workbook = XLSX.read(await (await fetch(url)).arrayBuffer());
    /* get first worksheet */
    const worksheet = workbook.Sheets[workbook.SheetNames[0]];
    const raw_data = XLSX.utils.sheet_to_json(worksheet, {header:1});
    /* fill years */
    var last_year = 0;
    raw_data.forEach(r => last_year = r[0] = (r[0] != null ? r[0] : last_year));
    /* select data rows */
    const rows = raw_data.filter(r => r[0] >= 2007 && r[0] <= 2024 && r[2] > 0);
    /* generate row objects */
    const objects = rows.map(r => ({FY: r[0], FQ: r[1], total: r[8]}));
    /* display data */
    setRows(objects);
  })(); }, []);
  return ( <pre>{rows.map(r => JSON.stringify(r)+"\n")}</pre> );
}

Present Data

At this point, objects is an array of objects.

ReactJS

The live demos in this example use ReactJS. In ReactJS, arrays of objects are best presented in simple HTML tables12:

<table>
<thead><tr><th>Fiscal Year</th><th>Quarter</th><th>Total (in $B)</th></tr></thead>
<tbody>
{objects.map((o,R) => ( <tr key={R}>
<td>{o.FY}</td>
<td>{o.FQ}</td>
<td>{o.total}</td>
</tr>))}
</tbody>
</table>

Vanilla JS

https://sheetjs.com/sl.html is a hosted version of this demo.

Without a framework, HTML table row elements can be programmatically created with document.createElement and added to the table body element. For example, if the page has a stub table:

<table>
<thead><tr><th>Fiscal Year</th><th>Quarter</th><th>Total (in $B)</th></tr></thead>
<tbody id="tbody"></tbody>
</table>

TR elements can be added to the table body using appendChild:

/* add rows to table body */
objects.forEach(o => {
const row = document.createElement("TR");
row.innerHTML = `<td>${o.FY}</td><td>${o.FQ||""}</td><td>${o.total}</td>`;
tbody.appendChild(row);
});

Command-Line Tools

In the command line, there are ways to display data in a table:

 FY    FQ    Total
-- -- -----
2007 516
2013 Q1 961.9

For data pipelines, tab-separated rows are strongly recommended:

  /* print header row*/
console.log(`FY\tFQ\tTotal`);
/* print tab-separated values */
objects.forEach(o => {
console.log(`${o.FY}\t${o.FQ||""}\t${o.total}`);
});

Live Demo

This demo runs in the web browser! It should automatically fetch the data file and display a table.

This example includes a row count that can be increased or decreased

Result
Loading...
Live Editor
function StudentAidTotal() {
  const [rows, setRows] = React.useState([]);
  const [num, setNum] = React.useState(5);
  React.useEffect(() => { (async() =>{
    /* parse workbook */
    const url = "https://docs.sheetjs.com/PortfolioSummary.xls";
    const workbook = XLSX.read(await (await fetch(url)).arrayBuffer());

    /* get first worksheet */
    const worksheet = workbook.Sheets[workbook.SheetNames[0]];
    const raw_data = XLSX.utils.sheet_to_json(worksheet, {header:1});

    /* fill years */
    var last_year = 0;
    raw_data.forEach(r => last_year = r[0] = (r[0] != null ? r[0] : last_year));

    /* select data rows */
    const rows = raw_data.filter(r => r[0] >= 2007 && r[0] <= 2024 && r[2] > 0);

    /* generate row objects */
    const objects = rows.map(r => ({FY: r[0], FQ: r[1], total: r[8]}));

    /* display data */
    setRows(objects);
  })(); }, []);
  return ( <>
    <a onClick={()=>{setNum(Math.max(num-5,0))}}>Show Less </a>
    <b>Showing {num} rows </b>
    <a onClick={()=>{setNum(num+5)}}>Show More</a>
    <table>
      <thead><tr><th>Fiscal Year</th><th>Quarter</th><th>Total (in $B)</th></tr></thead>
      <tbody>
        {rows.slice(0, num).map((o,R) => ( <tr key={R}>
          <td>{o.FY}</td>
          <td>{o.FQ}</td>
          <td>{o.total}</td>
        </tr>))}
      </tbody>
    </table>
  </> );
}

Run the Demo Locally

Save the following script to SheetJSStandaloneDemo.html:

SheetJSStandaloneDemo.html
<body>
<table>
<thead><tr><th>Fiscal Year</th><th>Quarter</th><th>Total (in $B)</th></tr></thead>
<tbody id="tbody"></tbody>
</table>
<script src="https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js"></script>
<script>
(async() => {
/* parse workbook */
const url = "https://docs.sheetjs.com/PortfolioSummary.xls";
const workbook = XLSX.read(await (await fetch(url)).arrayBuffer());

/* get first worksheet */
const worksheet = workbook.Sheets[workbook.SheetNames[0]];
const raw_data = XLSX.utils.sheet_to_json(worksheet, {header:1});

/* fill years */
var last_year = 0;
raw_data.forEach(r => last_year = r[0] = (r[0] != null ? r[0] : last_year));

/* select data rows */
const rows = raw_data.filter(r => r[0] >= 2007 && r[0] <= 2024 && r[2] > 0);

/* generate row objects */
const objects = rows.map(r => ({FY: r[0], FQ: r[1], total: r[8]}));

/* add rows to table body */
objects.forEach(o => {
const row = document.createElement("TR");
row.innerHTML = `<td>${o.FY}</td><td>${o.FQ||""}</td><td>${o.total}</td>`;
tbody.appendChild(row);
});
})();
</script>
</body>

After saving the file, run a local web server in the folder with the HTML file. For example, if NodeJS is installed:

npx http-server .

The server process will display a URL (typically http://127.0.0.1:8080). Open http://127.0.0.1:8080/SheetJSStandaloneDemo.html in your browser.

Footnotes

  1. The dataset URL has changed many times over the years. The current location for the CC0-licensed dataset can be found by searching for "National Student Loan Data System" on data.gov. PortfolioSummary.xls is the file name within the dataset.

  2. See read in "Reading Files"

  3. See "SheetJS Data Model"

  4. See "Workbook Object"

  5. See "Workbook Object"

  6. See "Sheet Objects"

  7. See sheet_to_html in "Utilities"

  8. See sheet_to_json in "Utilities"

  9. See sheet_to_json in "Utilities"

  10. See "Merged Cells" in "SheetJS Data Model"

  11. See "Column Names" in "Addresses and Ranges"

  12. See "Array of Objects" in "ReactJS"

  13. See "Running on Device" in the React Native documentation for more details.