Some Notes on CSV Parsing and Google Spreadsheets in Node.js

April 2nd, 2013 Permalink

I was recently working on a short Node.js project that involved CSV parsing and automation of access to Google Spreadsheets. This post consists of a few notes on what was learned - not all of the tools in the ecosystem are all that they claim to be, sadly.

Miso.Dataset Google Spreadsheet Importers Don't Work

Miso.Dataset is a toolkit for manipulating tabular data. Insofar as the actual manipulation goes, it's pretty useful and can save you a lot of time. The Node.js distribution is a port of a client distribution, however, which means that whether or not non-core functionality actually works in Node.js is a coin-toss. The Google Spreadsheet importer is one such non-functional item; it works by calling one of the JSONP/XML format views of a spreadsheet and parsing that, but the actual HTTP requests are performed by lower level functions that will only work in a browser.

Google Spreadsheet JSONP Views are Buggy

It's possible to bypass the Google Spreadsheet API to obtain JSONP for a single worksheet with one request wherein the URL looks something like this:

https://spreadsheets.google.com/tq?key=%id&sheet=%name&tqx=version:0.6;responseHandler:%fn;reqId:0;out:json

This might seem tempting and easy, since the data is much more compact than the XML feed version. Unfortunately it is very easy to create a spreadsheet that returns invalid, broken JSON via this method. In addition, the data provided here uses a very aggressive algorithm to try to ensure that all columns have a header name; it merges rows until it finds non-empty values in all columns, and then gets stuck that way. Once it has started down the path of merging rows to create a header row, you can't stop it from doing that - the spreadsheet data interpretation is stuck that way, no matter how you edit it further.

Use a Google Spreadsheet Package that Works With the XML Feed

There are a number of small, simple Google Spreadsheet packages available via NPM, such as this one that incorporates authentication for non-public spreadsheets. It reads from the XML feed for a spreadsheet, which is less prone to issues.

Miso.Dataset Parsers Are Fragile

Getting data into Miso.Dataset is a two step process - there is an importer and then a parser. Any data import other than just feeding it an array of objects is risky, however, as the parsers tend to throw errors very readily: if given duplicate header row names, or empty header rows because one row happens to run long, for example. Since obtaining a working dataset from imported data is an asynchronous operation, you can't catch these errors.

// If the CSV-format data has duplicate or empty header rows, then
// an error will be thrown here, and there isn't a great deal you can do
// about that.
var ds = new Miso.Dataset({
  parser: Miso.Dataset.Parsers.Delimited,
  data: csvString,
  delimiter: ","
});
ds.fetch({
  error: function (error) {},
  success: function () {}
});

So use some other, more robust package to transform your CSV data into arrays of objects and then give that to Miso.Dataset:

// Passing in tabular data as an array of objects is much safer and
// more robust.
var ds = new Miso.Dataset({
  data: [
    {x: "y"}, {a: "b"} // etc ...
  ]
});
ds.fetch({
  error: function (error) {},
  success: function () {}
});

Use the Node CSV Package

You will save yourself a great deal of pain and suffering by using Node CSV to parse in raw CSV and transform it into a format that can be fed to Miso.Dataset. For example:

var csv = require("csv");
var parser = csv();
var records = [];
parser.on("error", function (error) {
  // Do something.
});
parser.on("record", function (row, index) {
  records.push(row);
});
parser.on("end", function () {
  var ds = new Miso.Dataset({
    data: records
  });
  ds.fetch({
    error: function (error) {},
    success: function () {}
  });
});
// Tell the parser to return rows as objects keyed by column names.
parser.from.options({
  columns: true
});
// Start parsing. This package will not throw errors for duplicate header rows
// or other similar issues that will stop Miso.Dataset in its tracks.
parser.from(csvString);