table of contents

Getting started

TL; DR

Try this on the terminal on the right

bionode ncbi search nucleotide cancer | head -n 1 | json

bionode ncbi search nucleotide cancer | head -n 1 | json uid | bionode ncbi fetch nuccore - | json

Prerequisites

We assume that you have some familiarity with a Command Line Interface (e.g., BASH). If that is not the case, we recommend doing the command_line_bootcamp. At a minimum, you need to know how to use the commands ls, cd, mkdir and touch.

Knowledge of JavaScript and Node.JS is not required but can be very helpful for some sections. A good resource is NodeSchool, and we recommend the sections javascripting, learnyounode, how-to-npm, stream-adventure, async-you, browserify-adventure, and unctional-javascript-workshop (in this order). If you want a good and free JavaScript beginners book, check out JavaScript for Cats.

If you get interested in Bioinformatics and want to learn more, there are plenty of resources and MOOCs out there. However, Bioinformatics Data Skills is a good beginners book.

Do it online (this workshop)

You can test drive Bionode online without installing anything using the try.bionode.io website.

Install it on your machine (alternative)

If you want to run it locally on your machine, the fist step is to get Node.JS. There are several ways to do it, but we recommend the following:

Mac OS

Install Homebrew by copy pasting the following command in your terminal.

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Then install a Node.JS version manager

brew install n

Then install the latest stable version of Node or a specific one.

n stable
# Or
n 7.0.0

Ubuntu

Run the following commands

# Install the Node Package Manager
sudo apt-get install npm
# Install a Node version manager
npm install n -g
# Install Node
n stable
## Or for a specific version
n 7.0.0

Windows

Go to http://nodejs.org, and follow instructions.

Install Bionode and other useful tools

Bionode provides a meta-module named bionode that can install all the other modules as dependencies. If you only need a specific module, you just install that one, e.g., bionode-ncbi. Tip: In this tutorial, in the interest of speed, saving computational resources, and avoid issues with some versions of Node.JS, we use --production to skip installing development dependencies.

Installs bionode 'globally', i.e., as a Command Line tool (using -g).

npm install bionode -g --production

You can also install a specific module instead of all

npm install bionode-ncbi -g --production

Install some other useful tools

npm install json tool-stream -g --production

Available modules

After you're setup you can have a quick look at the available modules on GitHub and jump to the section about that module, or keep reading

How things work in general

Command Line Interface

Check the documentation and status for each module in the README.md file on their GitHub page (e.g., [bionode-ncbi] (https://github.com/bionode/bionode-ncbi)), but in general you can use the command line interface like this:

bionode ncbi urls assembly Acromyrmex | json -ga genomic.fna

That command queries the NCBI database and retrieves URLs of the genome assembly for the ant species Acromyrmex. This will return a JSON object that is then piped to the json command so that we can retrieve only the property genomic.fna (the url of the file with DNA sequences in fna/fasta format) and filter out the other properties.

JavaScript API

Now the same could be done using the JavaScript API, but first you need to create a folder for your project and then for each module your are going to require in your code, you need to do npm install module_name (without the -g flag) to install a copy of that module locally in your project folder. You only use the -g flag when you want to install a module as a command line tool.

#!/bin/bash
npm install bionode-ncbi
#!/usr/bin/env node
var bio = require('bionode')

Bionode code patterns

You can generally use bionode modules in 3 different ways:

The Callback pattern

A callback simply means, you ask for something and once you get all of it you process it

// Query NCBI
bio.ncbi.urls('assembly', 'Acromyrmex', function(urls) {
  # Got all the urls as an array, print just first genome
  console.log(urls[0].genomic.fna)
})

The Event pattern

Callbacks are fine for most cases, but if you're getting too much data your code will run out memory and crash. A solutions is to use Events to do something as you get one object or chunks of data.

bio.ncbi.urls('assembly', 'Acromyrmex').on('data', printGenomeURL)
function printGenomeURL(url) {
  console.log(url.genomic.fna)
}

The Pipe pattern

Node.js Streams are based on Events and allow you to get rid of a lot of boilerplate code by chaining functions together.

var tool = require('tool-stream')
bio.ncbi.urls('assembly', 'Acromyrmex')
.pipe(tool.extractProperty('genomic.fna'))
.pipe(process.stdout)

How is it done in other libraries?

Here's an example of how you would do the same in BioPython (other libs are similar):

# URL for the Acromyrmex assembly?
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG
import xml.etree.ElementTree as ET
from Bio import Entrez
Entrez.email = "[email protected]"
esearch_handle = Entrez.esearch(db="assembly", term="Acromyrmex")
esearch_record = Entrez.read(esearch_handle)
for id in esearch_record['IdList']:
  esummary_handle = Entrez.esummary(db="assembly", id=id)
  esummary_record = Entrez.read(esummary_handle)
  documentSummarySet = esummary_record['DocumentSummarySet']
  document = documentSummarySet['DocumentSummary'][0]
  metadata_XML = document['Meta'].encode('utf-8')
  metadata = ET.fromstring('' + metadata_XML + '')
  for entry in Metadata[1]:
    print entry.text

More Node.js tips

If you git clone a Node.js folder, to install its dependencies you can just cd into it and type npm install. If you want to install that module that you just git cloned as a command line tool, you cd into the folder and do npm link (useful for development).