Bioinformatics and the Future of Biotechnology


“Bioinformatics” is both a set of tools and a field with blurred boundaries, making it difficult to list its chief accomplishments.  It is the practice of creating and/or using computational approaches to tackle biological problems.  Despite its nebulous nature—even the titles for practitioners fluctuate—and relative novelty, research using bioinformatics techniques has saved lives by identifying the causes of diseases, both infectious and genetic.


Fighting Infections with Bioinformatics

A mysterious outbreak at the US National Institutes of Health (NIH) in 2011 was in large part unraveled and subsequently contained by using bioinformatics techniques [1, 2].  Klebsiella pneumoniae (K. pneumoniae) a deadly bacterium, had been wreaking havoc at the NIH’s Clinical Center (CC).  The infection was being passed from patient to patient but nobody could figure out how, especially since strict sanitization protocols were enacted.  It created chaos in the ward until Dr Julie Segre and Dr Evan Snitkin got involved.

Dr. Snitkin is a Boston University-trained computational biologist working in Dr. Segre’s lab, where his research is focused on modeling networks—creating meaningful connections between interacting nodes and herding complex systems of interactions.  He used this technique to model the outbreak at the Clinical Center, treating every patient in the ward as a potential interactor, and modeled bacterial transmission in the ward.  While K. pneumoniae has previously been studied, this type of modeling revealed something brand new about its behaviour: the infection could lie dormant and undetected in otherwise healthy patients while still being transmitted.  Knowing this, the NIH CC upped their sanitization practices and finally curbed the outbreak [3].


Network modeling wasn’t the only bioinformatic technique the Segre group employed.  K. pneumoniae is most problematic when strains develop mutations that make them resistant to antibiotics. To identify the strains of the bacterium infecting each patient, the research team used metagenomic DNA sequencing.  Metagenomics studies apply DNA sequencing techniques to the mixture of bacteria in an environment to understand the makeup of that bacterial ecosystem, which is valuable information.  These studies have shown that bacterial populations vary in their composition even between the two inner elbows of the same person [4].  In the case of the NIH CC, Segre and Snitkin showed that sequencing the bacteria present on different regions of multiple patients could be used to track an infection’s progress through a hospital population.  This tracking is applicable for multiple species of bacteria as well as different strains of the same species.  Since mutations in the DNA of bacteria can occur spontaneously and persist in subsequent generations, these sequencing techniques can track the infections each individual patient has, when they are infected, and what treatments to apply.  For their work on solving the mystery, Dr Segre and Dr Snitkin were named US Federal Employees of the Year along with their colleagues [5].


Understanding Genetic Diseases with Bioinformatics

Since the beginnings of the Human Genome Project, scientists have promised that knowledge of all of our genes would enable a deeper understanding of heritable diseases, a promise currently being fulfilled using bioinformatics approaches.  Millions of dollars of research have been spent on finding protein-coding genes, our place in molecular evolution, and polymorphic variations—differences in DNA between members of a population—that are commonly found in patients with a disease but not in healthy individuals.  In such projects, groups of patients and healthy participants have their DNA analysed, looking for variations between their DNA sequences.  After statistical analysis, changes in the DNA sequence that commonly appear in the patients but rarely in the healthy participants are associated with the disease of interest, and may be the causes of the disease.  These genome-wide association studies have discovered thousands of polymorphisms associated with dozens of diseases using sophisticated programming and statistics [6].  It should be noted though that these are merely statistical associations with diseases, and that these research projects usually do not interrogate the whole genome. In addition, understanding of the functions of these polymorphisms is still wanting.

While polymorphisms may be associated with diseases, high-throughput sequencing approaches coupled with bioinformatics tools are homing in on the exact genetic causes of such diseases including atypical hemolytic-uremic syndrome [7], hereditary hypertension [8], and autism [9].  These studies usually begin by sequencing parts of or entire genomes and transcriptomes (the sum of all DNA transcripts) via whole-genome, RNA, or exome (the part of the genome formed by exons) sequencing. By comparing patient genomes with a reference genome, researchers can pinpoint base-pair differences between patients and what is expected. If the variations are in exons, their impact on the protein can be predicted. In other cases, binding sites for other proteins may be created by the change in the genetic sequence, causing unusual, and harmful levels of transcription [10].  Using bioinformatics researchers can create a list of the more likely “suspects” from the thousands of polymorphisms that differ between patients with a disease and healthy controls, making the problem of finding the functional genetic determinants of diseases more tractable.


Bioinformatics and Brain Power

Because it rarely requires formal lab resources, bioinformatics can be practised anywhere by anyone with a computer.  One of the finest examples of citizen science—where the general public makes significant contributions to scientific understanding—is FoldIt.  This game attempts to improve understanding of the rules of protein folding.  At its core, it uses human brain power to figure out how to get from a series of amino acids to a three – dimensional protein.  Much formal research has gone into this problem, but the spatial rules these amino acid chains follow are still poorly understood.  By combining extant software solutions that predict these structures with crowdsourcing platforms to gain the statistical power to work through the problem we can glean important new insights into the structures of pathogenic proteins.


The Tip of the Iceberg

Several applications of bioinformatics in research have been touched on here, but these are only a taste of its uses in many major research fields. Bioinformatics or Computational Biology has helped curb outbreaks, found genetic troublemakers, uncovered protein structures, and made many more contributions to the advancement of science.  As proverbial fire hydrants of data continue to gush from “Big Data” projects, bioinformatics may prove to be the most effective way to find meaningful droplets in this torrent.




[2] Snitkin, et al., Science Translational Medicine, Aug 2013, Vol. 4, Issue 148, p. 148ra116


[4] Grice, et al., Science, May 2009, Vol. 324, no. 5931, pp. 1190-1192.



[7] Lemaire, et al., Nature Genetics, May 2013, 45(5):531-6.

[8] Choi et al., Science, Feb 2011, Vol. 331, no. 6018, pp. 768-772.

[9] Sanders et al., Nature, Apr 2012, 485(7397):237-41.

[10] Huang el al., Science, Jan 2013, Vol. 339, no. 6122, pp. 957-959.


Brian is a post-doctoral associate in computational biology at the Whitehead Institute for Biomedical Research at MIT investigating transcriptional regulation and epigenomics. More of his work can be seen at or @snarkyscientist.

This article originally appeared here, republished under creative commons license.


1 Response

  1. November 23, 2014

    […] Full article, and bibliography here. […]