Ben Haley: Sequence me Purple

A lecture about sequencing, non-coding DNA and potentials for bio-discoveries.

At UCSD on April 4th, Edward R. Rubin gave a very nice synopsis explaining why sequence-based discoveries are so promising in biology. It boils down to this: Our capacity to sequence DNA is getting cheaper - exponentially cheaper. While, our ability to carry out in vivo (live animal) studies is improving slowly - painfully slowly. In vivo is slow and expensive, ultimately, because there is a limit to how fast a mouse is going to grow. So Eddy emphasized that there is a very real incentive to develop sequence based techniques to make important discoveries, because these techniques will inherently be scalable into the future... and with that profoundness expressed, I'll move on to what Eddy has done to practice what he preaches.

The first part of Eddy's lecture circled around funny little non-coding regions of the genome where scientists have found 200 base segments that are identical in humans, mice, and rats. These regions are known as 'ultraconserved' (more broadly ultra-like sequences) regions and the odds of them being observed by chance alone is astronomically small. This conservation implies that even slight modification to these non-coding regions will result in an evolutionary dead end for an organism.

As Eddy rightly pointed out there is little doubt that non-coding regions of the genome do perform biologically critical functions, but its very difficult to know what these functions might be. Coding sequences are a different story. We can produce an algorithm that finds coding genes and compare the predicted genes of our algorithm with real protein sequence data from proteomics. This verification gives researchers a training set, like a Rosetta stone, to develop better and better algorithms. But in the case of non-coding regions there is no training set... just a black hole of meaningless information.

So Eddy and his group has approached this problem by trying to produce a training set (a Rosetta stone) for non-coding regions. And surprise, surprise he's decided to focus on the 'ultra-like' sequences discussed before.

Based on previous evidence, Eddy's group supposed that these ultra-like sequences affected nearby the expression of nearby genes. So, they have gone through the labor-intense process of labeling the genes that neighbor ultra-like sequences with a florescent marker (a little glow in the dark tag that is copied with the gene) and then observing what happens when they knockout the nearby ultra-conserved or ultra-like regions. They observe these 'transgenic' mice about 2 weeks after conception and see that the labeled genes often (about 50% of the time) have a particular expression pattern that disappears or changes if the nearby ultra-like sequence is not present.

Further, they can show that if they move one of these ultra-like sequences near a different gene, that the gene they moved it to shows the same expression pattern as the gene they moved it from. And finally, if they put two of these ultra-like sequences near a gene, that the protein produced by the gene shows the expression patterns associated with both of the ultra-like sequences. Basically, they seem to add together their effects.

So all of this is rather cool, but why oh why are they doing it. Well it gets back to this training set or Rosetta Stone thing that we don't have for non-coding regions of the genome. How can we sort out one part from another? Eddy supposes that the type of gene regulation that his group is observing from these ultra-conserved regions is only a subset of a much bigger class of non-coding regions that are working to similarly regulate gene expression. By verifying which of the ultra-like sequences are affecting gene expression his group is producing a training set that can be used to develop algorithms to find other non-coding regions, that may not be so conserved, but are involved in the same type of regulation.

Good show, old man.

And with that we will move on part two of Eddy's lecture, which is a horse of a very different color. We are going to jump from eukaryotes (plants, animals and fungi) with all their non-coding regions to bacteria (most of life on the planet) who have genomes so tightly programmed that most parts of them code for genes in both directions.

Recently, there has been a whole lot of sequencing going on, and bacteria are no different. Eddy said there have been about 1400 species of bacteria sequenced so far... which I think must be something of a vast underestimation, but it will suffice for now.

A strange thing prevents scientists from sequencing a genome in a totally straight-forward way in these bacteria (and eukaryotes for that matter). But, to understand it's strangeness, you must have a basic knowledge of how sequencing is done.

In order to sequence a genome, it is cut into a bunch of little bits multiple times in multiple different ways. These little bits are fed into bacteria cells who work day and night making millions of copies. Then scientists take these millions of copies and do a little chemistry (which is why we need all the copies...) to determine what these itty-bitty sequence fragments are (like AACTTGGCC and so on). When things are said and done, these sequencing biologists have a bunch of fragmented pieces of the genomic picture cut at different points that they can begin stitching together to produce the whole genome.

But, there is a catch. Invariably, there are holes in the genomic picture scientists produce by the afore mentioned method, and worse, if they sequence a bacteria again the same holes are there again. These sequencing holes are a pain for the scientists, because instead of using those trusty little bacteria to copy the itty-bitty fragments, the scientists have to go in do the copying themselves using PCR, which is something of a pain in the ass. Of course, there are a myriad of explanations for why these sequencing holes exist, but the explanation that Eddy is interested in is that maybe the little sequence fragments we are asking the bacteria to copy is actually killing the bacteria. Eureka!

Low and behold, often the hole-producing sequence fragment is killing the bacteria. If Bob's group takes one of these little DNA fragments and puts a blocker on it to prevent it from being expressed in the bacteria, then the bacteria live. If they then proceed to unblock the sequence, the bacteria die.

In case you haven't put it together yet, things that kill bacteria are known as antibiotics, and they are pretty handy in the area of keeping people alive. So finding them, like Eddy's group is doing, is really cool stuff. But we aren't at the Emerald castle just yet.

Eddy's group has proven quite well that whatever is produced by these sequence fragments is able to kill bacteria cells from the inside out. But, if we are going to get any really handy antibiotics, we need to know that it will kill a bacteria from the outside in. So the next step Eddy's group has taken is to produce a chip (just a piece of plastic with carefully carved channels, holes and such) on which they place all the killer-sequences and did some chemistry to produce the gene product (some call this a protein). Next, they flooded the chip with glow-in-the-dark bacteria, sealed off the chambers between each protein, and let it sit. Any little chamber that stops glowing means that the bacteria have died, and the killer sequence product (protein) was deadly from the outside too. Finally, they have a handful of real useful antibiotics.

And what's so very wonderful about this work comes back to the point that Eddy made first in his lecture, that is: because sequencing keeps getting cheaper techniques that rely solely on sequencing will keep getting cheaper. So the work that Eddy has done with these bacteria will age quite well. And that's a good thing, because a lot of our old antibiotics are getting the worse for wear. We need some new ones quick. Its good that we have Eddys around.

If you'd like to read a little more about this, go check out Eddy's website. Or check out this paper his group has written.

Sequence me Purple

No comments: