[ale] bioknoppix (Dow, isn't this your game?)

Tue Feb 24 09:16:47 EST 2004

On Mon, 23 Feb 2004, Dow Hurst wrote:

> Hey thanks!  Bioinformatics is a new field of the 
> biosciences where you apply all kinds of computational 
> techniques to data to determine properties of a biological 
> structure, be it protein, dna, or something else.  For 
> instance, if your interested in a particular sequence of 
> human nucleotides from the human genetic sequence, then you 
> could look for similar sequences in other genomes that might 
> have additional information.  It is perfect for database 
> searching, statistical analysis, and scripting languages.  I 
> don't know much about it yet, but am looking at some books 
> on the subject.  Perl, python, and ruby are all languages 
> being used in the field.  Linux is used alot since the 
> languages fit in there better and Linux is free too.  It is 
> a fascinating field when you need to find out alot about a 
> sequence of amino acids or nucleic acids quickly.

There's lots of open source used in biology, particularly in genomics; the
public Human Genome project, and other non-Celara / TIGR genomic projects,
are commonly done through a process where different people sequence
different parts of the genome, and then open source perl scripts and beowulf
clusters glue it all together. There's also some really interesting
object-oriented open source databases used with that. The fly and worm
genome projects especially use those. See acedb <http://www.acedb.org/> for
them. There's lots of other open source tools there too, but most of the
rest are too specialized to be of interest outside the field. I know of 2
companies using Acedb for non-genetics data, though....

Even a lot of the hardware used in large-scale biology is now open sourced.  
Pat Brown, one of the developers of microarray technology, has complete
schematics available, patent-free, for anyone who wants to build their own.  
See <http://cmgm.stanford.edu/pbrown/mguide/index.html>. Microarrays allow
you to track differental gene expression across hundreds of genes
simultaneously. The idea there is that you can compare, say, cancer cell
lines with normal cell lines, look at the 20,000 genes expressed in each,
measure their relative expression levels, and see that these 150 genes are
expressed more in cancerous cells and those 35 genes are expressed less in
cancerous cells than in normal. Based on that, you have 185 genes of
interest as being part of the pathway leading those normal cells to turn
cancerous.

For another example, my grad work was in human evolution, looking at genetic
changes in human populations over the past several hundred thousand years.  
Once the basic raw data used in those sorts of studies is collected, it's
just huge ascii dumps of genetic sequences. Perl on Linux was what our lab,
and most others doing that sort of thing, used for analyzing the sequences.  
Any modelling was usually done in Fortran or sometimes C on Linux, Solaris,
or Plan 9.  That's why the *anthropology* department at the University of
Utah has a Beowulf cluster ;-). In general, the Perl / Fortran / C code
people use for those sorts of things is open source; most journals demand
that it be open source before they'll publish papers whicch use it for
analysis....

later,
chris