Recent Updates

Last post
Notes from the biomass will continue at My...
spitshine - 2006-07-16 13:11
OK, you got me. While technically not blogging at the...
spitshine - 2006-07-07 10:55
Greetings from another...
Greetings from another HBS-founder (
freshjive - 2006-06-15 20:06
HBS manifesto will be...
Hi there! I am one of the hard blogging scientsts. We...
020200 - 2006-06-15 18:13
Latter posts - comment...
Things to do when you're not blogging: Taking care...
spitshine - 2006-04-29 18:46

About this blog

About content and author

A few posts of interest

The internet is changing... Powerpoint Karaoke
Quantifying the error...

Link target abbreviations

[de] - Target page is in German
[p] - Paywall - content might not be freely available
[s] - Subscription required
[w] - Wikipedia link




December 2018


vi knallgrau GmbH

powered by Antville powered by Helma

Creative Commons License

xml version of this page
xml version of this page (summary) AGB


Bioinformatics publications citations make it into the Top Ten

Pedro Beltrao noted that a table of the 10 most highly cited papers include the bioinformatics applications BLAST, ClustalW and MFOLD. This even more surprising in that the set is not limited to life science publications.

The GeneRank algorithm

The work from Julie L Morrison and colleagues from the university of Glasgow, recently published in BMC Bioinformatics, is interesting in several ways. GeneRank: Using search engine technology for the analysis of microarray experiments. describes the application of the PageRank algorithm used by Google to "boost" the rank of genes in a list that are e.g. differentially expressed.

This idea naturally extends to analysing the results of a microarray experiment, where we would like a gene to be highly ranked if it is linked to other highly ranked genes, even if its own position is lower, e.g., due to measurement variability.

Algorithmically, the work is solid and the application of such algorithms seems a smart way of making use of interaction networks and expression data and I got a nice introduction to the PageRank algorithm with it. The algorithm includes a weighting parameter, you can solely rely on the underlying network for detecting groups or rely only marginally on the use of GeneRank algorithm off.

For me, there is one big caveat: Are the genes that are highly connected really important, pivotal genes? Morrison et al only use a network obtained from Gene Ontology. For integration, it might be more useful to use the rich resource we have in protein-protein interaction data, particular when analyzing data in yeast. However, given the many false positives and the fact that the highly connected proteins generally display unspecific binding, I wonder whether we would anything out of such analyses. Obviously, this applies to all interpretation of networks.

If you create the interaction map of the parts of a machine or study the human body and count the interaction of its structures - are the "important" parts the ones that are highly connected?

The microbial pan-genome

Genomic sequencing of prokaryotes is now in a stage where it's no longer sufficient just to sequence the genome of a prokaryote with some relevance for publication in a good journal. You will have to come up with novel insights and either an interesting bug (getting rarer...) or sequence a bunch of related genomes. The latter was performed by a group at TIGR on Group B Streptococci, reporting that sequencing a few strains are not sufficient to represent all strains in the species.

From the abstract:
Analysis of these genomes and those available in databases showed that the S. agalactiae species can be described by a pan-genome consisting of a core genome shared by all isolates, accounting for approximately 80% of any single genome, plus a dispensable genome consisting of partially shared and strain-specific genes. Mathematical extrapolation of the data suggests that the gene reservoir available for inclusion in the S. agalactiae pan-genome is vast and that unique genes will continue to be identified even after sequencing hundreds of genomes.

Dr Ferfried Gutfind from the Barbara Cartland Center for Positivistic Genomics says: "The massive sequencing of six strains is a beautifully analyzed, yielding the surprising conclusion that species diversity in some niches will not be covered ever by traditional sequencing approaches. Even "classical" meta-genomics won't help to cover the sequence space in these organisms."
However, Noland Works, chairman of "Bioinformatics students against basically everything" concludes: "Yeah, roight, after sequencing six strains theses guys say that they need to sequence way more species to ever reach their goal in a contributed paper to PNAS. Sounds like setting up publications for references if you ask me." (I won't.)

See also the note in The Scientist.

Yuk yuk...

Forget about the genome in today's Nature, the other articles of the special chimp issue seem a lot more interesting than the n-th partial sequencing of another mammal. The real question to me is whether we have improved the gene finding in mammalian genomes. Not that I think that the work is insignificant - but the findings in the genome papers always disappoint. We know the genome so well, what novelties are left to discover?

Btw, check this large scale clinical systems biology experiment from the Advance Online Publication pages.

"All Scientific Papers are Wrong."

... will be the headline when this essay from the editorial of PLoS Medicine winds up in the mainstream-mainstream media. It is already entitled Why Most Published Research Findings Are False and quoted as Most scientific papers are probably wrong by the New Scientist.

The work from John Ioannaidis, an epidemiologist at the university of Ioannina in Greece will stir some controversy and will be water on the mills of the quacks. The paper appears solid but most readers with some statistics experience won't be surprised, let alone experimentalists dealing with interpretation of results of complex experiments.

Does this editorial really say something that is novel and that has to pronounced as dramatic? Not that we should bar any scientific work that is peer-reviewed (are the editorials peer-reviewed?) from publication but does it really need a title like that?

On the other hand, a lot of research is sold as if the cure for cancer is just around a corner and a dose of scientific reality certainly won't hurt.

[Thanks, gruggled]

The future (of medicine) according to Leroy Hood

Jason over at The Personal Genome found an ... ahem ... interesting prediction on the future of medicine by Leroy Hood in the Seattle Post. Instead of depleting my stock of Niels Bohr quotes, I rather point you to Jason's comments.

Effective cosmopolitans

The genome of the peculiar oceanic bacterium Pelagibacter ubique has been sequenced and is presented in the current issue of Science. The bug contributes massivly to the biomass in the oceans, owing in part to his effective genome markup.
No phages or transposons, few paralogs and no recent duplications or pseudogenes were found but the ability to synthesize most metabolites, including the 20 aminoacids. This bug must be treasure trove if you are studying metabolic networks. All is coded neatly into 1354 genes separated by a median spacer of only 3 nucleotides (!), making P. ubique the smallest free-living microorganism.
Somehow, I like this fellow.

The rice genomes and world hunger

The publication of a map of the rice genome on August 11th in Nature was echoed in the main stream media. Several of those channels highlighted the global importance of research based on the genome sequence to human nutrition. According to the paper one has to increase the production of rice by 30% in the course of the next 20 years on the same area of arable land. Also, global warming and pollution (amongst others) require other, enhanced rice strains.

Obviously, these facts seem to be more important to the media than all the genes and transposons in the genome. I looked up the references to the facts presented and the statements go back to a paper from 1999, published in Crop Science (impact factor 0.958, #17 in 50 journals in Agronomy [ISI]), and a PNAS publication (contributed), highlighting the impact of global warming. Obviously, the authors had to restrict themselves to a few key publications given the constraints posed by the editors.

After reading the publications and some of that cite them, I find it hard to evaluate the assumptions the researchers make because I am not an expert in crop sciences in asia. However, these publications were only put into the light of the general public in the context of the rice genome and the numbers of citations (60 for the work from 1999) seem low given the potential impact of the work.

Well, read the references yourself. And read the rice genome paper if you are interested what plant genomes look like. It's good solid work and certainly does not require advertising by famine.

Selected pickings

Friday morning is my regular time to check the scientific RSS-feeds, skim through Nature and Science and empty my mail folder for eTOCs. Being a blogger handicapped in screen-reading, I print more papers than I could ever read. Here's one that I started reading on the way from the printer and continued through: Jan Ihmels et al describe the large scale changes to gene expression after the genome duplication events in the yeasts and the loss of motif in Saccharomyces. Good stuff, I just hope that the differences they see primarliy are not caused by the different sizes and ways the samples from Candida and Saccharomyces were generated in the first place.



Online for 5011 days
Last update: 2006-07-16 13:11

The young PI
Useful tools