Recent Updates

Last post
Notes from the biomass will continue at nftb.net. My...
spitshine - 2006-07-16 13:11
Stubborn
OK, you got me. While technically not blogging at the...
spitshine - 2006-07-07 10:55
Greetings from another...
Greetings from another HBS-founder (media-ocean.de)....
freshjive - 2006-06-15 20:06
HBS manifesto will be...
Hi there! I am one of the hard blogging scientsts. We...
020200 - 2006-06-15 18:13
Latter posts - comment...
Things to do when you're not blogging: Taking care...
spitshine - 2006-04-29 18:46

About this blog

About content and author

A few posts of interest

The internet is changing... Powerpoint Karaoke
Quantifying the error...

Link target abbreviations

[de] - Target page is in German
[p] - Paywall - content might not be freely available
[s] - Subscription required
[w] - Wikipedia link
More...

Search

 

Archive

January 2025
Sun
Mon
Tue
Wed
Thu
Fri
Sat
 
 
 
 1 
 2 
 3 
 4 
 5 
 6 
 7 
 8 
 9 
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 
 
 
 

Credits

Predictions

Quantifying the margin of error in high-throughput data interpretation

In the interpretation of high-throughout data, we make statements about the number of biological entities. However, the number of genes in a higher eukaryote, of protein domains or folds, or protein complexes is the product of many parameter choices. Because these entities have soft boundaries, we have to make assumptions about their nature that can not be quantified by a confidence interval or other established quantifiers. It would be very useful to label the results with a statement about their validity, expressing the confidence of the researchers, similar to the procedures for annotations in Gene Ontology.
A statement in a publication could read:
The number of protein coding genes in Alosa fallax is 37.387[SEN]. We predict that these proteins form 235.345 splice forms[I50K], arranging themselves into 920 protein complexes[IA<]. 4 protein are involved in fin formation[I5] and 3.454 in cell cycle regulation[I5].

Here are the abbreviations
[I5] Inferred from Perl script. 5 lines of Perl can't be wrong.
[I50K] Inferred from major calculation. 50.000 lines of C++ can't be wrong.
[SEN] Contribution of the senior author.
[REV] Stinking reviewers didn't like our numbers. Have to put them into the acknowledgments.
[IA>] As high as we could get it to meet your expectations
[IA<] As low as we could get it to meet your expectations
[2*] Could be twice as much but who am I
[DUD] Spot on, dude. Seriously!

Commentaries on Systems Biology in Cell

The current issue of Cell has a number of interesting comments on Systems Biology.
Edison T. Liu's sounds a little funny to me.
<quote>"The greatest challenges in establishing this systems approach are not biological but computational and organizational."</quote>
As long as the high-throughput data is as messy as it is, assembling the systems will be a herculean step. The computational problem is a mere laugh, if you ask me.

Blame the experimentalist

One of the puzzling, recurring utterances of bioinformaticians young and old are the complaints about experimental data sets. The remarks that are errors in most of the biological data we are dealing with are abound.
I recall the former physicist who complained about the change of a few genes in yeast and the Postdoc who complained about the mRNA expression data who made cheap jokes about his collaborator - and any conference data has at least three tables that discuss such issues too loudly.

Actually, they should rejoice that there are errors - if the data gets fairly clean, the failure of many bioinformatic predictions will again shift into focus. The precision in structural information on proteins has not solved the fold prediction problem and the nice and precise information on eukaryotic genomes has not made de novo gene finding a routine task.

What's left to do in bioinformatics

Recently, I stumbled across this paper that describes an approach to predict the localization of proteins in yeast.
I was suprised and giggling in my chair, because the whole yeast proteome was screened for its localization by Huh et al. , so why would any body agree to publish such a method. Obviously, the study by Huh et al. fails to get a signal for each and every protein due to experimental reasons. Some proteins are simply not accessible and the authors of the computational study simply fill the gaps.
However, it appears to me that the task feels trivial and wonder what bioinformatics, at least the part that's want to predict biolological properties. So far, no method was really good at predicting the localization from sequence but the task appears simple compared to fold prediction - which has been predicted (by David Baker, if I am not mistaken) to be good once all the folds have been discovered experimentally.

I wonder, whether there will be bioinformatic predictions of real impact. Can you name a few?

Elsewhere...

Status

Online for 7237 days
Last update: 2006-07-16 13:11

Blogs
Conferences
Databases
Journals
Meta
Misc.
Papershow
Patents
PPI
Predictions
Publishing
The young PI
Useful tools
Profil
Logout
Subscribe Weblog