Recent Updates

Last post
Notes from the biomass will continue at nftb.net. My...
spitshine - 2006-07-16 13:11
Stubborn
OK, you got me. While technically not blogging at the...
spitshine - 2006-07-07 10:55
Greetings from another...
Greetings from another HBS-founder (media-ocean.de)....
freshjive - 2006-06-15 20:06
HBS manifesto will be...
Hi there! I am one of the hard blogging scientsts. We...
020200 - 2006-06-15 18:13
Latter posts - comment...
Things to do when you're not blogging: Taking care...
spitshine - 2006-04-29 18:46

About this blog

About content and author

A few posts of interest

The internet is changing... Powerpoint Karaoke
Quantifying the error...

Link target abbreviations

[de] - Target page is in German
[p] - Paywall - content might not be freely available
[s] - Subscription required
[w] - Wikipedia link
More...

Search

 

Archive

August 2005
Sun
Mon
Tue
Wed
Thu
Fri
Sat
 
 1 
 2 
 4 
 6 
 7 
 8 
 9 
11
12
13
14
17
20
21
22
26
27
28
 
 
 
 

Credits

vi knallgrau GmbH

powered by Antville powered by Helma


Creative Commons License

xml version of this page
xml version of this page (summary)

twoday.net AGB

Quantifying the margin of error in high-throughput data interpretation

In the interpretation of high-throughout data, we make statements about the number of biological entities. However, the number of genes in a higher eukaryote, of protein domains or folds, or protein complexes is the product of many parameter choices. Because these entities have soft boundaries, we have to make assumptions about their nature that can not be quantified by a confidence interval or other established quantifiers. It would be very useful to label the results with a statement about their validity, expressing the confidence of the researchers, similar to the procedures for annotations in Gene Ontology.
A statement in a publication could read:
The number of protein coding genes in Alosa fallax is 37.387[SEN]. We predict that these proteins form 235.345 splice forms[I50K], arranging themselves into 920 protein complexes[IA<]. 4 protein are involved in fin formation[I5] and 3.454 in cell cycle regulation[I5].

Here are the abbreviations
[I5] Inferred from Perl script. 5 lines of Perl can't be wrong.
[I50K] Inferred from major calculation. 50.000 lines of C++ can't be wrong.
[SEN] Contribution of the senior author.
[REV] Stinking reviewers didn't like our numbers. Have to put them into the acknowledgments.
[IA>] As high as we could get it to meet your expectations
[IA<] As low as we could get it to meet your expectations
[2*] Could be twice as much but who am I
[DUD] Spot on, dude. Seriously!
Grady (guest) - 2006-01-18 18:37

Precisely the point I've been trying to make.

You, of course, said it much better than I. I hope Alf Eaton reads this.

Automated data extraction is great, but it's only as good as source material, and there's a lot of crap out there. Many statements exist in the text of a peer-reviewed journal article that are only weakly suggested by the data shown, and without some way of separating the authoritative statements from the more speculative, automated data extraction will be no better than manual slogging through the literature, as measured by the amount of highly-likely relationships you've learned at the end of the day.


spitshine - 2006-01-18 21:53

Should give that sniplet a rather serious update at some point. When did I write this? August - feels like two years ago.

Trackback URL:
http://binf.twoday.net/stories/900612/modTrackback

Elsewhere...

Status

Online for 4585 days
Last update: 2006-07-16 13:11

Blogs
Conferences
Databases
Journals
Meta
Misc.
Papershow
Patents
PPI
Predictions
Publishing
The young PI
Useful tools
Profil
Logout
Subscribe Weblog