Notes from the Biomass : Quantifying the margin of error in high-throughput data interpretation

Recent Updates

Last post
Notes from the biomass will continue at nftb.net. My...

spitshine - 2006-07-16 13:11

Stubborn
OK, you got me. While technically not blogging at the...

spitshine - 2006-07-07 10:55

Greetings from another...
Greetings from another HBS-founder (media-ocean.de)....

freshjive - 2006-06-15 20:06

HBS manifesto will be...
Hi there! I am one of the hard blogging scientsts. We...

020200 - 2006-06-15 18:13

Latter posts - comment...
Things to do when you're not blogging: Taking care...

spitshine - 2006-04-29 18:46

About this blog

About content and author

A few posts of interest

The internet is changing... Powerpoint Karaoke
Quantifying the error...

Link target abbreviations

[de] - Target page is in German
[p] - Paywall - content might not be freely available
[s] - Subscription required
[w] - Wikipedia link
More...

Search

Navigation

Notes from the Biomass

twoday.net

Credits

Quantifying the margin of error in high-throughput data interpretation

In the interpretation of high-throughout data, we make statements about the number of biological entities. However, the number of genes in a higher eukaryote, of protein domains or folds, or protein complexes is the product of many parameter choices. Because these entities have soft boundaries, we have to make assumptions about their nature that can not be quantified by a confidence interval or other established quantifiers. It would be very useful to label the results with a statement about their validity, expressing the confidence of the researchers, similar to the procedures for annotations in Gene Ontology.
A statement in a publication could read:
The number of protein coding genes in Alosa fallax is 37.387[SEN]. We predict that these proteins form 235.345 splice forms[I50K], arranging themselves into 920 protein complexes[IA<]. 4 protein are involved in fin formation[I5] and 3.454 in cell cycle regulation[I5].

Here are the abbreviations
[I5] Inferred from Perl script. 5 lines of Perl can't be wrong.
[I50K] Inferred from major calculation. 50.000 lines of C++ can't be wrong.
[SEN] Contribution of the senior author.
[REV] Stinking reviewers didn't like our numbers. Have to put them into the acknowledgments.
[IA>] As high as we could get it to meet your expectations
[IA<] As low as we could get it to meet your expectations
[2*] Could be twice as much but who am I
[DUD] Spot on, dude. Seriously!

spitshine - 2005-08-15 08:09

2 comments - add comment - 0 trackbacks

Grady (guest) - 2006-01-18 18:37

Precisely the point I've been trying to make.

You, of course, said it much better than I. I hope Alf Eaton reads this.

Automated data extraction is great, but it's only as good as source material, and there's a lot of crap out there. Many statements exist in the text of a peer-reviewed journal article that are only weakly suggested by the data shown, and without some way of separating the authoritative statements from the more speculative, automated data extraction will be no better than manual slogging through the literature, as measured by the amount of highly-likely relationships you've learned at the end of the day.

spitshine - 2006-01-18 21:53

Should give that sniplet a rather serious update at some point. When did I write this? August - feels like two years ago.

Trackback URL:
https://binf.twoday.net/stories/900612/modTrackback

August 2005
Sun	Mon	Tue	Wed	Thu	Fri	Sat
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31
July				September

Notes from the Biomass

Recent Updates

About this blog

A few posts of interest

Link target abbreviations

Search

Navigation

Archive

Credits

Quantifying the margin of error in high-throughput data interpretation

Precisely the point I've been trying to make.

Elsewhere...

Blogroll

Favorite links

Status