Recent Updates

Last post
Notes from the biomass will continue at nftb.net. My...
spitshine - 2006-07-16 13:11
Stubborn
OK, you got me. While technically not blogging at the...
spitshine - 2006-07-07 10:55
Greetings from another...
Greetings from another HBS-founder (media-ocean.de)....
freshjive - 2006-06-15 20:06
HBS manifesto will be...
Hi there! I am one of the hard blogging scientsts. We...
020200 - 2006-06-15 18:13
Latter posts - comment...
Things to do when you're not blogging: Taking care...
spitshine - 2006-04-29 18:46

About this blog

About content and author

A few posts of interest

The internet is changing... Powerpoint Karaoke
Quantifying the error...

Link target abbreviations

[de] - Target page is in German
[p] - Paywall - content might not be freely available
[s] - Subscription required
[w] - Wikipedia link
More...

Search

 

Archive

November 2005
Sun
Mon
Tue
Wed
Thu
Fri
Sat
 
 
 1 
 2 
 3 
 5 
 6 
 8 
12
13
15
16
20
21
23
25
26
27
28
29
30
 
 
 
 

Credits

Identifying names of proteins in the biomedical literature

Text mining the biomedical literature has matured from a peculiar idea towards a respectable niche in bioinformatics. In a letter to the editor of Bioinformatics, Blaschke, Valencia and others, themselves pioneers in the field summarize recent advances. They note that the crucial task of relating gene and protein names to normalized identifiers is now producing reliable results, even if they still somewhat lacking in precision compared to other fields of computer linguistics. I am positively surprised about the high numbers. The tasks of identifying the correct sequence of a gene from a paper is sometimes still complicated even if you do it "manually" despite working with fully sequenced bacterial genomes and extensive set of synonyms. Hopefully, we such parses will make it to the desktop soon.

If you think about this from a high-level perspective, the way we work with biomedical information seems absurd. It would be simpler and useful if the names of proteins and genes would be identified as such in the publication using some simple markup like for references to other publications; one could extend reference managers like Endnote or Bibtex. Clearly, we need mnemonic names for spots on a gel, phenotypes or ORFs in sequenced genome for readability but why don't we disambiguate them in a consistent manner now that the majority of important genome sequences are at hand? Other "stable" constituents of a paper -strains, protocolls, chemicals, antibodies- could also be referenced in such manner, directly linking to resources such as the Registry of Standard Biological Parts at the MIT or catalogs (if stable identifiers will be maintained). After all, the method sections often name the supplier of agents - would it not be a logical extension to submit all information?

However, if the publishers support these advancements, text mining methods will be required to identify such information in the tome of fundamental biological information that we have compiled in the last 100 years now that it is slowly becoming digitalized. This will stay an interesting field for years too come.

Trackback URL:
https://binf.twoday.net/stories/1188496/modTrackback

Elsewhere...

Status

Online for 7176 days
Last update: 2006-07-16 13:11

Blogs
Conferences
Databases
Journals
Meta
Misc.
Papershow
Patents
PPI
Predictions
Publishing
The young PI
Useful tools
Profil
Logout
Subscribe Weblog