The NAR data base issue 2006

The 2006 edition of Nucleic Acids Research's annual data base issue appeared online today. Given modern search engines and a home page for every researcher, one would think that there would be little new servers to be found but from browsing the issue and performing some sample queries I was surprised to learn that many servers are not easily discovered by simple web searches. From experience, I would not attribute the finding to poor quality of the resources as only few external people link to it but rather the effect that we usually believe the search engines too quickly.

One example is the new resource for bacterial insertion sequences (ISfinder), which was hard to find when searching for databases of insertion sequences, even though it provides a comprehensive resource. (It would be even more useful if it would be up to date with the current state of completely sequenced genomes.)

Some of the readers of this blog who venture into chemical space might be interested in the articles describing DrugBank, SuperNatural or GLIDA. SuperNatural is a collection of natural products and their derivatives, aiming to provide easily accessible small molecules. DrugBank combines drug and drug target information for use with e.g. virtual screening. Similarly, the GPCR-ligand data base (GLIDA) explores small molecules and their protein targets within the G-protein coupled receptors, arguably the most important class of therapeutically accessible proteins.

Many resources described in the current issue are geared towards individual genomes or comparative analysis within a limited group. Browsing the list, I wonder why none of additional, genome-independent systems ever replaced the individual developments performed for every new genome or old resources that do not keep up with current trends. Does the plethora of systems for annotations show that the one-size fits all approach has not been developed or is it simply the desire for independence from outside resources? Many of the resources could be easily run by systems such as SRS at the EBI but it would be very difficult for the developers of the resources to be credited for their work.

Reading the web server issue gives a good overview over the state that bioinformatics is in: fragmented, with limited interoperability between individual resources but hosting a lot of interesting data and displaying the desire to create the required structures some time in the future.

Gene prediction in prokaryotes using EasyGene

Gene prediction in prokayote genomes might not be the most taxing question in current bioinformatics research. However, after a dealing with genome comparisons recently, I realized that many differences between the genomes exist solely in the annotation procedure and not in their sequence. This seems to be pronounced for the genomes that were sequenced several years ago, when gene prediction was more difficult due to lack of reference strains.

Nielsen and Krogh developed Easygene 1.2 (described in Bioinformatics a month ago) that homogenizes the gene predictions for prokaryotic genomes using a fully automated procedure. They discover discrepancies between their methods and the deposited annotation that convincingly hint at errors in the original annotation for many genomes. Typical errors result in many genomes being overannotated: too many small ORFs are considered real genes.

A web server of their results is provided but as new genomes appear in the dozens each month, I wish they would provide the code for a stand alone installation too. Anyway, good to know that they take care of such seemingly dull but important work.

Wikipedia vs. Britannica

Wikipedia received a lot of criticism recently regarding its quality. Surprisingly, a peer review analysis of Wikipedia and Britannica in Nature finds Wikipedia at a higher quality.

However, an expert-led investigation carried out by Nature — the first to use peer review to compare Wikipedia and Britannica's coverage of science — suggests that such high-profile examples are the exception rather than the rule.
The exercise revealed numerous errors in both encyclopaedias, but among 42 entries tested, the difference in accuracy was not particularly great: the average science entry in Wikipedia contained around four inaccuracies; Britannica, about three.

Never tried the Britannica myself but after my recent, frustrating attempts to enhance some articles in Wikipedia myself I am about to ask whether Britannica is really that poor?

A praise for IPAM

The Institute for Pure and Applied Mathematics (IPAM) is an organization that aims to provide connections between scientists and mathematicians, enhancing the possibilities for collaborations and general interactions. It is located on the campus of the University California Los Angeles (UCLA).
IPAM provides several types of programs, the typical semester programs span three months and include weeks for tutorials and workshops as well as lecture free time to interact with other participants as one sees fit.

In Spring 2004, IPAM held a proteomics program which I participated in.
The program included four weeks of conferences (see the program), and IPAM managed to attract many top scientists in the field of proteomics research - most of the them were even outstanding speakers. There were other activities such as longer workshops and tutorial and ample time that we devoted to work groups. The collaboration has lead to at least one successful publication on my side.
Currently, I am at the re-union workshop of the progam, held in conference center of the UCLA in Lake Arrowhead, up in the San Gabriel mountains outside LA. The re-unions are an integral part of the semester programs and usually overlap with other semester programs, so there is some exchange between the different workshops.

When I first got an email announcing the workshop I dumped it as I though that it would be difficult to manage all the constraints, and if it wasn't for some personal contacts who encouraged me to go, I would have never taken the leap of faith. It turned out that participating in the workshop was a great opportunity. Attending a workshop for three months obviously requires a bit of self-motivation. After all, there is no control by the organizers and you could (in principle) take the opportunity to explore Los Angeles for three months and not get anything done but as there are office desks and work stations for every participant, there is little that keeps you from working as productive as back home.

If you are a postdoc or grad student wishing, the IPAM workshops are a great opportunity, in particular, if you are coming from the biology side and want to strengthen your math, which was my prime motivation to join the meeting. The participants of the other workshop however seem to be mostly established mathematicians, including several professors. Over lunch, I was asked whether all proteomics researchers were as young as our group, which are mostly postdocs.

The next course that might be of interest to the regular readers of the these pages might be the Cells and Materials course starting in March 2006.

Conference blogging

One of the fruitful application of blogs in the life sciences is to provide realtime coverage of conferences. Summaries and reviews of scientific meetings have been a traditional part of the scientific journals such as the Trends-series by Elsevier. However, they usually appear several months after the closing of the conference and often do not provide insights beyond the abstracts book, as they rather attempt to make everyone happy rather than focussing on the highlights.

Blogs can emphasize outstanding presentations immediately, and allow for a more independent view of the conference, even if it is biased by the personal preferences of their authors. Smaller workshops, which might produce interesting outcomes from panel discussions usually lack media coverage and could use blogs (or wikis) to disseminate results. The matter is not completely free of complications - some meetings are "closed meetings" and implement strong limitations on the media coverage. Bloggers need to be aware of that, even though most would not identify themselves with The Media. However, most organizations running scientific conferences would probably appreciate "live" coverage.

Some conferences were already covered by bloggers: nodalpoint's Greg Tyrelle featured the ISMB 2005, for instance. Free Association, the blog of the editors of Nature Genetics is currently providing coverage of the Third Seattle Symposium in Biostatistics: Statistical Genetics and Genomics . I will participate in a proteomics workshop next week and will cover the meeting to some extent here.

It would be helpful if we would create a repository (some blog or wiki) that would provide links to ongoing coverage of conferences by bloggers, possibly in a similar fashion to the TravelBlog.

Scientific bloggers featured in Nature (again...)

You can call me vain for linking to an overview of science blogs in Nature that mentions my views; Rolf Apweiler even considers bloggers as exhibitionists, so you'd have company.

Nature's feature explains the pros and cons of blogging in the sciences thoroughly, after I was somewhat discontent with Nature's coverage of Google Base last week. The article also reflects the opinion of many people on blogs as coffee room chats. I appreciate most things I picked up in lunch breaks and over coffees - and from blogs, much of it directly relevant to my work. And I don't want to think about the number of irrelevant, useless but peer-reviewed papers that I worked through. The number of valuable blogs (to every individual) is dwarfed by the number of irrelevant chatter out here but the same arguments can be applied to books or the scientific literature - it's only a platform after all.



