Notes from the Biomass : Databases

Recent Updates

Last post
Notes from the biomass will continue at nftb.net. My...

spitshine - 2006-07-16 13:11

Stubborn
OK, you got me. While technically not blogging at the...

spitshine - 2006-07-07 10:55

Greetings from another...
Greetings from another HBS-founder (media-ocean.de)....

freshjive - 2006-06-15 20:06

HBS manifesto will be...
Hi there! I am one of the hard blogging scientsts. We...

020200 - 2006-06-15 18:13

Latter posts - comment...
Things to do when you're not blogging: Taking care...

spitshine - 2006-04-29 18:46

About this blog

About content and author

A few posts of interest

The internet is changing... Powerpoint Karaoke
Quantifying the error...

Link target abbreviations

[de] - Target page is in German
[p] - Paywall - content might not be freely available
[s] - Subscription required
[w] - Wikipedia link
More...

Search

Navigation

Notes from the Biomass

twoday.net

Credits

Databases

The BIND database for protein-protein interactions ran out of funding in November 2005. An editorial[s] in the current issue of Nature Biotech provides additional insights. In particular, the funding problems of the Alliance for Cellular Signaling, one of the largest and earliest systems biology initiatives, seems noteworthy. Perhaps, thinking big was simply not enough.

[via public rambling]

spitshine - 2006-02-14 13:47

0 comments - add comment - 0 trackbacks

Not a peer reviewed Wikipedia

Back in 2002, I would not have believed that Wikipedia could work out. You all know that it did became a major success but recently concerns were voiced, including quality issues, sometimes due to intentional addition of misleading information or simple deletion of unwanted facts, and the lack of experts on particular subjects.

Now, the Digital Universe wants to create a resource for peer reviewed scientific information, backed by Larry Sanger[w], who worked on Nupedia, a predecessor of Wikipedia and funded by Joseph Firmage, a victor of the New Economy who maintains an odd proximity to the UFO community. Digital Universe wants to secure the quality by stewardship of experts to particular research fields and boasts to become the "largest reliable information resource".

The first item on their roadmap already smells quite odd.

1 - Browser Independence
First, we’ve built the first version of the Digital Universe to work with the Mozilla-based ManyOne browser. In a few months, you’ll be able to access the Digital Universe from any popular browser, and also use text-based navigation in addition to visual navigation, if you prefer.

Several months to achieve browser independence? In 2006?

Nature[s] covers the story in its current issue but you might also want to check the The Register[f], which voices concerns on both projects - and the reply and corrections from Larry Sanger.

Wikipedia is currently aiming to attract scientists as lack of experts, particular in the natural sciences, is being discussed in the forums extensively. Sanger is definitely right in that Wikipedia is not a very appealing place for experts right now but there are major efforts to change that. However, the mix of experts and lay people should result in articles that focus on readability and understanding for the majority of users rather than expert opinions, an interaction that I have enjoyed. I have my doubts whether Digital Universe can compete with the quality initiative in Wikipedia and finally work out - but let's wait for the first articles.

spitshine - 2006-02-02 10:22

0 comments - add comment - 0 trackbacks

The NAR data base issue 2006

The 2006 edition of Nucleic Acids Research's annual data base issue appeared online today. Given modern search engines and a home page for every researcher, one would think that there would be little new servers to be found but from browsing the issue and performing some sample queries I was surprised to learn that many servers are not easily discovered by simple web searches. From experience, I would not attribute the finding to poor quality of the resources as only few external people link to it but rather the effect that we usually believe the search engines too quickly.

One example is the new resource for bacterial insertion sequences (ISfinder), which was hard to find when searching for databases of insertion sequences, even though it provides a comprehensive resource. (It would be even more useful if it would be up to date with the current state of completely sequenced genomes.)

Some of the readers of this blog who venture into chemical space might be interested in the articles describing DrugBank, SuperNatural or GLIDA. SuperNatural is a collection of natural products and their derivatives, aiming to provide easily accessible small molecules. DrugBank combines drug and drug target information for use with e.g. virtual screening. Similarly, the GPCR-ligand data base (GLIDA) explores small molecules and their protein targets within the G-protein coupled receptors, arguably the most important class of therapeutically accessible proteins.

Many resources described in the current issue are geared towards individual genomes or comparative analysis within a limited group. Browsing the list, I wonder why none of additional, genome-independent systems ever replaced the individual developments performed for every new genome or old resources that do not keep up with current trends. Does the plethora of systems for annotations show that the one-size fits all approach has not been developed or is it simply the desire for independence from outside resources? Many of the resources could be easily run by systems such as SRS at the EBI but it would be very difficult for the developers of the resources to be credited for their work.

Reading the web server issue gives a good overview over the state that bioinformatics is in: fragmented, with limited interoperability between individual resources but hosting a lot of interesting data and displaying the desire to create the required structures some time in the future.

spitshine - 2005-12-29 13:49

2 comments - add comment - 0 trackbacks

W3C launches a Semantic Web Health Care and Life Sciences Interest Group

The W3C announced a launch of a Semantic Web Health Care and Life Sciences Interest Group today.

The Semantic Web Health Care and Life Sciences Interest Group is designed to improve collaboration, research and development, and innovation adoption in the health care and life science industries. Aiding decision-making in clinical research, Semantic Web technologies will bridge many forms of biological and medical information across institutions.

What looks like straight from a buzzword generator might substantially enhance the current (babylonic) state of biological databases. Greg over at Nodalpoint recently summarized how RDF, SPARQL et al could help for the integration of bioinformatics resources, including practical problems and support through larger communities (or lack thereof).

There is a substantial increase in such techniques recently and I would not be surprised if they will finally deliver what bioinformaticians have been waiting for (and trying to achieve with other approaches) - integrating the vast amounts of biological data in a flexible way.

One of the challenges to such a system that the W3C cannot address is the availability of the information in a stable and usable form - the EBI can allocate the resources to offer Uniprot in XML but a small lab performing a high throughput screen usually does not have the necessary skills in the lab and will not put their data online in a highly abstracted way unless it hinders publication. On the other hand, I don't expect that journals will raise their standard for publications soon - after all, the format of the information is a lesser part of value of a research publication and the abundance of data is only comparable to those of the possible standards.

spitshine - 2005-11-22 12:46

0 comments - add comment - 3 trackbacks

Funding for the BIND data base running out

The home page of the BIND database now carries a grave statement from its PI, Chris Hogue, which explains that their last dollar was spent on November 16th and that the last thing the team will do is to maintain the status quo of the web servers. BIND is a database collecting protein-protein interactions, hosted by the Mount Sinai Hospital, Toronto, Canada.
Many smaller databases face the same problem - the Postdoc or grad student on the project leaves and funding is hardly ever available for the development of a database, let alone for maintenance. Competition from larger institutions such as NCBI or the EBI often spoil the efforts of many years of development.

However, BIND is not a small databasea and lists more than 100 programmers and curators and received CDN 29$ in public funding in 2003. I have to admit that I never quite understood why one would need such a major effort for a database of protein-protein interactions and can imagine why a project of this size is challenged by other scientists.

BIND is not the first data major database to run out of funding. The situation was similar for the GDB which lost funding in 1998. It was transferred from Johns Hopkins University, Baltimore to the The Hospital For Sick Children, Toronto and subsequently to RTI. The most important database running out of funding was probably Swissprot in 1999(?), which both managed to commercialize its data (thanks to the New Economy biotechs which floated at that time) and attracting support from the EBI.
Chris Hogue selected a few editorials papers covering the situation in the media and promises to continue to report on the situation in his blog.

[Via public rambling]

spitshine - 2005-11-22 08:17

1 comment - add comment - 0 trackbacks

Store and share your references online

Connotea, a service provided by Nature lets you store your references and share them with others for collaborations. Note to self: Should give it a try in the next try external project.

spitshine - 2005-07-22 16:43

0 comments - add comment - 0 trackbacks

Pubcrawler vs. RSS

Pubcrawler used to be my alerting service; I followed important fields, collaborators, competitors using canned queries at Pubmed and receiving results via email.

After I started blogging, I used RSS readers; it did not become an instant love-on first sight as many RSS readers struck me as bloated and annyoing. However, the Sage extension for Firefox works well for me - no silly icons jumping in my Dock or Panel.
Still, I never got the hang to use HubMed, but after the recent announcement of Pubmed to offer RSS feeds, I decided to give RSS feeds another try and whacked a couple of my important queries into Hubmed and Pubmed and compared the results to the Pubcrawler queries.

While it's a little early for comparative results, but doesn't appear as if the RSS feeds offer much benefits over the good old Pubcrawler - you can't track old results easily, Hubmed often highlights old information as new and Pubmed does not give you previous results, only the newest ones.

Seems like I won't change any time soon - and Pubcrawler has that personal touch to it that makes it fun to use. I should get to an Irish pub soon and drink to it.

spitshine - 2005-06-30 18:41

0 comments - add comment - 0 trackbacks

Nucleic Acids Research Web Server Issue

The NAR web server issue was published yesterday and describes 166 bioinformatics web servers.
The distribution of services is against the recent trend - many servers are "classical" sequence driven services, whereas the majority of bioinformatics (methods) articles are concerned with mRNA expression data or protein-protein interactions.

An overview of the methods published can be found at the UBiC.

spitshine - 2005-06-28 08:12

1 comment - add comment - 0 trackbacks

YeastHub - the semantic web at work

The current issue of Bioinformatics describes the YeastHub database , created by the Gerstein lab.

Yeast is a well sampled and organism with many high throughput data sets to integrate and several established and curated resources such as MIPS, SGD and what used to be YPD.

The new YeastHub data base connects available data using semantic web technologies such as RSS, RDF and a relational to RDF mapping. The set of technologies has the potential to solve many of the small problems one has do deal with when integrating data across many sources.

Eagerly, I tested the data base but was a little let down: While I see the obvious benefits, many small problems appear, which taken together, make the system not really very helpful to the average bioinformatics user, let alone a biologist in the the yeast community.

The lack of descriptions of formats of the data sources makes it tough to create queries and I did not really get past trivial results. I also dearly missed capabilities to browse the data sets. However, it appears as if these technologies will become more and more important and the old tab separated tables hopefully disappear one fine day.

spitshine - 2005-06-21 10:57

2 comments - add comment - 1 trackback

PubChem

A more recent addition to NCBI's online databases is PubChem, which aims at providing a comprehensive database for low molecular weight chemical structures. It includes the results of bioassays.
After the Slashdot feature today, I played a little with it and was little disappointed to obtain only 6 protein kinase inhibitors, 2 Caspase inhibitors and no assays for these activities. It will definitely take some time before the data base seriously competes with private efforts but let's sit back and wait and review this in September, when the service will be up for one year.

spitshine - 2005-06-20 12:59

0 comments - add comment - 1 trackback

Notes from the Biomass

Recent Updates

About this blog

A few posts of interest

Link target abbreviations

Search

Navigation

Archive

Credits

Databases

Funding for BIND (update)

Not a peer reviewed Wikipedia

The NAR data base issue 2006

W3C launches a Semantic Web Health Care and Life Sciences Interest Group

Funding for the BIND data base running out

Store and share your references online

Pubcrawler vs. RSS

Nucleic Acids Research Web Server Issue

YeastHub - the semantic web at work

PubChem

Elsewhere...

Blogroll

Favorite links

Status

March 2026
Sun	Mon	Tue	Wed	Thu	Fri	Sat
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31