Bioinformatics group
1. The focus, topics of research and activities
Culture collections such as the Centraalbureau voor Schimmelcultures (CBS, The Netherlands) need to manage large numbers of strains (the records) and characters (the fields). Until a few years ago, collected data were essentially used for taxonomic purposes. Most of the time, these were not disclosed to users outside the institute. Only the strain’s species name was available together with a few additional data such as depositor, substrate, and strain origin, in printed catalogues. Several years ago, CBS decided to tackle this problem and to create databases that would contain all possible sources of information at the strain and species levels. This task was far from trivial since it involved integration of many different sources of information (e.g., administrative data, bibliography, geography, pictures, nomenclature, morphology, physiology, biochemistry, molecular data [electrophoresis results, sequences, …]), as well as hyperlinks to other web based repositories. From the beginning, it was apparent that conventional searching tools in commercially available databases did not fit CBS needs. Therefore, new software called BioloMICS was developed capable of searching, identifying, classifying, and analyzing all available data in a polyphasic way. Many new tools and algorithms were programmed and the current version of this software is fully adaptable to the end-users’ needs (e.g., lists of and weighting of characters can be changed dynamically).
In 2004, we also introduced the Mycobank concept. MycoBank is an on-line database aimed as a service to the mycological and scientific society by documenting mycological nomenclatural novelties (new names and combinations) and associated data, for example descriptions and illustrations. The nomenclatural novelties receive unique MycoBank numbers that can be cited in the publication where the nomenclatural novelty is introduced. These numbers are used by the nomenclatural database Index Fungorum, with which MycoBank is associated and serves as Life Science Identifiers (LSIDs). The success of the MycoBank system is pushing us to develop it further
In 2009, due to the ever increasing needs in terms of data storage, developments of specific algorithms and research projects requiring specific programming, it was recently decided to create a group on Bioinformatics. The group provides a number of services to the other CBS groups but develops its own research projects as well, either alone or in collaboration with other research groups worldwide.
2. The bioinformatics group focuses on a number of objectives
2.1. BioloMICS developments
a) automated sequence editions, identifications/labeling and insertions in BioloMICS databases, this in relation with high throughput sequencing,
b) development of specific tools related to chromatographic data,
c) development of specific tools related to DNA chips data,
d) development of specific tools related to pyrosequencing data.
2.2. Mycobank developments
a) further developments of the Mycobank system by the implementation of group specific databases and tools allowing advanced data analyses, polyphasic identifications and classifications,
b) online deposit of group specific data,
c) advanced curation of Mycobank databases via Terminal server sessions using BioloMICS allowing external curation (i.e. non-CBS researchers),
d) ensure acceptance and enforcing of deposits in Botanical Code (for next major Botanical Code meeting).
2.3. FES project
a) creation of integrated and interlinked databases of sequences and pathogens data for all the participants of the project,
b) management, hosting and curation (to some levels) of the associated websites on our servers.
2.4. QBOL project
a) creation of integrated and interlinked databases of sequences and pathogens data for all the participants of the project,
b) management, hosting and curation (to some levels) of the associated websites on our servers, c) automated submission of barcodes (sequences and associated data) to BOLD, Genbank, EMBL and DDBJ.