The Catalogue of Life: indexing the world’s species
Volume 4 Number 1 - January 2007
Karen L. Wilson & Frank Bisby
A list of all organisms is a basic necessity for accessing and organizing information about them worldwide - and that list is available now: right? Actually - wrong. Many people are astonished to discover that the taxonomic profession has no comprehensive catalogue of all the world’s organisms, or even of all the world’s plants. Some ask: “Why can’t we just use the list of names in Index Kewensis?” Extremely valuable as that resource is, we argue here that IK is not what is needed by society and the biodiversity professions as a working list of plant species around the world.
The members of Species 2000 and the Integrated Taxonomic Information System (ITIS) are collaborating to complete the Species 2000 and IT IS Catalogue of Life by 2011 to meet this need.
What kind of catalogue do we need?
What is needed is a functioning and maintained species checklist that lists as nearly as possible a consensus view of all known species. It needs to combine two extremely important components that are not trivial to deliver. Firstly, it needs to reflect expert taxonomic opinion as to which distinct species exist, and how each is circumscribed. Where opinions are divided, a consensus may not be possible, but then one responsibly chosen view should be given, and access provided to alternatives. Secondly, the checklist also needs to reflect accurately the accepted scientific name of each species, and the synonyms, other scientific names by which the species, or prior species now included in the present concept, have been named in the past or in other catalogues.
Both these requirements are important for practical reasons of how the catalogue can function. The first is difficult to attain on a global scale, but without it a species may accidentally be in the list twice (under different names), or a broadly drawn species and its sub-components may accidentally be listed alongside each other. It means that here is just one entry for each biological species, and that they can be counted. The second requirement is important because one of the principal uses of the catalogue is for synonymic indexing. As an example, specimens identified under different names in different herbaria or gardens may be samples of the same species. This can be detected by using the synonymy, which in this case may indicate that two of the names are synonyms of the species now known by the third name. Conversely, using the synonymy prior to an Internet search may allow a user to search for a species under all of the different names by which it is known.
It is estimated that the world has about 1.75 million known and named living species of plants, animals, fungi and micro-organisms, and that the number of vascular plants (flowering plants, conifers, cycads and ferns) is between 223,000 and 420,000 (Scotland and Wortley 2003) and bryophytes number about 25,000 species. People agree that reliable, readily available, core knowledge of the individual species that inhabit this planet is central to understanding biodiversity, conserving it, and using it in a sustainable fashion. For example, a working list of all organisms has been adopted by the UN Convention on Biological Diversity as a target under the Global Taxonomy Initiative (see COP-8 VIII, decision 3), and a working list of all plant species was earlier adopted at COP-VI (decision 9) as Target 1 of the Global Strategy for Plant Conservation.
Access to all knowledge about a species, whether it is a description of its features, geographical distribution, ecological associations, genetic composition, or its usefulness to humans, can be provided by using the species checklist and the synonymic indexing of the scientific names that provide unique tags to access the data.
Why not use vernacular names? Well, there certainly are cases where vernacular names precisely refer to just one species, and where a species has just one vernacular name in one language, but these cases are rare. It is also true that vernacular names are very widely used. However, many species have no vernacular name at all. Where they do exist, the same vernacular name may be used for several different species, leading to confusion as to which is meant. Most organisations agree that vernacular names, and the languages and places in which they are used, do make a useful addition to the species list, but it is clear that scientific names provide a better basis for the list.
Catalogues of species versus catalogues of names
There are three main kinds of lists or catalogues of scientific names of organisms, often confused by people.
Firstly, there are the nomenclators, which are alphabetical lists of all names ever published (see example Box 1). There are various of these indexes. For higher plants, this is the impressive Index Kewensis, initially funded by a donation from Charles Darwin, and published in hard copy for more than a century. It is now also available electronically as part of the International Plant Name Index (IPNI, www.ipni.org), supplemented by the Australian Plant Names Index and the Gray Card Index, which provide more in-depth coverage of infraspecific names than Index Kewensis. For fungi, there is Index Fungorum, for bacteria the List of Prokaryotic Names with Standing in Nomenclature and for animals there is the Index to Organism Names.
Secondly, there are global species lists, in the electronic world referred to as Global Species Databases (GSDs), a term coined by Species 2000. These are, or aim to be, taxonomically authoritative lists of all the known species in a group, with any synonyms listed under the relevant accepted names as a result of revision or scrutiny by one or more taxonomists (see example Box 1). In flowering plants, it is estimated that there are an average of three synonyms for each accepted name (Scotland and Wortley 2003). It is this expert input that differentiates these global lists from nomenclators. There are many global lists, often in hard copy or handwritten on index cards, but increasingly they are being put together electronically and being made available on websites scattered around the world. Commonly these lists include more than just the species’ names - protologue and type information and geographic distribution are the most common inclusions.
Thirdly, there are regional checklists, which aim to cover all species in a region. These are often variable in terms of taxonomic content and validation, but may be rich in extra information about the occurrence and variation of the species in that region.
All these sources of information electronic and hard copy) about organisms are scattered and, until recently, it was difficult for people to readily find out the names of whatever species were of interest to them, or to find further information about those species. One had to know where to find the sources of information and then be knowledgeable enough to interpret what was found: for example, did the species named A in country X represent the same species as the one named B in country Y? And what about the species named C from another region of country Y: was it the same as A and/or B, or a different species?
The last decade has seen a revolution in this area of research, as in most others, with the explosive development of e-Science. It is now much easier for biologists in different regions to collaborate on research projects, thanks to innovations such as email and video-conferencing, and aided by the availability of analytical software and electronic images of specimens and publications, increasingly accessible on websites. Also, many biologists have been able to travel more readily to extend their research through study and fieldwork in relevant parts of the world. Many biological database projects have started around the world in this decade, making available electronically information about particular groups of plants, animals or micro-organisms (Bisby 2000). These Global Species Databases (GSDs) are key elements in the Catalogue of Life because they provide a comprehensive taxonomic snapshot of all the species in a particular group. Regional databases that include all the organisms in a particular region of the world are also important in adding details not covered in the GSDs.
The Species 2000 & ITIS Catalogue of Life
Numbers of these database projects, spread around the world, are collaborating to produce a unified, authoritative index of the world’s species: the Species 2000 & IT IS Catalogue of Life. This is a keystone knowledge set - the gateway to a digital library of biodiversity information on the Internet, using species names to link to other data systems on subjects as varied as specimen data, agriculture, pharmacognosy and conservation uses. The Catalogue of Life is available on the web (www.sp2000.org) as a Dynamic Checklist, with live access to contributing databases, and also as an Annual Checklist, available both on the website and on a CD-ROM (Bisby et al. 2006). The Species 2000 & ITIS Catalogue of Life is an example of:
- a successful approach to managing complex data in biology
- the computational challenges in managing complex data from multiple sources
- the sociology of international collaboration between database projects in biology.
The genetic diversity inherent in living organisms means that compiling the Catalogue of Life is far from a simple exercise of listing names (Bisby 2003, Wilson et al. 2005). This is a knowledge-gathering programme, involving taxonomic expertise in interpreting species and their relationships. The specialist knowledge needed to create and continuously enhance a global species database for a group is the “tip of an iceberg”, below which lies layer upon layer of taxonomic processes: from field observation and collections through to monographic revisions and phylogenetic analysis. Names are the mere tags by which this knowledge is accessed.
Indeed, the key component that marks the Catalogue of Life as being much more than a list of names is the expert input from taxonomic biologists in all parts of the world to validate the complex biological content. Compilation is further complicated by the fact that understanding of biodiversity is still far from adequate, resulting in many scientific names not yet being in a 1:1 relationship with species. Much further taxonomic research by experts is needed to sort out such problems.
The collaborative input has dictated a distributed model for the Catalogue of Life. Even though a centralised model is more efficient computationally, it is sociologically very important to keep the individual data-sets close to the taxonomists who provide the expertise to update the species information. Another advantage of the distributed approach is that the work of aggregating taxonomic knowledge is going ahead in a massively parallel way, rather than in a serial fashion as would happen with a centralised approach.
The success of this distributed approach is seen in the fact that, since 2001, more than 880,000 species have been added to the Catalogue of Life: about 50 per cent of the world’s known species (Figure 1). The aim is to add the other 50 per cent by 2011, but these species mostly belong to poorly studied groups, especially amongst the insects, and so it will be a major challenge to reach 100 per cent in that time frame.
The Catalogue of Life is already proving useful as an index, even though it is not yet complete. For example, the Global Biodiversity Information Facility (GBIF) uses it as the taxonomic backbone for its web portal for biodiversity information (www.gbif.net), as do some members of GBIF for their local databases.
Interaction with other global programmes
Besides interaction with individual taxonomic experts and databases, the Catalogue of Life interacts strongly with a wide range of international and national bodies, as both supporters and users of this species index. Species 2000 began as a joint program between the Committee on Data for Science and Technology (CODATA) of the International Council for Science (ICSU), the International Union for Biological Sciences (IUBS) and the International Union of Microbiological Societies (IUMS) in the early 1990s, which led to a workshop funded by UNEP and the Global Environment Fund in Manila, the Philippines, in 1996. Funding for the Catalogue of Life comes from many sources, both directly to Species 2000 and ITIS and indirectly through their contributing members, with recent notable contributions from the European Union and GBIF. Species 2000 and its regional groups actively support all the above groups, as well as the Global Taxonomy Initiative (GTI) and other programmes of the Convention on Biological Diversity, and the International Working Group on Taxonomic Databases (TDWG).
Progress with the World List of Plant Species within the Catalogue of Life
Within the Catalogue of Life programme, Species 2000 is cooperating with the Royal Botanic Gardens Kew and other stakeholders to bring together the taxonomic sectors that will complete the working list of plant species for the GSPC Target 1. A workshop organised jointly between them in June 2004 started the process of evaluating existing and potential coverage of all groups of flowering plants in a gap analysis. Botanists from around the world spent two days assembling both their knowledge of ongoing databases and projects, and of groups of experts who might be able to assist. Conclusions drawn from examining the coverage map created included:
- Coverage of families: global checklists were done for 15 per cent of spp., in progress for 22 per cent, and in draft stages for 30 per cent.
- Families that were not started constitute approx. 33 per cent of species.
- Jointly planned activities of the Royal Botanic Gardens, Kew, Missouri Botanical Garden and New York Botanical Garden were likely to account for some 55 per cent of the total, leaving a “gap” of 45 per cent that needs both taxonomic expertise and co-ordination.
The following priorities were agreed:
- The larger missing sectors (thought to be Compositae (Asteraceae), Melastomataceae and Malvaceae) must be started urgently if there is to be any chance of even nearing completion by the 2010 target date.
- For the very many smaller and middle-sized families to be started or brought to completion, it is both an issue of focusing appropriate expertise on the task, and providing leadership, co-ordination and funding to the programme of work. A vigorous co-ordinating process, possibly from Species 2000, the International Organization for Plant Information (IOPI) or the Integrated Twxonomic Information System (ITIS) is needed for the 45 per cent of sectors needed from outside the Royal Botanic Gardens, Kew, Missouri Botanical Garden and New York Botanical Garden programme.
- The coverage map created at the workshop should be publicized and developed, working with the Species 2000 metadatabase and GBIF to keep track of who is doing what.
Since the 2004 workshop, significant progress has been made. RBG Kew has made steady progress with extending its series, World Checklist of Seed Plants, covering monocots and selected other groups. The message about the big gap for Compositae has been picked up by GBIF and it is funding a major project, which started in early 2006 and is led by Ilse Breitwieser in New Zealand, with partners in The International Compositae Alliance (TICA) in Europe, including the Bot. Garten and Bot. Museum Berlin-Dahlem, and in the Americas, including the Missouri Botanical Garden and the Smithsonian Institution. In other groups, Species 2000 has recently extended the coverage provided within the Catalogue of Life: as well as the extensive coverage provided by the RBG Kew World Checklist for a range of families, the Catalogue of Life now covers the algae (AlgaeBase), mosses (Missouri BG), conifers (A. Farjon), Leguminosae (the International Legume Database and Information System - ILDIS), Annonaceae (AnnonBase), Lecythidaceae (New York BG), and cycads and six flowering plant families from the IOPI Global Plant Checklist and Species Plantarum Programme (www.iopi.org). Regional datasets coming into the Catalogue of Life include Euro+Med PlantBase (part of the Catalogue of Life Regional Checklist for Europe) and the North American plants from ITIS and the PLANTS databases (Catalogue of Life Regional Checklist for N. America).
Challenges for the future
There are continuing challenges facing this project and taxonomy in general.
One is the need to integrate activities to avoid duplication of effort and to make best use of available funding. Species 2000 is implementing an organisational architecture that is capable of both creating a complete Catalogue of Life and maintaining its taxonomic enhancement through time. At a superficial level, this programme is about creating databases and continuing to maintain them, but underlying this is a serious proposal for self-organization within the taxonomic community and for rationalising and structuring taxonomic effort on a global and regional scale. The result of current initiatives is an exciting opportunity to generate endorsement and further resources where all efforts have failed in the past.
Another challenge is how to allow users to choose alternative classifications of species (where they exist) within the Catalogue of Life (Bisby 2003). More systematically aware users want to be able to choose which classification they use. Others just want a single, generally accepted classification that will allow them to communicate information about their species of interest using a set of stable, accepted names. Our data structure and user interface aim to allow all users to choose whichever of the available classifications they prefer for their group of birds or legumes or whatever.
The overarching challenge globally for taxonomy is to study and document the living species that are not yet known and named. The Global Taxonomic Initiative has emphasized the shortage of systematic /taxonomic resources (both people and natural history collections) to document the organisms of this world before extinction strikes. About 1.7 million species of organisms have been given scientific names, but anywhere from 2 to 50 million species or even more (DIVERSITAS 2000; Wilson 2003) are still not formally described, most of them micro-organisms or small invertebrates such as insects. Without names as unique tags for species, we are floundering to understand all the wonders of our biodiverse world, let alone to conserve and sustainably manage them.
- Bisby F. A. 2000. The quiet revolution: Biodiversity informatics and the Internet. Science 289: 2309-2312.
- Bisby F. A. 2003. Doing the impossible: Creating a stable species index and operating a common access system on the Internet. Preprints of the Metadiversity Conference Proceedings. (National Federation of Science Abstracting and Indexing Services (NFSAIS): Philadelphia). Also available as of 15 November 2006 at http://www.nfais.org/publications/metadiversity_preprints7.htm
- Bisby F. A., M. A. Ruggiero, Y. R. Roskov, M. Cachuela-Palacio, S. W. Kimani, P. M. Kirk, A. Soulier-Perkins and J. van Hertum (eds.) 2006. Species 2000 and ITIS Catalogue of Life 2006 Annual Checklist. CDROM and printed booklet (Species 2000: Reading). Also available as of 15 November 2006 at http://www.catalogueoflife.org/annual-checklist/2006/search.php
- DIVERSITAS. 2000. Implementing the GTI: Recommendations from DIVERSITAS core programme element 3, including an assessment of present knowledge of key species groups. Report UNEP/CBD/SBSTTA/4/INF6.
- Scotland R. W. and A. H. Wortley 2003. How many species of seed plants are there? Taxon 52: 101-104.
- Wilson E. O. 2003. The encyclopedia of life. Trends in Ecology and Evolution 18(2): 77-80.
- Wilson K. L., F. A. Bisby M. A. Ruggiero et al. 2005. Progress with the Species 2000 and ITIS Catalogue of Life. Proceedings of 2005 International Workshop on Integrated Biodiversity and Natural Specimens Databases and Forum of Species 2000 Asia-Oceania. Taichung, Taiwan, 30 Sep-2 Oct 2005: 9-17.
Karen L. Wilson
Postal address: National Herbarium of New South Wales, Royal Botanic Gardens Sydney, NSW 2000 Australia
Tel: +61 2 9231 8111
Fax: +61 2 9251 4403
Postal address: Plant Diversity and Systematics, School of Plant Sciences, University of Reading,Whiteknights, Reading
RG6 6AS, UK
Tel: +44 118 378 6437
Fax: +44 118 378 8106