What is DNA-Barcoding?

DNA-Barcoding is a genetic technique based on using a short sequence of DNA to identify the organism to which it belongs. Thus, the sequence acts as a label for a particular species and is deposited in public databases.

The markers used in barcoding should have enough variability to allow unambiguous identification of species, but they have to be flanked by conserved regions so that primers can be easily designed to amplify them.

With the generalization of DNA-Barcoding to a large number of organisms covering a broad taxonomic range, it becomes  possible to generate databases with the genetic sequences for each marker for different species. This allows extending the technique to the so-called metabarcoding.

What is Metabarcoding? 

With the advances in sequencing technology, in particular the advent of high throughput techniques, it became possible to sequence heterogeneous DNA templates containing multiple different DNA fragments. This opened the field to metabarcoding studies.

In metabarcoding, the composition of an entire community can be taxonomically resolved from environmental samples, from which the DNA can be obtained, the chosen marker amplified and sequenced for the myriad species that left their DNA in the environment. These sequences are then compared with the databases generated by DNA-barcoding to identify the organisms present to the greatest taxonomic detail possible.

Metabarcoding techniques are revolutionizing the way we monitor biological communities. With this technique it is possible to identify and assign the different fragments of DNA of an environmental DNA sample, which includes community DNA (from living organisms) and also extra organismic DNA, from dead organisms, tissues, exudates, reproductive material, faecal material or extracellular DNA, including DNA dissolved in water.


Metabarcoding of hard bottom benthic communities

Metabarcoding methods have been applied to relatively homogeneous substrates, such as plankton, marine sediments or soil. In complex, heterogeneous substrates such as hard bottom benthic communities, these methods haven’t been well developed yet.  That is because a quantitative and efficient sampling of these communities is not straightforward. Furthermore,the lack of universal primers that can be applied to the large range of taxonomic groups of these communities, the necessity of complex bioinformatic algorithms to work with a large quantity of sequences, and the information gaps in the molecular databases hinder the application of metabarcoding to complex sublittoral communities.


Hard bottom sciaphilic macroalgal community
(Cabrera Archipelago National Park) 


With the development of technology and sampling methods these obstacles are being overcome. Some of the advances that are helping the development of this technique are: a repeatable process for the sampling and DNA extraction, new primers for the amplification of highly variable genetic markers (such as COI, Citocrome oxidase 1, from the mitochondria, or the ribosomal subunit  rDNA18S), bioinformatic algorithms to work with the Molecular Operational Taxonomic Units (MOTUs) resulting and the existence of public databases for molecular taxonomy.

SAMPLING

A representative sampling of the structure and complexity of the community is a problem for the study of hard bottom benthic communities. In these complex and heterogeneous communities there is a need of replicate samples to adequately cover the extant complexity. Also, sampling too many replicates is to be avoided because of the destructive techniques used and the complex laboratory treatment the samples need.

Another factor in complex benthic metabarcoding samples is the large size range of the organisms in the marine communities. Metabarcoding techniques have usually been applied to organisms or roughly similar sizes, but in complex communities such as macroalgal forests there are orders of magnitude  in size differences between organisms, ranging from macro-organisms such as corals or sponges to microorganisms. There is also a wide range of substrates and types of organisms, which makes the preparation and treatment of samples for extracting DNA difficult.

To get the samples, communities are scraped to bare rock using hammer and chisel by scuba divers, using a standard sampling unit ( 25x25 cm squares in algal communities and 20cm diameter corers in Maerl bottoms ). All the material obtained is collected in zip plastic bags underwater.

These samples are stored in alcohol after sampling to correctly preserve the genetic material for downstream treatment in the laboratory.


 
Communities before the sampling, Sciaphilic Macroalgal community and Maerl bottom ( Cíes islands, Atlantic Islands of Galicia National Park)

SAMPLE PROCESSING AND DNA EXTRACTION

The samples are size-fractionated using 3 sieves of different sizes, with meshes of 10 mm, 1mm and 63 µm, where samples are filtered and washed with water. Then the fractions are homogenized with a blender and stored in ethanol until the DNA extraction.
In the washing and filtering process, particles smaller than 63 microns are not retained. Thus, extra-organismic DNA and DNA from small organisms (including bacteria) will practically disappear from the sample.

        

Sample filtering using three sieves of different mesh sizes and final samples prior to seceunciation.

GENETIC MARKERS

In order to perform metabarcoding, an appropriate genetic marker has to be chosen, in a genomic region with enough variability to allow distinguishing between species and taxonomic levels.  Mitochondrial, chloroplast, or ribosomal DNA fragments are ideal for this. When choosing a genetic marker the length of the fragment and the design of primers are two key factors to consider for successful results. Short fragments have less information but will make the metabarcoding task easier and error rates will decrease. The design of primers of the chosen fragment will depend on the objective and target of the investigation, that require more specific or universal primers. Markers such as COI ( Cytochrome Oxidase 1) and 18S (rDNA) are the most used in eukaryotic metabarcoding.

HIGH THROUGHPUT SECUENCING AND BIOINFORMATIC ANALYSIS

Once the genetic marker has been chosen, the DNA samples are processed in a PCR that amplifies it in each sample separately. The genetic markers also include a short genetic tag for each sample in order to recover during the analysis to which sample each sequence belongs.

There are different high-throughput sequencing technologies for metabarcoding that vary in the number and length of the DNA sequence reads. Paired-end sequencing allows reading a small DNA fragment  from both ends of a sequence, which provides two overlapping reads for each sequence that are paired during the bioinformatic analysis.

The analysis of data obtained in the massive sequencing can be done using different bioinformatic methods, generally following these steps:
  • Quality control and reading of genetic labels to assign each read to its original sample.
  • Grouping of the sequences that pass the quality control in MOTUs ( Molecular identifiers for species (Molecular Operational Taxonomic Unit)).
  • Assignment of MOTUs to a taxonomic group at different levels (species, genus, family, etc).
These processes require a reliable database that integrates classical taxonomy and barcoding results. Errors at different steps should be pruned from the final dataset. These errors include sequencing errors, poor DNA template quantity or quality, or amplification of non-target organisms (bacteria, symbionts…). Depending on the genetic marker used , there are different databases : for the COI marker  we usually use the BOLD project database (Barcode Of Life Data Systems).

Taxonomic assignment also entails some problems, usually linked to errors or lack of completeness of the databases. This can cause wrong taxonomic assignment of some sequences and usually results in a higher than ideal taxonomic level of identification. With the improvement and development of databases these assignments are becoming more correct and reliable.

To finish the final matrix of data a series of corrections and calibrations are made based on control samples, normalization of the results, and a manual elimination of inconsistent results (usually contamination sequences from human origin or other false positives)


RESULTS

The basic result of metabarcoding is a table of all MOTUs (Molecular units equivalent to species) detected in each sample and their abundance in number of sequences. Then the taxonomic groups are assigned based on the degree of similarity with nearby sequences at the database. These tables indicate the presence and abundance of the taxonomic group assigned to a MOTU. The tables are then analysed using multivariate techniques common to all community studies.

The genetic techniques are usually accompanied by ecological and environmental studies to get more comprehensive and integrated results.


FUTURE OF METABARCODING

Genetic monitoring methods using metabarcoding in marine communities is a fast growing field of study, with more and more studies and works in the field. These techniques complete traditional taxonomic and ecological studies with a molecular perspective, and are becoming a very useful and effective tool for biodiversity assessment.

Molecular monitoring usually identifies a greater number of species and taxa in the studied communities than a morphological monitoring, with which many organisms would go unnoticed, such as small-sized species, endosymbionts or other types of meiofauna that could play key roles for the ecosystem. As no taxonomic expertise is required, these molecular identification methods add a point of objectivity to the assignment of species and also provide information on inter and intraspecific variability between organisms. 

With the increase in barcoding efforts, the reference databases are growing and improving, which will allow the identification of species and sequences that cannot be presently assigned to any taxon because of a lack of data, in a near future.

Improvements in sampling and sample processing methods, in DNA sequencing and amplification technology, in bioinformatic processes and  the increase in the number of studies in this field will likely revolutionize biodiversity studies by making them faster, simpler and more effective.


BIGPARK PROJECT AND METABARCODING

In this project we continue a genetic monitoring using metabarcoding in hard bottom marine communities, initiated in previous projects (Metabarpark 2014), focusing in communities with the influence of invasive algae in the Spanish Marine National Parks of Cabrera Archipelago and Cíes Islands of the Atlantic Islands of Galicia National Park.

Until recently there were no studies of complex hard bottom marine communities using metabarcoding. In  the previous project (METABARPARK), the technique and methodology to be applied to these communities was developed and tested and a monitoring of representative communities of the two National Parks  was started. With the BigPark project, the monitoring and study of these communities is extended to obtain a time series of more than 6 years (2014-2021) for the evolution of the target communities. More recently, a genetic analysis of ichthyofauna from water samples has been added to the project.

In both parks we are studying two types of communities: Shallow rocky bottoms occupied by photophilic algae and circalittoral bottoms occupied by a sciaphilic algae, and in the Cíes Islands a monitoring of Maerl bottoms is also being conducted. In these different bottoms we are monitoring communities with and without invasive algae, including Caulerpa cylindracea, Lophocladia lallemandii and Asparagopsis armata, among others.

  
Invaded communities by Caulerpa cylindracea ( PN Cabrera), Lophocladia lallemandii (PN Cabrera) and Asparagopsis armata ( PN Islas Atlánticas)

Sampling campaigns have been already made in both Parks, to collect community samples and preserve them to be treated later in the laboratory at CEAB ( Center for Advanced Studies of Blanes), where environmental DNA from the samples is extracted and amplified and then sent to the Tromso University in Norway where the sequencing is done.

The results of these genetic analyses, combined with ecological and environmental studies, will enable a complete and integrated monitoring of these communities and ecosystems.