M.Sc. Bioinformatics

These pages are not displaying properly because the Compatibility View in your Internet Explorer is enabled. We suggest that you remove 'fu-berlin.de' from your list of sites that have Compatibility View enabled.

  1. In Internet Explorer, press the 'Alt' key to display the Menu bar, or press and hold the address bar and select 'Menu bar'.
  2. Click 'Tools' and select 'Compatibility View settings'.
  3. Select 'fu-berlin.de' under 'Websites you've added to Compatibility View'.
  4. Click 'Remove'.

Typical workflow for seeking for mutations in patient DNA (Genomics)

In practical courses and work you very likely end up working with some kind of sequencing data. Be it in transcriptomics, sequence analysis, building networks of gene interactions, finding genes associated with diseases or one of many other topics, everything starts with genomics. At the moment the most popular sequencing method for RNA expression is RNA-seq; when looking for interactions of DNA with DNA or with proteins the usual approach is by using immunoprecipitation followed by sequencing (e.g. ChIP-seq). In this exercise we want to find genomic variations in people with a certain disease compared to healthy people.

Sort the following steps of the workflow into the right order.

Whole Genome Mapping

Raw Data Generation

Raw Data Analysis

Variant Annotation

Variant Calling

Usually wet-lab biologists generate the raw data but we should still know how it was done. In the raw data analysis we check the quality and can make some general statistics. Next we have to map our reads to the genome and compare the genomic regions of disease samples with a healthy reference to identify possible causes for diseases. Finally we can annotate them - do they overlap an already known gene, or regulatory region, what is the function of that region? Thus, the right order of working steps is as follows:

  • Raw Data Generation
  • Raw Data Analysis
  • Whole Genome Mapping
  • Variant Calling
  • Variant Annotation

When working with sequencing data we should know what it can be used for, what biological components are used in the experiments, how the experiments are conducted and thus what mistakes, artefacts or biases can occur. Raw data should always be checked for quality and an appropriate normalization performed. If possible pool together or compare between replicates.

Make yourself acquainted with already existing programs for processing and evaluating data as well as the file formats that have been developed along with them. Two collections of programs that are worth knowing are samtools and bedtools.

Associate the following input types with each of the processes.

Raw Data Generation


samples with certain conditions to be compared

Raw Data Analysis


single nucleotide information

Whole Genome Mapping


files containing the sequences of the reads with quality scores

Variant Calling


Sequence Alignment Map: positions in the genome with quality information

Variant Annotation


Variant Call Format, contains positions in the genome with possible genotype information


Prepared samples


Intensity files


SAM files


Fastq files


VCF files

You get feedback for each answer by clicking on the button.