
Chapter 13 DNA Methylation Sequencing Analysis Methods
13.1 Learning Objectives
13.2 What are the goals of analyzing DNA methylation?
To detect methylated cytosines (5mC), DNA samples are prepped using bisulfite (BS) conversion. This converts unmethylated cytosines into uracils and leaves methylated cytosines untouched. Probes are then designed to bind to either the uracil or the cytosine, representing the unmethylated and methylated cytosines respectively.
For a given sample, you will obtain a fraction, known as the Beta value, that indicates the relative abundance of the methylated and unmethylated versions of the sequence. Beta values exist then on a scale of 0 to 1 where 0 indicates none of this particular base is methylated in the sample and 1 indicates all are methylated.
Note that bisulfite conversion alone will not distinguish between 5mC and 5hmC though these often may indicate different biological mechanics.
Additionally, 5-hydroxymethylated cytosines (5hmC) can also be detected by oxidative bisulfite sequencing (OxBS) [Booth et al. (2013). oxidative bisulfite conversion measures both 5mC and 5hmC. If you want to identify 5hmC bases you either have to pair oxBS data with BS data OR you have to use Tet-assisted bisulfite (TAB) sequencing which will exclusively tag 5hmC bases (Yu et al. 2012).
13.3 Methylation data considerations
13.3.1 Beta values binomially distributed
Because beta values are a ratio, by their nature, they are not normally distributed data and should be treated appropriately. This means data models (like those used by the limma
package) built for RNA-seq data should not be used on methylation data. More accurately, Beta values follow a binomial distribution.
This generally involves applying a generalized linear model.
13.3.2 Measuring 5mC and/or 5hmC
If your data and questions are interested in both 5mC and 5hmC, you will have separate sequencing datasets for each sample for both the BS and OBS processed samples. 5mC is often a step toward 5hmC conversion and therefore the 5mC and 5hmC measurements are, by nature, not independent from each other. In theory, 5mC, 5hmC and unmethylated cytosines should add up to 1.
Because of this, its been proposed that the most appropriate way to model these data is to combine them together in a model (Kochmanski, Savonen, and Bernstein 2019).
13.4 Methylation data workflow
Like other sequencing methods, you will first need to start by quality control checks. Next, you will also need to align your sequences to the genome. Then, using the base calls, you will need to make methylation calls – which are methylated and which are not. This details of step depends on whether you are measuring 5mC and/or 5hmC methylation calls. Lastly, you will likely want to use your methylation calls as a whole to identify differentially methylated regions of interest.
13.5 Methylation tools!
13.5.1 Quality control
TODO: How should this be the same or different from general sequencing quality control checks? - FASTQC
13.5.2 Genome Alignment
TODO: How should this be the same or different from general sequencing alignment?
13.5.3 Methylation calls
TODO: What other packages/tools should be mentioned here?
13.5.4 Find regions of interest!
13.5.5 Annotation of regions of interest
TODO: How does this differ from annotating genomic regions in general?
13.6 More resources
- DNA methylation analysis with Galaxy tutorial
- The mint pipeline for analyzing methylation and hydroxymethylation data.
- Book chapter about finding methylation regions of interest