Alignment module

This module contains rules for for carrying out sequence alignment, calculating summary statistics of the results, and generating plots.

Rules

Rule align[source]

Use MAFFT to align the combined sequence file against the project reference.

Input:
  • original – the combined sequence file generated from the combine rule

  • reference – the project reference sequence, provided during McCoy project creation

Config:
  • align.mafft – a list of command line arguments passed directly to MAFFT

    default: ['--6merpair', '--keeplength', '--addfragments']

  • align.threads – the number of threads (cores) to use for a single MAFFT call

    default: 4

  • align.resources – the resources to request when submitting to a cluster

    default: {'runtime': 10, 'mem_mb': 8000}

Output:

the aligned version of the original input file

Params:

the command-line arguments passed to MAFFT (set in align.mafft config entry)

Threads:

set to align.threads from the config file if present, else set by the number of cores available to the workflow (up-to threads_max)

Resources:

set to align.resources in the project config, if present

Conda:
channels:
  - bioconda
  - conda-forge
dependencies:
  - mafft==7.471
  - seqkit==2.1.0

Rule alignment_stats[source]
Conda:
channels:
  - jlsteenwyk
dependencies:
  - phykit
  - jlsteenwyk-biokit

Rule pairwise_identity_histogram[source]
Conda:
channels:
  - conda-forge
dependencies:
  - python=3.9
  - numpy
  - typer
  - pandas
  - plotly
  - pip
  - pip:
    - kaleido