Quickstart

Warning

This page is under construction!

Install McCoy

If you haven’t done so yet, install McCoy and Mamba following the installation instructions.

Create a project

A new project can be created using:

mccoy create <project_name> \
    --reference <reference_fasta_file> \
    --template <beast2_template_file>

Where <project_name>, <reference_fasta_file>, and <beast2_template_file> are replaced appropriately.

A new directory will be created with the following contents:

├── config.yaml            # <- A configuration file, used to tweak the                          #    parameters of each step in the pipeline
├── resources
│   ├── reference.fasta    # <- Copy of the reference genome   └── template.xml       # <- Copy of the Beast2 template
│
├── runs                   # <- An empty directory where runs will be stored
└── tests.py               # <- Bare-bones quality control test suite

Configure your project

Due to the large diversity in possible phylodynamic analyses, you will almost certainly need to customise the McCoy project to fit your needs. McCoy has been designed so that as much of that customisation as possible can be made using the project config.yaml or command line arguments.

The default config file looks something like this:

Config options applied to all workflow rules
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
all:
  # TODO: check this out...
  threads_max: 64                 # If a rule does not have `threads` specified below then **never** use more than this for a single rule, even if more cores are available.
  update_default_resources: null  # Additional default-resources to extend those defined in the profile being used
 
align:                            # Options for the alignemnt rule
  mafft:                          # Command line options passed directly to mafft
    - "--6merpair"
    - "--keeplength"
    - "--addfragments"
  threads: 4                      # Use up to this many threads
  resources:                      # Only used for submission to an HPC queue
    runtime: 10                   # minutes
    mem_mb: 8000

tree:                             # Options for the tree rule
  iqtree2:                        # Command line options passed directly to iqtree2
    - "-nt AUTO"
    - "-m HKY+G"
    - "-bb 1000"
    - "-seed 847202098"
  threads: 10                     # Use up to this many threads
  resources:                      # Only used when called with --cluster
    runtime: 20                   # minutes
    mem_mb: 16000

beast:                            # Options for the beast rule
  beast:                          # Command line options passed directly to beast
     - "-beagle_SSE"              # This will use the SSE extensions in BEAGLE if available
     - "-seed 98202873"
  dynamic:                        # Values for the dynamic template parameters
      - "ePopSize=0.3"
      - "Date.dateFormat=yyyy/MM/dd"
      - "Date.delimiter=_"
      - "mutationRate=1.0"
      - "mcmc.chainLength=10000"
      - "treelog.logEvery=1000"
      - "tracelog.logEvery=1000"
  threads: 6                      # Use up to this many threads
  resources:                      # Only used for submission to an HPC queue
    runtimtime: 1200              # minutes
    mem_mb: 16000

Let’s break down some of the key sections…

The first block is called all and applies to all rules in the workflow. threads_max allows setting the absolute maximum number of threads / cores used by any single rule on a single machine, regardless of what is available. If you are running on an HPC system and want to make use of nodes with more than 64 cores then you may want to raise this value.

Warning

When discussing “threads” and “cores”, we attempt to remain as close to the definitions of Snakemake as possible. However, the Snakemake definitions can be quite confusing and the meaning of “thread” changes depending on the context (e.g. running on a cluster or locally). If in doubt, we recommend always treating a “thread” as a physical cpu-core unless you have a specific reason not to and know what you are doing. This is the default definition in McCoy.

When running on an HPC cluster, you can also use update_default_resources to set defaults for the requested resources of all of your jobs. This is very useful to, for example, set the default account to which your resource usage will be charged:

update_default_resources:
  - account='proj00577'

After this, we have one block for each rule of the workflow with config options: align, tree, and beast. See the documentation for each of these rules for information.

One key parameter which you will see repeated is threads. When running locally, this corresponds to the maximum number of CPU cores allocated (remembering that we will never exceed all.threads_max). This value will automatically be adjusted to the total number of cores available on the machine if more than that number is requested. When running on an HPC cluster, McCoy uses threads to set the SLURM cpus-per-task value [[1]]. For more information about running McCoy on HPC systems, check out the HPC page.

Run the McCoy workflow

Now that your project is configured, it’s time to run the McCoy workflow:

mccoy run <project_dir> --data <fasta_file>

All results will be stored in <project_dir>/runs/run_1.

To see all of the available options for the mccoy run command use:

mccoy run --help

A few notable options include:

  • --inherit & --inherit-last

    These will be discussed in the following section.

  • --config

    This allows overriding the values set in the config.yaml file. For example --config='all.update_default_resources=["account=proj00577"]'.

  • --hpc

    Run McCoy by submitting jobs to an HPC SLURM scheduler. See the Running on HPC page for more info.

Important

Any options or arguments not listed in mccoy run --help are forwarded on to Snakemake. This provides power-users with the ability to fully tailor how the workflow runs.

Check the results

Upon the successful completion of a run, McCoy will generate an html report with a number of results and diagnostics from each stage of the workflow.

_images/report_example.png

Updating a run with new data

One of the key features of McCoy is the ability to add new sequences and continue from the results of a previous run, without starting from scratch. This is achieved using online-BEAST.

To inherit the results from the last run of McCoy, simply use:

mccoy run <project_dir> --data <new_sequences_fasta> --inherit-last

To inherit from a different run:

mccoy run <project_dir> --data <new_sequences_fasta> --inherit <project_dir>/runs/run_<N>