Quick Start

The following sections contain information about how to create a schema from scratch and perform allele calling to determine the allelic profiles of a set of strains.

Create a schema

Option 1 - Genome assemblies

Include all genome assemblies (complete or draft genome assemblies in FASTA format) in a directory and adapt the following template command (you can also provide a file with the full paths to the input files, one full path per line):

chewBBACA.py CreateSchema -i InputAssembliesFolder -o OutputSchemaFolder --ptf ProdigalTrainingFile

Option 2 - Coding DNA Sequences

You can provide FASTA files with Coding DNA Sequences (CDSs) and skip the gene prediction step by passing the --cds parameter:

chewBBACA.py CreateSchema -i InputFilesFolder -o OutputSchemaFolder --ptf ProdigalTrainingFile --cds

Important

We recommend that you provide a Prodigal training file even when you provide FASTA files with CDSs. This will ensure that a training file is included in the schema for future use if needed.

Note

The CreateSchema module creates a schema seed with one representative allele per locus in the schema. To include more allele variants in the schema, we recommend starting by performing allele calling with the set of genome assemblies/CDSs used for schema creation.

Option 3 - Adapt an external schema

Include all loci files (one FASTA file per locus, each file contains all alleles for a specific locus) in a directory and adapt the following template command:

chewBBACA.py PrepExternalSchema -g ExternalSchemaFastaFiles -o OutputSchemaFolder --ptf ProdigalTrainingFile

Important

External schemas need to be processed to filter out sequences that do not meet a set of criteria applied to create every chewBBACA schema. This process might remove alleles or complete loci from the schemas. For more information see the page about the PrepExternalSchema module.

Perform allele calling

Determine the allelic profiles for genome assemblies:

chewBBACA.py AlleleCall -i InputAssembliesFolder -g OutputSchemaFolder/SchemaName -o OutputFolderName

Perform allele calling with a subset of the schema loci:

chewBBACA.py AlleleCall -i InputAssembliesFolder -g OutputSchemaFolder/SchemaName -o OutputFolderName --gl LociList.txt

Provide FASTA files with CDSs (one file per genome/strain):

chewBBACA.py AlleleCall -i InputFilesFolder -g OutputSchemaFolder/SchemaName -o OutputFolderName --cds

Important

  • The file passed to the --gl parameter must have one full path or one locus identifier, with or without the .fasta extension, per line (the locus identifier is the basename of the FASTA file that contains the locus alleles).

  • We strongly advise users to provide a Prodigal training file and to keep using the same training file to ensure consistent results (the training file used for schema creation is added to the schema’s directory and automatically detected at start of allele calling without the need to pass it to the --ptf parameter).

  • Use the --cpu parameter to enable parallelization and considerably reduce execution time.