GetAlleles - Create FASTA files containing the alleles identified by the AlleleCall module =========================================================================================== The *GetAlleles* module is used to create FASTA files containing the alleles identified by the *AlleleCall* module. Given a TSV file with the allelic profiles, this module generates FASTA files for each gene/locus, including the nucleotide sequences and optionally the translated sequences (if the ``--translate`` option is provided). If the ``--distinct`` option is provided, the module will only include distinct alleles in the output files (this also applies to the FASTA files created when using the ``--translate`` option). If a TXT file with a list of gene/locus identifiers is provided through the ``--genes-list`` parameter, the module will only create FASTA files for the specified genes/loci. Otherwise, the module will create FASTA files for all genes/loci present in the input TSV file. Basic Usage ----------- :: chewBBACA.py GetAlleles -i /path/to/AlleleCallResultsFolder/results_alleles.tsv -g /path/to/SchemaFolder -o /path/to/OutputFolder Parameters ---------- :: -i, --input-file (Required) Path to the TSV file containing the allelic profiles (default: None). -g, --schema-directory (Required) Path to the schema directory (default: None). --gl, --genes-list (Optional) Path to a file with the list of genes/loci to create FASTA files for. The file must include the identifiers of the loci, one per line, without the .fasta extension (default: None). -o, --output-directory (Required) Path to the output directory (default: None). --cpu, --cpu-cores (Optional) Number of CPU cores/threads that will be used to run the process (chewie resets to a lower value if it is equal to or exceeds the total number of available CPU cores/threads) (default: 1). --distinct (Optional) Only get distinct alleles (default: False). --translate (Optional) Create FASTA files with the translated alleles (default: False). --ta, --translation-table (Optional) Genetic code used to translate coding DNAsequences (CDSs). If no value is specified, the process tries to get the value stored in the schema config file. If the schema does not include a config file, the process uses the default translation table (11) (default: None). Outputs ::::::: :: OutputFolderName ├── fastas │ ├── locus1.fasta │ ├── ... │ └── locusN.fasta ├── translated_fastas │ ├── locus1_protein.fasta │ ├── ... │ └── locusN_protein.fasta └── summary_stats.tsv - The ``fastas`` folder contains the FASTA files for each gene/locus, named as ``locusID.fasta``, where *locusID* is the identifier of the locus. Each FASTA file includes the nucleotide sequences of the alleles identified in all strains listed in the input file. The header of each sequence includes the strain, locus, and allele identifier (e.g. ``>strainID_locusID_alleleID``). If the ``--distinct`` option is provided, only distinct alleles will be included in the FASTA files and the header will only include the locus and allele identifier (e.g. ``>locusID_alleleID``). - The ``translated_fastas`` folder contains the FASTA files for each gene/locus with the translated sequences, named as ``locusID_protein.fasta``. Each FASTA file includes the protein sequences of the alleles identified in all strains listed in the input file. The sequence headers follow the same format as the one used for the nucleotide sequences. - The ``summary_stats.tsv`` file contains, for each locus, the number of alleles in the schema, the number of samples in which the locus was identified, and the number of distinct alleles identified in the dataset.