SyncSchema - Synchronize a schema with its remote version in Chewie-NS
The SyncSchema module allows users to synchronize local schemas, previously downloaded from Chewie-NS, with their remote versions. All chewBBACA users can synchronize schemas to get the latest alleles added to Chewie-NS and to ensure that a common allele identifier nomenclature is maintained for the alleles that are common between the local instance and the remote schemas. We also provide the option to submit novel alleles, that were identified locally and are not present in Chewie-NS.
Important
Only authorized users can submit new local alleles to update remote schemas, although all users can download schemas and novel alleles from the Chewie-NS public server. Please send a request to imm-bioinfo@medicina.ulisboa.pt if you wish to submit novel alleles.
To synchronize a local schema with its remote Chewie-NS public server version it is only necessary to provide the path to the schema directory. The simplicity of the process is ensured by a configuration file, present in all schemas downloaded from Chewie-NS, that contains the identifier of the schema in Chewie-NS and the last modification date of the schema.
Configuration file content
['2020-06-30T19:10:37.466104', 'http://chewbbaca.online/NS/api/species/9/schemas/1']
Novel alleles identified locally are added to the schema with a *
preceding their integer
identifier to indicate these are temporary designations. In the example below alleles 4 to 7
were detected locally but were not present in the database that had been retrieved from the
Chewie-NS public server.
Local FASTA file example
>prefix-018550_1
ATGAGCAAGCCTAATGTTGTTCAGTTAAATAATCAATATATTAACGATGAGAATCTAAAAAAACGTTACGAAGCTGAGGAGTTACGCTAA
>prefix-018550_2
ATGAGCAAGCCTAATGTTGTTCAGTTAAATAATCAATATATTAACGATGAGAATCTAAAAAAACGTTACGAAGCTGAGGAGTTACGCTAA
>prefix-018550_3
ATGAGCAAGCCTAATGTTGTTCAGTTAAATAATCAATATATTAACGATGAGAATCTAAAAAAACGTTACGAAGCTGAGGAGTTACGCTAA
>prefix-018550_*4
ATGAGCAAGCCTAATGTTGTTCAGTTAAATAATCAATATATTAACGATGAGAATCTAAAAAAACGTTACGAAGCTGAGGAGTTACGCTAA
>prefix-018550_*5
ATGAGCAAGCCTAATGTTGTTCAGTTAAATAATCAATATATTAACGATGAGAATCTAAAAAAACGTTACGAAGCTGAGGAGTTACGCTAA
>prefix-018550_*6
ATGAGCAAGCCTAATGTTGTTCAGTTAAATAATCAATATATTAACGATGAGAATCTAAAAAAACGTTACGAAGCTGAGGAGTTACGCTAA
>prefix-018550_*7
ATGAGCAAGCCTAATGTTGTTCAGTTAAATAATCAATATATTAACGATGAGAATCTAAAAAAACGTTACGAAGCTGAGGAGTTACGCTAA
On synchronizing, novel alleles retrieved from Chewie-NS are compared to novel local alleles and
the process reassigns allele identifiers to ensure that the alleles common to local and remote
schemas have the same identifiers. Local alleles that are not in Chewie-NS are shifted to the last
positions in the FASTA files and keep a *
in the identifier. If the user wants to submit those
alleles (only available to authorized users) it should add the --submit
flag and the necessary
data will be collected and uploaded to Chewie-NS. Chewie-NS will return the identifiers assigned to
the submitted alleles and the local process will remove the ‘*’ from the submitted alleles and
assign the permanent identifiers. If the process retrieves new alleles from Chewie-NS, it will
redetermine representative sequences ONLY for the loci in the local schema that were altered
by the synchronization process.
Important
It is strongly advised that users adjust the value of the --cpu
argument in order to
accelerate the determination of representative sequences.
Basic Usage
If we want to synchronize a local schema we only need to provide the path to the directory that contains the schema:
$ chewBBACA.py SyncSchema -i path/to/SchemaFolder
The --submit
argument allows users to submit novel alleles in their local schema to the
Chewie-NS public server:
$ chewBBACA.py SyncSchema -i path/to/SchemaFolder --submit
Parameters
-sc, --schema-directory (Required) Path to the directory with the schema to be synced (default: None).
--cpu, --cpu-cores (Optional) Number of CPU cores/threads that will be used to run the process
(chewie resets to a lower value if it is equal to or exceeds the total
number of available CPU cores/threads). This value is only used if the
process retrieves novel alleles from the remote schema and needs to
redetermine the set of representative alleles for the local schema (default: 1).
--ns, --nomenclature-server (Optional) The base URL for the Chewie-NS instance. The default
option will get the base URL from the schema's URI. It is also
possible to specify other options that are available in chewBBACA's
configs, such as: "main" will establish a connection to "https://chewbbaca.online/",
"tutorial" to "https://tutorial.chewbbaca.online/" and "local" to
"http://127.0.0.1:5000/NS/api/" (localhost). Users may also provide
the IP address to other Chewie-NS instances (default: None).
--b, --blast-path Path to the directory that contains the BLAST executables (default:None).
--submit If the process should identify new alleles in the local schema and
send them to the Chewie-NS instance. (only authorized users can submit new alleles)
(default: False).