We obtained a barcode Zika virus that had been generated by Greg Ebel at Colorado state. We used a Zika barcode that had degenerate bases at 8 positions between positions 4007 and 4030 in the genome, with reference to the KU501215 genbank sequence. These are located in the NS2A gene.
See: barcodeposition.png below
We sequenced the barcode stock using the protocol that was pioneered by Quick et. al.:
For each replicate, here is the number of input templates:
pjW232-WT: this is the Wild type clone stock: 1 x 10^6 templates
pjW236-C1_p0: this is the barcode virus stock: 1 x 10^6 templates
514982 day 2: 146 templates (the animal simply had a low viral load)
715132 day 3: 2865 templates
688387 day 3: 667 templates
776301 day 3: 13, 153 templates
The sequence data was analyzed using the Zequencer pipeline that was engineered by Dave and is attached as a compressed zip file below.
Specifically, this pipeline performs the following steps:
1. Trims and merges the paired reads from the FASTQ data.
2. Extracts 1000 reads (if they are present in the sequence data) spanning each of the 35 amplicons that were generated by the amplification protocol.
3. Maps the 1000 reads to a full reference genome.
4. Calls the SNP positions using SNPeff, and generates a VCF file.
5. Generates a BAM file that can be viewed in a program like Geneious.
Once a BAM file is created, it is opened in Geneious. The barcode region can be extracted and then duplicates identified in Geneious. A FASTA file containing the information about the number of each sequence is generated and then converted using a geneiousFASTAtoTSV.py script. The TSV files can be merged and a pivot table generated to assess how frequent each barcode sequence is present in the stocks, and then in samples isolated from animals.
In this experiment, we sequenced the barcode stock virus pjW236_C1, from passage 0 and passage 1. We also sequenced the day 2 time point from animal 514982, and the day 3 time point from animals 715132 and 688387 in ZIKV-022. We tried to sequence the day 3 time point from animal 514982, but the viral loads were too low to get consistent data across replicates. Lastly we sequenced the day 3 time point from the pregnant animal 776301 from ZIKV-021. All samples were sequenced in duplicate, where possible. For one of the replicates in animal 514982, we were only able to generate data from half the genome.
We generated a pivot table containing the information about each barcode in the stock and in the animal samples. The excel file is called Zikv-022&021_barcode+inocula.xlsx (see attachment). There is one tab labeled ‘ConditionalFormat’ which has the data colored on a scale of 0% to 100% in shades of red, and one tab labeled ‘0.1%orgreater’ which has every sample at 0.1% or greater labeled in red. Also note that the samples are sorted. They are first sorted based on the pjW232WT_p0_RepA clone and then on the pjW236_C1_p0_RepA inocula. Both are listed as most frequent to least frequent barcode.
The key points are as follows:
1. The most common barcodes are fairly similar in the stock and in the animals at this early time point.
2. There are some rarer barcodes that are detectable. We do not know how or if these will persist.
3. The barcode virus replicated as a population and the barcodes were maintained, rather than reverting back to wild type.
4. The barcodes that commonly appear in the animals infected with the barcode virus do not naturally develop in the animals infected with the WT clone.
5. In the animal with the lowest number of virus templates used in the experiment (Animal #514982), the frequencies of each barcode between replicates was not particularly consistent. This is a consequence of starting with such a low viral load. Barcode consistency improves when the number of input virus templates are higher.