[Download a Geneious Pro 9.1.2 archive containing all the files used in this analysis]
After sequencing the ZIKV-002 challenge stock and noting differences from the Genbank MR766 sequence (including a 12nt in-frame deletion) we obtained two additional MR766 isolates. One comes from UTMB and the second is a seed stock from CDC from which we expanded the stock we used in ZIKV-002. All three stocks were deep sequenced by isolating viral RNA, preparing double-stranded cDNA, and then fragmenting the ds-cDNA with Nextera reagents. Libraries were deep sequenced on an Illumina miSeq.
1. Comparison of Genbank MR766 sequences
In our initial analysis of the ZIKV-002 challenge stock, we compared sequence reads with NCBI Genbank sequence LC002520 [Genbank]. This sequence was deposited in Genbank in 2014 and is the only full-length genome in Genbank that is found with the query 'Zika MR766' as of 10 April 2016. As I prepared this analysis, I noticed that there are five additional full-length polyprotein sequences of this virus stock in Genbank, but these are only obtained by searching with the query 'Zika "MR 766"':
To assess the similarity of these sequences to one another, I did a MUSCLE alignment (Geneious default parameters) of all 6 MR766 sequences alongside the Asian-lineage ZIKV PF/2013 challenge stock consensus sequence:
and made a tree with the Geneious Tree Builder
Looking at the alignment, a few observations are obvious:
- The sequence with accession DQ859059.1 looks nothing like the others at the nucleotide level. I would be cautious about using this as a reference for any MR766 studies unless its relationship with MR766 is clarified by the group that submitted it.
Both the frameshift and 12nt in-frame deletion observed in the ZIKV-002 challenge stock are present in a subset of the Genbank MR766 sequences
There are two pairs of identical sequences:
- KU720415.1 and HQ234498.1
- NC_012532.1 and AY632535.2 (NC_012532.1 is the NCBI RefSeq and is derived from AY632535.2)
These consensus level changes undoubtedly mask variants present within each virus preparation.
[Download Geneious Pro files used in this analysis]
2. Which Genbank reference is most similar to ZIKV-002 challenge stock?
Aligned Genbank MR766 sequences with MUSCLE, this time including the ZIKV-002 challenge stock consensus sequence defined previously. Made tree as described above. The closest Genbank sequence to the ZIKV-002 challenge is HQ234498.1 as shown below. Note that the biggest difference between these sequences is the 12nt deletion in ZIKV-002's challenge stock that is not represented in HQ234498.1:
3. Comparison of three MR766 isolates to HQ234498.1
I used the Zequencer workflow to map reads from three MR766 isolates using HQ234498.1 as the reference. Each of the virus isolates has variants relative to the reference, as shown below:
I plotted the frequency of these variants in plot.ly:
There are three high frequency variants relative to the reference. The first is a synonymous SNP at nt 993. The second is the aforementioned in-frame deletion beginning at nt 1327. It is present in at least 73% of reads in each of the three sequenced isolates. The third is another synonymous SNP at nt 10237.
4. What happens to these variants in vivo?
Shelby generated sequence data from one of the ZIKV-002 animals 2 days post-challenge. I mapped these reads against the HQ234498.1 reference using the Zequencer reference. Note that these reads were generated by fragmenting five overlapping RT-PCR amplicons instead of by fragmenting ds-cDNA.
Interestingly, the deletion present in the stock isolates is not present (or present in < 5% of stock sequences).
There are four variants detected in the 2 day sequence. The first is a synonymous SNP at nt2544 that is present in 39% of reads. It is not detected at >5% in any of the three sequenced stock isolates. The second is synonymous SNP at nt 4785 that is in 38% of reads and is also not in any of the three sequenced stock isolates. The third is a synonymous SNP at nt 5071 that is in 58% of the reads. It is in the ZIKV-002 challenge stock, but only at 10%. The fourth is a variant at nt 9475 that is in 40% of reads, but is not present in any of the stock isolates.
In other words, within two days of infection, the circulating virus sequence is amino acid identical to the HQ234498 reference sequence.
Anyone who is using viruses termed ZIKV MR766 needs to carefully examine the sequence composition of their stocks. Multiple viruses all termed MR766 may have different sequences and biological properties. In the case of the MR766 we are using in our studies, there is a deletion in the challenge stock that is strongly selected against quickly in vivo. There is a remote possibility that the use of unbiased vs. amplicon sequencing is biasing the detection of this and other variants, but based on our experience using these approaches with other viruses consider this unlikely. The amino acid sequence of viruses in the ZIKV-002 macaque at day 2 post-infection is identical to the amino acid sequence of the HQ234498 reference sequence, which will be used as the reference sequence for subsequent comparisons.