ZIKV-002 Challenge Stock Summary
Virus stock info: Zika virus/R.macaque-tc/UGA/1947/MR766-3329
ZIKV strain MR766 (GenBank:LC002520), originally isolated from a sentinel Rhesus monkey on 20 April 1947 in Zika Forest, Entebbe, Uganda with 149 suckling mouse brain passages and two rounds of amplification on Vero cells, was obtained from Brandy Russell (CDC, Ft. Collins, CO). Virus stocks were prepared by inoculation onto a confluent monolayer of C6/36 mosquito cells.
Harvest Date: 15 February 2016 Titer: 6.77 log10 PFU/ml
Unlike the challenge stock for ZIKV-001, where the consensus sequence of the challenge stock was identical to the Genbank sequence of the parental virus, there is one significant consensus-level change between the challenge stock and the Genbank sequence from ZIKV MR766. There are also several synonymous sites where variation is fixed in the challenge stock relative to the Genbank sequence. I do not know whether this is due to sequence changes in the source ZIKV MR766 virus that we received and amplified or whether these changes accumulated during in vitro expansion of the virus to prepare the challenge stock.
To map and call variants, I used a modified version of the Zequencer workflow I developed to analyze ZIKV-001 data. This version of the workflow:
- removes duplicate reads using the bbmap dedupe.sh script
- trims low quality sequence and removes adapter sequences from the ends of reads
- filters out reads shorter than 150bp after trimming
- maps reads to the reference sequence using the bbmap algorithm in local alignment mode, using the normal sensitivity preset
- calls variants supported by >5% of reads with a p-value < 10e-60 and a minimum strand bias P value of 10e-5 when exceeding 65% bias
This version of the workflow does not rely on any external plug-ins and can be run using only integrated plug-ins available in Geneious Pro 9.1.2.
Assessment of challenge stock variants
The most interesting region of variability involves a sequence at position 1430-1441 (relative to Genbank LC002520) where the majority of sequences have a 4 amino acid in-frame deletion. Reads that do not have the deletion have two non-synonymous nucleotide changes in the same region.
There are a number of putative variants in the 5' and 3' UTRs:
Here is a table of all of the variants observed in the challenge stock at >5% along with their predicted impact on protein function. Variants at sites within the region encoding the polyprotein are highlighted in yellow. Note that the only non-synonymous variants predicted to impact amino acid sequence are located in the same location as the deletion that is present in many reads. In other words, some reads have a deletion while others have two non-synonymous substitutions.
|Name||Type||Minimum||Maximum||Length||Amino Acid Change||CDS Position||Coverage||Protein Effect||Variant Frequency||Variant P-Value (approximate)|
|CGC||Polymorphism||15||17||3||55 -> 59||69.5% -> 72.7%||1.20E-99|
|CGC||Polymorphism||35||37||3||392 -> 396||19.4% -> 19.6%||4.00E-133|
|Polymorphism||1,430||1,441||12||TVND ->||1,324||17679 -> 18169||Deletion||79.8% -> 82.0%||0|
|T||Polymorphism||1,431||1,431||1||T -> I||1,325||18,049||Substitution||18.80%||0|
|C||Polymorphism||1,443||1,443||1||I -> T||1,337||18,126||Substitution||16.50%||0|
|CT||Polymorphism||10,611||10,612||2||5489 -> 5551||34.7% -> 34.9%||0|
|TC||Polymorphism||10,620||10,621||2||5066 -> 5074||37.6% -> 37.7%||0|
|CTG||Polymorphism||10,625||10,627||3||4877 -> 4905||39.4% -> 39.6%||0|
|CA||Polymorphism||10,637||10,638||2||4145 -> 4148||47.00%||0|
|CT||Polymorphism||10,642||10,643||2||3972 -> 3978||49.0% -> 49.1%||0|
|TTC||Polymorphism||10,653||10,655||3||3579 -> 3632||49.4% -> 50.0%||0|