Cloud-based mNGS Analysis: Virus Sequence Comparison in 60 Seconds

Preparation

e.g.
ossutil cp ICU6G_S2_L001_R1_001.fastq.gz oss://my-test-shenzhen/cov2-samples/
ossutil cp ICU6G_S2_L001_R2_001.fastq.gz oss://my-test-shenzhen/cov2-samples/
Usage:
ags remote run rna-mapping \ # <rna-mapping>: RNA 序列的比对任务
--region cn-shenzhen \ # <cn-shenzhen|cn-beijing|...>: 地域ID,目前支持深圳和北京。
--bucket my-test-shenzhen \ # <bucket_name> 对象存储bucket的名称
--fastq1 cov2-samples/ICU6G_S2_L001_R1_001.fastq.gz \ # 双端测序数据fq1相对路径
--fastq2 cov2-samples/ICU6G_S2_L001_R2_001.fastq.gz \ # 双端测序数据fq2的相对路径
--output-bam bam/ICU6G_S2.bam \ #产出比对结果bam的输出路径,报告也在同样位置,以.txt结尾
--reference [sars-cov-2 | betacov-ncbi-39 | <path of RNA library reference in specified bucket >] # 参考序列预置了新型冠状病毒sars-cov-2和目前已经知道的39种betacov的冠状病毒,可以指定自定义的病毒序列库

COVID-19 Comparison

ags remote run rna-mapping \
--region cn-shenzhen \
--fastq1 cov2-samples/ICU6G_S2_L001_R1_001.fastq.gz \
--fastq2 cov2-samples/ICU6G_S2_L001_R2_001.fastq.gz \
--bucket my-test-shenzhen \
--output-bam bam/ICU6G_S2.bam \
--reference sars-cov-2
INFO[0002] {"JobName":"rna-mapping-gpu-2ms6w"}
INFO[0002] Job submit succeed
High Quality Mapped Reads is: 3629
Matched reads in orf1ab range is: 480
Matched reads in orf1ab range with alignment score (AS) is greater than 120: 404
feature sequence of ICU6G_S2_L001 is similar to SARS-CoV-2 with very high mappQ and AS reads: True
ags remote get rna-mapping-gpu-2ms6w --show
+-----------------------+------------------+-----------+-------------------------------+----------+-------------------------------+-------------+-------------+
| JOB NAME | JOB NAMESPACE | STATUS | CREATE TIME | DURATION | FINISH TIME | TOTAL READS | TOTAL BASES |
+-----------------------+------------------+-----------+-------------------------------+----------+-------------------------------+-------------+-------------+
| rna-mapping-gpu-2ms6w | XXXXXXXXXXXX | Succeeded | 2020-03-04 16:40:30 +0800 CST | 43s | 2020-03-04 16:41:13 +0800 CST | 10369818 | 1456539874 |
+-----------------------+------------------+-----------+-------------------------------+----------+-------------------------------+-------------+-------------+
+---------------------------------+--------------------------------------------+
| JOB DETAIL | |
+---------------------------------+--------------------------------------------+
| rna_matached_reads | 480 |
| rna_is_sars_cov2 | True |
| rna_mapping_oss_region | cn-shenzhen |
| rna_mapping_fastq_second_name | cov2-samples/ICU6G_S2_L001_R2_001.fastq.gz |
| rna_mapping_no_unmapped | |
| rna_mapping_service | s |
| rna_matached_reads_alignment | 404 |
| rna_high_quality_mapped | 3629 |
| rna_mapping_fastq_first_name | cov2-samples/ICU6G_S2_L001_R1_001.fastq.gz |
| rna_mapping_mark_dup | |
| rna_mapping_reference_file_name | sars-cov-2 |
| rna_cov_detail_file | bam/ICU6G_S2.bam.cov.txt |
| rna_mapping_bam_file_name | bam/ICU6G_S2.bam |
| rna_mapping_bucket_name | my-test-shenzhen |
+---------------------------------+--------------------------------------------+
ossutil ls oss://my-test-shenzhen/bam/ICU6G_S2.bam
LastModifiedTime Size(B) StorageClass ETAG ObjectName
2020-03-04 16:41:11 +0800 CST 356320 Standard 9596D012A30438A0073A2A0B38F5D578 oss://my-test-shenzhen/bam/ICU6G_S2.bam
2020-03-04 16:41:11 +0800 CST 2889 Standard 63175E7180D110BA9D3BAB34F4313C59 oss://my-test-shenzhen/bam/ICU6G_S2.bam.cov.txt
2020-03-04 16:41:11 +0800 CST 396 Standard 940D51FF7ECFF60B5E5A41D1F635180D oss://my-test-shenzhen/bam/ICU6G_S2.bam.summary.json
ossutil cp oss://my-test-shenzhen/bam/HKU2_160660.summary.json .
ossutil cp -r oss://my-test-shenzhen/bam/ICU6G_S2.bam.cov.txt .
ossutil cp oss://my-test-shenzhen/bam/HKU2_160660.bam .
cat bam/ICU6G_S2.bam.cov.txtSummary:
High Quality Mapped Reads is: 3629
Matched reads in orf1ab range is: 480
Matched reads in orf1ab range with alignment score (AS) is greater than 120: 404
/data/cov2-samples_ICU6G_S2_L001_R1_001.fastq.gz-output/ICU6G_S2.bam is similar to SARS-CoV-2 with very high mappQ and AS reads: True
21571 21581 21591 21601 21611 21621 21631
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
ATGTT GTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGT CAATTACCCCCTGC
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGA CCCCCTGC
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
ATGTTTGTTTTTCTTGTTTTATTGCCA agtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
atgtttgtttttcttgttttattgcca AGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
TGTTTGTTTTTCTTGTTTT CACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
tgtttgtttttcttgtttt
TTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
gtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc

Further Analysis of the Comparison Data

Comparison with 39 Known Beta Coronaviruses

ags remote run rna-mapping \
--region cn-shenzhen \
--fastq1 cov2-samples/ICU6G_S2_L001_R1_001.fastq.gz \
--fastq2 cov2-samples/ICU6G_S2_L001_R2_001.fastq.gz \
--bucket my-test-shenzhen \
--output-bam bam/ICU6G_S2_virus.bam \
--reference betacov-ncbi-39
INFO[0011] {"JobName":"rna-mapping-gpu-6mpcc"}
INFO[0011] Job submit succeed
ags remote get rna-mapping-gpu-6mpcc --show
+-----------------------+------------------+-----------+-------------------------------+----------+-------------------------------+-------------+-------------+
| JOB NAME | JOB NAMESPACE | STATUS | CREATE TIME | DURATION | FINISH TIME | TOTAL READS | TOTAL BASES |
+-----------------------+------------------+-----------+-------------------------------+----------+-------------------------------+-------------+-------------+
| rna-mapping-gpu-6mpcc | XXXXXXXXX | Succeeded | 2020-03-04 17:36:21 +0800 CST | 40s | 2020-03-04 17:37:01 +0800 CST | 10369818 | 1456539874 |
+-----------------------+------------------+-----------+-------------------------------+----------+-------------------------------+-------------+-------------+
# 2014 mapped reads detected, but no mapped reads found in range
+---------------------------------+--------------------------------------------+
| JOB DETAIL | |
+---------------------------------+--------------------------------------------+
| rna_mapping_reference_file_name | betacov-ncbi-39 |
| rna_matached_reads_alignment | 0 |
| rna_mapping_bam_file_name | bam/ICU6G_S2_virus.bam |
| rna_mapping_fastq_first_name | cov2-samples/ICU6G_S2_L001_R1_001.fastq.gz |
| rna_mapping_oss_region | cn-shenzhen |
| rna_cov_detail_file | bam/ICU6G_S2_virus.bam.cov.txt |
| rna_mapping_no_unmapped | |
| rna_matached_reads | 0 |
| rna_mapping_mark_dup | |
| rna_mapping_service | s |
| rna_high_quality_mapped | 2014 |
| rna_mapping_bucket_name | my-test-shenzhen |
| rna_mapping_fastq_second_name | cov2-samples/ICU6G_S2_L001_R2_001.fastq.gz |
| rna_is_sars_cov2 | False |
+---------------------------------+--------------------------------------------+

Using Custom Virus Databases for Comparison

ossutil cp betacov-ncbi-test.fa oss://my-test-shenzhen/ref/
ags remote run rna-mapping \
--region cn-shenzhen \
--fastq1 cov2-samples/ICU6G_S2_L001_R1_001.fastq.gz \
--fastq2 cov2-samples/ICU6G_S2_L001_R2_001.fastq.gz \
--bucket my-test-shenzhen \
--output-bam bam/ICU6G_S2_virus.bam \
--reference ref/betacov-ncbi-test.fa
INFO[0002] {"JobName":"rna-mapping-gpu-69mwb"}
INFO[0002] Job submit succeed
ags remote get rna-mapping-gpu-69mwb --show
+-----------------------+------------------+-----------+-------------------------------+----------+-------------------------------+-------------+-------------+
| JOB NAME | JOB NAMESPACE | STATUS | CREATE TIME | DURATION | FINISH TIME | TOTAL READS | TOTAL BASES |
+-----------------------+------------------+-----------+-------------------------------+----------+-------------------------------+-------------+-------------+
| rna-mapping-gpu-69mwb | 1365606736606053 | Succeeded | 2020-03-04 17:47:00 +0800 CST | 40s | 2020-03-04 17:47:40 +0800 CST | 10369818 | 1456539874 |
+-----------------------+------------------+-----------+-------------------------------+----------+-------------------------------+-------------+-------------+
+---------------------------------+--------------------------------------------+
| JOB DETAIL | |
+---------------------------------+--------------------------------------------+
| rna_mapping_fastq_first_name | cov2-samples/ICU6G_S2_L001_R1_001.fastq.gz |
| rna_mapping_fastq_second_name | cov2-samples/ICU6G_S2_L001_R2_001.fastq.gz |
| rna_mapping_mark_dup | |
| rna_mapping_oss_region | cn-shenzhen |
| rna_cov_detail_file | bam/ICU6G_S2_virus.bam.cov.txt |
| rna_is_sars_cov2 | False |
| rna_mapping_bam_file_name | bam/ICU6G_S2_virus.bam |
| rna_mapping_service | s |
| rna_matached_reads_alignment | 0 |
| rna_high_quality_mapped | 2014 |
| rna_mapping_bucket_name | my-test-shenzhen |
| rna_mapping_no_unmapped | |
| rna_mapping_reference_file_name | ref/betacov-ncbi-test.fa |
| rna_matached_reads | 0 |
+---------------------------------+--------------------------------------------+
+---------------------------------+------------------------------------------+
ossutil ls oss://my-test-shenzhen/bam/ICU6G_S2_virus.bam
LastModifiedTime Size(B) StorageClass ETAG ObjectName
2020-03-04 17:47:38 +0800 CST 753458 Standard DF7B1A6CA5AF5DE6BF4FFDBB6DEF71C3 oss://my-test-shenzhen/bam/ICU6G_S2_virus.bam
2020-03-04 17:47:38 +0800 CST 1474 Standard 9D7968A779A0DE7C1993CC2A8D0E5A56 oss://my-test-shenzhen/bam/ICU6G_S2_virus.bam.cov.txt
2020-03-04 17:47:38 +0800 CST 397 Standard 81170E30BAAFEB947A2238E015171A51 oss://my-test-shenzhen/bam/ICU6G_S2_virus.bam.summary.json
Object Number is: 3
ossutil cp oss://my-test-shenzhen/bam/ICU6G_S2_virus.bam.summary.json .cat bam/ICU6G_S2_virus.bam.summary.json
{
"total_reads":10369818,
"total_bases":1456539874,
"pass_vendor_filter_reads":10369818,
"mapped_reads":6736,
"pair_reads":6680,
"properly_paired_reads":6520,
"mapq_40_to_inf_reads":2030,
"mapq_30_to_40_reads":0,
"mapq_20_to_30_reads":1,
"mapq_10_to_20_reads":3,
"mapq_0_to_10_reads":23,
"mapq_0_reads":10367761,
"GC":"46.499%",
"total_alignment":2057,
"supplementary_alignment":0
}%
ossutil cp oss://my-test-shenzhen/bam/ICU6G_S2_virus.bam .
samtools view bam/ICU6G_S2_virus.bam

Original Source:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

4.97K Followers

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com