GT-Scan2: Bringing Bioinformatics to Alibaba Cloud

CRISPR-Cas9 is a genome editing tool that is creating a buzz in the science world. It is faster, cheaper and more accurate than previous techniques for editing the genome of living cells. It hence has the potential to revolutionize a wide range of applications.

CRISPR-Cas9 has a lot of potential especially in the health space as it allows the treatment of medical conditions that have a genetic component, including cancer, hepatitis B or even high cholesterol. Clinical trials have already started for patients with specific blood and solid cancer types.

CRISPR-Cas9 is suitable for these applications because it can be programmed to recognize and edit specific locations in the genome by pattern-matching unique sequences of DNA. However, for robust application in the clinic, the efficiency of CRISPR-Cas9 needs to be increased as does the speed with which target sites can be designed.

Researchers in the eHealth program of the Commonwealth Scientific and Industrial Research Organization (CSIRO) in Australia, developed GT-Scan2, a novel software tool to address both issues.

GT-Scan2 can help researchers find the most effective CRISPR/Cas9 targets in a genomic region by ranking targets by the predicted cutting efficiency. You can think of it as the “search-engine for the genome”. GT-Scan2 will also report the number of potential off-targets for each target, where potential off-targets are other regions in the genome with 0–3 mismatches to the target.

  • Identifies optimal CRISPR-Cas9 targets in the human genome.
  • Combines information about the chromatin environment and sequence of the target site.

Architecture

When a user submits a job, GT-Scan2 inserts the job parameters as an item into a TableStore table via an API call. This allows the solution to be freely scalable without creating a bottleneck. The database entry triggers the first Function Compute function, which finds all putative CRISPR targets in the user-specified DNA sequence (fetched automatically upon user submission). Potential CRISPR target sites have fixed rules and can be easily found using a regular expression that completes in seconds and are inserted into a second TableStore table.

GT-Scan2 is served directly from OSS making it a static web app without server-side processing. It retrieves the dynamic content (such as job results and parameters) via API calls using API Gateway from a NoSQL database (TableStore) using a JavaScript framework.

Applying Serverless Computing

Alibaba Cloud Function Compute provides a framework to develop a future-ready software package that is able to support medical genome engineering applications. It has the ability to instantaneously scale at run time to the optimal capability by spawning the appropriate number of functions to cope with the varying complexity of different genes. Other benefits include only paying for the storage when no compute is triggered; jobs not competing with web server resources as the website is a static page with dynamic content being updated through Angular 2 and the API Gateway; as well as not needing to maintain compute instances (security patches of OS).

Improvements

  • Uses asynchronous invoke method instead of queue based triggers. This allows shorter invoke times and removes the dependency on message queue.
  • Applies Batch read/write when accessing data from the NoSQL database, making IO more efficient.
  • GT Scan deployment streams all logs to Alibaba Cloud Log Service, which allows easier troubleshooting of issues with the workflow operations. Access to logs in a single location allows user to pin point issues easily without having to spend time on logging into server or individual service consoles.

Automated Deployments

What’s Next?

Analytics

Log Analysis

Reference:

https://www.alibabacloud.com/blog/gt-scan2%3A-bringing-bioinformatics-to-alibaba-cloud_593841?spm=a2c41.11807779.0.0

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.