GT-Scan2: Bringing Bioinformatics to Alibaba Cloud

  • Identifies optimal CRISPR-Cas9 targets in the human genome.
  • Combines information about the chromatin environment and sequence of the target site.

Architecture

A Web Application front end is used to access the GT Scan2 application and to submit the relevant jobs.

Applying Serverless Computing

All potential targets need to be evaluated for their off-target risk using the efficient string matching tool, Bowtie. Though Bowtie only requires a reduced representation of the 3 billion letter genomic sequence, the size of these index files still reaches 915 MB for the human genome. Even though Alibaba Cloud Function Compute supports temp spaces of this size, the implementation divides the genome into smaller blocks to enable parallel processing. For an average run, GT-Scan2 hence triggers 200–500 individual Function Compute functions, which simultaneously update the scores for the different putative targets in TableStore. During this process, the frontend is polling this table via API Gateway and updating the webpage as results come in, eliminating the need for server-side compute.

Improvements

GT-Scan deployment benefitted from the Alibaba Cloud specific architectural patterns and services. Some of them are listed below.

  • Uses asynchronous invoke method instead of queue based triggers. This allows shorter invoke times and removes the dependency on message queue.
  • Applies Batch read/write when accessing data from the NoSQL database, making IO more efficient.
  • GT Scan deployment streams all logs to Alibaba Cloud Log Service, which allows easier troubleshooting of issues with the workflow operations. Access to logs in a single location allows user to pin point issues easily without having to spend time on logging into server or individual service consoles.

Automated Deployments

The open sourced Fun Tool (Fun with Serverless) will enable automated deployments of API Gateway and Function Compute resources making deployments of new GT Scan versions a breeze. The tool allows automated deployments of components defined in a simple YAML file.

What’s Next?

Analytics

Leverage Alibaba Cloud’s award winning big data platform to create a Machine Learning Pipeline will enable sophisticated analyses to be integrated in the application. This is of specific relevance for personalized health applications, which identify editing strategies for individual patients.

Log Analysis

Alibaba Cloud Log Service allows exporting log files for future analysis leveraging Alibaba Cloud’s big data platform of existing open sources analysis platforms available at CSIRO’s disposal. The log file exports can then be plugged into an existing machine learning pipeline to learn from the usage patterns of the GT-Scan application.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com