Data Analysis: FlashTex or RegEx

Before you proceed with your analysis, you need to clean your source data, even for the simplest text. This often includes searching and replacing keywords. For example, search the corpus for the keyword “Python,” or replace all “python” to “Python.”

The Smarter and Faster way of Data Cleansing — FlashText

As the name suggests, FlashText is one of the fastest ways to execute search and replace keywords. It is an open source python library on GitHub.

Why is FlashText so Fast?

To truly understand the reason behind FlashText’s speed, let us consider an example. Take a sentence that comprises three words “I like Python”. Assume that you have a corpus of four words {Python, Java, J2ee, and Ruby}.

When do You Need to Use FlashText?

In terms of search, if the number of keywords is greater than 500, FlashText will perform better than RegEx.

Related Blog Posts

How to Write a Headless Web Scraping Bot in Python

In this article, you will get some information on writing our own basic headless web scraping “bot” in Python with Beautiful Soup 4 on an Alibaba Cloud Elastic Compute Service (ECS) instance with CentOS 7.

Alibaba Cloud DevOps Cookbook Part 1 — CLI, SDK, SSH, SFTP

Now that I have a test website that is load balanced and has auto scaling, I would like to learn more about the Alibaba Cloud CLI and Python SDK. During development I often need to make changes to files that I publish on my ECS instances. Since the auto scaling group is built from an image, changing the image takes effort and time. During testing, I want to do rapid-fire edit / deploy / debug / improve. This means that I need a quick way to upload files to my ECS instances all at once.

Related Market Product

AISE TensorFlow 1.9 Python 3.6 CPU MKL Notebook

A pre-configured and fully integrated minimal runtime environment with TensorFlow, an open source software library for machine learning, Keras, an open source neural network library, Jupyter Notebook, a browser-based interactive notebook for programming, mathematics, and data science, and the Python programming language. The stack is built with the Intel MKL and MKL-DNN libraries and optimized for running on CPU.

Related Documentation

RDS Python SDK Developer Guide

This example shows how to use Alibaba Cloud python? The SDK calls the createdbinstance interface of the RDS to create an RDS instance.

Enable logging in Python SDK

OSS Python SDK provides a logging function to easily track problems. This function is disabled by default.

Related Products

Object Storage Service

Alibaba Cloud Object Storage Service (OSS) is an encrypted, secure, cost-effective, and easy-to-use object storage service that enables you to store, back up, and archive large amounts of data in the cloud, with a guaranteed reliability of 99.999999999%. RESTful APIs allow storage and access to OSS anywhere on the Internet. You can elastically scale the capacity and processing capability, and choose from a variety of storage types to optimize the storage cost.

Server Load Balancer

Alibaba Cloud Server Load Balancer (SLB) distributes traffic among multiple instances to improve the service capabilities of your applications. You can use SLB to prevent single point of failures (SPOFs) and improve the availability and the fault tolerance capability of your applications.

Related Course

How to Scale Python on Cloud

Analysts often use libraries, tools in the Python ecosystem to analyze data on their personal computer. They like these tools because they are efficient, intuitive, and widely trusted. However when they choose to apply their analyses to larger datasets they find that these tools were not designed to scale beyond a single machine. In this course we will introduce how Alibaba Cloud scales Python based on its offline data processing engine,taking advantage of the unlimited computing resource on cloud.

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website: