Ideas and Methods for System Refactoring

Image for post
Image for post

Recently, I participated in many refactoring projects. Some projects aimed to reconstruct gateways, AMAPS, and other services to improve server resource utilization. Other projects aimed to split shared business into services to improve architecture rationality and R&D efficiency. In this article, I would like to share some lessons drawn from my experience. I will be focusing on some misunderstandings and frequently overlooked aspects in refactoring based on my own experience.

In Chinese philosophy, Dao (道) and Shu (术) are two important concepts. Roughly translated, Dao means “the way” or “direction,” and Shu means “technique” or “method.” I believe these are central concepts of programming as well. When we have a general direction or idea, we need a technique for putting it into practice. If we do not even have an idea of what we’re doing, then all kinds of techniques and methods are useless.

In this article, I will be discussing some of the basic refactoring direction and principles and the application of common refactoring solutions.

Direction of System Refactoring

Image for post
Image for post

Analyze Problems Practically

Then, before performing any refactoring, we should first clarify the problems we want to solve. Do we want to improve performance and security or to achieve quick and continuous integration and release?

Set Clear Goals

The specific, attainable, and time-bound principles are easy enough to understand. However, I want to describe the measurable principle in detail. Can we use the preceding issues as goals of a refactoring project? The answer is no because the indicators are not measurable. For example, in a refactoring project aiming to improve performance, measurable performance indicators include the reduction in the service response time, the increase in the queries per second of a single kernel, and the reduction in the number of service resources required.

Then, how can we turn non-quantitative goals into measurable indicators? Take a refactoring project aiming to improve service availability as an example. It is hard to quantify this goal. As such, we can use specific events as indicators, for example, on-premises fault recovery that is imperceptible to users or automatic downgrade and recovery of underlying faults. We often use many 9’s in the row to evaluate system availability, but this indicator can only be calculated after a period of time, so it is not suitable for a short- or medium-term project goal.

Alibaba engineers often mention starting points. Specific and measurable goals are the starting points of refactoring projects.

Appropriate Design

In addition to avoiding insufficient or excessive design, we must also take cost-effectiveness into consideration. Some designs may solve certain problems, but the solution might be overly complex and costly. So, we need to determine whether or to apply the solution based on our return on investment.

There are no shortcuts to overcoming insufficient design. The only solution is continuous learning and experience. To combat excessive design, we need to think about the necessity and cost-effectiveness of the design when preparing a design solution.

Layered Design

Take the refactoring of an ordering service as an example. The number of orders may be small, but many different types of service are involved, including hotel, admission ticket, and train ticket services. In the final design scheme, the system is divided into four modules based on the order processing procedure: order module, CP order synchronization module, order processing module, and statistics module. You may wonder whether it is appropriate to split the system into multiple modules even if the order volume is small. Besides design-related factors, another important consideration is the ability to launch the system and verify the results phase by phase. This is the only way to ensure risks can be controlled. Here, we assume the system is not vertically partitioned by service type.

Therefore, as far as possible, we need to make an iteration plan for system refactoring from the very beginning.

Methods of System Refactoring

Image for post
Image for post

Service-Based Design

Service-based Goals

  • Requirement level: Fast iteration is supported.
  • Development level: Code decoupling and independent development are required to reduce the maintenance costs.
  • O&M level: Independent deployment, scale-out, and downgrade control are supported.

The preceding are the values of service-based design. Interestingly, these are also problems caused by poor service-based design. For example, people often complain that service-based design is more difficult than the initial development and launch work. Therefore, the first step in service-based design is to clarify its goals. If the goals are not achieved or negative results are generated, so we need to consider whether the designs are reasonable.

Emphasis on Individual Services and Limited Communication

Granularity Selection

High Fault Tolerance

Cache Design

Caching is Not a Silver Bullet

Whether you are designing a new system or refactoring an old system, do not rely on data caching as the first choice for solving performance problems. Otherwise, you will not be able to see the problems that exist on other levels. When I worked in an enterprise-oriented software company, I did basic service design with the company’s chief architect. He did not allow me to consider any cache design during initial system design. This left a deep impression. Adding caches to improve performance is a tactical method. However, strategically, we need to perform a comprehensive evaluation from multiple perspectives to systematically solve the problems.

Avalanche and Penetration

Avalanche and penetration must be considered when data caching is introduced. First, we need to consider the downgrade policies and specific downgrade plans at the service level. At the technical level, we need to take some details into consideration, such as cache availability, persistence of cached data, necessity of a cache push mechanism, and the discrete design of the expiration time.

Internal System Optimization

Asynchronous Transformation

Database or Table Partitioning

If you are developing a new system, we do not recommend that you perform database or table partitioning at the initial stage of development, unless the service depends on a large amount of data, because database and table partitioning complicate system design and development to a certain extent. If database or table partitioning is performed at the very beginning and the service developers have little experience, this can turn simple challenges into complex problems.

For example, using different mechanisms can generate primary keys. The commonly used and mostly discussed mechanisms are Snowflake, which has system time requirements, and TDDL generation policies, which can fulfill core requirements concerning performance, global uniqueness, and single-database incremental scale-out.

After database or table partitioning is performed, cross-database and cross-table queries must be considered and avoided as far as possible. However, for an ordering service, we need to split the data for users and the seller along different dimensions. For example, we can use the user and the seller as the shard keys of the primary database and secondary database respectively. Complex multi-condition queries in the operation management background will run slowly in single databases even if database or table partitioning is performed. Elasticsearch can be used for non-massive data query and Elasticsearch+HBase can be used for massive data query.

System Evaluation

During system evaluation, we need to evaluate the system based on the degree to which the system satisfies the preceding six indicators, very unsatisfactory, unsatisfactory, and satisfactory, and actual service conditions to analyze the system shortcomings. In addition, it is important to note that these dimensions are not mutually independent. When the system is refactored based on one dimension, the impact on other dimensions must be considered.

Image for post
Image for post

It is not easy to refactor a system, but the difficulty does not lie in solving specific problems. It is a systematic project. The biggest challenge is how to select the optimal solution after considering a wide range of solutions and specific factors.

Are you eager to know the latest tech trends in Alibaba Cloud? Hear it from our top experts in our newly launched series, Tech Show!

Original Source:

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store