Conway’s Law — A Theoretical Basis for the Microservice Architecture
Introduction
Microservices architecture is a new concept that has become very popular and has become a hot topic of research recently. However, the implementation of microservices is still loosely defined and there is no theoretical proof of its effectiveness.
You may be surprised to know that the idea of microservices was first introduced in an article published over fifty years ago. Additionally, over the years, numerous studies have proven the accuracy of numerous points presented in that article.
One of the fundamental concepts introduced in the article is Conway’s Laws. Although initially intended to point out the flaws of distributed teams, many organizations have applied Conway’s Laws to create efficient microservices architecture.
This article explores the ideas of Conway’s Laws with reference to the article titled “Conway’s Law under Remote Distance — Team Construction in a Distributed World”, written by Mike Amundsen (author of Design RESTful API).
The most famous line in Conway’s Law is:
“Organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations.” — Melvin Conway (1967).
This means that organizations that design systems are constrained to producing designs that replicate the organizations’ communication structures. The following figure illustrates this concept.
The figure depicts the existing communication structure of the organizations, which coincides with their respective product development processes. Simply put, organization structure equals system design.
Here, systems mentioned by the author are not restricted to software systems. It is also speculated that the Harvard Business Review initially rejected this article. Therefore, Conway submitted it to a programming magazine, which led to the misconception of the article being about software development. In the beginning, the author did not propose his ideas as laws and only described his findings and conclusions. When the famous book The Mythical Man-Month introduced Brooks’ Law and cited some of Conway’s points, Conway’s ideas were popularized into the well-known Conway’s Law we know today.
Conway’s Law Demystified
In his articles, Mike Amundsen summarized some core viewpoints, as stated below.
- Law 1
- Communication dictates design
o The mode of organizational communication is expressed through system design
- Law 2
- There is never enough time to do something right, but there is always enough time to do it over
o A task can never be done perfectly, even with unlimited time, but there is always time to complete a task
- Law 3
- There is a homomorphism from the linear graph of a system to the linear graph of its design organization
o Homomorphism exists between linear systems and linear organizational structures
- Law 4
- The structures of large systems tend to disintegrate during development, qualitatively more so than small systems
o A large system organization is easier to decompose than a smaller one
Conway’s First Law
“Human beings are complex social animals.”
Other fields have also provided some illustrations on the tight relationship between organized communication and system design. For a complex system, design topics always involve communication between human beings. A good system design addresses issues about such communication. Many viewpoints in the classic era from 1975, The Mythical Man-Month, resonate with this idea.
The most memorable line from The Mythical Man-Month is:
“Adding manpower to a late software project makes it later” — Fred Brooks, (1975)
Increasing the number of programmers to keep up with a tight schedule is a common pitfall for many organizations. While it makes sense to increase the work force to increase output, it just does not apply to the world of software development.
Why is this the case? The Mythical Man-Month provides a simple answer: Communication cost increases exponentially as the number of personnel in a project or organization increases. The communication cost can be calculate with the formula n(n-1)/2, where the complexity of the project management algorithm is O(n²). The following example illustrates the idea of communication cost:
- For a project team with five members, the required number of communication channels is 5*(5–1)/2 = 10.
- For a project team with 15 members, the required number of communication channels is 15*(15–1)/2 = 105.
- For a project team with 50 members, the required number of communication channels is 50*(50–1)/2 = 1,225.
- For a project team with 150 members, the required number of communication channels is 150*(150–1)/2 = 11,175.
This is the main reason why internet startups have small teams. If a startup has too many employees, it will exhaust the investment from VC soon after the CEO introduces his/her idea to everyone involved.
Another interesting and relevant theory put forward by biologist Robin Dunbar in 1992 is called the “Dunbar Number”. At first, Dunbar found that the brain capacity of a primate correlates with the size of its population. He then postulated some estimates on the number of relationships that a human brain can maintain. For example, a typical person would have
- 5 intimate friends
- 15 trusted friends
- 35 close friends
- 150 casual friends
Aren’t they seemingly associated with the communication costs mentioned above? Yes, our brains limit us to maintain only that many relationships. (In a development team, the number may be even smaller).
Communication issues lead to system design issues that affect the development efficiency of the entire system as well as the final results of product development.
Conway’s Second Law
“Rome was not built in a day. Address the issues that can be addressed first.”
Erik Hollnagel, one of the titans in agile development, has explained some similar points in his book titled Efficiency-Thoroughness Trade-Off.
“Problem too complicated? Ignore details.
Not enough resources? Give up features.”
– Erik Hollnagel (2009)
The system’s complexity, the number of functions, market competition, and investor expectations are increasing, but human intelligence remains constant. No organization is certain whether it can find sufficient talents, regardless of the capabilities and funds. For an extremely complex system, there will always be something ignored by the operators. Erik believes that the best solution to this issue is to just “let it be.”
We often encounter such issues during daily development tasks. Are the requirements raised by product managers too complex? If so, ignore some minor requirements and focus on the major ones first. Do the product managers have too many requirements? If yes, give up some functions.
Reports indicate that Erik once received an invitation by an airline carrier to provide consulting services on a flight system’s stability and safety. Erik believes that it is possible to ensure safety by two means:
- To ensure ideal safety, people must detect and eliminate as many errors as possible.
- To ensure elastic safety, people must promptly handle errors that occur, for service recovery.
For a system as complex as the flight system, some vulnerabilities are likely to be overlooked, no matter how good the tester. Therefore, Erik recommended that the company to drop the idea of setting up a perfect system. Instead, he recommended relative safety and correctness, where the carrier carries out continuous flight tests to identify issues and ensure that the system can automatically recover in case of a fault. The following figure shows the different interpretations of safety.
Does this sound familiar? Doesn’t it mean continuous integration and agile development? Absolutely.
The above principle is the same as that applied to the resilience of distributed systems maintained by Internet companies. It is impossible to identify and fix all the bugs in a distributed system, even if unit tests cover the entire system. Distributed systems are prone to errors. The optimal solution is not to eliminate all the issues, but to tolerate them and implement automatic recovery in case of a failure. In a system comprised of microservices, each microservice may stop responding, which is completely normal. We only need to ensure enough redundancy and backup, which is also called resilience or high availability design.
Conway’s Third Law
“Create independent subsystems to reduce the communication cost.”
The diagram represents a specific application of the internal relationship between an organization and system design according to Conway’s first law. Simply put, set up a team suitable for the system that you want. If you have a front-end team, a Java back-end development team, a DBA team, and an O&M team, your system will look like the following:
Instead, if business boundaries create divisions in your system and all members turn their modules into small systems or products to address the same business goals, your larger system will look like a microservice architecture as shown in the following:
The idea of microservices among teams should be “inter-operate, not integrate.” Inter-operate means to define system boundaries and interfaces and offer a full stack to the entire team for complete autonomy. If the setup of a team follows this conjecture, it will generate intra-system communication costs, and subsystems will communicate more. Such arrangement results in less inter-system dependency and lower inter-system communication costs.
Conway’s Fourth Law
“Divide and conquer.”
As mentioned above, human beings are complex social animals and communication between people is very complicated. When it comes to a system, we often choose to add manpower to reduce its complexity. For our organization, how do we address such communication issues? Divide and conquer. Look at your company, isn’t it true that a line-1 manager in your company manages less than 15 people, a line-2 manager manages fewer people than a line-1 manager, a line-3 manager manages even fewer people than a line-2 manager, and so on? (I am not implying that it is more difficult to manage development managers than programmers.)
Therefore, a large organization usually has small team divisions to reduce the communication costs/ management issues. Here are some scenarios for you to consider.
- The idea to start a business is so great. Let us recruit more programmers. Anyway, the VC has offered us a large sum of money.
- There are too many people to manage. I need to find several managers to help and report to me.
Conway’s Law also tells us that we can see organizational communication modes from system design. Each manager is responsible for a certain duty on a small part of a large system. In this way, there are communication boundaries between them and the larger system. As such, the larger system incorporates smaller division teams in charge of the smaller systems (microservice serves well for this).
Conway’s Laws and Microservices
Let us have a look at how Conway’s Law provided the theoretical basis for microservices half a century ago.
- Communication between human beings is complex, and each person has a limited amount of energy for communication. Therefore, when an issue is complex and requires concerted redress, we need to divide our organization to improve communication efficiency.
- The system design in which the members of an organization work depends on the communication between the members. Managers can adjust the division mode to implement different ways of communication between teams, which will influence the system’s design.
- If a subsystem is communicational and has clear external communication boundaries, then we can effectively reduce the communication costs, and the corresponding design will be more appropriate and efficient.
- There is need to continuously optimize a complex system with the help of error tolerance and resilience. Do not expect big and all-embracing designs or architectures, as their development occurs in an iterative manner.
Here are some practical suggestions:
- Leverage all possible means to improve the communication efficiency, such as Slack, Github, and Wiki. Communicate with only the people involved. Each person and each system must have clear duties. You must know whom to turn to in case of an issue, to ensure accountability.
- Design a system in the MVP mode, verify and optimize the system in an iterative manner, and ensure that the system is elastic.
- Adopt a team that aligns with your system design and streamline the team if possible. A plausible recommendation is that whenever possible, set up teams by departments so that each team is autonomous and communicational. Clarify the departmental boundaries to reduce external communication costs. Each small team must be responsible for its module throughout the entire module life cycle. Prevent vague boundaries and shifting the responsibility. Set up the “inter-operate, not integrate” relationship between the teams.
- Develop small and efficient teams, as the costs increase and the efficiency decrease when the number of team members goes up. Jeff Bezos, CEO of Amazon, had a funny rule of thumb: if two pizzas are not enough for a team, the team is oversized. Typically, a small product team of an Internet company consists of 7 to 8 people. (These include people in charge of front-end and back-end tests, interactions, and user research. Some people may have multiple task assignments.)
When looking at the following microservice criteria, we can easily see the close relationship between microservices and Conway’s Law:
- Systems consisting of distributed services
- Organization division by business-line
- Development of excellent products, not projects
- Smart endpoints and dumb pipes (this refers to highly capable individuals and light communication efforts)
- Automatic O&M (DevOps)
- Error tolerance
- Rapid evolution
Conclusion
This article introduces Conway’s laws and explores whether they offer a theoretical explanation of the concept of microservices. It discusses the four laws in detail and the application of each law. The first law talks about the connection between communication and system design. The second law talks about efficiently completing tasks, with perfection not an attainable goal and hence should not be a reason for delayed task completion. Instead, people should focus on completing tasks on time, with regular improvements to follow. The third law talks about the homomorphism that exists between linear systems and linear organizational structures. Finally, the fourth law discusses the means with which people can utilize the “Divide and Conquer” approach to reduce the complexity and costs involved in communication within large enterprises.
References
- Conway’s Law under Remote Distance — Team Construction in a Distributed World (Images in this article are from the screenshots of the PPT)
- Conway’s Law in wiki
- Conway’s Law Homepage