Local vs. Cloud: Which Is Better for Big Data?
By Zhao Yuying
When should an enterprise decide to migrate to the cloud? In the past, the answer to this question may have been that when an enterprise wants an alternative to purchasing new hardware to meet new business needs (mainly from the perspective of cost) or it needs elastic computing and finds that a cloud platform is a good solution for scaling IT resources during business peaks.
Unlike the “Replacement” of the existing technical boundaries, today a new answer to this question is when an enterprise needs to “extend” its technical boundary. An enterprise may consider migrating to the cloud when it requires a specific capability (like AI or big data) but it does not have adequate technology support or its core competitiveness is not technology. This is actually a major reason why many enterprises choose cloud platforms. By choosing a cloud platform, an enterprise can extend its technical boundary and provide technical support for its business boundary extension.
Over the past few years, big data services on the cloud have become more mature. For the big data service alone, mainstream cloud service providers can provide dozens of services. Local big data services seem to have a decreasing share of the big data market. This is especially true after the merge of Cloudera and Hortonworks. Some analysts say the integration of streaming technologies like Hadoop and Spark/Flink in cloud platforms has made basic products of Cloudera and Hortonworks fall behind. DataBricks, a commercial company founded by the creators of Spark, adopts a software development philosophy different from that of Cloudera and Hortonworks and sticks to the on-the-cloud subscription, making it one of the leading companies in the industry. Does this mean that local big data services will disappear shortly? Which is the final trend of big data services on the cloud platform, multiple clouds, hybrid cloud or a single public cloud? If the cluster scale increases, will the cost of migrating to the cloud be too high?
InfoQ will conduct a series of discussions on the aforementioned questions and interview several technical experts in the cloud computing and big data fields to find answers to these questions. Guest of this interview: Guan Tao, head of the Alibaba Cloud intelligent and universal computing platform
Big Data Services on the Cloud Platform vs. Local Big Data Services
Before comparing big data services on the cloud platform with local big data services, let’s discuss one basic question first: Are enterprises of different scales and with different technology levels all suitable for migrating to the cloud? This is a question that enterprises must carefully consider before they decide to migrate to the cloud. We often hear this: In the future, cloud computing will be like current common utility infrastructures such as water, gas, and electrical grids. Although this argument remains to be verified, we do need to speed up migrating to the cloud in China. If cloud computing is to become an essential infrastructure, it must match the requirements of enterprises of different scales and with different technology levels.
Internet start-ups are usually characterized by the uncertainty of their business models, scales, and computing scenarios, small data volume, insufficient capital, and lack of their own data technologies. According to Guan Tao, cloud computing services can help start-ups lower the threshold of their access to big data services and allow them to focus their limited manpower and material resources on the business layer so that they can quickly build business services and implement elastic scaling to deal with uncertain future developments. The key requirement for this type of customers is flexibility and comprehensiveness.
Medium and large Internet enterprises usually have their own clusters and data, have relatively stable business or mature data teams and require SLA. Their technical capabilities may meet their business needs. Migrating to the cloud can reduce or eliminate maintenance, ensure SLA and improve security while providing lower-cost services related to performance, elasticity, and other aspects. The key requirement for this type of customers is: stability and cost.
Traditional customers usually have large data centers. They are usually more cautious about migrating to the cloud and will consider lots of factors or need a set of solutions before the migration. The cost, stability, and security of cloud computing are the main concerns of these enterprises. The key requirement for this type of customers is: solution offering.
Cloud computing itself is a field that requires significant investment. Despite the large number of cloud computing companies, only a few companies with huge capital and advanced technologies stand out. Compared with local big data services, the performance, stability, cost, and security of big data services on the cloud platform have always been the focus of the discussion. Guan Tao says that cloud service providers need to spend tens or even hundreds of billions of RMB on site selection and infrastructure construction of data centers (for example, electricity and capacity need to be taken into account), hardware construction, network bandwidth (for example, independent dual-link network), storage, CDN distribution, and security hardening. Medium and small enterprises obviously cannot afford this scale of capital and technology investment if they set up their own services.
It is easy for medium and small enterprises to migrate to the cloud. They have a small number of data assets and can benefit from the technology advantages provided by cloud service providers. Additionally, the overall cost is very low because medium and small enterprises have relatively small business volume. However, many people also think that the cost of using cloud platforms will become very high when the cluster size reaches a certain level. Although cloud computing services are billed on the Pay-As-You-Go basis, the cost of starting hardware in small data centers is very low. However, if enterprises set up the services themselves, they need to consider the labor cost, which is usually ignored.
From the software perspective, leading cloud service providers are investing heavily in technology. This investment is not limited to capital investment in technical development and research. Most services provided by Alibaba Cloud also need to prove they are stable by running successfully for a long time before they are made available to users. However, ordinary companies, especially medium and small enterprises, cannot afford this scale of investment when they migrating the infrastructure to the cloud. Instead of the infrastructure, differentiated advantages in the business layer are very crucial for medium and small enterprises.
If enterprises have some offline clusters, migrating to the cloud requires a relatively large workload and cost. The migration includes tasks like interconnecting networks and migrating data, jobs, and applications. The larger the offline scale, the larger the workload. This is a main obstacle to migrating enterprises to the cloud. According to Guan Tao, this migration workload is done step-by-step. Enterprises can directly or indirectly benefit from the cloud technology development. Additionally, cloud service providers provide many migration techniques, for example, various data transfer and migration tools, and data uploading and hybrid cloud technologies based on the private line.
Security is the first and the biggest cloud computing concern of enterprises. Although migrating to cloud cannot eliminate security risks, cloud platforms provide much higher security than self-built data centers. If enterprises build data centers themselves and deploy a certain version of Linux, some patches may not be installed. This also poses security risks. Guan Tao says that Alibaba Cloud has made huge security efforts and investment (including kernel vulnerability repair, Anti-DDoS, proactive vulnerability scan, permission management, and privacy protection). Alibaba Cloud provides much higher security than self-built data centers do.
Big Data Services from Cloud Providers vs. Self-Built Big Data Services on the Cloud
Some Internet enterprises want to migrate to the cloud but do not know whether to choose big data services from cloud providers or build their own big data services based on cloud platforms. These enterprises usually have accumulated some technology capabilities in addition to their core business. Guan Tao indicates that most Internet companies in the United States have abandoned self-built data centers and applied cloud technologies on a large scale. For example, Netflix has migrated all its business to the public cloud. Cloudera and Hortonworks, two companies specialized in public cloud and IaaS, have merged.
Judging from the development trend, this is a process during which cloud services gradually become mature and customer recognition gradually evolves, just like every new technology needs to be well proven in the market for a long time. Guan Tao believes that the customer recognition has the following evolution pattern: from doubts (related to security, stability, and other aspects), gradual attempt to migrate to the cloud, and then to large-scale dependency on the cloud; from data centers and managed hosts, to dependency on IaaS, then to the wide application of serverless computing, PaaS, and SaaS; from private cloud deployment to hybrid cloud, and then to public cloud.
In addition, with the development and maturity of big data and AI technologies, the competitive advantage of cloud providers has changed from “capable” to “fast and good”. The scaling advantage of cloud computing providers allows creation of a higher competitive threshold. However, most self-built data services cannot achieve this.
Hybrid Cloud and Multi-Cloud Deployments
Hybrid cloud and multi-cloud deployment are only an intermediate process, and a single cloud platform may be the mainstream trend. After enterprises decide to choose big data services from cloud providers, a new problem occurs: How can enterprises choose a proper cloud platform that suits their business development requirements from a variety of big data services on the cloud? Should enterprises choose hybrid cloud, multiple clouds or a single public cloud platform?
Last year, AWS, which had not previously shown its interest in hybrid cloud, launched many hybrid cloud services. This causes many enterprises to speculate that the future of cloud computing will be hybrid cloud or multiple clouds. In this regard, Guan Tao believes that hybrid cloud and multi-cloud deployment are only an intermediate process and that the final trend is a single public cloud platform. He adds that both hybrid cloud and multi-cloud deployment involve cross-cloud management. Users need to have one or more cloud systems and coordinate data and services across systems, making hybrid cloud and multi-cloud deployment more complex than local platforms or a single cloud platform.
Currently, local deployment, hybrid cloud (intermediate process), and cloud platform deployment are all possible solutions based on different user requirements. In the long term, with the further development of cloud platforms and the evolution of customer recognition, a single platform may be the mainstream from the perspective of cost and efficiency.
For public platforms, the main concern of enterprises is being bound to a single provider. Once bound, it is difficult to migrate data and business and choices for future development may be limited. If the cloud platform fails, enterprises need to consider whether it supports heterogeneous disaster tolerance and whether it will have irreversible impact on the business.
Guan Tao believes that the hierarchical decoupling of cloud services is becoming clearer and clearer and that basic service interfaces will become more standardized in the future, such as containerization and K8S. These standardized services will greatly mitigate users’ concern about binding to a single cloud platform. In addition, mainstream cloud providers are already providing a certain degree of the heterogeneous disaster tolerance capability. For example, the 3AZ solution from Alibaba Cloud can ensure reliability across data centers and technically meet heterogeneous disaster tolerance requirements. If users need ultimate disaster tolerance, they may choose hybrid cloud or multiple cloud providers. This requires an additional layer of data management and business synchronization logic on multi-cloud platforms, increasing the architecture technology complexity and the cost. This finally depends on user needs. Currently, this option is rare. For example, in the financial database field, enterprises rarely choose two database solutions at the same time.
In the long term, Guan Tao believes that the user-created local big data services will gradually disappear. Judging from the development process of cloud computing outside China, many large enterprises choose to migrate to the cloud perhaps because they attach more importance to upper-layer capabilities like the power of big data and AI than IaaS capabilities. This indicates that users may need both IaaS capabilities and upper-layer capabilities.
Accordingly, Alibaba Cloud will improve its big data services in the following aspects:
- With the explosive data growth and the wide application of computing, the performance, cost, scalability, and stability are still the technical keys of big data engines.
- Improve the ability to process non-text data, for example, recognize and process new data formats such as audio, video, and image formats in various scenarios (like short video recommendation).
- Improve the ability to process non-relational data, such as graph computing and graph embedding.
- Further develop AI for BigData, for example, AI-based intelligent data management, intelligent modeling, and data optimization in the case of large amounts of data.
What is your opinion about the relationship between big data services on cloud platforms and local big data services? How does your enterprise decide?
This article is reposted from InfoQ (link to the original blog in Chinese)
Learn more about Alibaba MaxCompute at https://www.alibabacloud.com/product/maxcompute