Economical Comparison of AWS CPUs for MySQL (ARM vs Intel vs AMD)

Mysql

It is always hard to select a CPU for your own purpose. You could waste hours reviewing different benchmarks, reviews, and bloggers, and in the end, we would limit all our requirements to performance and price. For performance measuring we already have some specific metrics (e.g. in MHz to some specific tool), however, for economic comparison, it is quite hard.  Mostly we are limited by our budget. Again, for our personal purposes, we are limited only with the cash in our pockets. It is easy to compare only two or three CPUs; it is required just to compare their price and performance and then create a simple bar plot and then check the results. However, what do you do if you have at least three types of CPU, a different number of CPUs cores on board, and seven different scenarios?  It was a challenge to do it for performance, and for economic efficiency, it has become a nightmare. For a one-time purchase, it should be easier than for the long-term and for rent (as we do for renting CPU on AWS).

Since October 2021, there have been three performance reviews for CPUs for MySQL (mostly it was comparing ARM with others):

Comparing Graviton (ARM) Performance to Intel and AMD for MySQL (Part 1)

Comparing Graviton (ARM) Performance to Intel and AMD for MySQL (Part 2)

Comparing Graviton (ARM) Performance to Intel and AMD for MySQL (Part 3)

I thought it was hard to visualize multiple scenarios for multiple processor types and multiple CPU amounts. The real challenge appeared when it was needed to compare the economical efficiency of these CPUs. There were four attempts to write this article. As a professional, at first, I wanted to show all numbers and details, because I didn’t want to be subjective, but rather allow readers to make decisions by themselves. In this case, that variant of those articles became so long and unreadable. So I’ve decided to simplify and present it without all the details (all details graphs and numbers readers could find on our GitHub arm_cpu_comparison_m5, csv_file_with_all_data_m5, arm_cpu_comparison_c5, csv_file_with_all_data_c5arm_cpu_comparison_m6, csv_file_with_all_data_m6).

The main goal of this post is to show a general picture of the economical efficiency of different CPUs for MySQL in comparison to each other. The main value of this post is to provide performance and economical comparison of different CPUs for MySQL (AWS only). It should help readers to see alternatives for their existing solution in performance and see if it is possible to save some money using a similar EC2 with a different CPU. Also, it will be useful for everyone for planning a migration or planning infrastructure for the long term.

This post contains a lot of technical information based on a large amount of data. It tries to show the efficiency of all instances from previous test runs. So it would be a comparison of m5.* (Intel), m5a.* (AMD). m6g.* (Graviton), c5.* (Intel), c5a.* (AMD), c5g.* (Graviton), m6i.* (Intel), and m6a.* (AMD) types of instances.

In general, there could be a lot of questions about methodology and comparison approach, and I would be glad to hear (read) other opinions and try to do it much better and more efficiently. 

The main idea was to find what EC2 (with what CPU type) is more economical to use and more effective from a performance point of view for MySQL For these purposes, we would show two main indicators: 

How many queries we could run during one hour (because all of us pay hourly for EC2 instances and because AWS shows hourly prices).
How many queries could be done for one US dollar. It was needed to create some universal economical value. So I thought that performance for one USD should be universal for economical comparison.

All other conclusions would be the results of these indicators. 

The next few points are required to describe the approach of testing and our test description

DISCLAIMERTests were run on the M5.*, M6I.*, C5.* (Intel), M5a.*, C5a.*, M6a.* (AMD),  M6g.*,C6g.* (Graviton) EC2 instances in the US-EAST-1 region. (List of EC2 see in the appendix). We select only the same class of instances without any additional upgrades like M5n (network) or M5d (with fast SSD). The main goal is to take the same instances with only differences in CPU types.  The main goal was to calculate price efficiency with only one variable CPU type.
Monitoring was done with Percona Monitoring and Management.
OS: Ubuntu 20.04 LTS
Load tool (sysbench) and target DB (MySQL) installed on the same EC2 instance. It was done to exclude any network impact. It could have minimal impact on performance results because all instances are in the same conditions, and results are comparable.
Oracle MySQL Community Server — 8.0.26-0 — installed from official packages (it was installed from Ubuntu repositories).
Load tool: sysbench —  1.0.18
innodb_buffer_pool_size=80% of available RAM
Test duration is five minutes for each thread and then 90 seconds warm down before the next iteration.
Each test was at least run three times (to smooth outliers / to have more reproducible results). Then results were averaged for graphs.
We are going to use a “high-concurrency” scenario definition for scenarios when the number of threads would be bigger than the number of vCPU. And “low-concurrent” scenario definition with scenarios where the number of threads would be less or equal to a number of vCPU on EC2.
We were comparing MySQL behavior on the same class of EC2, not CPU performance. This time we just want to know how many queries could be done for one US dollar and during one payment hour.
The post is not sponsored by any external company. It was produced using only Percona resources. We do not control what AWS uses as CPU in their instances, we only operate with what they offer.
Some graphs are simplified, to make them easy to understand. There are too many conditions to visualize in one graph. Each one that is simplified is commented on directly.

 

TEST case description

Prerequisite:
To use only CPU (without disk and network) we decided to use only read queries from memory. To do this we did the following actions.

Create DB with 10 tables with 10 000 000 rows each table using sysbench tool%
sysbench oltp_read_only –threads=10 –mysql-user=sbtest –mysql-password=sbtest –table-size=10000000 –tables=10 –db-driver=mysql –mysql-db=sbtest prepare
Load all data to LOAD_buffer with reading queries , using sysbench:
sysbench oltp_read_only –time=300 –threads=10 –table-size=1000000 –mysql-user=sbtest –mysql-password=sbtest –db-driver=mysql –mysql-db=sbtest run

Test:
Run in a loop for same scenario but with different concurrency THREADs (2,4,8,16,32,64,128) on each EC2 (again using sysbench tool)
sysbench oltp_read_only –time=300 –threads=${THREAD} –table-size=100000 –mysql-user=sbtest –mysql-password=sbtest –db-driver=mysql –mysql-db=sbtest run

 

Result Overview

It was decided to visualize results in some specific lollipops graphs. These graphs will show at least both variables – performance per hour and performance per dollar.

Also, there are simple point plots with several dimensions. It will show not only point values but also the type of CPU (by color) and the number of vCPU (by shape).

All CPU colors will be the same throughout this article. 

Graviton – orange
Intel – blue
AMD – red

To simplify visualization it was decided to leave only results when the load (in the number of active threads) had been equal to the number of vCPU on an EC2 instance. In most cases, it shows the best results, due to minimal 95th percentile latency. However, there were a few exceptions, which we will talk about later.

Request Per Hour vs. Price For Equal Load

First, let’s review simple dependency price and performance at plot 1.1.

Plot 1.1. Number of requests per hour compared to EC2 instance price

Plot 1.2. Number of requests per hour compared to EC2 instance price with EC2 labels

Plot 1.2. illustrates the same information as plot 1.1. with EC2 labels.

Request Per Dollar vs. Price

Plot 2.1. Number of requests per one USD comparing with instance price for equal load

Plot 2.1. Showed an approximate number of transactions that could be generated for one USD.  it shows much more interesting pictures than plot 1.1 Data shows how many queries MySQL could execute for one USD. Looks like the best economic performance shows 16 cores EC2, next goes 32 Intel and AMD, then went eight cores AMD. What is interesting here is that for one USD, two cores AMD could execute a little bit more queries than 64 cores Intels EC2 per one USD. It is definitely that 64 cores Intel could execute more queries for an hour, but not always it is required to do it in one hour.

Plot 2.2. Number of requests per one USD comparing with instance price for equal load with EC2 instance labels

RATING

Plot 3.1. All EC2 sorted by approximate amount of transactions they could generate during one hour.

Plot 3.1. Illustrates two variables. The number of transactions each EC2 could generate for one hour (purple circle) and the number of executed transactions it could execute for one USD. This rating is sorted by performance. And there is no surprise more virtual cores exist in EC2 more transactions it could generate. On the top is the latest m6i.16xlarge (Intel) and m6a.16xlarge (AMD). It is the latest “general-purpose” instance. What is interesting here is that on the third and fourth place it is seen the same Graviton vCPU but in two different instance types c5g.16xlarge (third place) and m6g.16xlarge. Looks like a “compute-optimized” instance really has some internal optimization, because on average it showed better performance than a general-purpose Graviton instance.

Plot 3.2. All EC2 sorted by approximate amount of transaction they could generate for one USD

Plot 3.2. Illustrates two variables. The number of transactions each EC2 could generate for one hour (purple circle) and the number of executed transactions it could execute for one USD. This rating is sorted by economic efficiency. And here we got a surprise: it appears that the best economic efficiency has EC2 instances with 16 and 32 vCPU on board. On the top there is m6a.4xlarge (AMD, 16 vCPU) which is a little more efficient than m6i.4xlarge (Intel, 16 vCPU), however, Intel, was a little bit performance efficient. In third place, m6i.8xlarge (Intel with 32 vCPU) was a little less economically efficient, than EC2 from second place. And Graviton is only in fourth place. However, these results are valid only, when the load was equal to the number of vCPU. Is important because performance Intel and AMD vs Graviton have absolutely different results. In most cases, Intel and AMD had maximum performance when the load was equal (additional visualization would be provided next on plot 5.1. and plot 5.2.).

How I Would Select a CPU For The Next Project

The next words can’t be an official recommendation; just the option of the person to be stuck in performance data of test results and spend a few months here.  

To select some vCPU for MySQL I would be oriented on my previous research. At first, I would focus on the load. How many transactions per second (or per hour) my DB should handle. And after that, I would select the cheapest EC2 instance for that load.

For example, my DB should handle 500 Million transactions per hour. In this case, I would build some graphs with the cheapest instances from different CPU developers. And then just select the cheapest one.

Plot 4.1. Cheapest EC2 instances that can handle 500 million transactions per hour

Plot 4.1. Showed the cheapest instance that could handle 500 million transactions per hour. These results could be reached by overloading the system eight times. This load could handle EC2 with 16 vCPU and they easily could handle this load even if there would be a load with 128 active threads. It is talking only about reading translations right now. And we’re talking about an hour because most of us are oriented on hourly price on AWS, so it should be oriented on hourly load, even if it is not constant value during the hour. 

However, let’s review the same example for load in transaction per second. The approach would be the same. Take your load and find the cheapest instance that could handle your load.  For example, let’s take a load of 10,000 transactions per second (a kindly reminder that we are talking about read transactions). 

Plot 4.2. Cheapest EC2 instances that can handle 10,000 transactions per second

Plot 4.2. Showed that 10k transactions per second could be handled by two vCPU compute-optimized instances – c5.large(Intel), c5a.large(AMD), c6g.large(Graviton). Again Graviton became cheaper. It is cheaper than Intel by 20 percent and 9 percent compared with AMD. 

The short table you could find in the appendix, full one on Github (with all scenarios).

But if someone doesn’t want to build the graph or analyze the table,  I’ve built graph (almost heatmap) plot 4.3 (transactions per hour) and plot 4.4 (transaction per second).

The next plots show the cheapest EC2 instances that could handle some load (on the y-axis) for some specific class of these instances depending on the number of vCPU (on the x-axis). By color, it is easy to identify the type of CPU, but it was labeled the cheapest EC2 instance in the cell. Of course, some other instance could also handle that load but in the cell label of the cheapest one. 

Short summary regarding plot 4.3. and plot 4.4. Graviton is the cheapest solution in most cases, however, it can’t handle the maximal load that Intel or AMD can.

Plot 4.3. The cheapest EC2 instances for a particular load in transaction per hour depends on the number of vCPU

Plot 4.4. The cheapest EC2 instances for a particular load in transaction per second depends on the number of vCPU

In case someone wants to identify the cheapest instance for some particular load and doesn’t care about vCPU onboard – welcome to plot 4.4.1., which shows the cheapest EC2 for some load when the load in an active thread was equal to the number of vCPU on board.

Plot 4.4.1. Cheapest EC2 instance for required load with a load that was equal to the number of vCPU on an instance

Plot 4.4.2. Cheapest EC2 instance for required load with a load that was maximal during research

There are not a lot of differences between plot 4.4.1 and plot 4.4.2. However, sometimes Intel overran AMD. But in the overall picture, Graviton is still cheaper in most cases.

Important Exceptions

Next, there were a few exceptions that are required to talk about. Plots and scenarios were taken from its particular research, so it could be that they are not equal to the picture above. All details will be provided.

Plot 5.1. Graviton behavior on higher load

Plot 5.1. Illustrates that Graviton (m6g.16xlarge with 64 vCPU) showed better performance on higher loads. Previously all results were shown when loads were equal to the amount of vCPU. However, most CPUs did not show impressive performance with loads bigger than the number of vCPU. On the other hand, Graviton (most of the time) showed better performance than on equal load. An example of it you could see on the first two lines on plot 5.1. This is a very interesting feature of Gravitons, and this feature is reproducible. On plot 5.1.1. it is seen that Graviton on EC2 with 16, 32, 64 vCPU on board produces more transactions on double load than on equal load. In percentages, it is an additional boost of 10 percent when we overload Graviton EC2 compared with other CPUs, and their result could be a statistical error.

Plot 5.1.1 Performance comparison of  high-performance EC2 instances with an equal and double load

Plot 5.1.2 Advantage of high concurrency instances with double load over equal load in percents

Plot 5.2. Economical efficiency of 8 and 16 cores EC2

The next interesting exception is shown in plot 5.2. In the case of different loads (not only when load max or equal to the number of vCPU), Graviton also showed the best economic efficiency compared with all other vCPUs. On plot 5.2. I left only the results with maximal load and we could see that Graviton had the best economical potential.  What is more interesting is that all EC2 with 8 and 16 vCPU on board were on the top of this rating. Looks like it is more economically effective to use 8 or 16 core instances than others. If the load near 200k per second (read transaction) is fine for more than 16 cores, EC2 instances are the best economic value for you (look at plot 4.4.2).

Plot 5.4. Economical efficiency of 12 core Intel vs 16 core Graviton and AMD

Sometimes some particular load could handle instances with fewer vCPU, but even in this case, it could be more expensive than using EC2 with more vCPU. Plot 5.4. showed this example. It was the maximum load that all three CPUs could handle over 2.1 billion transactions. EC2 instances that could handle it are c6g.16xlarge (Graviton with 64 vCPU), m6i.12xlarge (Intel with 48 vCPU), and m6a.16xlarge (AMD with 64 vCPU). Here EC2 with AMD appeared to be more expensive than its competitors. Next is Intel with less vCPU onboard and cheaper price, 48 vCPU compared to AMDs 64. However, Gravitons EC2 with 64 cores on board could handle the same load while cheaper than Intel with less vCPU. It could be done few conclusions: 

The number of CPUs does not always correlate with higher performance
It is possible to find a better price and better conditions  

Final Thoughts

I’ve spent a few weeks preparing a script to run benchmark tests. It took a week to run and re-run all the benchmark tests. And it took months to write this article. Multiple attempts to describe everything lead me to this show article with a lot of limitations and ranges. It is a really difficult thing to speak about difficult things in easy matters. It is easy to compare one dimension, but it is harder to compare multiple dimensions like performance for different CPU types which depend on different numbers of vCPU and in different test cases. But even this is easier because the previous time we compared performance to performance. This time it was required to compare multi-dimensional performance to economic efficiency and prices. It is like comparing the calories of different fruits with their prices and identifying the best one, without thinking about personal tastes. 

This task became quite difficult for me personally. However, I think this is a good point to start a discussion about it. I’ve started thinking about its unique comparable measurement like the number of transactions for one USD. Based on this measurement, EC2 instances with Graviton CPU become most effective in most cases. It didn’t show equal performance measurements like the latest Intel and AMD, but if it joins the economy and performance it is definitely a good choice to try it in future DB projects.

PS: On our GitHub — there are scripts to reproduce this research and more interesting graphs, that couldn’t be inserted here.

APPENDIX

Simplified table with results and list of EC2 that were used in research

 

VM_type
Number_of_threads
cpu_amount
avg_qps
price_usd
cpu_type
c5a.large
2
2
19287
0.077
AMD
m5a.large
2
2
12581
0.086
AMD
m6a.large
2
2
23280
0.0864
AMD
c5a.xlarge
4
4
29305
0.154
AMD
m5a.xlarge
4
4
21315
0.172
AMD
m6a.xlarge
4
4
37681
0.1728
AMD
c5a.2xlarge
8
8
81575
0.308
AMD
m5a.2xlarge
8
8
58396
0.344
AMD
m6a.2xlarge
8
8
98622
0.3456
AMD
c5a.4xlarge
16
16
158539
0.616
AMD
m5a.4xlarge
16
16
113172
0.688
AMD
m6a.4xlarge
16
16
211681
0.6912
AMD
m5a.8xlarge
32
32
189879
1.376
AMD
m6a.8xlarge
32
32
376935
1.3824
AMD
c5a.16xlarge
64
64
482989
2.464
AMD
m5a.16xlarge
64
64
312920
2.752
AMD
m6a.16xlarge
64
64
612503
2.7648
AMD
c6g.large
2
2
17523
0.068
Graviton
m6g.large
2
2
17782
0.077
Graviton
c6g.xlarge
4
4
30836
0.136
Graviton
m6g.xlarge
4
4
31415
0.154
Graviton
c6g.2xlarge
8
8
61517
0.272
Graviton
m6g.2xlarge
8
8
65521
0.308
Graviton
c6g.4xlarge
16
16
156914
0.544
Graviton
m6g.4xlarge
16
16
155558
0.616
Graviton
m6g.8xlarge
32
32
298258
1.232
Graviton
c6g.16xlarge
64
64
542983
2.176
Graviton
m6g.16xlarge
64
64
534836
2.464
Graviton
c5.large
2
2
19751
0.085
Intel
m5.large
2
2
17836
0.096
Intel
m6i.large
2
2
23012
0.096
Intel
c5.xlarge
4
4
33891
0.17
Intel
m5.xlarge
4
4
33937
0.192
Intel
m6i.xlarge
4
4
40156
0.192
Intel
c5.2xlarge
8
8
81039
0.34
Intel
m5.2xlarge
8
8
68327
0.384
Intel
m6i.2xlarge
8
8
86793
0.384
Intel
c5.4xlarge
16
16
178295
0.68
Intel
m5.4xlarge
16
16
162387
0.768
Intel
m6i.4xlarge
16
16
225371
0.768
Intel
m5.8xlarge
32
32
313932
1.536
Intel
m6i.8xlarge
32
32
443327
1.536
Intel
m5.16xlarge
64
64
483716
3.072
Intel
m6i.16xlarge
64
64
803180
3.072
Intel

 

My.cnf

My.cnf

 

[mysqld]

ssl=0

performance_schema=OFF

skip_log_bin

server_id = 7

 

# general

table_open_cache = 200000

table_open_cache_instances=64

back_log=3500

max_connections=4000

 join_buffer_size=256K

 sort_buffer_size=256K

 

# files

innodb_file_per_table

innodb_log_file_size=2G

innodb_log_files_in_group=2

innodb_open_files=4000

 

# buffers

innodb_buffer_pool_size=${80%_OF_RAM}

innodb_buffer_pool_instances=8

innodb_page_cleaners=8

innodb_log_buffer_size=64M

 

default_storage_engine=InnoDB

innodb_flush_log_at_trx_commit  = 1

innodb_doublewrite= 1

innodb_flush_method= O_DIRECT

innodb_file_per_table= 1

innodb_io_capacity=2000

innodb_io_capacity_max=4000

innodb_flush_neighbors=0

max_prepared_stmt_count=1000000 

bind_address = 0.0.0.0

[client]