Distributed set processing performance analysis with ICE 3.5.2pl1 at 20 nodes.

Demonstrating distributed set processing performance

Shard-Query + ICE scales very well up to at least 20 nodes

This post is a detailed performance analysis of what I’ve coined “distributed set processing”.

Please also read this post’s “sister post” which describes the distributed set processing technique.

Also, remember that Percona can help you get up and running using these tools in no time flat. We are the only ones with experience with them.

20 is the maximum number of nodes that I am allowed to create in EC2. I can test further on our 32 core system, but I wanted to do a real world “cloud” test to show that this works over the network, in a real world environment. There is a slight performance oddity at 16 nodes. I suspect the EC2 environment is the reason for this, but it requires further investigation.

Next I compared the performance of 20 m1.large machines cold, versus hot. I did not record the cold results on the c1.medium machines, so only the warm results are provided for reference. Remember that the raw input data set was 55GB before converting to a star schema (21GB) and being compressed by ICE to 2.5GB. Many of these queries examine the entire data set doing origin/count(distinct destination) combinations across two dimension (origin/dest), each with 400 unique items.

In the following chart you will see performance at a single node as the tall blue line, and the short cyan line is 20 nodes. In order to avoid too many bars on the chart, response times between 2 and 16 nodes (inclusive) are shown as lines.

Concurrency testing is important too. I tested a 20 node m1.large system at 1,2,4,8,16 and 32 threads of concurrency.

The following simple bash scripts were used for the concurrency test:

$ cat start_bench
#!/bin/bash
if [ "$1" = "" ]
then
  workers=1
else
  workers=$1
fi
if [ "$2" = "" ]
then
  iterations=3
else
  iterations=$2
fi
for i in `seq 1 $workers`
do
  echo "Launching benchmark worker #$i with $iterations iterations"
  ./bench_worker $iterations $workers &
done
wait

$ cat start_bench

#!/bin/bash

if [ "$1" = "" ]

then

workers=1

else

workers=$1

if [ "$2" = "" ]

then

iterations=3

else

iterations=$2

for i in `seq 1 $workers`

echo "Launching benchmark worker #$i with $iterations iterations"

./bench_worker $iterations $workers &

done

wait

$ cat bench_worker
#!/bin/bash
for i in `seq 1 $1`
do
        mkdir -p "results/$2/"
        ./run_query < queries.sql |egrep "rows|^-" > results/$2/raw.$$.$i.txt
done;

$ cat bench_worker

#!/bin/bash

for i in `seq 1 $1`

mkdir -p "results/$2/"

./run_query < queries.sql |egrep "rows|^-" > results/$2/raw.$$.$i.txt

done;

Query processing is handled by a Gearman queue which limits the maximum number of concurrent storage node queries. This prevents the system from being overloading and provides a scalable average response time under increased concurrency. Another queue in front of the storage nodes is probably advisable.

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Distributed set processing performance analysis with ICE 3.5.2pl1 at 20 nodes.

Demonstrating distributed set processing performance

Shard-Query + ICE scales very well up to at least 20 nodes

Related

Related Blog Articles

RECOMMENDED ARTICLES

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

Seamless Table Modifications: Leveraging pt-online-schema-change for Online Alterations

Securing Your MySQL Database: Essential Best Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Distributed set processing performance analysis with ICE 3.5.2pl1 at 20 nodes.

Demonstrating distributed set processing performance

Shard-Query + ICE scales very well up to at least 20 nodes

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

Seamless Table Modifications: Leveraging pt-online-schema-change for Online Alterations

Securing Your MySQL Database: Essential Best Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation