April 19, 2014

T2000 CPU Performance – Watch out

Sun is aggressively pushing T2000 as Scalable MySQL Platforms, and indeed it is Scalable in terms of high concurrency workloads – it is able to execute a lot of concurrent threads and so speed gain from 1 thread to say 32 thread will be significant.

But thing a lot of people miss is – Being Scalable is Not Enough – you need to scale from reasonable base to claim the good performance, and this is where T2000 performs subpar in many cases.

I often hear about people complaining queries take much longer on T2000 compared to recent Intel or AMD CPUs when there is no concurrent load – It is reported T2000 can be as much as 5-15 times slower in this case depending on the workload.

Here is example run of purely CPU consuming “Benchmark” function for 2.6Ghz Intel Xeon vs T2000:

As you can see this is hell a lot of difference !

Depending on your application performance with single thread may be important or non important for you – it is surely important for the slave if you’re having active replication, if you’re running time sensitive long running CPU bound queries or if queries contribute significant time to generating web page.

For example if on Xeon queries take 50ms to generate the page, the MySQL Latency you may see on T2000 may be as high as 500ms which would be well above performance guidelines for many web applications.

I’m hearing Sun is working on new CPUs which would offer significantly higher single thread performance, but at this time I have to be very careful advising this platform to the customers.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. #1, “Sun” is not, to my knowledge, “aggressively pushing T2000 as Scalable MySQL Platforms”. Could you provide a reference?

    That’s not to say there could be individuals who are talking about T2000 for MySQL, and there are scenarios where any of the CMT systems are good for this, but there’s not any kind of aggressive pushing. I would know. :)

    #2, the T2000 is the first generation CoolThread CMT CPU, and the second generation has been out for over a year. The respin of the second generation is out now. I recognize you may not have had one of these available for testing, but there are certainly differences.

    #3, Sun has always been clear that the CMT systems are not the right thing for all jobs. Just have a look at the profiling tools and discussions to help find the right apps at http://cooltools.sunsource.net

    If you have one user doing selects as fast as possible, you have a weird reason for using a database at all. If you have a million users connecting, selecting, disconnecting, or you have a large number of databases or applications, that’s something else entirely.

    There are certainly clear cases where the CMT systems are perfectly reasonable, and even the best performer, for a workload. Just have a look here:
    http://www.sun.com/servers/coolthreads/benchmarks/index.jsp

    What happens when you have a lot of of concurrency with a Xeon? The OS has to context switch. This means spending a bunch of those 2.6GHz cycles moving data out of registers and onto the stack. As you add more concurrent workload, the response time per request will start dropping, and the number of responses within an acceptable timeframe (i.e. under 1sec, the way many benchmarks test) will drop.

    What happens when you have a lot of concurrency with a CMT system? It just switches between strands of execution. As you add more concurrent workload, the response time stays pretty constant, and the number of requests just goes up.

    It’s called efficiency. It’s why you hire a moving van and workers to help you load your furniture, rather than rent a Corvette for the day. :)

  2. Denis Sheahan says:

    Hi

    You are showing a 9x difference in performance here which is way higher than expected.
    Even with frequency differences I would only expect 3-4x difference

    Is it possible to get the data from your benchmark table so we can run the workload
    in-house and determine why it is so slow

    Also on the T2000 where does the database reside, local disk?

    What OS is running on both boxes

  3. peter says:

    Mark,

    1) This is feedback I’m getting from the clients – if they deal with Sun Sales – T2000 is what is frequently recommended for MySQL workloads with good discounts offered etc.

    2) Right I know newer CPUs are out, though It looks like they are more expensive so I’m not hearing about people considering them too much. But anyway – I’m not writing about new generation of CPUs – I would like to hear if they are much better and write about it.

    3) I think I’m clear in my post – I’m just saying T2000 is very low performing for single client workloads, because I think this is not communicated enough by Sun Sales and Marketing teams. Or can you give me a link for Sun benchmarks which would show how much T2000 is lower than Xeons for some workloads ?

    Regarding “if you have single user you should not use database” this is very strange note about MySQL users – there are a lot of cases when there is a limited concurrency and many queries or complex queries being run. I mentioned replication as most obvious example :)

    Plus see my note about page generation – performance with single client is the best latency you can expect, if this is not good enough already the fact you can run 1000 of them at the same time with zero performance regression does not help.

    Regarding your examples first – with Xeons 16 cores (4×4) are getting commodity and this is a lot of native concurrency already. But again comparison of T2000 performance in high concurrency is not the topic here. I know some high concurrency users were quite happy while other have were not so happy.

  4. peter says:

    Denis,

    Well you’re free to run this little “Benchmark” test on your T2000 or new generation CPU and tell me what you get. It is not very scientific but it gives good ballpark figure for raw “in cache” calculation speed.

    The frequency is not everything – for example “NetBurts” Xeons had about 2 times worse performance per Ghz compared to newer “Core” based one. It would be interested to know what do you expect in this case.

    Regarding database – it was CPU bound “cached” workload which we’re speaking about – if things are IO bound it is strange to compare CPUs :)

  5. Luke Monahan says:

    I have been testing MySQL 5.0.45 as distributed by Sun on a T5120 over the last few days. The T5120 essentially a next gen T2000. Twice as many threads-per-core leads to 64 logical processors being seen by the OS. Another big change is the addition of an FPU per core rather than a single FPU for the whole chip as in the T2000. For comparison I’ve been up against a Sun X4100 with 2 AMD dual-cores. Both machines have 16GB of RAM, but I’ve been testing with and without a large cache enabled to see the difference. My tests are all using innodb and Sysbench (latest versions). I’ve been using mainly the MySQL config to tune, and haven’t delved into filesystem and OS configuration or source code changes (eek!) yet.

    Essentially I am getting very similar results from each machine. The main difference is the resource utilization on the T5120 is much lower: 20% CPU versus 80-85% on the X4100. I have a while to go to see if I can do any better on both machines, but I am sure the Sun Niagara chips — especially in their latest incarnation — are very capable.

  6. peter says:

    Luke,

    How much do you get from Sysbench for Opteron vs Niagara based system for _single_ thread. If you can share multiple threads it is also interesting

  7. Luke Monahan says:

    Peter:

    The Niagara was running best at 32 thread concurrency — showing I believe a limit of MySQL to scaling out to more threads than this. Disk IO at this stage was fine (expected: separate disk IO tests showed the Niagara to excel here), so I am continuing to search for other bottlenecks. With low numbers of threads the Niagara was (predictably) slow, I only have results here from 4 threads upwards, but I can do some more tests for you if you like on Monday. I’ve still got a few days to finish up before we send the box back, so any suggestions on the most worthwhile benchmarks would help.

    The Opteron ran it’s best at 8-12 threads.

    Using Sysbench on 1M rows:

    sysbench –test=oltp –num-threads= –max-time=60 –max-requests=0 –oltp-read-only=on run

    T5120:
    X=4: 400 TPS
    X=8: 716 TPS
    X=16: 1178 TPS
    X=32: 1935 TPS
    X=48: 1869 TPS
    X=64: 1674 TPS

    I do have some more R/W and benchmarks with different configs, but not at work to dig them out.

    As far as getting your T2000 to work a bit better: http://hell.jedicoder.net/?p=88 contains some tuning resources at the bottom. I find the latest Coolstack release of MySQL has many of these applied already, so make sure you use that to test.

  8. Mikael Ronstrom says:

    Peter,
    Your blog is usually a very interesting but this type of benchmarks is about as
    informative as a benchmark of SELECT COUNT(*) from t and benchmarking InnoDB vs
    MyISAM where MyISAM will beat InnoDB by a large factor.

    A blog gives you the ability to quickly report findings but this blog certainly
    lacked the normal proper technical research that I would expect from you.

    Rgrds Mikael

  9. peter says:

    Michael,

    We’ve got to pick our fights. There is only so much in depth research we can do vs amount of information we come across.

    I constantly run into people having problems with T2000 for single thread applications, and this is information I want to share.

    I do not have constant access to T2000 which would allow to perform good elaborate benchmarks. Before Users Conference I’ve asked one of my Sun contacts to get access to one so I could include benchmarks on it in my presentation – unfortunately he could not arrange for one.

  10. peter says:

    Luke,

    Thanks for posting. Though it would be great to get numbers for Opteron and for Single Thread workload.

  11. Luke Monahan says:

    Hi Peter,

    Sorry I haven’t got back to this, but I’ll hopefully post some more benchmarks tomorrow. However, the single-threaded benchmarks aren’t going to do anything other than confirm what is already known: The Niagara isn’t aimed at single threaded workloads, and has never been advertised as such AFAIK. We are finding it to be well positioned for most web-based workloads (short queries, lots of them), but to do any data mining or long reporting processes we replicate to a more suitable server.

  12. peter says:

    Luke,

    No problem. If you will have a chance to post results please do it anyway.

    single-threaded vs multi-threaded is really oversimplification.

    T2000 should be advertised for multi-threaded workload when latency is not critical otherwise you may get into surprises. Tons of short queries may work or may not work – if you’re executing 100 1ms queries on Xeon to generate the page (sequentially) you may be well surprised by T2000 performance.

    Same about Web server in general – if you had CPU consumption of 0.05 sec to generate web page in PHP for example going with T2000 may push you outside of acceptable response time.

Speak Your Mind

*