July 24, 2014

What time 18446744073709550.000 means

Sometimes when you do profiling you can see number like this in timestamps. Periodically in our patches there was Query_time: 18446744073709550.000 in slow.log file (well, it was fixed recently, but still appears in other places).

I faced this problem several years ago when only 2-core AMD Opteron systems appeared and I noticed sysbench getting crazy showing query execution time 18446744073709550.000 or like this.

Obviously this is unsigned integer which was received by subtraction bigger number from smaller.
But how it can be if we use:

start_time = clock_gettime(CLOCK_THREAD_CPUTIME_ID, &tp);

query_execution

end_time = clock_gettime(CLOCK_THREAD_CPUTIME_ID, &tp);

total_time = end_time – start_time;

How we can get query executed in negative time ? It would be too good to be real :).

After some research I found the reason, and it is interesting that I want to share it even few years later. The reason is that different CPU on the same motherboard uses different frequency generators (each CPU has its own frequency generator, though exact implementation depends on CPU vendor), and there is no way to get these generators synchronized. That is if in your server you have 4 CPU, each of them lives on its own time. Usually difference between generators is very small (microseconds matter), but it appears this difference is enough to get negative time.
It happens when your query started on one CPU and during execution it was switched to another CPU (where generator delayed to first CPU) and time between two measurements were small enough to catch difference in generators. In result you have negative time, but it actually, it is more serious, it mean – we can’t get exact time between two points in application.
So designing profiling in your application be aware of. I wonder how realtime sensor systems, where microseconds are important, deal with this.

About Vadim Tkachenko

Vadim leads Percona's development group, which produces Percona Clould Tools, the Percona Server, Percona XraDB Cluster and Percona XtraBackup. He is an expert in solid-state storage, and has helped many hardware and software providers succeed in the MySQL market.

Comments

  1. PH says:

    Easy, just make sure the execution between the checkpoints is on the same cpu (using affinity) or use hardware that has synchronized TSC’s or use counters that are not CPU-specific (slower).

  2. Vadim says:

    PH,

    Yeah, affinity makes sense. Thanks!

  3. Seth says:

    Good post. Does this also apply to multi-core machines with just one physical CPU?

  4. Vadim says:

    Seth,

    In that time it was dual-core single CPU AMD. I am not fully sure how it is currently, I have heard Intel uses single generator per physical CPU, but I may be wrong in this case.

  5. I have encountered the same phenomenon – but under different reasons:
    A machine whose clock was really bad was constantly synchronized (in crond) using ntpdate.
    Constantly means about every 30 minutes.
    So, once every 30 minutes, there was a time “glitch”. Once every 30 minutes, we would see unreasonably long queries in the slow query log – that’s because the clock was set back during the execution of some queries.
    I just built a script which ignores such long queries altogether…

  6. Marki says:

    Using ntpdate on production server is really bad idea. Or precisely changing time is a bad idea. Use only “time shifting”, so that time is not changed in step (possible negative), but kernel makes the time go slower/faster which will adjust the time without making any gap. Or the best solution – configure NTP daemon to have constantly synchronized time.

  7. @Marki,
    Thanks – will look into it.

  8. Joey Parsons says:

    I think I’ve seen this in other places, like Seconds_Behind_Master in “show slave status” where it’ll consistently be at 0 or 1 and then suddenly jump to something ungodly numbers above ten million and then jump right back to 0 or 1 a tick later.

    Anyone else seen that?

  9. Tim Bunce says:

    I’ve noted similar issues in the Devel::NYTProf docs at http://search.cpan.org/~timb/Devel-NYTProf/lib/Devel/NYTProf.pm#CAVEATS

  10. peter says:

    Vadim,

    I think there is a need for TIME_DIFFERENCE macro consistently used in MySQL or any other code which will handle the case of negative times and will replace them with zeroes.

  11. foo says:

    or use a high resolution monotonic clock…

  12. peter says:

    foo,

    What exactly do you suggest ? Note we use this for profiling which means potentially hundreds of thousands time gets per second so it must be as fast as possible. Because it is profiling we also do not care if it numbers are a bit inaccurate.

  13. In general you should allow negative times to be accumulated naturally as they’ll tend to balance out over the longer term. Treating them as zero at the time they’re measured would be wrong. Treating them as zero in reports, after accumulation, is less wrong but it’s still better to educate the users to the harsh realities of clocks on SMP/virtual systems.

    I’d strongly suggest you allow the clock id passed to clock_gettime() to be configurable. Then the user can make their own choice about the accuracy vs performance trade-off that suits their needs best.

    See also:
    http://groups.google.com/group/comp.os.linux.development.apps/tree/browse_frm/thread/dc29071f2417f75f/ac44671fdb35f6db
    http://sean.chittenden.org/news/2008/06/01/

    It seems to be difficult to find good detailed information about the semantics of the various CLOCK_*’s that clock_gettime() supports on various systems. If anyone has better links I’d be grateful for them.

  14. If we allow negative times to accumulate naturally, it will skew things horribly. These times go negative because they’re very small to begin with; it would take many of them to add up enough to cancel out a single negative-gone-huge-positive.

    Maatkit has handled these for a long time in mk-query-digest by setting them to zero during log aggregation. I think this is reasonable because if it goes negative, it means it wasn’t measurable anyway.

  15. Guillaume Lefranc says:

    @Shlomi, regarding negative times in slow query logs I filed this bug http://bugs.mysql.com/bug.php?id=35396, a patch was committed and should solve the problem. Nevertheless the problem was more about the NTP source that was not synchronizing correctly (ask your sysadmin not to run NTP in a virtual machine, quite a bad idea) and my servers having from -10/+10 sec. time readjust.

  16. Barron, I was referring to the underlying negative values that could arise from the clock_gettime() calls in the original article. The fact they’re presented as a huge positive my mysqld is obviously a bug.

  17. @Guillaume – thanks!

  18. Vadim says:

    TimBunce,

    It is two different timers. They are not synchronized. If we get negative time from two different clocks I do not see why we should add this time to total time, it has no good information.
    It’s like you measure time for runner and measure start time by your watches and measure finish time by your friend’s watches, and difference between both watches is unknown.
    It may happen your friend’s watches is slower by XYZ sec. And by your watches runner started at 10:54 and by your friend’s watches he finished at 10:37.
    By subtracting you get -0:17, but does it really have any sense ?

  19. Vadim. I believe the clocks are synchronized, just not as frequently as we’d like so they drift slightly. I had a link that, as I recall, explained that, but it doesn’t work now (http://webnews.giga.net.tw/article/mailing.freebsd.performance/710).

    Re your analogy, imagine you have two watches (2 cpus) and each time you measure an interval you pick a watch at random (thread switched to different cpu). Over thousands of measurements you’ll get a more accurate total if you don’t discard negative durations.

  20. foo says:

    peter,

    a monotonic clock won’t go backwards and it probably provides equal or better performance when compared to the other clocks supported by clock_gettime.

  21. peter says:

    Tim,

    I think math wise you’re right. However on the practical measures dealing with the huge values (not small negative values) is quite inconvenient. We also mainly use this for things like query execution time etc where the gain of accuracy by using negative values is not worth loss of convenience.

  22. peter says:

    Vadmin – do you know why Google patches do not use monotonic clock in their patches for wall clock timing ?

  23. Mark Callaghan says:

    Why would CLOCK_THREAD_CPUTIME_ID go backwards? It isn’t a wall clock timer, CLOCK_REALTIME and CLOCK_MONOTONIC do that. Ticks/nanoseconds/useconds of CPU time consumed by a process or thread should never go backwards. I think this is a bug in glibc or the Linux kernel.

  24. Mark Callaghan says:

    And this describes the broken behavior in CLOCK_THREAD_CPUTIME_ID and CLOCK_PROCESS_CPUTIME_ID — http://linux.die.net/man/3/clock_gettime. I have no idea what values are returned by them, but CPU time can’t go backwards.

  25. Tim Bunce says:

    That clock_gettime man page says “The CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID clocks are realized on many platforms using timers from the CPUs (TSC on i386, AR.ITC on Itanium). These registers may differ between CPUs and as a consequence these clocks may return bogus results if a process is migrated to another CPU”

    I think “may return bogus results” means that CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID can appear to ‘go backwards’ if a process is migrated to a different cpu.

Speak Your Mind

*