reader: don't use Decimal class to calculate backoff #155

virtuald · 2016-06-02T19:36:28Z

It's not immediately clear why the Decimal class is being used here, when it seems like pure floats are just as good.

Profiling indicates lots of time spent here, reduced 50000 messages
from 23s to 18s (not raw NSQ timing, with custom layering on top)

Haven't ran the tests yet, going to let Travis do that for me :)

virtuald · 2016-06-02T19:44:06Z

Ok, I'll fix the tests... give me a few minutes.

- Profiling indicates lots of time spent here, reduced 50000 messages from 23s to 18s (not raw NSQ, with custom layering on top)

virtuald · 2016-06-02T20:39:19Z

Ok, tests should pass this time. Because of the way backoff is used, I changed it around to use integers instead of floating point numbers, but the public API (get_interval()) still returns a floating point number. Slightly less speedup because of the conversion in get_interval, but still a huge win (20%).

ploxiln · 2016-06-02T22:01:11Z

You could probably get it to work with plain floats, with just a tweak: in the success case, instead of using max() to stop it at 0, use something like if interval < 1.0/1024: interval = 0

virtuald · 2016-06-03T04:44:25Z

As I said, this is still a 20% speedup over the current release, so I'm not too concerned. Knock yourself out if you want to get that extra 0.5%. 👍

mreiferson · 2016-06-03T15:58:01Z

I'm still really confused as to how it spent 20% of its time here - do you have any more logs or results you can paste from your profiling?

Backoff shouldn't be calculated so frequently?

virtuald · 2016-06-03T16:38:21Z

I don't have anything I can release publicly, but if you want I can put together a quick raw NSQ test that should duplicate the results.

First thing of course, is the Decimal class is slow compared to raw integers or floats. This is python after all, and this seems to be called in a critical section (or at least as shown below -- multiple times per message).

Looking at my profiling data, it shows that BackoffTimer.success is called 50001 times (which is equal to the number of messages received), and BackoffTimer.get_interval is called 100,040 times.

success seems to be called by on_backoff_resume, which is called by trigger which is called by lots of people..
get_interval seems to be called
- 50001 times by maybe_update_rdy, called by _handle_message
- 50001 times by enter_continue_or_exit_backoff, called by _on_backoff_resume

If that's unexpected, perhaps the backoff code needs some restructuring and we'd see even more performance boost. :)

virtuald · 2016-06-03T20:33:04Z

Alright, I put together a test using just NSQ that demonstrates the performance difference. It's not totally realistic, but good enough. :) See this gist.

Using this raw nsq reader/writer, and a local nsqd (no lookupd) on my macbook pro without power, I get:

Python 2.7.10: 9.6s with my patch, and 13.8s without it
Python 3.5.1: 10.5 with my patch, and 11.108 without it

Tornado 4.3 for both.

mreiferson · 2016-06-04T02:14:57Z

Thanks, I'm very surprised at this, but then again I've unfortunately never actually paid attention to some of these details.

What this tells me is that the problem isn't Decimal, it's the fact that we're interacting with this in critical code paths. We should fix that instead.

virtuald · 2016-06-04T03:02:12Z

Well. I'm open to someone refactoring the internals so that it's even faster. 👍

However, I don't feel comfortable enough with the way NSQ works to do the refactoring correctly, and this provides a sizable speed boost for me now. I'll leave it to you.

mreiferson · 2016-06-06T05:52:49Z

see #156, thanks for digging into this!

virtuald force-pushed the perf-remove-decimal branch from f992f78 to 0da3ece Compare June 2, 2016 19:55

Change backup timer to use integers instead of Decimal

0afef30

- Profiling indicates lots of time spent here, reduced 50000 messages from 23s to 18s (not raw NSQ, with custom layering on top)

virtuald force-pushed the perf-remove-decimal branch from 0da3ece to 0afef30 Compare June 2, 2016 20:35

virtuald changed the title ~~Use floats instead of Decimal class~~ Don't use Decimal class to calculate backoff Jun 2, 2016

mreiferson changed the title ~~Don't use Decimal class to calculate backoff~~ reader: don't use Decimal class to calculate backoff Jun 3, 2016

mreiferson added the feature label Jun 3, 2016

ploxiln mentioned this pull request Jun 4, 2016

reader: optimize BackoffTimer using float #156

Merged

jehiah added performance and removed feature labels Jun 5, 2016

mreiferson closed this Jun 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reader: don't use Decimal class to calculate backoff #155

reader: don't use Decimal class to calculate backoff #155

virtuald commented Jun 2, 2016

virtuald commented Jun 2, 2016

virtuald commented Jun 2, 2016

ploxiln commented Jun 2, 2016

virtuald commented Jun 3, 2016

mreiferson commented Jun 3, 2016

virtuald commented Jun 3, 2016

virtuald commented Jun 3, 2016

mreiferson commented Jun 4, 2016

virtuald commented Jun 4, 2016

mreiferson commented Jun 6, 2016

reader: don't use Decimal class to calculate backoff #155

reader: don't use Decimal class to calculate backoff #155

Conversation

virtuald commented Jun 2, 2016

virtuald commented Jun 2, 2016

virtuald commented Jun 2, 2016

ploxiln commented Jun 2, 2016

virtuald commented Jun 3, 2016

mreiferson commented Jun 3, 2016

virtuald commented Jun 3, 2016

virtuald commented Jun 3, 2016

mreiferson commented Jun 4, 2016

virtuald commented Jun 4, 2016

mreiferson commented Jun 6, 2016