Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace memory-leak-inciting Writer.reserve with more intuitive Writer.reserve_current method. #2260

Merged
merged 1 commit into from
Oct 11, 2017

Conversation

dipinhora
Copy link
Contributor

Prior to this commit, multiple calls to Writer.reserve would
allocate more and more memory in _current even if a sufficient
amount of unused memory was already available and allocated in
_current.

This commit changes the logic in reserve to call _check as
that has similar logic to confirm/reserve memory required but
without the memory leak.


For context:

We had a piece of code with a memory leak related to Writer.reserve calls. Removing the calls removed the leak. We were calling Writer.reserve right after create and Writer.done and then filling the Writer using Writer.write in between. This resulted in a memory leak because Writer.write doesn't use _current and Writer.done doesn't change/reallocate _current if there was no data written to it. As a result, over time, multiple Writer.reserve/Writer.write/Writer.done cycles resulted in the memory leak because Writer.reserve would continually add more memory to _current even though it already had enough unused space already allocated. This PR corrects the memory leak.


Note: I wasn't able to think of a good test to add to _test.pony to prevent a regression because _current is hidden behind the Writer interface and there is no way to inspect it directly. I'd appreciate any ideas regarding adding a test to prevent a regression.

@jemc
Copy link
Member

jemc commented Oct 5, 2017

Just wanted to note that this could be seen as a change in semantics rather than a fix.

The docstring says "Reserve space for size additional bytes" - I think this is a bit ambiguous as to whether it implies the old behaviour (reserve N additional bytes on every call), or the new behaviour that you've introduced here (ensure that at least N total bytes are reserved). Honestly, I'd lean more toward thinking that the docstring implies the old behaviour, particularly because of the word additional.

I think both meanings are actually valid, so how would you feel about having both methods available, with clarified docstrings and appropriate names on each one?

@jemc
Copy link
Member

jemc commented Oct 5, 2017

I've added the "needs discussion during sync" label just to mark that it needs discussion - if we come to a good answer before next week's sync, we can just remove that label.

@dipinhora
Copy link
Contributor Author

I initially considered it to be "not a bug" before rethinking it so I understand your suggestion. I'm not sure what the names should be to make the distinction clear and not confusing so I'm open to suggestions (or if someone wants to create another PR with the appropriate changes I'll close this one).

@dipinhora
Copy link
Contributor Author

Also, as additional context, the reserve_chunks function behaves the same as reserve after the change in this PR is applied. Currently they behave differently even though their names are similar which has the potential to lead to confusion.

@jemc
Copy link
Member

jemc commented Oct 5, 2017

I'm not sure what the names should be to make the distinction clear and not confusing so I'm open to suggestions

It seems like the new behaviour you're introducing here is much more similar semantics to Array.reserve, so we can use that as a benchmark for the expected meaning of reserve.

With that in mind, I don't think either the old behaviour or the new behaviour of this method are the same as the Array.reserve meaning, since Array.reserve refers to the total size of the Array, including elements that are already in it, and both the old and new behaviour of this method ignore the current number of bytes already in the buffer.

So maybe we need two new names to thoroughly avoid confusion?

I'll think on it a little bit, and anyone else can feel free to give suggestions in the meantime.

@mfelsche
Copy link
Contributor

mfelsche commented Oct 6, 2017

What about one of

  • extend
  • extend_by
  • extend_capacity

as new names for the old reserve method?

And one of:

  • ensure
  • ensure_capacity

for the new implementation which just calls _check ?

Preferably both names should match (e.g. ensure_capacity and extend_capacity).

@jemc
Copy link
Member

jemc commented Oct 6, 2017

I think ensure alone is too vague, so something like ensure_capacity / extend_capacity or ensure_space / extend_space may be the way to go.

Do we also need to consider a new name for reserve_chunks? ensure_chunks?

@SeanTAllen
Copy link
Member

SeanTAllen commented Oct 9, 2017

I've been going over the minimal case with @dipinhora and I think this is

  1. really nasty.
  2. a principle of least surprise bug.

I don't think this is a change in semantics. I think this is a straight up bug.

I'm shocked that something like this, which I would think is completely safe, can leak memory:

use "buffered"

actor Main
  let _wb: Writer = Writer
  var _num: USize = 10000000
  var _data: Array[ByteSeq] = Array[ByteSeq]

  new create(env: Env) =>
    f()

  be f() =>
    _wb.reserve(8)
    _wb.write("random stuff")
    _data = _wb.done()
    _num = _num - 1
    if _num > 0 then
      f()
    end

whether its a semantics change or not, i think we need to fix that under principal of least surprise bug.

@jemc
Copy link
Member

jemc commented Oct 9, 2017

@SeanTAllen - no one in this thread is saying that this API is okay, or that it should stay how it is - the issue at hand is finding the right way to fix it, and most importantly, avoiding having confusion like this in the future.

I understand that the behaviour was surprising to you and @dipinhora, but I was just pointing out that this is most likely behaving as the author intended it to. That is, I see it not as a "memory leak in Writer.reserve", but as a misleading API which has a tendency to misdirect the programmer into writing programs with memory leaks. It's a bit of a pedantic difference though, because it's still an unacceptable status quo (principle-of-least-surprise, as you mentioned).

From my perspective, the main issues with the "old" code (before this PR) are:

  1. the docstring was ambiguous and could be interpreted in multiple ways, so that intention was not communicated clearly
  2. the behaviour contradicts what's likely to be intuitive for a method named "reserve" (the principle-of-least-surprise issue that you mentioned)

So my hesitation to merge this PR as is are that I think it might still suffer from those same two problems above.

  1. The docstring hasn't been clarified, so it's still ambiguous, and could still be interpreted as having the old behaviour. As I mentioned in an earlier comment, if the same docstring could be interpreted as describing two wildly different behaviours (the old behaviour and the new behaviour), we definitely need to more explicitly explain what this method does.
  2. The way this method behaves (both the old behaviour and the new behaviour) is very different from the behaviour of Array.reserve, and so I worry that keeping the name of this method as Writer.reserve risks creating further "surprises" for folks down the road. I'd like to see if we can find a different name for reserve and reserve_chunks that is less surprising.

@aturley
Copy link
Member

aturley commented Oct 9, 2017

I wrote this together with @sylvanc. I don't remember why we did it this way rather than some other way, but maybe he does.

I dug through some old Wallaroo slack logs and found a conversation about this method from June, 2016. In the conversation I suggested that maybe what reserve does not do what people would expect it to do, but it doesn't look like we ever acted on that thought.

@jemc
Copy link
Member

jemc commented Oct 11, 2017

After discussing this on the sync call a bit, we didn't come to a conclusion, but my personal preference for this result would be to rename the method as reserve_current, to reflect that it acts like Array.reserve within the current chunk, and then improve the docstring and merge.

Prior to this commit, multiple calls to `Writer.reserve` would
allocate more and more memory in `_current` even if a sufficient
amount of unused memory was already available and allocated in
`_current`.

This commit changes the logic in `reserve` to call `_check` as
that has similar logic to confirm/reserve memory required but
without the memory leak.

This commit also renames `reserve` to `reserve_current`.
@dipinhora dipinhora force-pushed the buffered_writer_fix branch from ee840af to c920a0f Compare October 11, 2017 21:45
@dipinhora
Copy link
Contributor Author

@jemc I've renamed the method and changed the docstring.

Please let me know if this is what you had in mind.

@jemc jemc changed the title Fix memory leak in Writer.reserve Replace memory-leak-inciting Writer.reserve with more intuitive Writer.reserve_current method. Oct 11, 2017
@jemc jemc added changelog - changed Automatically add "Changed" CHANGELOG entry on merge and removed needs discussion during sync labels Oct 11, 2017
@jemc jemc merged commit 738a340 into ponylang:master Oct 11, 2017
ponylang-main added a commit that referenced this pull request Oct 11, 2017
@jemc
Copy link
Member

jemc commented Oct 11, 2017

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog - changed Automatically add "Changed" CHANGELOG entry on merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants