Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create cluster simulator #11

Closed
wants to merge 3 commits into from
Closed

Create cluster simulator #11

wants to merge 3 commits into from

Conversation

jkozlowski
Copy link
Contributor

  • The simulator can simulate random things happening to the cluster, e.g.
    nodes dying/coming back up/timeouts/nodes processing events and processing
    resultant commands.
  • Added first property that runs the cluster until there is a stable leader elected
    by the quorum.
  • May need to be extended further to add actual events arriving that are then replicated
    but I'd rather have that in a separate commit when I try to fix Handle RequestVote correctly if the Candidate has stale log #10:
    would be intersting to see if I can write a test case to prove that the bug is there.
    Fixed jkozlowski/kontiki#2

- The simulator can simulate random things happening to the cluster, e.g.
  nodes dying/coming back up/timeouts/nodes processing events and processing
  resultant commands.
- Added first property that runs the cluster until there is a stable leader elected
  by the quorum.
- May need to be extended further to add actual events arriving that are then replicated
  but I'd rather have that in a separate commit when I try to fix #10:
  would be intersting to see if I can write a test case to prove that the bug is there.
Fixed jkozlowski/kontiki#2
@jkozlowski
Copy link
Contributor Author

Hey,

There is still the issue of modelling the events slightly better (sometimes the test might run longer than a minute in which case it times out), but I think it's a good first cut. I was hoping you guys could take a preliminary look to see if you like the approach or not + it's really my first piece of non-trivial haskell so any feedback is welcome.

For the failures I think of two things:

  1. Pragmatic: define the whole test passed if, say, 90 of the cases (out of the usual 100) passed and didn't fail with a timeout.
  2. Try to model the cluster a bit better: few improvements
  • For each node, keep a queue of messages per the remaining peers and draw randomly from them, to simulate messages arriving randomly.
  • Model timeouts by counting simulation steps: instead of just randomly firing, actually model it by sending those once a leader node goes down (I'm a bit fuzzy here, but I think this can be improved)
  • Model node failure probability by somehow tying it to the uptime (so nodes that have been up longer, have higher probability of failure).

I think this approach is not 100% deterministic, not sure if it can be made. In either case it is possible to get a sporadic failure, but we're making the probability smaller.

Let me know what you think

@qrilka
Copy link
Contributor

qrilka commented Dec 19, 2013

I'm eager to take a closer look, hope that I'll have enough time on this weekend.
As a side question: do you plan any real use of kontiki or just "playing around"?

@jkozlowski
Copy link
Contributor Author

Thanks, I'd really appreciate a review :)

I treat it as a great way to learn haskell and I'd love to see it finished with persistence etc. I code Java at work, so chances of me using anything like this are slim atm. I obviously have an ulterior motive as well, I'd like to become more active in open source.

What about you?

@qrilka
Copy link
Contributor

qrilka commented Dec 20, 2013

I worked on a couple of projects earlier this year and one of them required distributed storage mechamism. Actually it was discontinued in the begining of the summer. And (even) after that I've noticed etcd and Raft and then came to this library. At the moment I don't have any active Haskell projects at work and do this (though not very actively) just as hobby.

@jkozlowski
Copy link
Contributor Author

Nice. I read the raft paper at about the same time as I seriously started reading Haskell, so it's nice there was this project to hack on.

I saw etcd, it's getting quite a bit of press now due to docker and then projects like Flynn and coreos, so it's quite a hot topic. We're a bit late to the party though, got beaten by go-lang, damn :) it's a pity as it could bring good press to Haskell, more than anything FP complete could do, not that they aren't awesome.

Not sure what the value proposition would have to be to replace etcd with something built on top of kontiki, but who knows... :)

Sent from my iPhone

On 20 Dec 2013, at 06:09, Kirill Zaborsky [email protected] wrote:

I worked on a couple of projects earlier this year and one of them required distributed storage mechamism. Actually it was discontinued in the begining of the summer. And (even) after that I've noticed etcd and Raft and then came to this library. At the moment I don't have any active Haskell projects at work and do this (though not very actively) just as hobby.


Reply to this email directly or view it on GitHub.

@jkozlowski
Copy link
Contributor Author

I am now thinking my testing should go along the lines of simulating the cluster for N number of events and testing that the properties of the Raft protocol are true at all times, as per paper Section 5:

  • Election Safety: at most one leader can be elected in a given term. §5.2
  • Leader Append-Only: a leader never overwrites or deletes entries in its log; it only appends new entries. §5.3
  • Log Matching: if two logs contain an entry with the same index and term, then the logs are identical in all entries up through the given index. §5.3
  • Leader Completeness: if a log entry is committed in a given term, then that entry will be present in the logs of the leaders for all higher-numbered terms. §5.4
  • State Machine Safety: if a server has applied a log entry at a given index to its state machine, no other server will ever apply a different log entry for the same index. §5.4.3

These look like the could be codeable: also, with this approach it doesn't matter if the events happening are not realistic, as long as the cluster adheres to these. The bug in NicolasT/kontiki/#10, however, would not potentially be discoverable, as other pieces of code guard against this and if a leader was elected, it would be forced to step down. But these type of tests should be a good compliment to unit-tests. I shall give that a go tomorrow.

@NicolasT
Copy link
Owner

I think what @jkozlowski wrote in a comment above is the way to go: given a set of clusters, let all clusters behave 'correctly' (no Byzantine behavior), i.e. let them only use the state-machine as-is, not making up random events/messages, then let them act by communicating on lossy/reordering/high-latency/... channels, and at all times ensure the Raft invariants are kept. I.e. some sort of model-checking (think Spin/Promela and others).

I'll go through the code now.

-- | Generates a cluster between 3 and 7 nodes
manageableCluster :: Gen ClusterState
manageableCluster = suchThat arbitraryClusterState between3And7
where between3And7 cs = Map.size cs > 3 && Map.size cs < 8
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3-node is a fairly common setup (at least in our Paxos-based product), so might want to include that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, fair point, I wanted to do this, just made an error in the declaration of between3And7.

@NicolasT
Copy link
Owner

I really like this. One request though: could you split-out the 'plumbing' from the actual test-code (in a different module)? I feel like this should make adding more tests/checks easier.

@jkozlowski
Copy link
Contributor Author

Thanks for the review, I'll go through the individual comments inline. As mentioned, this pull request was to illicit initial discussion if what I'm thinking makes sense, I've started working on the second iteration already.

The idea is to, on each tick of the simulation, simulate a single event for all nodes (but don't append resulting events to queues until all are simulated, so that there's arbitrary ordering allowed), where event can be timer going off, simulating event from queue, killing node. Once all are simulated, shuffle the events and distribute them among queues (this is how we get latency and reordering).

That means I'll want to actually simulate timeouts properly, so I'm thinking of somehow modelling the average broadcastTime (as per the paper, so I can get: broadcastTime < electionTimeout < MTBF). I'm looking at various queing models in papers, it's a lot of fun. I probably won't have this for this iteration, but it's on my mind.

Should be fun.

@qrilka
Copy link
Contributor

qrilka commented Dec 22, 2013

@jkozlowski, the moment I don't understand in your current implementation - is there anything like time in this model? From the code I see I'm not sure that heartbeats will be close to being periodic.
Also why there is no simulation event appending new entries?
It looks like frequency function you use will result in proportional probabilities of node being killed and node responses being sent.
I'm not proficient in simulations but I'm not sure that such correlation between different model parameters will give proper results.

@jkozlowski
Copy link
Contributor Author

Yep, this first iteration was not very realistic in that all events were random. I want to simulate timeouts properly in second iteration (and more proper nodes going down, by drawing the restart times from some distribution and restarting nodes according to those).

As a side note, even thought this current model is not realistic, the properties mentioned in the paper should still hold (and yes, I missed the appending of events, as the first test I wanted to have was leader election, which turns out to be not really what we want).

Thanks to both for all your feedback, publishing this meant I had to review it myself and I think we now should have a better model, I just need to implement it :)

@qrilka
Copy link
Contributor

qrilka commented Dec 22, 2013

BTW if you'll find good papers how to do such simulations properly - please share :)

@jkozlowski
Copy link
Contributor Author

Sure :) I think the point here is to decouple it from time and think about it in terms of ticks as defined above (which means you can run it regardless of wall time). Once I have that I'll just have to derive a formula to calculate average size of the incoming queue, which will give me the broadcastTime in ticks which means I can bound the timeouts with it and we're home.

At least that's my theory :)

@NicolasT
Copy link
Owner

NicolasT commented Jan 9, 2014

I've been thinking about this a little more, and I think there might be an alternative testing strategy.
What if, instead of testing the FSM as-is by feeding quasi-random input & determining the output etc, a different approach is taken: make a full map of the FSM states, and feed this into a model checker.

Obviously it's impossible to generate/dump all states, but I think it should be feasible to generate, e.g. in a 3-'node' system, all possible states on every node (modulo actual log content and maybe some other variables which don't really matter, obviously) and turn this into a Promela script, after which all invariants can be added, and checked using Spin.

This might be a bit overkill, but anyway... Heck, it could be a really cool research project. Anyone looking for a master thesis subject? "Validating a Haskell Raft FSM using Spin" 😉

@jkozlowski
Copy link
Contributor Author

Yep, I thought about that too, but I don't have much experience with this approach. I think that's how they've validated the algorithm itself, from what I remember.

A bit busy at work this week, I'm hoping to come back to this over the weekend.

@NicolasT
Copy link
Owner

Indeed, there's a TLA+ proof (http://raftuserstudy.s3-website-us-west-1.amazonaws.com/proof.pdf). Note that's the other way around though: this proof provides assurance about the protocol. What we're looking for is assurance about the correctness of the implementation.

@jkozlowski
Copy link
Contributor Author

Yep. I started reading about this today, after seeing a master thesis about testing CERN's state machines (they used a different language, but it's the same thing). I'll try to finish up what I have and then I'll see if we can attack it this way, I'm intrigued:)

Sent from my iPhone

On 10 Jan 2014, at 20:48, Nicolas Trangez [email protected] wrote:

Indeed, there's a TLA+ proof (http://raftuserstudy.s3-website-us-west-1.amazonaws.com/proof.pdf). Note that's the other way around though: this proof provides assurance about the protocol. What we're looking for is assurance about the correctness of the implementation.


Reply to this email directly or view it on GitHub.

@NicolasT
Copy link
Owner

Cool! 👍

@jkozlowski
Copy link
Contributor Author

Just found this, thought might be of interest: http://javapathfinder.sourceforge.net/. Effectively Java -> Promela, at least it appears so from the first paper: http://havelund.com/Publications/jpf1-report-99.pdf

@abailly
Copy link

abailly commented Jan 5, 2016

How about using/replicating what jepsen does for testing?

@NicolasT NicolasT closed this Jul 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants