Create cluster simulator #11

jkozlowski · 2013-12-19T20:34:23Z

The simulator can simulate random things happening to the cluster, e.g.
nodes dying/coming back up/timeouts/nodes processing events and processing
resultant commands.
Added first property that runs the cluster until there is a stable leader elected
by the quorum.
May need to be extended further to add actual events arriving that are then replicated
but I'd rather have that in a separate commit when I try to fix Handle RequestVote correctly if the Candidate has stale log #10:
would be intersting to see if I can write a test case to prove that the bug is there.
Fixed jkozlowski/kontiki#2

- The simulator can simulate random things happening to the cluster, e.g. nodes dying/coming back up/timeouts/nodes processing events and processing resultant commands. - Added first property that runs the cluster until there is a stable leader elected by the quorum. - May need to be extended further to add actual events arriving that are then replicated but I'd rather have that in a separate commit when I try to fix #10: would be intersting to see if I can write a test case to prove that the bug is there. Fixed jkozlowski/kontiki#2

jkozlowski · 2013-12-19T20:42:32Z

Hey,

There is still the issue of modelling the events slightly better (sometimes the test might run longer than a minute in which case it times out), but I think it's a good first cut. I was hoping you guys could take a preliminary look to see if you like the approach or not + it's really my first piece of non-trivial haskell so any feedback is welcome.

For the failures I think of two things:

Pragmatic: define the whole test passed if, say, 90 of the cases (out of the usual 100) passed and didn't fail with a timeout.
Try to model the cluster a bit better: few improvements

For each node, keep a queue of messages per the remaining peers and draw randomly from them, to simulate messages arriving randomly.
Model timeouts by counting simulation steps: instead of just randomly firing, actually model it by sending those once a leader node goes down (I'm a bit fuzzy here, but I think this can be improved)
Model node failure probability by somehow tying it to the uptime (so nodes that have been up longer, have higher probability of failure).

I think this approach is not 100% deterministic, not sure if it can be made. In either case it is possible to get a sporadic failure, but we're making the probability smaller.

Let me know what you think

qrilka · 2013-12-19T20:49:00Z

I'm eager to take a closer look, hope that I'll have enough time on this weekend.
As a side question: do you plan any real use of kontiki or just "playing around"?

jkozlowski · 2013-12-19T20:59:59Z

Thanks, I'd really appreciate a review :)

I treat it as a great way to learn haskell and I'd love to see it finished with persistence etc. I code Java at work, so chances of me using anything like this are slim atm. I obviously have an ulterior motive as well, I'd like to become more active in open source.

What about you?

qrilka · 2013-12-20T05:09:41Z

I worked on a couple of projects earlier this year and one of them required distributed storage mechamism. Actually it was discontinued in the begining of the summer. And (even) after that I've noticed etcd and Raft and then came to this library. At the moment I don't have any active Haskell projects at work and do this (though not very actively) just as hobby.

jkozlowski · 2013-12-20T10:23:14Z

Nice. I read the raft paper at about the same time as I seriously started reading Haskell, so it's nice there was this project to hack on.

I saw etcd, it's getting quite a bit of press now due to docker and then projects like Flynn and coreos, so it's quite a hot topic. We're a bit late to the party though, got beaten by go-lang, damn :) it's a pity as it could bring good press to Haskell, more than anything FP complete could do, not that they aren't awesome.

Not sure what the value proposition would have to be to replace etcd with something built on top of kontiki, but who knows... :)

Sent from my iPhone

On 20 Dec 2013, at 06:09, Kirill Zaborsky [email protected] wrote:

I worked on a couple of projects earlier this year and one of them required distributed storage mechamism. Actually it was discontinued in the begining of the summer. And (even) after that I've noticed etcd and Raft and then came to this library. At the moment I don't have any active Haskell projects at work and do this (though not very actively) just as hobby.

—
Reply to this email directly or view it on GitHub.

jkozlowski · 2013-12-20T17:01:31Z

I am now thinking my testing should go along the lines of simulating the cluster for N number of events and testing that the properties of the Raft protocol are true at all times, as per paper Section 5:

Election Safety: at most one leader can be elected in a given term. §5.2
Leader Append-Only: a leader never overwrites or deletes entries in its log; it only appends new entries. §5.3
Log Matching: if two logs contain an entry with the same index and term, then the logs are identical in all entries up through the given index. §5.3
Leader Completeness: if a log entry is committed in a given term, then that entry will be present in the logs of the leaders for all higher-numbered terms. §5.4
State Machine Safety: if a server has applied a log entry at a given index to its state machine, no other server will ever apply a different log entry for the same index. §5.4.3

These look like the could be codeable: also, with this approach it doesn't matter if the events happening are not realistic, as long as the cluster adheres to these. The bug in NicolasT/kontiki/#10, however, would not potentially be discoverable, as other pieces of code guard against this and if a leader was elected, it would be forced to step down. But these type of tests should be a good compliment to unit-tests. I shall give that a go tomorrow.

…ully qualified name.

NicolasT · 2013-12-22T15:03:07Z

I think what @jkozlowski wrote in a comment above is the way to go: given a set of clusters, let all clusters behave 'correctly' (no Byzantine behavior), i.e. let them only use the state-machine as-is, not making up random events/messages, then let them act by communicating on lossy/reordering/high-latency/... channels, and at all times ensure the Raft invariants are kept. I.e. some sort of model-checking (think Spin/Promela and others).

I'll go through the code now.

NicolasT · 2013-12-22T15:11:50Z

bin/test.hs

+-- | Generates a cluster between 3 and 7 nodes
+manageableCluster :: Gen ClusterState
+manageableCluster = suchThat arbitraryClusterState between3And7
+  where between3And7 cs = Map.size cs > 3 && Map.size cs < 8


3-node is a fairly common setup (at least in our Paxos-based product), so might want to include that.

Yep, fair point, I wanted to do this, just made an error in the declaration of between3And7.

NicolasT · 2013-12-22T16:00:00Z

I really like this. One request though: could you split-out the 'plumbing' from the actual test-code (in a different module)? I feel like this should make adding more tests/checks easier.

jkozlowski · 2013-12-22T17:32:46Z

Thanks for the review, I'll go through the individual comments inline. As mentioned, this pull request was to illicit initial discussion if what I'm thinking makes sense, I've started working on the second iteration already.

The idea is to, on each tick of the simulation, simulate a single event for all nodes (but don't append resulting events to queues until all are simulated, so that there's arbitrary ordering allowed), where event can be timer going off, simulating event from queue, killing node. Once all are simulated, shuffle the events and distribute them among queues (this is how we get latency and reordering).

That means I'll want to actually simulate timeouts properly, so I'm thinking of somehow modelling the average broadcastTime (as per the paper, so I can get: broadcastTime < electionTimeout < MTBF). I'm looking at various queing models in papers, it's a lot of fun. I probably won't have this for this iteration, but it's on my mind.

Should be fun.

qrilka · 2013-12-22T18:54:50Z

@jkozlowski, the moment I don't understand in your current implementation - is there anything like time in this model? From the code I see I'm not sure that heartbeats will be close to being periodic.
Also why there is no simulation event appending new entries?
It looks like frequency function you use will result in proportional probabilities of node being killed and node responses being sent.
I'm not proficient in simulations but I'm not sure that such correlation between different model parameters will give proper results.

jkozlowski · 2013-12-22T19:03:04Z

Yep, this first iteration was not very realistic in that all events were random. I want to simulate timeouts properly in second iteration (and more proper nodes going down, by drawing the restart times from some distribution and restarting nodes according to those).

As a side note, even thought this current model is not realistic, the properties mentioned in the paper should still hold (and yes, I missed the appending of events, as the first test I wanted to have was leader election, which turns out to be not really what we want).

Thanks to both for all your feedback, publishing this meant I had to review it myself and I think we now should have a better model, I just need to implement it :)

qrilka · 2013-12-22T19:22:10Z

BTW if you'll find good papers how to do such simulations properly - please share :)

jkozlowski · 2013-12-22T19:43:49Z

Sure :) I think the point here is to decouple it from time and think about it in terms of ticks as defined above (which means you can run it regardless of wall time). Once I have that I'll just have to derive a formula to calculate average size of the incoming queue, which will give me the broadcastTime in ticks which means I can bound the timeouts with it and we're home.

At least that's my theory :)

NicolasT · 2014-01-09T00:42:44Z

I've been thinking about this a little more, and I think there might be an alternative testing strategy.
What if, instead of testing the FSM as-is by feeding quasi-random input & determining the output etc, a different approach is taken: make a full map of the FSM states, and feed this into a model checker.

Obviously it's impossible to generate/dump all states, but I think it should be feasible to generate, e.g. in a 3-'node' system, all possible states on every node (modulo actual log content and maybe some other variables which don't really matter, obviously) and turn this into a Promela script, after which all invariants can be added, and checked using Spin.

This might be a bit overkill, but anyway... Heck, it could be a really cool research project. Anyone looking for a master thesis subject? "Validating a Haskell Raft FSM using Spin" 😉

jkozlowski · 2014-01-09T07:49:24Z

Yep, I thought about that too, but I don't have much experience with this approach. I think that's how they've validated the algorithm itself, from what I remember.

A bit busy at work this week, I'm hoping to come back to this over the weekend.

NicolasT · 2014-01-10T20:48:06Z

Indeed, there's a TLA+ proof (http://raftuserstudy.s3-website-us-west-1.amazonaws.com/proof.pdf). Note that's the other way around though: this proof provides assurance about the protocol. What we're looking for is assurance about the correctness of the implementation.

jkozlowski · 2014-01-10T20:52:13Z

Yep. I started reading about this today, after seeing a master thesis about testing CERN's state machines (they used a different language, but it's the same thing). I'll try to finish up what I have and then I'll see if we can attack it this way, I'm intrigued:)

Sent from my iPhone

On 10 Jan 2014, at 20:48, Nicolas Trangez [email protected] wrote:

Indeed, there's a TLA+ proof (http://raftuserstudy.s3-website-us-west-1.amazonaws.com/proof.pdf). Note that's the other way around though: this proof provides assurance about the protocol. What we're looking for is assurance about the correctness of the implementation.

—
Reply to this email directly or view it on GitHub.

NicolasT · 2014-01-13T16:59:38Z

Cool! 👍

jkozlowski · 2014-01-23T07:38:54Z

Just found this, thought might be of interest: http://javapathfinder.sourceforge.net/. Effectively Java -> Promela, at least it appears so from the first paper: http://havelund.com/Publications/jpf1-report-99.pdf

abailly · 2016-01-05T13:39:25Z

How about using/replicating what jepsen does for testing?

jkozlowski added 2 commits December 20, 2013 19:22

Use toLazyByteString only in Eq instance for Command and do not use f…

62f0af1

…ully qualified name.

Use 60 * 1000000 as the timeout.

233a272

NicolasT reviewed Dec 22, 2013
View reviewed changes

drchaos mentioned this pull request Sep 9, 2014

When leader becomes follower and has uncommitted entries it can't to commit entries from new leader. #15

Closed

NicolasT closed this Jul 29, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create cluster simulator #11

Create cluster simulator #11

jkozlowski commented Dec 19, 2013

jkozlowski commented Dec 19, 2013

qrilka commented Dec 19, 2013

jkozlowski commented Dec 19, 2013

qrilka commented Dec 20, 2013

jkozlowski commented Dec 20, 2013

jkozlowski commented Dec 20, 2013

NicolasT commented Dec 22, 2013

NicolasT Dec 22, 2013

jkozlowski Dec 22, 2013

NicolasT commented Dec 22, 2013

jkozlowski commented Dec 22, 2013

qrilka commented Dec 22, 2013

jkozlowski commented Dec 22, 2013

qrilka commented Dec 22, 2013

jkozlowski commented Dec 22, 2013

NicolasT commented Jan 9, 2014

jkozlowski commented Jan 9, 2014

NicolasT commented Jan 10, 2014

jkozlowski commented Jan 10, 2014

NicolasT commented Jan 13, 2014

jkozlowski commented Jan 23, 2014

abailly commented Jan 5, 2016

Create cluster simulator #11

Create cluster simulator #11

Conversation

jkozlowski commented Dec 19, 2013

jkozlowski commented Dec 19, 2013

qrilka commented Dec 19, 2013

jkozlowski commented Dec 19, 2013

qrilka commented Dec 20, 2013

jkozlowski commented Dec 20, 2013

jkozlowski commented Dec 20, 2013

NicolasT commented Dec 22, 2013

NicolasT Dec 22, 2013

Choose a reason for hiding this comment

jkozlowski Dec 22, 2013

Choose a reason for hiding this comment

NicolasT commented Dec 22, 2013

jkozlowski commented Dec 22, 2013

qrilka commented Dec 22, 2013

jkozlowski commented Dec 22, 2013

qrilka commented Dec 22, 2013

jkozlowski commented Dec 22, 2013

NicolasT commented Jan 9, 2014

jkozlowski commented Jan 9, 2014

NicolasT commented Jan 10, 2014

jkozlowski commented Jan 10, 2014

NicolasT commented Jan 13, 2014

jkozlowski commented Jan 23, 2014

abailly commented Jan 5, 2016