-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create cluster simulator #11
Conversation
- The simulator can simulate random things happening to the cluster, e.g. nodes dying/coming back up/timeouts/nodes processing events and processing resultant commands. - Added first property that runs the cluster until there is a stable leader elected by the quorum. - May need to be extended further to add actual events arriving that are then replicated but I'd rather have that in a separate commit when I try to fix #10: would be intersting to see if I can write a test case to prove that the bug is there. Fixed jkozlowski/kontiki#2
Hey, There is still the issue of modelling the events slightly better (sometimes the test might run longer than a minute in which case it times out), but I think it's a good first cut. I was hoping you guys could take a preliminary look to see if you like the approach or not + it's really my first piece of non-trivial haskell so any feedback is welcome. For the failures I think of two things:
I think this approach is not 100% deterministic, not sure if it can be made. In either case it is possible to get a sporadic failure, but we're making the probability smaller. Let me know what you think |
I'm eager to take a closer look, hope that I'll have enough time on this weekend. |
Thanks, I'd really appreciate a review :) I treat it as a great way to learn haskell and I'd love to see it finished with persistence etc. I code Java at work, so chances of me using anything like this are slim atm. I obviously have an ulterior motive as well, I'd like to become more active in open source. What about you? |
I worked on a couple of projects earlier this year and one of them required distributed storage mechamism. Actually it was discontinued in the begining of the summer. And (even) after that I've noticed |
Nice. I read the raft paper at about the same time as I seriously started reading Haskell, so it's nice there was this project to hack on. I saw etcd, it's getting quite a bit of press now due to docker and then projects like Flynn and coreos, so it's quite a hot topic. We're a bit late to the party though, got beaten by go-lang, damn :) it's a pity as it could bring good press to Haskell, more than anything FP complete could do, not that they aren't awesome. Not sure what the value proposition would have to be to replace etcd with something built on top of kontiki, but who knows... :) Sent from my iPhone
|
I am now thinking my testing should go along the lines of simulating the cluster for N number of events and testing that the properties of the Raft protocol are true at all times, as per paper Section 5:
These look like the could be codeable: also, with this approach it doesn't matter if the events happening are not realistic, as long as the cluster adheres to these. The bug in NicolasT/kontiki/#10, however, would not potentially be discoverable, as other pieces of code guard against this and if a leader was elected, it would be forced to step down. But these type of tests should be a good compliment to unit-tests. I shall give that a go tomorrow. |
I think what @jkozlowski wrote in a comment above is the way to go: given a set of clusters, let all clusters behave 'correctly' (no Byzantine behavior), i.e. let them only use the state-machine as-is, not making up random events/messages, then let them act by communicating on lossy/reordering/high-latency/... channels, and at all times ensure the Raft invariants are kept. I.e. some sort of model-checking (think Spin/Promela and others). I'll go through the code now. |
-- | Generates a cluster between 3 and 7 nodes | ||
manageableCluster :: Gen ClusterState | ||
manageableCluster = suchThat arbitraryClusterState between3And7 | ||
where between3And7 cs = Map.size cs > 3 && Map.size cs < 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3-node is a fairly common setup (at least in our Paxos-based product), so might want to include that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, fair point, I wanted to do this, just made an error in the declaration of between3And7.
I really like this. One request though: could you split-out the 'plumbing' from the actual test-code (in a different module)? I feel like this should make adding more tests/checks easier. |
Thanks for the review, I'll go through the individual comments inline. As mentioned, this pull request was to illicit initial discussion if what I'm thinking makes sense, I've started working on the second iteration already. The idea is to, on each tick of the simulation, simulate a single event for all nodes (but don't append resulting events to queues until all are simulated, so that there's arbitrary ordering allowed), where event can be timer going off, simulating event from queue, killing node. Once all are simulated, shuffle the events and distribute them among queues (this is how we get latency and reordering). That means I'll want to actually simulate timeouts properly, so I'm thinking of somehow modelling the average broadcastTime (as per the paper, so I can get: broadcastTime < electionTimeout < MTBF). I'm looking at various queing models in papers, it's a lot of fun. I probably won't have this for this iteration, but it's on my mind. Should be fun. |
@jkozlowski, the moment I don't understand in your current implementation - is there anything like time in this model? From the code I see I'm not sure that heartbeats will be close to being periodic. |
Yep, this first iteration was not very realistic in that all events were random. I want to simulate timeouts properly in second iteration (and more proper nodes going down, by drawing the restart times from some distribution and restarting nodes according to those). As a side note, even thought this current model is not realistic, the properties mentioned in the paper should still hold (and yes, I missed the appending of events, as the first test I wanted to have was leader election, which turns out to be not really what we want). Thanks to both for all your feedback, publishing this meant I had to review it myself and I think we now should have a better model, I just need to implement it :) |
BTW if you'll find good papers how to do such simulations properly - please share :) |
Sure :) I think the point here is to decouple it from time and think about it in terms of ticks as defined above (which means you can run it regardless of wall time). Once I have that I'll just have to derive a formula to calculate average size of the incoming queue, which will give me the broadcastTime in ticks which means I can bound the timeouts with it and we're home. At least that's my theory :) |
I've been thinking about this a little more, and I think there might be an alternative testing strategy. Obviously it's impossible to generate/dump all states, but I think it should be feasible to generate, e.g. in a 3-'node' system, all possible states on every node (modulo actual log content and maybe some other variables which don't really matter, obviously) and turn this into a Promela script, after which all invariants can be added, and checked using Spin. This might be a bit overkill, but anyway... Heck, it could be a really cool research project. Anyone looking for a master thesis subject? "Validating a Haskell Raft FSM using Spin" 😉 |
Yep, I thought about that too, but I don't have much experience with this approach. I think that's how they've validated the algorithm itself, from what I remember. A bit busy at work this week, I'm hoping to come back to this over the weekend. |
Indeed, there's a TLA+ proof (http://raftuserstudy.s3-website-us-west-1.amazonaws.com/proof.pdf). Note that's the other way around though: this proof provides assurance about the protocol. What we're looking for is assurance about the correctness of the implementation. |
Yep. I started reading about this today, after seeing a master thesis about testing CERN's state machines (they used a different language, but it's the same thing). I'll try to finish up what I have and then I'll see if we can attack it this way, I'm intrigued:) Sent from my iPhone
|
Cool! 👍 |
Just found this, thought might be of interest: http://javapathfinder.sourceforge.net/. Effectively Java -> Promela, at least it appears so from the first paper: http://havelund.com/Publications/jpf1-report-99.pdf |
How about using/replicating what jepsen does for testing? |
nodes dying/coming back up/timeouts/nodes processing events and processing
resultant commands.
by the quorum.
but I'd rather have that in a separate commit when I try to fix Handle RequestVote correctly if the Candidate has stale log #10:
would be intersting to see if I can write a test case to prove that the bug is there.
Fixed jkozlowski/kontiki#2