Cassandra split brain partition -


we running 6-node cassandra cluster across 2 aws availability zones (ap-southeast-1 , ap-southeast-2).

after running happily several months, cluster given rolling restart fix hung repair, , each group thinks other down.

cluster information:     name: megaportglobal     snitch: org.apache.cassandra.locator.dynamicendpointsnitch     partitioner: org.apache.cassandra.dht.murmur3partitioner     schema versions:         220727fa-88d2-366f-9473-777e32744c37: [10.5.13.117, 10.5.12.245, 10.5.13.93]          unreachable: [10.4.0.112, 10.4.0.169, 10.4.2.186]  cluster information:     name: megaportglobal     snitch: org.apache.cassandra.locator.dynamicendpointsnitch     partitioner: org.apache.cassandra.dht.murmur3partitioner     schema versions:         3932d237-b907-3ef8-95bc-4276dc7f32e6: [10.4.0.112, 10.4.0.169, 10.4.2.186]          unreachable: [10.5.13.117, 10.5.12.245, 10.5.13.93] 

from sydney, 'nodetool status' reports singapore nodes down:

datacenter: ap-southeast-2 ========================== status=up/down |/ state=normal/leaving/joining/moving --  address      load       tokens  owns    host id                               rack un  10.4.0.112   9.04 gb    256     ?       b9c19de4-4939-4112-bf07-d136d8a57b57  2a un  10.4.0.169   9.34 gb    256     ?       2d7c3ac4-ae94-43d6-9afe-7d421c06b951  2a un  10.4.2.186   10.72 gb   256     ?       4dc8b155-8f9a-4532-86ec-d958ac207f40  2b datacenter: ap-southeast-1 ========================== status=up/down |/ state=normal/leaving/joining/moving --  address      load       tokens  owns    host id                               rack un  10.5.13.117  9.45 gb    256     ?       324ee189-3e72-465f-987f-cbc9f7bf740b  1a dn  10.5.12.245  10.25 gb   256     ?       bee281c9-715b-4134-a033-00479a390f1e  1b dn  10.5.13.93   12.29 gb   256     ?       a8262244-91bb-458f-9603-f8c8fe455924  1a 

but singapore, sydney nodes reported down:

ap-southeast-2 ========================== status=up/down |/ state=normal/leaving/joining/moving --  address      load       tokens  owns    host id                               rack dn  10.4.0.112   8.91 gb    256     ?       b9c19de4-4939-4112-bf07-d136d8a57b57  2a dn  10.4.0.169   ?          256     ?       2d7c3ac4-ae94-43d6-9afe-7d421c06b951  2a dn  10.4.2.186   ?          256     ?       4dc8b155-8f9a-4532-86ec-d958ac207f40  2b datacenter: ap-southeast-1 ========================== status=up/down |/ state=normal/leaving/joining/moving --  address      load       tokens  owns    host id                               rack un  10.5.13.117  9.45 gb    256     ?       324ee189-3e72-465f-987f-cbc9f7bf740b  1a un  10.5.12.245  10.25 gb   256     ?       bee281c9-715b-4134-a033-00479a390f1e  1b un  10.5.13.93   12.29 gb   256     ?       a8262244-91bb-458f-9603-f8c8fe455924  1a 

even more confusing, 'nodetool gossipinfo' executed in sydney reports fine status - normal:

/10.5.13.117   generation:1440735653   heartbeat:724504   severity:0.0   dc:ap-southeast-1   load:1.0149565738e10   schema:7bf335ee-61ae-36c6-a902-c70d785ec7a3   rack:1a   status:normal,-1059943672916788858   release_version:2.1.6   net_version:8   rpc_address:10.5.13.117   internal_ip:10.5.13.117   host_id:324ee189-3e72-465f-987f-cbc9f7bf740b /10.5.12.245   generation:1440734497   heartbeat:728014   severity:0.0   dc:ap-southeast-1   load:1.100647505e10   schema:7bf335ee-61ae-36c6-a902-c70d785ec7a3   rack:1b   status:normal,-1029869455226513030   release_version:2.1.6   net_version:8   rpc_address:10.5.12.245   internal_ip:10.5.12.245   host_id:bee281c9-715b-4134-a033-00479a390f1e /10.4.0.112   generation:1440973751   heartbeat:4135   severity:0.0   dc:ap-southeast-2   load:9.70297176e9   schema:7bf335ee-61ae-36c6-a902-c70d785ec7a3   rack:2a   release_version:2.1.6   status:normal,-1016623069114845926   net_version:8   rpc_address:10.4.0.112   internal_ip:10.4.0.112   host_id:b9c19de4-4939-4112-bf07-d136d8a57b57 /10.5.13.93   generation:1440734532   heartbeat:727909   severity:0.0   dc:ap-southeast-1   load:1.3197536002e10   schema:7bf335ee-61ae-36c6-a902-c70d785ec7a3   rack:1a   status:normal,-1021689296016263011   release_version:2.1.6   net_version:8   rpc_address:10.5.13.93   internal_ip:10.5.13.93   host_id:a8262244-91bb-458f-9603-f8c8fe455924 /10.4.0.169   generation:1440974511   heartbeat:1832   severity:0.0   dc:ap-southeast-2   load:1.0023502338e10   schema:7bf335ee-61ae-36c6-a902-c70d785ec7a3   rack:2a   release_version:2.1.6   status:normal,-1004223692762353764   net_version:8   rpc_address:10.4.0.169   internal_ip:10.4.0.169   host_id:2d7c3ac4-ae94-43d6-9afe-7d421c06b951 /10.4.2.186   generation:1440734382   heartbeat:730171   severity:0.0   dc:ap-southeast-2   load:1.1507595081e10   schema:7bf335ee-61ae-36c6-a902-c70d785ec7a3   rack:2b   status:normal,-10099894685483463   release_version:2.1.6   net_version:8   rpc_address:10.4.2.186   internal_ip:10.4.2.186   host_id:4dc8b155-8f9a-4532-86ec-d958ac207f40 

the same command executed in singapore not include status nodes in sydney:

/10.5.12.245   generation:1440974710   heartbeat:1372   severity:0.0   load:1.100835806e10   rpc_address:10.5.12.245   net_version:8   schema:7bf335ee-61ae-36c6-a902-c70d785ec7a3   release_version:2.1.6   status:normal,-1029869455226513030   dc:ap-southeast-1   rack:1b   internal_ip:10.5.12.245   host_id:bee281c9-715b-4134-a033-00479a390f1e /10.5.13.117   generation:1440974648   heartbeat:1561   severity:0.0   load:1.0149992022e10   rpc_address:10.5.13.117   net_version:8   schema:7bf335ee-61ae-36c6-a902-c70d785ec7a3   release_version:2.1.6   status:normal,-1059943672916788858   dc:ap-southeast-1   rack:1a   host_id:324ee189-3e72-465f-987f-cbc9f7bf740b   internal_ip:10.5.13.117 /10.4.0.112   generation:1440735420   heartbeat:23   severity:0.0   load:9.570546197e9   rpc_address:10.4.0.112   net_version:8   schema:7bf335ee-61ae-36c6-a902-c70d785ec7a3   release_version:2.1.6   dc:ap-southeast-2   rack:2a   internal_ip:10.4.0.112   host_id:b9c19de4-4939-4112-bf07-d136d8a57b57 /10.5.13.93   generation:1440734532   heartbeat:729862   severity:0.0   load:1.3197536002e10   rpc_address:10.5.13.93   net_version:8   schema:7bf335ee-61ae-36c6-a902-c70d785ec7a3   release_version:2.1.6   status:normal,-1021689296016263011   dc:ap-southeast-1   rack:1a   internal_ip:10.5.13.93   host_id:a8262244-91bb-458f-9603-f8c8fe455924 /10.4.0.169   generation:1440974511   heartbeat:15   severity:0.5076141953468323   rpc_address:10.4.0.169   net_version:8   schema:7bf335ee-61ae-36c6-a902-c70d785ec7a3   release_version:2.1.6   dc:ap-southeast-2   rack:2a   internal_ip:10.4.0.169   host_id:2d7c3ac4-ae94-43d6-9afe-7d421c06b951 /10.4.2.186   generation:1440734382   heartbeat:15   severity:0.0   rpc_address:10.4.2.186   net_version:8   schema:7bf335ee-61ae-36c6-a902-c70d785ec7a3   release_version:2.1.6   dc:ap-southeast-2   rack:2b   internal_ip:10.4.2.186   host_id:4dc8b155-8f9a-4532-86ec-d958ac207f40 

during restart, each node can see remote dc little while:

info  [gossipstage:1] 2015-08-31 10:53:07,638 outboundtcpconnection.java:97 - outboundtcpconnection using coalescing strategy disabled info  [handshake-/10.4.2.186] 2015-08-31 10:53:08,267 outboundtcpconnection.java:485 - handshaking version /10.4.2.186 info  [handshake-/10.4.0.169] 2015-08-31 10:53:08,287 outboundtcpconnection.java:485 - handshaking version /10.4.0.169 info  [handshake-/10.5.12.245] 2015-08-31 10:53:08,391 outboundtcpconnection.java:485 - handshaking version /10.5.12.245 info  [handshake-/10.5.13.93] 2015-08-31 10:53:08,498 outboundtcpconnection.java:485 - handshaking version /10.5.13.93 info  [gossipstage:1] 2015-08-31 10:53:08,537 gossiper.java:987 - node /10.5.12.245 has restarted, info  [handshake-/10.5.13.117] 2015-08-31 10:53:08,537 outboundtcpconnection.java:485 - handshaking version /10.5.13.117 info  [gossipstage:1] 2015-08-31 10:53:08,656 storageservice.java:1642 - node /10.5.12.245 state jump normal info  [gossipstage:1] 2015-08-31 10:53:08,820 gossiper.java:987 - node /10.5.13.117 has restarted, info  [gossipstage:1] 2015-08-31 10:53:08,852 gossiper.java:987 - node /10.5.13.93 has restarted, info  [sharedpool-worker-33] 2015-08-31 10:53:08,907 gossiper.java:954 - inetaddress /10.5.12.245 info  [gossipstage:1] 2015-08-31 10:53:08,947 storageservice.java:1642 - node /10.5.13.93 state jump normal info  [gossipstage:1] 2015-08-31 10:53:09,007 gossiper.java:987 - node /10.4.0.169 has restarted, warn  [gossiptasks:1] 2015-08-31 10:53:09,123 failuredetector.java:251 - not marking nodes down due local pause of 7948322997 > 5000000000 info  [gossipstage:1] 2015-08-31 10:53:09,192 storageservice.java:1642 - node /10.4.0.169 state jump normal info  [handshake-/10.5.12.245] 2015-08-31 10:53:09,199 outboundtcpconnection.java:485 - handshaking version /10.5.12.245 info  [gossipstage:1] 2015-08-31 10:53:09,203 gossiper.java:987 - node /10.4.2.186 has restarted, info  [gossipstage:1] 2015-08-31 10:53:09,206 storageservice.java:1642 - node /10.4.2.186 state jump normal info  [sharedpool-worker-34] 2015-08-31 10:53:09,215 gossiper.java:954 - inetaddress /10.5.13.93 info  [sharedpool-worker-33] 2015-08-31 10:53:09,259 gossiper.java:954 - inetaddress /10.5.13.117 info  [sharedpool-worker-33] 2015-08-31 10:53:09,259 gossiper.java:954 - inetaddress /10.4.0.169 info  [sharedpool-worker-33] 2015-08-31 10:53:09,259 gossiper.java:954 - inetaddress /10.4.2.186 info  [gossipstage:1] 2015-08-31 10:53:09,296 storageservice.java:1642 - node /10.4.0.169 state jump normal info  [gossipstage:1] 2015-08-31 10:53:09,491 storageservice.java:1642 - node /10.5.12.245 state jump normal info  [handshake-/10.5.13.117] 2015-08-31 10:53:09,509 outboundtcpconnection.java:485 - handshaking version /10.5.13.117 info  [gossipstage:1] 2015-08-31 10:53:09,511 storageservice.java:1642 - node /10.5.13.93 state jump normal info  [handshake-/10.5.13.93] 2015-08-31 10:53:09,538 outboundtcpconnection.java:485 - handshaking version /10.5.13.93 

then, without errors, nodes marked down:

info  [gossiptasks:1] 2015-08-31 10:53:34,410 gossiper.java:968 - inetaddress /10.5.13.117 down info  [gossiptasks:1] 2015-08-31 10:53:34,411 gossiper.java:968 - inetaddress /10.5.12.245 down info  [gossiptasks:1] 2015-08-31 10:53:34,411 gossiper.java:968 - inetaddress /10.5.13.93 down 

we have tried multiple restarts, behaviour remains same.

*edit

it looks related gossip protocol... turning on debug shows phi values steadily increasing:

trace [gossiptasks:1] 2015-08-31 16:46:44,706 failuredetector.java:262 - phi /10.4.0.112 : 2.9395029255 trace [gossiptasks:1] 2015-08-31 16:46:45,727 failuredetector.java:262 - phi /10.4.0.112 : 3.449690761 trace [gossiptasks:1] 2015-08-31 16:46:46,728 failuredetector.java:262 - phi /10.4.0.112 : 3.95049114 trace [gossiptasks:1] 2015-08-31 16:46:47,730 failuredetector.java:262 - phi /10.4.0.112 : 4.451317456 trace [gossiptasks:1] 2015-08-31 16:46:48,732 failuredetector.java:262 - phi /10.4.0.112 : 4.952114357 trace [gossiptasks:1] 2015-08-31 16:46:49,733 failuredetector.java:262 - phi /10.4.0.112 : 5.4529339645 trace [gossiptasks:1] 2015-08-31 16:46:50,735 failuredetector.java:262 - phi /10.4.0.112 : 5.953951289 trace [gossiptasks:1] 2015-08-31 16:46:51,737 failuredetector.java:262 - phi /10.4.0.112 : 6.4547808165 trace [gossiptasks:1] 2015-08-31 16:46:52,738 failuredetector.java:262 - phi /10.4.0.112 : 6.955600038 trace [gossiptasks:1] 2015-08-31 16:46:53,740 failuredetector.java:262 - phi /10.4.0.112 : 7.456422601 trace [gossiptasks:1] 2015-08-31 16:46:54,742 failuredetector.java:262 - phi /10.4.0.112 : 7.957303284 trace [gossiptasks:1] 2015-08-31 16:46:55,751 failuredetector.java:262 - phi /10.4.0.112 : 8.461658576 trace [gossiptasks:1] 2015-08-31 16:46:56,755 failuredetector.java:262 - phi /10.4.0.112 : 8.9636610545 trace [gossiptasks:1] 2015-08-31 16:46:57,763 failuredetector.java:262 - phi /10.4.0.112 : 9.4676926445 

the phi values steadily increase after restart, until exceed failure threshold , marked down.

any suggestions on how proceed?

for laggy network, raise phi failure detection threshold 12 or 15. commonly required in aws, cross region.


Comments

Popular posts from this blog

swift - Button on Table View Cell connected to local function -

dns - Dokku server hosts two sites with TLD's, both domains are landing on only one app -

c# - ajax - How to receive data both html and json from server? -