jm + zeromq   5

good example of Application-Level Keepalive beating SO_KEEPALIVE
we have now about 100 salt-minions which are installed in remote areas with 3G and satellite connections.

We loose connectivity with all of those minions in about 1-2 days after installation, with test.ping reporting "minion did not return". The state was each time that the minions saw an ESTABLISHED TCP connection, while on the salt-master there were no connection listed at all. (Yes that is correct). Tighter keepalive settings were tried with no result. (OS is linux) Each time, restarting the salt-minion fixes the problem immediately.

Obviously the connections are transparently proxied someplace, (who knows what happens with those SAT networks) so the whole tcp-keepalive mechanism of 0mq fails.


Also notes in the thread that the default TCP timeout for Azure Load Balancer is 4 minutes: https://azure.microsoft.com/en-us/blog/new-configurable-idle-timeout-for-azure-load-balancer/ . The default Linux TCP keepalive doesn't send until 2 hours after last connection use, and it's a system-wide sysctl (/proc/sys/net/ipv4/tcp_keepalive_time).

Further, http://networkengineering.stackexchange.com/questions/7207/why-bgp-implements-its-own-keepalive-instead-of-using-tcp-keepalive notes "some firewalls filter TCP keepalives".
tcp  keep-alive  keepalive  protocol  timeouts  zeromq  salt  firewalls  nat 
april 2016 by jm
Making Storm fly with Netty | Yahoo Engineering
Y! engineer doubles the speed of Storm's messaging layer by replacing the zeromq implementation with Netty
netty  async  zeromq  storm  messaging  tcp  benchmarks  yahoo  clusters 
october 2013 by jm
DataSift Architecture: Realtime Datamining at 120,000 Tweets Per Second
250 million tweets per day, 30-node HBase cluster, 400TB of storage, Kafka and 0mq.

This is from 2011, hence this dated line: 'for a distributed application they thought AWS was too limited, especially in the network. AWS doesn’t do well when nodes are connected together and they need to talk to each other. Not low enough latency network. Their customers care about latency.' (Nowadays, it would be damn hard to build a lower-latency network than that attached to a cc2.8xlarge instance.)
datasift  architecture  scalability  data  twitter  firehose  hbase  kafka  zeromq 
april 2013 by jm
creators of AMQP ditching it for ZeroMQ
'While iMatix was the original designer of AMQP and has invested hugely in that protocol, we believe it is fundamentally flawed, and unfixable. It is too complex and the barriers to participation are massive. We do not believe that it's in the best interest of our customers and users to invest further in AMQP. Specifically, iMatix will be stepping out of the AMQP workgroup and will not be supporting AMQP/1.0 when that emerges, if it ever emerges.' wow, massive downvote there
queueing  amqp  zeromq  imatix  mq  protocols  openamq  via:janl  from delicious
march 2010 by jm

Copy this bookmark:



description:


tags: