Disce aut Discede
Learn or Leave
Add post

Given a linux acting as a router, it is possible to maintain currently established sessions to their original destination while redirecting new sessions to another host. This way, service can be switched over without any downtime.

This trick is particularly useful in an active-standby clusters where a node is processing traffic and you want to gracefully fail over the service to the other node. New flows will be routed to a new destination, while the established flows will be processed in the old node. The active node will eventually "drain" all established flows and switch over the new node can be safely performed without any service impact.

For example, let's suppose our active node IP is 123.1.1.1, requests come through eth1 and we want to redirect new sessions to the standby node 123.1.1.2

iptables -t mangle -A PREROUTING -i eth1 -d 123.1.1.1/32 -m conntrack --ctstate NEW -j CONNMARK --set-mark 1
iptables -t mangle -A PREROUTING -j CONNMARK --restore-mark
iptables -t mangle -A PREROUTING -i eth1 -p tcp --tcp-flags FIN FIN -j MARK --set-mark 2
iptables -t mangle -A PREROUTING -i eth1 -p tcp --tcp-flags RST RST -j MARK --set-mark 2
iptables -t mangle -A PREROUTING -i eth1 -m mark ! --mark 0 -j TEE --gateway 123.1.1.2
iptables -t mangle -A PREROUTING -i eth1 -m mark --mark 1 -j DROP

First two rules mark the packets belonging to the new flows. Third and fourth rules mark packets FIN/RST packets.

Fifth rule sends a duplicate of all marked packets to another host (123.1.1.2).

Sixth rule drops packets belonging to new flows to prevent reaching their original destination.

First I tried to do it with a iproute2 ruleset, but I found issues when the destination IP (123.1.1.1) is a local IP. It could be solved by changing local routing table priority, but then again that would require setting up a selector/rule to avoid sending all the packets to the "diverted" routing table. At the end, I preferred to use this TEE&DROP solution. If anyone has a cleaner solution, feel free to comment it!

These rules can be easily integrated into a start/stop script:

case $1 in
start)
        echo Redirection for new sessions is enabled

        for f in /proc/sys/net/ipv4/conf/*/rp_filter; do echo 0 > $f; done
        iptables -t mangle -A PREROUTING -i eth1 ! -p vrrp -m conntrack --ctstate NEW -j CONNMARK --set-mark 1
        iptables -t mangle -A PREROUTING -j CONNMARK --restore-mark
        iptables -t mangle -A PREROUTING -i eth1 -p tcp --tcp-flags FIN FIN -j MARK --set-mark 2
        iptables -t mangle -A PREROUTING -i eth1 -p tcp --tcp-flags RST RST -j MARK --set-mark 2
        iptables -t mangle -A PREROUTING -i eth1 -m mark ! --mark 0 -j TEE --gateway 123.1.1.2
        iptables -t mangle -A PREROUTING -i eth1 -m mark --mark 1 -j DROP
        ;;
stop)
        iptables -t mangle -D PREROUTING -i eth1 -m mark --mark 1 -j DROP
        iptables -t mangle -D PREROUTING -i eth1 -m mark ! --mark 0 -j TEE --gateway 123.1.1.2
        iptables -t mangle -D PREROUTING -i eth1 -p tcp --tcp-flags RST RST -j MARK --set-mark 2
        iptables -t mangle -D PREROUTING -i eth1 -p tcp --tcp-flags FIN FIN -j MARK --set-mark 2
        iptables -t mangle -D PREROUTING -j CONNMARK --restore-mark
        iptables -t mangle -D PREROUTING -i eth1 ! -p vrrp -m conntrack --ctstate NEW -j CONNMARK --set-mark 1

        echo Redirection for new sessions is disabled
        ;;
esac

Warning: As the traffic is being redirected to another host, return routes must be carefully checked as issues might show up due to asymmetrical routing.