Swarm + Traefik: dreadful pitfalls

Swarm is an exquisitely simple container orchestrator, and allied with Traefik for the reverse proxy + load-balancing + certificate generation, you’ve got yourself a powerful solution in a matter of hours or even minutes.

Yet, the structure of the docker-compose file used by Swarm, and the need for each web-exposed service (using a domain name) to join a Traefik network (in the Swarm definition) has often kept me perplexed, even suspicious. And one should be, as potentially dreadful pitfalls await!

Non isolation

If you want your services to be reachable via domain names, they will have to join the Swarm public network you created for Traefik, so that Traefik can proxy the requests to it when the time comes.

In other words, they will have to share the same network. This means that all exposed services can access one another with their service name (the one defined in the docker compose file), while one could have expected for services in a given stack to be isolated from other services from other stacks, if not specified differently.

That could be bearable, but then what happens if two stacks each have their own service, with the same name?

Using same service names in different stacks

It is totally allowed to use the same service name in different stacks, such as a ‘db’ service in a PHP+MySQL+phpMyAdmin stack named ‘api-server’, and another ‘db’ service in a NestJS+MariaDB stack named ‘ws-calendar’. Nobody yields any warning during the stacks deployment, and the command docker service ls will even show different full-named services, here: api-server_db and ws-calendar_db. Great!

Yet, it’s not. Indeed, if they share the same network (as it’s the case if these two services are exposed to the web via domain names and Traefik), Swarm will consider they’re the same, replicated services, basing itself only on the short name of each service, and not on its full name. As a result, it will start load-balancing between them, which might cause disastrous results, or at the very least not the ones you were expecting.

One should therefore be very cautious about the naming of the stacks’ services, which should be distinct all over your solution.

It especially applies if one would have hoped (naively) to manage services of different environments (production, staging, ci) with the same Traefik, relying only on different stacks with different service configurations. If the service names are not cautiously distinct, staging processes might very well end up modifying production databases just like that. It would be more prudent to create different Traefik networks with different Traefik instances (necessarily on different servers to avoid ports conflicts), or better, to manage totally different Swarm clusters per environment.