Integration of a Go service with systemd: socket activation
Vincent Bernat
In a previous post, I highlighted some useful features of systemd when writing a service in Go, notably to signal readiness and prove liveness. Another interesting bit is socket activation: systemd listens on behalf of the application and, on incoming traffic, starts the service with a copy of the listening socket. Lennart Poettering details in a blog post:
If a service dies, its listening socket stays around, not losing a single message. After a restart of the crashed service it can continue right where it left off. If a service is upgraded we can restart the service while keeping around its sockets, thus ensuring the service is continuously responsive. Not a single connection is lost during the upgrade.
This is one solution to get zero-downtime deployment for your application. Another upside is you can run your daemon with less privileges—loosing rights is a difficult task in Go.1
The basics#
Let’s take back our nifty 404-only web server:
package main import ( "log" "net" "net/http" ) func main() { listener, err := net.Listen("tcp", ":8081") if err != nil { log.Panicf("cannot listen: %s", err) } http.Serve(listener, nil) }
Here is the socket-activated version, using go-systemd:
package main import ( "log" "net/http" "github.com/coreos/go-systemd/activation" ) func main() { listeners, err := activation.Listeners(true) // ❶ if err != nil { log.Panicf("cannot retrieve listeners: %s", err) } if len(listeners) != 1 { log.Panicf("unexpected number of socket activation (%d != 1)", len(listeners)) } http.Serve(listeners[0], nil) // ❷ }
In ❶, we retrieve the listening sockets provided by systemd. In ❷,
we use the first one to serve HTTP requests. Let’s test the result
with systemd-socket-activate
:2
$ go build 404.go $ systemd-socket-activate -l 8000 ./404 Listening on [::]:8000 as 3.
In another terminal, we can make some requests to the service:
$ curl '[::1]':8000 404 page not found $ curl '[::1]':8000 404 page not found
For a proper integration with systemd, you need two files:
- a socket unit for the listening socket; and
- a service unit for the associated service.
We can use the following socket unit, 404.socket
:
[Socket] ListenStream = 8000 BindIPv6Only = both [Install] WantedBy = sockets.target
The systemd.socket(5)
manual page describes the
available options. BindIPv6Only = both
is explicitly specified
because the default value is distribution-dependent. As for the
service unit, we can use the following one, 404.service
:
[Unit] Description = 404 micro-service [Service] ExecStart = /usr/bin/404
systemd knows the two files work together because they share the
same prefix. Once the files are in /etc/systemd/system
, execute
systemctl daemon-reload
and systemctl start 404.socket
. Your
service is ready to accept connections!
Handling of existing connections#
Our 404 service has a major shortcoming: existing connections are abruptly killed when the daemon is stopped or restarted. Let’s fix that!
Waiting a few seconds for existing connections#
We can include a short grace period for connections to terminate, then kill remaining ones:
// On signal, gracefully shut down the server and wait 5 // seconds for current connections to stop. done := make(chan struct{}) quit := make(chan os.Signal, 1) server := &http.Server{} signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM) go func() { <-quit log.Println("server is shutting down") ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) defer cancel() server.SetKeepAlivesEnabled(false) if err := server.Shutdown(ctx); err != nil { log.Panicf("cannot gracefully shut down the server: %s", err) } close(done) }() // Start accepting connections. server.Serve(listeners[0]) // Wait for existing connections before exiting. <-done
Upon reception of a termination signal, the goroutine would resume and schedule a shutdown of the service:
Shutdown()
gracefully shuts down the server without interrupting any active connections.Shutdown()
works by first closing all open listeners, then closing all idle connections, and then waiting indefinitely for connections to return to idle and then shut down.
While restarting, new connections are not accepted: they sit in the
listen queue associated to the socket. This queue is bounded and its
size can be configured with the Backlog
directive in the socket
unit. Its default value is 128. You may keep this value, even when
your service is expecting to receive many connections by second. When
this value is exceeded, incoming connections are silently dropped. The
client should automatically retry to connect. On Linux, by default, it
will retry 5 times (tcp_syn_retries
) in about 3 minutes. This is a
nice way to avoid the herd effect you would experience on restart if
you increased the listen queue to some high value.
Waiting longer for existing connections#
If you want to wait for a very long time for existing connections to
stop, you do not want to ignore new connections for several
minutes. There is a very simple trick: ask systemd to not kill any
process on stop. With KillMode = none
, only the stop command is
executed and all existing processes are left undisturbed:
[Unit] Description = slow 404 micro-service [Service] ExecStart = /usr/bin/404 ExecStop = /bin/kill $MAINPID KillMode = none
If you restart the service, the current process gracefully shuts down
for as long as needed and systemd spawns immediately a new instance
ready to serve incoming requests with its own copy of the listening
socket. On the other hand, we loose the ability to wait for the
service to come to a full stop—either by itself or forcefully after a
timeout with SIGKILL
.
Update (2021-01)
KillMode=none
is now deprecated. In
addition to the below alternative, it is possible to pass the
currently active connections to systemd with the
sd_pid_notify_with_fds()
function. Then, the new process
needs some logic to handle them. Unfortunately, this is not an easy
task as you also need to serialize the associated state. Moreover, the
function is not implemented in go-systemd.
Waiting longer for existing connections (alternative)#
An alternative to the previous solution is to make systemd believe your service died during reload.
done := make(chan struct{}) quit := make(chan os.Signal, 1) server := &http.Server{} signal.Notify(quit, // for reload: syscall.SIGHUP, // for stop or full restart: syscall.SIGINT, syscall.SIGTERM) go func() { sig := <-quit switch sig { case syscall.SIGINT, syscall.SIGTERM: // Shutdown with a time limit. log.Println("server is shutting down") ctx, cancel := context.WithTimeout(context.Background(), 15*time.Second) defer cancel() server.SetKeepAlivesEnabled(false) if err := server.Shutdown(ctx); err != nil { log.Panicf("cannot gracefully shut down the server: %s", err) } case syscall.SIGHUP: // ❶ // Execute a short-lived process and asks systemd to // track it instead of us. log.Println("server is reloading") pid := detachedSleep() daemon.SdNotify(false, fmt.Sprintf("MAINPID=%d", pid)) time.Sleep(time.Second) // Wait a bit for systemd to check the PID // Wait without a limit for current connections to stop. server.SetKeepAlivesEnabled(false) if err := server.Shutdown(context.Background()); err != nil { log.Panicf("cannot gracefully shut down the server: %s", err) } } close(done) }() // Serve requests with a slow handler. server.Handler = http.HandlerFunc( func(w http.ResponseWriter, r *http.Request) { time.Sleep(10 * time.Second) http.Error(w, "404 not found", http.StatusNotFound) }) server.Serve(listeners[0]) // Wait for all connections to terminate. <-done log.Println("server terminated")
The main difference is the handling of the SIGHUP
signal in ❶: a
short-lived decoy process is spawned and systemd is told to track
it. When it dies, systemd will start a new instance. This method is
a bit hacky: systemd needs the decoy process to be a child of PID 1
but Go cannot easily detach on its own. Therefore, we leverage a
short Python helper, wrapped in a detachedSleep()
function:3
// detachedSleep spawns a detached process sleeping // one second and returns its PID. func detachedSleep() uint64 { py := ` import os import time pid = os.fork() if pid == 0: for fd in {0, 1, 2}: os.close(fd) time.sleep(1) else: print(pid) ` cmd := exec.Command("/usr/bin/python3", "-c", py) out, err := cmd.Output() if err != nil { log.Panicf("cannot execute sleep command: %s", err) } pid, err := strconv.ParseUint(strings.TrimSpace(string(out)), 10, 64) if err != nil { log.Panicf("cannot parse PID of sleep command: %s", err) } return pid }
During reload, there may be a small period during which both the new
and the old processes accept incoming requests. If you don’t want
that, you can move the creation of the short-lived process outside the
goroutine, after server.Serve()
, or implement some synchronization
mechanism. There is also a possible race-condition when we tell
systemd to track another PID—see PR #7816.
The 404.service
unit needs an update:
[Unit] Description = slow 404 micro-service [Service] ExecStart = /usr/bin/404 ExecReload = /bin/kill -HUP $MAINPID Restart = always NotifyAccess = main KillMode = process
Each additional directive is significant:
ExecReload
tells how to reload the process—by sendingSIGHUP
.Restart
tells to restart the process if it stops “unexpectedly,” notably on reload.4NotifyAccess
specifies which process can send notifications, like a PID change.KillMode
tells to only kill the main identified process—others are left untouched.
Zero-downtime deployment?#
Zero-downtime deployment is a difficult endeavor on Linux. For example, HAProxy had a long list of hacks until a proper—and complex—solution was implemented in HAproxy 1.8. How do we fare with our simple implementation?
From the kernel point of view, there is a only one socket with a
unique listen queue. This socket is associated to several file
descriptors: one in systemd and one in the current process. The
socket stays alive as long as there is at least one file
descriptor. An incoming connection is put by the kernel in the listen
queue and can be dequeued from any file descriptor with the accept()
syscall. Therefore, this approach actually achieves zero-downtime
deployment: no incoming connection is rejected.
By contrast, HAProxy was using several different sockets listening
to the same addresses, thanks to the SO_REUSEPORT
option.5 Each socket gets its own listening queue
and the kernel balances incoming connections between each queue. When
a socket gets closed, the content of its queue is lost. If an incoming
connection was sitting here, it would receive a reset. An elegant
patch for Linux to signal a socket should not receive new
connections was rejected. HAProxy 1.8 is now recycling existing
sockets by sending them to the new processes through a Unix socket.
I hope this post and the previous one show how systemd is a good sidekick for a Go service: readiness, liveness and socket activation are some of the useful features you can get to build a more reliable application.
Addendum: decoy process using Go#
Update (2018-03)
On /r/golang, it was pointed out to me
that, in the version where systemd is tracking a decoy, the helper
can be replaced by invoking the main executable. By relying on a
change of environment, it assumes the role of the decoy. Here is such
an implementation replacing the detachedSleep()
function:
func init() { // As early as possible, check if we should be the decoy. state := os.Getenv("__SLEEPY") os.Unsetenv("__SLEEPY") switch state { case "1": // First step, fork again. execPath := self() child, err := os.StartProcess( execPath, []string{execPath}, &os.ProcAttr{ Env: append(os.Environ(), "__SLEEPY=2"), }) if err != nil { log.Panicf("cannot execute sleep command: %s", err) } // Advertise child's PID and exit. Child will be // orphaned and adopted by PID 1. fmt.Printf("%d", child.Pid) os.Exit(0) case "2": // Sleep and exit. time.Sleep(time.Second) os.Exit(0) } // Not the sleepy helper. Business as usual. } // self returns the absolute path to ourselves. This relies on // /proc/self/exe which may be a symlink to a deleted path (for // example, during an upgrade). func self() string { execPath, err := os.Readlink("/proc/self/exe") if err != nil { log.Panicf("cannot get self path: %s", err) } execPath = strings.TrimSuffix(execPath, " (deleted)") return execpath } // detachedSleep spawns a detached process sleeping one second and // returns its PID. A full daemonization is not needed as the process // is short-lived. func detachedSleep() uint64 { cmd := exec.Command(self()) cmd.Env = append(os.Environ(), "__SLEEPY=1") out, err := cmd.Output() if err != nil { log.Panicf("cannot execute sleep command: %s", err) } pid, err := strconv.ParseUint(strings.TrimSpace(string(out)), 10, 64) if err != nil { log.Panicf("cannot parse PID of sleep command: %s", err) } return pid }
Addendum: identifying sockets by name#
For a given service, systemd can provide several sockets. To
identify them, it is possible to name them. Let’s suppose we also want
to return 403 error codes from the same service but on a different
port. We add an additional socket unit definition, 403.socket
,
linked to the same 404.service
job:
[Socket] ListenStream = 8001 BindIPv6Only = both Service = 404.service [Install] WantedBy=sockets.target
Unless overridden with FileDescriptorName
, the name of the socket is
the name of the unit: 403.socket
. go-systemd provides the
ListenersWithNames()
function to fetch a map from names to listening
sockets:
package main import ( "log" "net/http" "sync" "github.com/coreos/go-systemd/activation" ) func main() { var wg sync.WaitGroup // Map socket names to handlers. handlers := map[string]http.HandlerFunc{ "404.socket": http.NotFound, "403.socket": func(w http.ResponseWriter, r *http.Request) { http.Error(w, "403 forbidden", http.StatusForbidden) }, } // Get listening sockets. listeners, err := activation.ListenersWithNames(true) if err != nil { log.Panicf("cannot retrieve listeners: %s", err) } // For each listening socket, spawn a goroutine // with the appropriate handler. for name := range listeners { for idx := range listeners[name] { wg.Add(1) go func(name string, idx int) { defer wg.Done() http.Serve( listeners[name][idx], handlers[name]) }(name, idx) } } // Wait for all goroutines to terminate. wg.Wait() }
Let’s build the service and run it with systemd-socket-activate
:
$ go build 404.go $ systemd-socket-activate -l 8000 -l 8001 \ > --fdname=404.socket:403.socket \ > ./404 Listening on [::]:8000 as 3. Listening on [::]:8001 as 4.
In another console, we can make a request for each endpoint:
$ curl '[::1]':8000 404 page not found $ curl '[::1]':8001 403 forbidden
-
Many process characteristics in Linux are attached to threads. Go runtime transparently manages them without much user control. Until recently, this made some features, like
setuid()
orsetns()
, unusable. ↩︎ -
With an older version of systemd (before 230), look for
/lib/systemd/systemd-activate
instead. ↩︎ -
Python is a good candidate: it’s likely to be available on the system, it is low-level enough to easily implement the functionality and, as an interpreted language, it doesn’t require a specific build step.
There is no need to fork twice as we only need to detach the decoy from the current process. This simplify a bit the Python code. ↩︎
-
This is not an essential directive as the process is also restarted through socket-activation. ↩︎
-
This approach is more convenient when reloading since you don’t have to figure out which sockets to reuse and which ones to create from scratch. Moreover, when several processes need to accept connections, using multiple sockets is more scalable as the different processes won’t fight over a shared lock to accept connections. ↩︎