Gracefully Shutdown your Go Application

A Common Problem

Common Deployment Issue
Every time you deploy your code changes

Looks familiar? I hope it doesn’t. But if it does, you may need to diagnose and debunk your application further. The image above describes a common problem that most web server applications may have encountered. Every time we deploy new code changes, the services will be terminated first to have the new changes up & running. The problem happens when in that timeframe, the service is serving ongoing requests, and those requests will be responded with error as those requests aren’t processed if the application isn’t shutting down gracefully.

Beyond that specific problem, this article aims to extend it to broad use cases for any Go applications.

What should we do?

Before we dive into how do we address this problem, let’s define what Graceful Shutdown in a process is.

A graceful shutdown in a process is when a process is turned off by the operating system (OS) is allowed to perform its tasks of safely shutting down processes and closing connections

So it means that any cleanup task should be done before the application exits, whether it’s a server completing the on-going requests, removing temporary files, etc.

Graceful shutdown is also one of The Twelve-Factor App, which is Disposability.

To be able to achieve that, we have to listen to termination signals are sent to the application by the process manager, and act accordingly. This means that your app should not terminate itself immediately when the process manager orders it to do a graceful shutdown.

So with all information that we have right now, what we want to do is basically:

  1. Listen for the termination signal/s from the process manager like SIGTERM
  2. We block the main function until the signal is received
  3. After we received the signal, we will do clean-ups on our app and wait until those clean-up operations are done.
  4. We also need to have a timeout to ensure that the operation won’t hang up the system.

The Code

Talk is cheap. Show me the code. — Linus Torvalds

Enough theory, let’s see how the previous steps could be turned into a working sample code (gist URL).

package main

// operation is a clean up function on shutting down
type operation func(ctx context.Context) error

// gracefulShutdown waits for termination syscalls and doing clean up operations after received it
func gracefulShutdown(ctx context.Context, timeout time.Duration, ops map[string]operation) <-chan struct{} {
	wait := make(chan struct{})
	go func() {
		s := make(chan os.Signal, 1)

		// add any other syscalls that you want to be notified with
		signal.Notify(s, syscall.SIGINT, syscall.SIGTERM, syscall.SIGHUP)
		<-s

		log.Println("shutting down")

    // set timeout for the ops to be done to prevent system hang
		timeoutFunc := time.AfterFunc(timeout, func() {
			log.Printf("timeout %d ms has been elapsed, force exit", timeout.Milliseconds())
			os.Exit(0)
		})

		defer timeoutFunc.Stop()

		var wg sync.WaitGroup

		// Do the operations asynchronously to save time
		for key, op := range ops {
			wg.Add(1)
			innerOp := op
			innerKey := key
			go func() {
				defer wg.Done()

				log.Printf("cleaning up: %s", innerKey)
				if err := innerOp(ctx); err != nil {
					log.Printf("%s: clean up failed: %s", innerKey, err.Error())
          return
				}

        log.Printf("%s was shutdown gracefully", innerKey)
			}()
		}

		wg.Wait()

		close(wait)
	}()

	return wait
}

package main

func main() {
  // initialize some resources
  // e.g:
  // db, err := database.Initialize()
  // server, err := http.Initialize()

  // wait for termination signal and register database & http server clean-up operations
  wait := gracefulShutdown(context.Background(), 2 * time.Second, map[string]operation{
    "database": func(ctx context.Context) error {
      return db.Shutdown()
		},
    "http-server": func(ctx context.Context) error {
      return server.Shutdown()
		},
	// Add other cleanup operations here
  })

  <-wait
}

The code itself is pretty straightforward. The main benefit of this approach is that it’s reusable, scalable, and easy to maintain when our application grows.

The code checks up all of the previous lists that need to be done to have a graceful shutdown application. Let’s break it down:

  • Listen for the termination signal/s from the process manager like SIGTERM
s := make(chan os.Signal, 1)
// add any other syscalls that you want to be notified with
signal.Notify(s, syscall.SIGINT, syscall.SIGTERM, syscall.SIGHUP)
<-s

It could be only one termination signal or multiple, it really depends on your app behavior, common apps may at least need to listen on SIGTERM.

  • We block the main function until the signal is received
// wait for termination signal and register database & http server clean-up operations
wait := gracefulShutdown(context.Background(), 2 * time.Second, map[string]operation{
    "database": func(ctx context.Context) error {
                    return db.Shutdown()
        },
    "http-server": func(ctx context.Context) error {
                    return server.Shutdown()
        },
    // Add other cleanup operations here
})

<-wait

This code could also be extended for doing other clean-up or utility operations, e.g.: Closing Redis Connections, Send post-serving metrics, release any resource that was being used for profiling & diagnostic of the application, etc. Please refer to resource links that I put in the last section of this article for some common use cases.

  • After we received the signal, we will do clean-ups on our app and wait until those clean-up operations are done.
var wg sync.WaitGroup

// Do the operations asynchronously to save time
for key, op := range ops {
	wg.Add(1)
	innerOp := op
	innerKey := key
	go func() {
		defer wg.Done()

		log.Printf("cleaning up: %s", innerKey)
		if err := innerOp(ctx); err != nil {
			log.Printf("%s: clean up failed: %s", innerKey, err.Error())
                        return
		}

        log.Printf("%s was shutdown gracefully", innerKey)
	}()
}

wg.Wait()

We want to be as fast as possible when doing cleanup operations at shutdown time, that’s why we spawn goroutine for doing every operation.

One thing to remember for this part is that we have to make sure that there’s no resource sharing between cleanup operations, otherwise it may lead to race condition due to concurrency happening here.

  • We also need to have a timeout to ensure that the operation won’t hang up the system.
// set timeout for the ops to be done to prevent system hang
timeoutFunc := time.AfterFunc(timeout, func() {
	log.Printf("timeout %d ms has been elapsed, force exit", timeout.Milliseconds())
	os.Exit(0)
})

defer timeoutFunc.Stop()

So that’s graceful shutdown implementation for general Go Applications. The sample code implementation is written in Go, but the core idea can be applied to other languages as well.

Conclusion

Graceful shutdown is only one of many things that you need to implement to have a resilient & robust application.

Besides, we may also still need to figure out how to route incoming requests/new tasks to the application that has the latest version of our code when we’re having a deployment. This may need its own article that I might write in the future, so stay tuned!

These are the related external resources that you might find important to learn further on how to handle specific resource cleanup:

I hope you found this article useful. If you do, please share it with others who may need it.

Thanks for reading! Looking forward to hearing your feedback & suggestions.