Compressing embedded files in Go

Vincent Bernat

Go’s embed feature lets you bundle static assets into an executable, but it stores them uncompressed. This wastes space: a web interface with documentation can bloat your binary by dozens of megabytes. A proposition to optionally enable compression was declined because it is difficult to handle all use cases. One solution? Put all the assets into a ZIP archive! 🗜️

Code#

The Go standard library includes a module to read and write ZIP archives. It contains a function that turns a ZIP archive into an io/fs.FS structure that can replace embed.FS in most contexts.1

package embed

import (
  "archive/zip"
  "bytes"
  _ "embed"
  "fmt"
  "io/fs"
  "sync"
)

//go:embed data/embed.zip
var embeddedZip []byte

var dataOnce = sync.OnceValue(func() *zip.Reader {
  r, err := zip.NewReader(bytes.NewReader(embeddedZip), int64(len(embeddedZip)))
  if err != nil {
    panic(fmt.Sprintf("cannot read embedded archive: %s", err))
  }
  return r
})

func Data() fs.FS {
  return dataOnce()
}

We can build the embed.zip archive with a rule in a Makefile. We specify the files to embed as dependencies to ensure changes are detected.

common/embed/data/embed.zip: console/data/frontend console/data/docs
common/embed/data/embed.zip: orchestrator/clickhouse/data/protocols.csv 
common/embed/data/embed.zip: orchestrator/clickhouse/data/icmp.csv
common/embed/data/embed.zip: orchestrator/clickhouse/data/asns.csv
common/embed/data/embed.zip:
    mkdir -p common/embed/data && zip --quiet --recurse-paths --filesync $@ $^

The automatic variable $@ is the rule target, while $^ expands to all the dependencies, modified or not.

Space gain#

Akvorado, a flow collector written in Go, embeds several static assets:

  • CSV files to translate port numbers, protocols or AS numbers, and
  • HTML, CSS, JS, and image files for the web interface, and
  • the documentation.
Breakdown of space used by each package before and after introducing
embed.zip. It is displayed as a treemap and we can see many embedded files
replaced by a bigger one.
Breakdown of the space used by each component before (left) and after (right) the introduction of embed.zip.

Embedding these assets into a ZIP archive reduced the size of the Akvorado executable by more than 4 MiB:

$ unzip -p common/embed/data/embed.zip | wc -c | numfmt --to=iec
7.3M
$ ll common/embed/data/embed.zip
-rw-r--r-- 1 bernat users 2.9M Dec  7 17:17 common/embed/data/embed.zip

Performance loss#

Reading from a compressed archive is not as fast as reading a flat file. A simple benchmark shows it is more than 4× slower. It also allocates some memory.2

goos: linux
goarch: amd64
pkg: akvorado/common/embed
cpu: AMD Ryzen 5 5600X 6-Core Processor
BenchmarkData/compressed-12     2262   526553 ns/op   610 B/op   10 allocs/op
BenchmarkData/uncompressed-12   9482   123175 ns/op     0 B/op    0 allocs/op

Each access to an asset requires a decompression step, as seen in this flame graph:

🖼 Flame graph when reading data from embed.zip compared to reading data directly
CPU flame graph comparing the time spent on CPU when reading data from embed.zip (left) versus reading data directly (right). Because the Go testing framework executes the benchmark for uncompressed data 4 times more often, it uses the same horizontal space as the benchmark for compressed data. The graph is interactive.

While a ZIP archive has an index to quickly find the requested file, seeking inside a compressed file is currently not possible.3 Therefore, the files from a compressed archive do not implement the io.ReaderAt or io.Seeker interfaces, unlike directly embedded files. This prevents some features, like serving partial files or detecting MIME types when serving files over HTTP.


For Akvorado, this is an acceptable compromise to save a few mebibytes from an executable of almost 100 MiB. Next week, I will continue this futile adventure by explaining how I prevented Go from disabling dead code elimination! 🦥


  1. You can safely read multiple files concurrently. However, it does not implement ReadDir() and ReadFile() methods. ↩︎

  2. You could keep frequently accessed assets in memory. This reduces CPU usage and trades cached memory for resident memory. ↩︎

  3. SOZip is a profile that enables fast random access in a compressed file. However, Go’s archive/zip module does not support it. ↩︎