Testing Go+S3 with Gnomock and Localstack

A few months ago I built gompress: a simple utility that takes a location in AWS S3, compresses all the files in it with GZIP, and puts them in another location, also in S3, optionally keeping or removing the original files. It was something I wrote to use once on a large S3 bucket full of uncompressed CSV files, and published it for anyone else who might need it. Although all worked well, there was something that made me uncomfortable about it: it didn’t have any tests.

There were two reasons for not adding tests. One, and obvious, was that it wasn’t worth the time as I needed a quick solution of a specific problem I faced. The other one was that it was not easy to write a test to a program that operates entirely on a 3rd party service, such as AWS S3. Even though there were ways to mock the service, I felt that these tests will never be good enough.

The original version of the code can be found here.

Recently, after I created Gnomock, an integration and end-to-end testing toolkit that uses external services without mocking them, I decided to use it to test gompress as well.

Localstack preset

Localstack is an very popular project that allows to spin up many AWS services, including S3, locally, using a single docker container. To be used in Go tests, it needs to be wrapped with some code that allows to pull the image, start the container, setup port bindings, wait for the services to become available and setup some initial state, all before starting to actually verify whatever the program does. These actions are repeated for every package that uses S3, so I implemented a new preset for Gnomock. This preset can be reused anywhere very easily, and here is how.

Implementation

Getting the dependencies

First of all, go get the package:

$ go get github.com/orlangure/gnomock

Preparing the code

You can skip this part, and go directly to the actual testing

Since I originally wrote the code without thinking about its testability, it wasn’t really possible to test anything easily. There were a few problems:

Moving execution to sub-package

Actual execution code was originally inside main function:

func main() {
	conf, err := newConfig()
	// ...
	src, err := newClient(conf.srcRegion, conf.srcBucket, conf.srcPrefix)
	// ...
	dst, err := newClient(conf.dstRegion, conf.dstBucket, conf.dstPrefix)
	// ...
	files, errors := src.listFiles()
	// ...
	w := &worker{src, dst, conf.keepOriginal}
	// ...
	w.start(files, wg)
	// ...
}

Even though it worked, this code was hard to test: I needed to actually run the program to trigger whatever actions were taken by it. Such tests won’t benefit from Go’s coverage reports, race detector, or debugging. That’s how all the application logic moved to gompress package inside gompress repository. From that point forward, main included only configuration using flag package, and a call to gompress.Run

func main() {
	conf, err := newConfig()
	// ...
	err = gompress.Run(conf)
	// ...
}

S3 Endpoint configuration

The next step was to instruct AWS SDK to use a custom endpoint whenever it needed to access S3. Config type was made public, and it got a new field: Endpoint:

// Config defines how gompress will process the files
type Config struct {
	// ...

	// Endpoint is used for tests to override default s3 endpoint
	Endpoint string
}

This new field was used every time the code created a new AWS session:

config := &aws.Config{Region: aws.String("us-east-1")}

if endpoint != "" {
	config.Endpoint = aws.String(endpoint)
	config.S3ForcePathStyle = aws.Bool(true)
}

sess, err := session.NewSession(config)
// ...

Here, two AWS SDK for Go parameters are used: Endpoint, which is an address of S3 service, and S3ForcePathStyle, which makes sure AWS SDK does not use custom domain names for every bucket, and appends bucket names to the URL instead.

So, in order to properly test the code, I needed to change both the structure of the code and the actual “production” code (S3 client configuration).

Preparing test data

gompress is used on buckets with lots of uncompressed files. So, before writing actual test code, I needed to create the files for it. All the files can be found in testdata folder. There are 200 files with some random base64 contents, created with the following commands:

$ for i in `seq 100`; do openssl rand -base64 -out testdata/input-bucket/a-$i.txt 1000; done
$ for i in `seq 100`; do openssl rand -base64 -out testdata/input-bucket/b-$i.txt 1000; done

All the files were put into input-bucket folder, which later will be translated to a new S3 bucket.

Setting up localstack

Finally, I was able to start writing the test itself. First, I needed a running instance of localstack, with all my test files already inside it. It was an easy task with gnomock:

// create empty folder for output bucket
err := os.MkdirAll("./testdata/output-bucket", 0755)
// ...

// use gnomock-localstack preset to spin up S3
p := localstack.Preset(
	localstack.WithServices(localstack.S3),
	localstack.WithS3Files("./testdata"),
)
c, err := gnomock.Start(p)

// clean up after we are done
defer func() { _ = gnomock.Stop(c) }()

// ...

// local s3 service is now accessible:
s3Endpoint = fmt.Sprintf("http://%s/", c.Address(localstack.S3Port))

testdata folder already included input-bucket directory, and I needed to create an empty output-bucket folder so that gnomock will recreate this structure in S3: one bucket for the input, with all the files, and the other one for the output, empty.

Gnomock Localstack preset allows to recreate a local folder structure in S3, running locally using localstack. All it needs is a localstack.WithS3Files option with a root directory of the required S3 state. Every direct child folder will be used as a bucket, and all its files will be uploaded into it, keeping the relative paths.

gnomock.Start(p) call is blocking until the container is up, running, and includes all the files I wanted it to have.

With that, I created a new S3 client using AWS SDK and the local container:

config := &aws.Config{
	Region:           aws.String(region),
	Endpoint:         aws.String(s3Endpoint),
	S3ForcePathStyle: aws.Bool(true),
	Credentials:      credentials.NewStaticCredentials("a", "b", "c"),
}

sess, err := session.NewSession(config)
// ...

svc = s3.New(sess)

Actual testing

With S3 service running locally in a container, with all the files already inside, I started writing the actual test. First, I needed to confirm that the original state was as I expected:

// start with 200 files
listInput := &s3.ListObjectsV2Input{Bucket: aws.String(inputBucket)}
files, err := svc.ListObjectsV2(listInput)
require.NoError(t, err)
require.Len(t, files.Contents, 200)

Then, I needed to actually run the code against this S3 service:

conf := &gompress.Config{
	Src: &gompress.S3Locaction{
		Region: region,
		Bucket: inputBucket,
		Prefix: "a-",
	},
	Dst: &gompress.S3Locaction{
		Region: region,
		Bucket: outputBucket,
		Prefix: "new-dir/",
	},
	KeepOriginal: false, // remove original files from s3
	Endpoint:	 s3Endpoint,
}

require.NoError(t, os.Setenv("AWS_ACCESS_KEY_ID", "foo"))
require.NoError(t, os.Setenv("AWS_SECRET_ACCESS_KEY", "bar"))
require.NoError(t, gompress.Run(conf))

Note the AWS credentials environment variables: the original code uses these credentials to connect, but in tests they don’t have to be even close to real credentials.

You can see the rest of the code in the test file, but there is nothing new: I use AWS SDK for Go to list and read files from S3 (running locally) to verify that that code did what I expected.

Summary

I didn’t add any tests to a program I wrote once because there was no way to do so that I found good enough. Later, gnomock made it possible: it was very easy to spin up S3 service locally, directly from my test code in Go, set up its initial state using one line of code, and run the tests against a real S3 service without any mocks.