Testing Go+S3 with Gnomock and Localstack
A few months ago I built gompress: a simple utility that takes a location in AWS S3, compresses all the files in it with GZIP, and puts them in another location, also in S3, optionally keeping or removing the original files. It was something I wrote to use once on a large S3 bucket full of uncompressed CSV files, and published it for anyone else who might need it. Although all worked well, there was something that made me uncomfortable about it: it didn’t have any tests.
There were two reasons for not adding tests. One, and obvious, was that it wasn’t worth the time as I needed a quick solution of a specific problem I faced. The other one was that it was not easy to write a test to a program that operates entirely on a 3rd party service, such as AWS S3. Even though there were ways to mock the service, I felt that these tests will never be good enough.
The original version of the code can be found here.
Recently, after I created Gnomock, an integration and end-to-end testing toolkit that uses external services without mocking them, I decided to use it to test gompress as well.
Localstack preset
Localstack is an very popular project that allows to spin up many AWS services, including S3, locally, using a single docker container. To be used in Go tests, it needs to be wrapped with some code that allows to pull the image, start the container, setup port bindings, wait for the services to become available and setup some initial state, all before starting to actually verify whatever the program does. These actions are repeated for every package that uses S3, so I implemented a new preset for Gnomock. This preset can be reused anywhere very easily, and here is how.
Implementation
Getting the dependencies
First of all, go get
the package:
$ go get github.com/orlangure/gnomock
Preparing the code
You can skip this part, and go directly to the actual testing
Since I originally wrote the code without thinking about its testability, it wasn’t really possible to test anything easily. There were a few problems:
Moving execution to sub-package
Actual execution code was originally inside main
function:
func main() {
conf, err := newConfig()
// ...
src, err := newClient(conf.srcRegion, conf.srcBucket, conf.srcPrefix)
// ...
dst, err := newClient(conf.dstRegion, conf.dstBucket, conf.dstPrefix)
// ...
files, errors := src.listFiles()
// ...
w := &worker{src, dst, conf.keepOriginal}
// ...
w.start(files, wg)
// ...
}
Even though it worked, this code was hard to test: I needed to actually run the
program to trigger whatever actions were taken by it. Such tests won’t benefit
from Go’s coverage reports, race detector, or debugging. That’s how all the
application logic moved to gompress
package inside gompress
repository.
From that point forward, main
included only configuration using
flag
package, and a call to
gompress.Run
func main() {
conf, err := newConfig()
// ...
err = gompress.Run(conf)
// ...
}
S3 Endpoint configuration
The next step was to instruct AWS SDK to use a custom endpoint whenever it
needed to access S3. Config
type was made public, and it got a new field:
Endpoint
:
// Config defines how gompress will process the files
type Config struct {
// ...
// Endpoint is used for tests to override default s3 endpoint
Endpoint string
}
This new field was used every time the code created a new AWS session:
config := &aws.Config{Region: aws.String("us-east-1")}
if endpoint != "" {
config.Endpoint = aws.String(endpoint)
config.S3ForcePathStyle = aws.Bool(true)
}
sess, err := session.NewSession(config)
// ...
Here, two AWS SDK for Go parameters are used:
Endpoint
, which is
an address of S3 service, and
S3ForcePathStyle
,
which makes sure AWS SDK does not use custom domain names for every bucket,
and appends bucket names to the URL instead.
So, in order to properly test the code, I needed to change both the structure of the code and the actual “production” code (S3 client configuration).
Preparing test data
gompress
is used on buckets with lots of uncompressed files. So, before
writing actual test code, I needed to create the files for it. All the files
can be found in
testdata
folder. There are 200 files with some random base64 contents, created with the
following commands:
$ for i in `seq 100`; do openssl rand -base64 -out testdata/input-bucket/a-$i.txt 1000; done
$ for i in `seq 100`; do openssl rand -base64 -out testdata/input-bucket/b-$i.txt 1000; done
All the files were put into input-bucket
folder, which later will be
translated to a new S3 bucket.
Setting up localstack
Finally, I was able to start writing the test itself. First, I needed a running instance of localstack, with all my test files already inside it. It was an easy task with gnomock:
// create empty folder for output bucket
err := os.MkdirAll("./testdata/output-bucket", 0755)
// ...
// use gnomock-localstack preset to spin up S3
p := localstack.Preset(
localstack.WithServices(localstack.S3),
localstack.WithS3Files("./testdata"),
)
c, err := gnomock.Start(p)
// clean up after we are done
defer func() { _ = gnomock.Stop(c) }()
// ...
// local s3 service is now accessible:
s3Endpoint = fmt.Sprintf("http://%s/", c.Address(localstack.S3Port))
testdata
folder already included input-bucket
directory, and I needed to
create an empty output-bucket
folder so that gnomock
will recreate this
structure in S3: one bucket for the input, with all the files, and the other
one for the output, empty.
Gnomock Localstack preset allows to recreate a local folder structure in S3,
running locally using localstack. All it needs is a
localstack.WithS3Files
option with a root directory of the required S3 state. Every direct child
folder will be used as a bucket, and all its files will be uploaded into it,
keeping the relative paths.
gnomock.Start(p)
call is blocking until the container is up, running, and
includes all the files I wanted it to have.
With that, I created a new S3 client using AWS SDK and the local container:
config := &aws.Config{
Region: aws.String(region),
Endpoint: aws.String(s3Endpoint),
S3ForcePathStyle: aws.Bool(true),
Credentials: credentials.NewStaticCredentials("a", "b", "c"),
}
sess, err := session.NewSession(config)
// ...
svc = s3.New(sess)
Actual testing
With S3 service running locally in a container, with all the files already inside, I started writing the actual test. First, I needed to confirm that the original state was as I expected:
// start with 200 files
listInput := &s3.ListObjectsV2Input{Bucket: aws.String(inputBucket)}
files, err := svc.ListObjectsV2(listInput)
require.NoError(t, err)
require.Len(t, files.Contents, 200)
Then, I needed to actually run the code against this S3 service:
conf := &gompress.Config{
Src: &gompress.S3Locaction{
Region: region,
Bucket: inputBucket,
Prefix: "a-",
},
Dst: &gompress.S3Locaction{
Region: region,
Bucket: outputBucket,
Prefix: "new-dir/",
},
KeepOriginal: false, // remove original files from s3
Endpoint: s3Endpoint,
}
require.NoError(t, os.Setenv("AWS_ACCESS_KEY_ID", "foo"))
require.NoError(t, os.Setenv("AWS_SECRET_ACCESS_KEY", "bar"))
require.NoError(t, gompress.Run(conf))
Note the AWS credentials environment variables: the original code uses these credentials to connect, but in tests they don’t have to be even close to real credentials.
You can see the rest of the code in the test file, but there is nothing new: I use AWS SDK for Go to list and read files from S3 (running locally) to verify that that code did what I expected.
Summary
I didn’t add any tests to a program I wrote once because there was no way to do so that I found good enough. Later, gnomock made it possible: it was very easy to spin up S3 service locally, directly from my test code in Go, set up its initial state using one line of code, and run the tests against a real S3 service without any mocks.