We use automated CI to ensure that each release of Stardog is as reliable as possible. In this post we look at some of the issues.
Automated test and release pipelines are a crucial part of software development. At Stardog we follow best practices for software development and testing. We release a new version of Stardog every month so it is important to have a seamless continuous integration (CI) environment that supports a fully-automated test and release pipeline.
Running our fastest tests with each pull request, slower tests nightly, and additional safety checks during a release help us to find bugs quickly and fix them before they’re released. Our CI environment is also designed to improve developer productivity by offloading the need for individual developers to spend time setting up and monitoring long-running or difficult-to-configure tests.
However, as any developer can tell you, there are a number of complications with automated test and release pipelines.
Flakiness: The Death Knell of CI
Teams set out with the best intentions when setting up CI. Goals that seem reasonable at first, such as running through all tests for each commit or creating and publishing a build artifact nightly, quickly turn into headaches as the project evolves and tests become more complex.
The deterioration of CI isn’t always evident at first. One or two tests that seem fine when they are merged can turn out to be flaky later. Tests may change the environment in subtle ways that impact other tests in the same run or, worse yet, in future runs.
Most of the time a few flaky tests or quirks in the test environment won’t cause problems. Developers can still make progress with minimal interruptions and releases still ship on time. However, as the size of the test suite grows, the time required to run the tests also increases, impacting the development cycle. Flaky tests and broken test environments can slowly overtake the time and focus of a development team. Developer productivity drops and releases start slipping because one failure in the pipeline derails the whole run. These challenges can turn the dream of a fully automated test and release pipeline into an ongoing maintenance nightmare and time sink for development teams.
Testing at Stardog
At Stardog we have tens of thousands of tests that we run against our development and release branches many times per day. Automatically running as many tests as possible as soon as we can against new commits increases the likelihood of uncovering bugs, flaky tests, or broken test environments. Developers can quickly see the impact of their changes and work to fix any bugs that are introduced.
The requirements for our test groups vary widely, from unit tests that can be run locally to complex Docker setups that inject network and system failures into a Stardog Cluster.
Our tests are broadly grouped into the following categories:
- Unit and integration tests
- Cluster-specific tests
- Virtual graph-specific tests
- Benchmarking and performance tests
- Chaos tests
- Release tests
- Final release checks and smoke tests
The unit and integration tests are run against every pull request (PR) while most of the rest of the tests are run nightly. Unit, integration, and cluster tests can be run locally by developers or in a small Docker container in CI. The virtual graph (VG) tests have external dependencies; in particular, they rely on third-party databases such as MySQL. The benchmark and performance tests require substantial resources that closely match a production deployment of Stardog, capable of storing hundreds of millions of nodes and edges. The chaos tests run in a handful of Docker containers with each Stardog cluster member and ZooKeeper server in separate containers. Blockade, an open source tool for chaos testing, then injects network and system failures (network partitions, high latency, etc.) between the containers as the tests perform various Stardog actions. Release tests consist of common use cases that serve as a final check before a release artifact is made public.
The sheer number of tests, combined with complex test environments and regular releases, underscores our need to keep flakiness and complexity at bay in order to keep CI running smoothly.
We’ve opted for a two-pronged approach to CI, relying on both a hosted CI and our own Jenkins deployment, to handle our diverse test requirements, leveraging the benefits of each where it makes sense.
Unit and integration tests are run against every PR (and change to a PR) using CircleCI. We use Circle for tests on PRs primarily because of its simplicity. Circle has seamless integration with GitHub and provides a simple interface for debugging test failures. This works well for most developer workflows, allowing developers to stay focused on their task and receive quick feedback on their changes.
Unfortunately, hosted CI’s come with their own set of limitations, in particular:
- Lack of customizability: tests on Circle run in Docker containers, which are customizable up to a point. However, our chaos tests run in multiple containers and inject system failures between them (which requires access to host networking, something Circle shouldn’t allow). The performance and benchmark tests need standalone hosts with substantial resources.
- Time limits: Circle imposes a two hour time limit on jobs. This is sufficient for our unit and integration tests but both the chaos and benchmark tests run for 12 hours.
- Security: our release pipeline pushes to Artifactory, our download server, and makes Stardog artifacts publicly available. It also commits back to GitHub. This level of access is something we prefer to keep locked down more than hosted CI’s allow.
The remainder of our test groups run on Jenkins. We have a single Jenkins master (with a significant amount of RAM) and run all jobs run in Docker containers on that master. Tests that need additional resources provision nodes on EC2 using Terraform for single-node Stardog deploys or Graviton for clusters. This way, Jenkins can run our chaos and benchmark tests nightly, provisioning the appropriate resources on EC2 and cleaning up when the jobs complete.
Developers can also run any of the test groups on Jenkins by pushing a branch up to GitHub and selecting the groups they want to run. We’ve found that keeping jobs contained to Docker on Jenkins and Terraform or Graviton deployments on EC2 minimizes the operational overhead of Jenkins. Each job is self-contained and can easily be cleaned after success or failure by simply destroying the Terraform deployment and shutting down the Docker container.
The release pipeline is also configured on Jenkins and allows us to run through all tests whenever we need to cut a snapshot or a release. Our release process consists of two phases:
- Build and test a release artifact, pushing to an internal location
- Promote an internal artifact either to a custom location for private snapshots or a publicly available location for GA releases
Splitting the release into two phases allows us to manually verify the artifact (if desired) before making it available to users. It also prevents a developer from accidentally building and releasing an artifact that was meant to be internal since the promotion phase is completely separate.
At Stardog our entire development team has a test-focused mentality to guarantee CI stays running smoothly and that we ship quality code every month.
Every new feature is thoroughly tested before it’s released; bugs are found, fixed, and stay fixed. Our extensive test suite and CI environment helps keep our focus on delivering the world’s leading Knowledge Graph platform for our users.