The Case for Manual Testing in an Automated World

There are many ideas that contradict one another in software. In one case, people may say that the DRY Principal – “Don’t Repeat Yourself” – should be adhered to without compromise. In other threads, people draw a line on what context that principal should be applied, and push very strongly for duplication in tests, as well as not sharing code between services or even bounded contexts of a modular application. Testing culture, similarly, has a lot of people believing that everything should be automated and that manual testing is dead. Here we’ll discuss the opposing idea: that manual testing still has a place in code culture, and discuss how and why it does, and where it fits into automated testing practices.

Types of Testing

Let’s highlight a few different layers and types of testing – while there are many typs of tests, we’ll focus only on a few:

  • Unit Testing: Automated testing of small bits of code. Can also include automated tests within a module or service that test integration between modules and components in the code (eg a slightly higher level test, but still isolated from other services.) True unit testing does not have an external database or any reliance on external services. As such, if built optimally, they should be extremely fast to run and have the benefit of being up-to-date documentation of the code by coupling to the behaviour.
  • Integration Testing: Combines modules or services together to test subsets of a system. Doesn’t need to run against a full environment. Can still be inexpensive to run as it doesn’t require a full environment and can be run from a module produced to combine modules/services.
  • System Testing: Automated testing that spans multiple modules or services, checking that they behave as expected when hooked up. Generally integration testing is run from “outside” of any particular service against a production-like environment. The tests makes requests against the public or internal APIs of the application to ensure the system as a whole, or subsets of components, behaves as expected. They are slow to run and expensive to maintain relative to unit testing.
  • Manual Testing: Similar to Integration Testing, manual testing will test against the entirety of the system but differs in that it is executed by people using the UI (or APIs as needed) to validate the system and its behaviour.

Manual Versus Automated Testing

Manual testing is often executed by dedicated QA teams in older organizations. Many modern technology organizations do not have explicit QA teams but may have engineering teams that aid in automated testing (The Software Engineer In Test role, for example, which joins teams to improve tooling and processes around automated testing.) The trend is to replace manual testing approaches with automated approaches and teams to support those goals and efforts. The major heuristic is that manual execution is wasteful as it doesn’t provide value going forward, and so building automation of all tests is logically better as an artifact is built in testing and maintains that the behaviour works forevermore.

There is some danger with automated testing. Of course, we know that software changes, so there is an overhead there in maintenance of the tests, and if they aren’t covering anything relevant, and aren’t easy to maintain and break frequently, then the cost of a test may outweigh the value it provides in its life, especially if the test isn’t covering important behaviour. It’s worse if the tests take a very long time to run.

I would suggest, then, that there is manual testing required to discover issues with a change, and that unit testing should be preferred over system tests, while system tests should be reserved for ensuring behaviour of a system is correct, and tests added to the system test suite be chosen with clear intention.

Manual testing can be entirely replaced by automated testing. But if it’s done indiscriminately, the cost of running and maintaining the automated testing may become too high. Likewise, if all test scripts ever created need to executed by humans, the cost will be far too high as a system grows in complexity and scope. I would argue that manual testing should be considered a piece of the development cycle apart from system testing. A human is better at “experimenting” than a script and can find blindspots that automated system testing can. As bugs are discovered by a human, and then fixed by a human, their fixes can generally include the faster, cheaper, more maintainable unit tests, rather than building system tests for some particular condition.


System test isn’t intended to cover a module or service’s particular behaviour, but only the behaviour of the components that touch. System tests and integration tests may only be concerned with testing behaviours that a unit test can not cover, and unit tests should test the specifics of a module/service that would be extraneous to cover in system tests. Manual testing should be done as a part of the development lifecycle, and issues identified/fixed be turned into unit tests wherever possible, while system testing be reserved for testing that cannot be covered with unit tests, especially confirming that components of the system are talking to one other correctly.


How to Test

I was talking to my very good friend about testing today and wanted to throw together a list of the heuristics that I’ve “discovered” which I found, personally, very difficult nuggets to discover through my career. Dan North in his article, “Introducing BDD,” describes the story arch in his career which sounds very much like my own:

The deeper I got into TDD, the more I felt that my own journey had been less of a wax-on, wax-off process of gradual mastery than a series of blind alleys. I remember thinking “If only someone had told me that!” far more often than I thought “Wow, a door has opened.” I decided it must be possible to present TDD in a way that gets straight to the good stuff and avoids all the pitfalls.

Example Code

Before we begin, I’m going to use a mutable Stack implementation in Scala as an example of code that should be easy enough to read and understand, regardless of the languages you’re familiar with.

Please note that I did not use a pure functional implementation with the intention of the code being more easily read by people coming from an imperative background!

A stack is a “last-in-first-out” data-structure that lets you place an item on the top of the stack (“push”) or retrieve and remove the item on the top of the stack (“pull.”) There is generally an operation called “peek” as well which will retrieve the item on the top of the stack without removing the value from the stack.

Screen Shot 2019-02-06 at 5.04.28 PM

You can find an example of a Java Stack here if you’d like further implementation details but it’s not necessary for comprehending the content in this article.

Naming Tests is Describing Behaviour

I’ve seen a lot of code in different environments in my career. One of the patterns I’ve seen in less experienced development teams is code like the Java JUnit example below.

public class StackTests {
    public void testPeek() { ... }

The test name seems reasonable at first glance – it’s clear that the stack is the class under test by the test suite name, we’re trying to test the Stack’s “peek” method in the peekTest method.

Well, the problem is that we know what noun we’re testing, but we don’t know which specific behaviour is being exercised. Is it when it’s empty? Is it when it’s loaded with data? Should it do something different in these two cases?

One of the qualities of good tests is that they act as documentation to someone trying to read and understand the code. By describing the behaviour we are trying to test, we are also creating documentation that is forced to stay up to date with the code.

A good heuristic is to start a test name with “it should.” This forces the writer of the tests to focus on the behaviour instead of the noun. And this is the foundation of Behaviour Driven Development (BDD) – a focus on behaviour!

Let’s have another look at that JUnit test applying this practice.

public class StackTests {

    public void itShouldReturnNothingWhenPeekingEmptyStack() { ... }

Most modern testing tools have been influenced by BDD so they will generally try to guide you toward describing behaviour. You can see in our JUnit example above that the test name is very long. Modern tools will help you organize your code by allowing you to describe different scenarios, and then describe the behaviour. Below is an example from ScalaTest which gives several different semantic options for describing your tests.

class StackSpec extends FlatSpec with Matchers {
  "An empty Stack" should "return None when peeking" in { ... }
  it should "return None when popping" in { ... }

How to TDD

Knowing the word TDD and understanding it in practice are two different things. TDD does not simply mean writing tests with your code, nor does it mean writing all of the tests before the code is written, rather, it is a practice of allowing the writing of code to be guided by the addition of tests, one at a time. The best way to highlight this is to demonstrate the addition of tests and code for our stack implementation.

The heuristic is: write the most general test that you can, and do the minimal work to implement the behaviours described so far in a test. This might mean hard coding a return value or stubbing features that aren’t yet under test as you go.

For example, let’s say we start with the test described above again:

class StackSpec extends FlatSpec with Matchers {
  "A an empty Stack" should "return None when peeking" in { ... }

The simplest thing to do here is to just hardcode the return value.

class Stack {
  def peek = { None }

Now, this obviously isn’t correct, but we chose the most general test, and then added the behaviour, and the test now passes. We can now add another test, and write the code to make it pass.

class StackSpec extends FlatSpec with Matchers {
  "An empty Stack" should "return None when peeking" in {
    val stack = new Stack(List())
    assert(stack.peek == None)
  "A non-empty Stack" should "peek last value" in {  
    val stack = new Stack(List(1, 2, 3))
    stack.peek should equal(Some(3)) 
class Stack(initialData: List[Int] = List.empty[Int]) {
  private var data: List[Int] = initialData.reverse
  def peek = { 
    if(data.nonEmpty) Some(data.head) else None 

You can continue to add behaviour in this manner, and just continue to change the code to make the tests pass. Adding our first push behaviour, for example:

class StackSpec extends FlatSpec with Matchers {
  "An empty Stack" should "return None when peeking" in { ... }
  "A non-empty Stack" should "peek last value" in { ... }

  it should "push a value to the front of the queue" in { 
    val stack = new Stack(List(1, 2, 3))
    stack.peek should equal(Some(4))
class Stack(initialData: List[Int] = List.empty[Int]) {
  var data: List[Int] = initialData.reverse
  def peek(): Option[Int] = { 
    if(data.nonEmpty) Some(data.head) else return None 
  def push(newValue: Int) {
    data = newValue :: data

And so on, like this. There are two scenarios where TDD is very good:

  • finding and fixing bugs: add the tests to the behaviour to show its broken and to cover the expected behaviour (because there is obviously no test for it!) Then you have a repeatable mechanism for demonstrating the bug so you can easily do analysis to find the root cause.
  • building new features: this is really where TDD shines – adding new features is a joy with TDD.

Ping Pong – TDD With Other People

This is also a very fun thing to do with another person! The game of “Ping Pong” translates to TDD very well. With yourself and another person, you separate roles of test writer and implementor and you both take turns in a round of writing the behaviour and then writing the next test. So the activities above would be done like so:

  1. Person A:
    1. Writes the test: “An empty Stack” should “return None when peeking”
  2. Person B:
    1. Implements the minimum code to make the test suite pass.
    2. Writes the test: “A non-empty Stack” should “peek last value”
  3. Person A:
    1. Implements the minimum code to make the test suite pass.
    2. Writes the test: “A non-empty Stack” should “push a value to the front of the queue” 

You continue like this until the feature is done. It’s really a lot of fun if you both understand the “rules.”

  • When writing tests: Write the most general test to the most specific.
  •  When writing code: Implement the smallest amount of code to get the tests passing.

It’s not screw your neighbour per se… But if you can find a “creative” way to make the tests pass by writing less code on your turn… the option is there!

At some point a test that is introduced may force the hand of the pair to write a lot of code before the next test – this is okay and you can still pair as you normally would through that period of pairing, trading off the driver/navigator roles as desired.

Test Bottom Up: From Unit to Integration

I’m starting a role at Elastic and their Developer Constitution sounds like it could come from my own mouth. I deeply resonated with this article when I bumped into it, and the heuristic laid out here come from sometimes painful experiences so avoid those errors and follow the guiding principals. You’ll still get to learn from mistakes, you just don’t have to make them yourself.

This section on testing is a good heuristic.

Test bottom up. If you write code, write unit tests first. Write many of them. Write code so you can write many of them. Integration testing is the last step. Focus on adding more tests that execute fast and are easy to debug, like unit tests. This is crucial for developer velocity.

So the idea is to Unit test class/modules in isolation. And then to test them integrated together, and integrated with whatever dependencies it has.




ExUnit: Testing Named Processes

Named processes can be a bit of a pain to test if you need to setup data before they start.

If you try to start a named process in test, you will get an :already_started error.

There is a great pattern here that @jowensboggs from the Elixir slack channel tipped me off on.

The example here is when trying to merge commands from two timelines into a unified view, we want to test if any commands are due from one of the queues.

You can define the start_link method in your gen_server so that it inspects the args passed, and tries to extract the name, or falls back to __MODULE__.

 def start_link(opts) do
{name, opts} = Keyword.pop(opts, :name, __MODULE__)
init_args = opts
GenServer.start_link(__MODULE__, init_args, name: name)

then in your test you pass the name as a genserver argument:

 {:ok, pid} = start_supervised({ShiftSchedule.Worker, name: :my_test})

Finally, in your public API, you would need to optionally accept a pid:

 def commands_in_range(%DateTime{} = from, %DateTime{} = to, pid \\ ShiftSchedule.Worker) do, {:get_commands_in_range, from, to})

This lets you use __MODULE__ in the rest of your code base, while setting up any data needed for the process before starting it in your tests. Before this I was killing the processes so that the supervisor would restart it which is not good!

Real World Elixir Umbrella Projects



This article is a quick collection of my notes and references for working with Elixir umbrella projects in the day to day. It first covers my usual workflow and tools I use, and then moves into a more thorough discussion on the concepts and heuristics that I use as guiding principals around decisions of what to place in an umbrella app, where to break things down, and how to interact with umbrella apps. I briefly discuss querying data for presentation, although that topic deserves its own post.

Umbrella Projects

Elixir umbrella projects are an excellent option for building and growing applications over time. They are a good middle-ground between the monolith and micro-services based approaches, with less overhead than micro-services, but with the benefit of cohesion that micro-services enforce. Umbrella projects are a single project that contain multiple isolated apps (what might be called a module in other technology stacks). It can be difficult to understand exactly where the boundaries should be around apps in an umbrella project, and exactly how big or small they should be.

Creating a new umbrella project can be done with mix like so:

mix new my_project --umbrella

Inside that project, you can then create a new app with a supervised process. Simply move into the apps folder and ask mix to do it for you like so:

cd apps; mix new folder_name --module ModuleName --sup

That will give you a basic supervised process in a folder called folder_name in a module called ModuleName. Obviously change those to suit your needs!

There is a sketch of a module with hello world that can be discarded or added to, but further inside is an application.ex file that has the supervision and process description:

defmodule ModuleName.Application do
  # See
  # for more information on OTP Applications
  @moduledoc false

  use Application

  def start(_type, _args) do
    # List all child processes to be supervised
    children = [
    # Starts a worker by calling: ModuleName.Worker.start_link(arg)
    # {ModuleName.Worker, arg},

  # See
  # for other strategies and supported options
  opts = [strategy: :one_for_one, name: ModuleName.Supervisor]
  Supervisor.start_link(children, opts)

Now you can create a module with genserver behaviour and add it to the child list to get up and running. When you run the project, each of the application.ex files for each app in the umbrella project will be invoked, and the processes will be started and managed. They exist independently, yet can find and communicate with each other as needed through message passing. OTP and discussion of genserver and actor model implementations are outside of the scope of this discussion, and, if you’re here, I assume you have at least a basic understanding of the technology.

Working with Umbrella Apps

I’ll discuss this a little bit more shortly, but when I’m working in an umbrella app, I typically treat it entirely independently, running the tests inside that folder, and opening that as a separate project in my editor (emacs.) For me, re-opening the project means running a little snippet to produce a .projectile file in the root of each app‘s folder:

cd apps; find .-maxdepth 1 -type d -exec touch {}/.projectile \;

From there, I can open an app as a stand-alone project so I only see its contents. Alchemist will run the tests for only that app. I can then switch between the apps and their files quickly and easily using a blend of buffer management and projectile.

Granularity: When to Break Things into Separate Apps

When designing an application, you’ll commonly encounter the question of how big or how small to make each app in an Umbrella project. If you organized a system on post it notes describing all state-changes (events) that can occur, you would likely find that they can be grouped together around a few distinct topics. For example, an e-commerce site may have events such as ProductCreated, PriceChanged, ProductDescriptionChanged, as well as CustomerAccountCreated, CustomerInformationUpdated, and maybe ProductAddedToCard, CartEmptied. If we logically grouped these, we’d see that there are a few distinct entities that the events are related to. In Domain Driven Design terminology, these entities are generally referred to as aggregate roots. A little box can be drawn around these, and they can stand alone and independently without any coupling between them. In Elixir/OTP, it’s easy to see how we might send instances of these Commands and have them respond with the state changes (EmptyCart -> CartEmptied.) In general, these entities that our events exist around are the perfect place to draw a line and have in their own umbrella app. The little box we draw around these pieces, in DDD terminology, we refer to as the bounded context.

If you’re into Domain Driven Design, then you’ll have an intuition around what boundaries to draw around an aggregate root (a bounded context). If you have this level of thinking, then you likely know already exactly how big or small an umbrella app should be. I would suggest that, occasionally, it may make sense to have multiple bounded contexts within the same umbrella application if they are closely related, although it’s perfectly fine to make a rule that a bounded context always has its own umbrella application as well. In general, the actor model is an excellent fit for Domain Driven Design. If you’re struggling with where to break things down, I would strongly recommend investing time in reading Eric Evans’ or Vaughn Vernon’s works on the subject of DDD.

How to Work with Umbrella Apps and Bounded Contexts

Once you have some bounded contexts represented by different umbrella apps, you’ll likely not need to interact with many of them at the same time. Messages may be passed between them, so implementing a protocol between the contexts may require wiring from one to the other, but they will otherwise exist quite independently of one another. If they don’t, you may have demarcated at the wrong place. If you have the right granularity, then you should only have to put one bounded context in your head at a time.

Because these contexts exist so independently of one another, my preference for working on any one project is to treat it as an entirely unique project in my editor. For me, this means only having one project open at a time using projectile in emacs. For you that might mean opening each folder in the app as a separate project in sublime or VSCode but you can use the same approach.


What to Share Between Projects

Generally, we don’t share anything between projects, but I do like having a couple projects that are foundational and shared. No utilities or anything like that should be shared if at all possible. We don’t have any common util or shared project or anything like that. And if you want to be quite extreme about this, you may decide to not share any data between projects either, requiring that the aggregate root deal with any requests for data.

We have chosen to share db between the apps so that different apps can query a table through that module. We are not deploying micro-services in our use case at FunnelCloud, but if we were, we would insist on no shared data.

So we have a db project that’s used to interface with Postgres. We keep basic ecto models there and any then only very general queries. For example, to de-duplicate Kafka messages through restarts and deployments, we store the offset of the last processed message for the consumers within a bounded context. This offsets table is shared, so the model and queries are in this common Elixir module.

defmodule Db.Offset do
  use Ecto.Schema
  import Ecto.Query

  @primary_key {:id, :string, []}
  schema "offsets" do
    field(:value, :integer)

def upsert(offset = %Db.Offset{}) do
  Db.Repo.insert(offset, on_conflict: :replace_all, conflict_target: :id)

def get(id) do
  qry =
      o in Db.Offset,
      where: == ^id

We also have a protocol project that’s used to describe all commands and events. Because commands and events are shared between projects, it’s simpler to have the contract described and shared between them all, rather than sharing specific projects that might indirectly lead to inappropriate coupling.

We try to limit any other sharing between apps as much as possible. Otherwise, if you need to interact with a bounded context, you do so only through message passing.

What Kind of Messages Do We Send?

Working with the actor model in a purely functional language is a bit of a different paradigm, and, from a high level design perspective, it tends to look more like Object Oriented design than it does Functional Programming. Of course, in the details of the implementation, it is functional.

In Object-Oriented programming, the heuristic of good design is to bring data and behaviour together so that you tell objects what to do. Interacting with objects causes them to interact with other objects, and to change their state.

val car = new Honda.Prelude()
val person = new Person.Programmer()
person.lookAtOdometer(car) // 0 km/h
person.lookAtOdometer(car) // 10 km/h

We never said “car.speed = 10.” We only told the objects what to do by issuing commands (getInCar, pushGasPedal). The objects do the the rest of the work by responding to those commands which can cause effects (state change). And the state changes that occurred could be described with events. The events, if they were emitted somewhere, might look like this:

class PersonGotInCar(person, car) extends Event
class CarAccelerated(car) extends Event

This approach of telling objects what to do by message passing is what Object Oriented programming was supposed to look like. Alan Kay, who coined the term Object Oriented, purportedly said:

“I invented the term object-oriented, and I can tell you that C++ wasn’t what I had in mind”. – Alan Kay, OOPSLA ’98

The underlying principal here is that we should tell objects what to do, not ask them about their state and changing it from outside of the object. Objects are not just data, they are the marriage of data and behaviour. The heuristic to remember is “Tell, Don’t Ask”.


Now, functional programming looks different. In functional programming, there are no objects (save for multi-paradigm languages like Scala.) In functional programming paradigms, data and behaviour exists separately, such that functions act on data. State changes don’t occur, instead new instances of data are created by passing data into a function, and having data come out of the other side:

def push_gas(car), do: %{car | car.speed + 10}

old_car = %Car{speed: 0}
new_car = push_gas(old_car)
assert new_car.speed == 10

Purely functional applications are built by composing functions that accept and transform data without side effects or state changes. But real applications have state, and state changes over time. Enter processes/actors in Erlang/Elixir/OTP. Here, we can marry the two paradigms together because a process can hold onto some data, waiting to pass it to a function, along with a message, whenever a message is received, and then holding the output of that function and waiting for a message again. Messages can be passed to other processes that are also holding onto data, waiting to receive messages as well.

That heuristic of “Tell, Don’t Ask” that we discussed a few moments ago? It turns out that this is the heuristic that we want to use with our processes/actors in Elixir/OTP too. By sending Commands to processes, we allow a process to encapsulate state and behaviour and can build loosely coupled modules by adhering to this principal. It takes a little bit of getting used to but it’s a great way to build systems.

How to Read and Display Data?

(This is a difficult topic and there are many ways to handle this. I’ll quickly discuss my thinking, but there is no universal truth, only what works for you and your team with your knowledge and experience.)

If we need to be aware of the data inside of other processes for any reason, such as presentation, we can either choose to listen to events emitted from those processes (eg using something like Kafka to produce a queue) or we can otherwise have a read model somewhere that we can read from. For example, we could write the current state somewhere on every state change. Or we could directly query the process if we absolutely must. But separating the read concerns allows us to have a very succinct expression of the domain in the bounded context.

For a little more information on our approaches at FunnelCloud, we have event listeners set up in Rails to update the read model for pieces of the application (your usual event sourcing + CQRS architecture), and then the read model is displayed to the user. In other places, we’ve opted for a simpler approach where rails treats the data written by Elixir as read-only data for presentation to the user, while that same data is used as a recovery mechanism from the bounded context in Elixir. This approach is simpler than a pure event-sourcing implementation and works well for our use case there without the overhead of needing to maintain a journal of events. Both approaches are fine – while we do use event sourcing in some areas, real-world experience has made me a bit cautious in choosing where to use it as the journal needs to be maintained and migrated over time.