My past few posts have been about things I learned while running services at AWS. These next few will be a bit different. I wanted to provide a kind of blueprint for building scalable agile software organizations. My experience at AWS was still impactful in this journey, but more as a proof point for its efficacy at massive scale.
In this first post we focus on organizational structure. Most teams start out not having to worry about organizational design or scaling. Early on in a new product, service, or company there is a single engineering team. This team is naturally aligned to a vision and purpose. They don’t really have to think about ownership since that one team owns the entirety of the vision. The team and its members are naturally incentivized to do whatever it takes to be successful.
That model works to a certain point. Amazon and others often talk about the ‘2-pizza’ rule, meaning that a team should be no larger than could be fed with 2-pizzas. Different people will have different opinions on what that size threshold is. In my experience, the optimal size is probably in the range of 7–8 and teams definitely start to become unwieldy in the 10–12 person range. Startups can probably push this even a bit further, but at some point an org design discussion is unavoidable.
It’s this initial transition from one team to two or more teams that is often the most difficult to get right. Done well, the principles that guide team structure can be replicated ad infinitum as the organization continues to grow ever larger. Hence the fractal image I started this post with. We want to design teams to be this fractal unit of organizational scaling — infinitely composable or decomposable.
Arguably, ownership is the most important aspect of this blueprint for scalable organization. In fact, in my post about Amazon culture I said that “Ownership which may be the most defining aspect of how organizations at Amazon are structured and operate”.
Consider that ownership leads to several desirable outcomes. First, ownership leads to team identity. Teams will naturally tend to associate with their ownership scope. They create a vision for a desired future state and how it will help meet customer needs. Second, it drives accountability. With clear ownership scope, inputs and outputs can be clearly defined. This can be used as a management mechanism to measure and improve performance. Finally, implemented correctly, it drives this fractal unit of scaling for the organization. Each team operates in relative isolation within their ownership scope but is still feeding into the broader product/organization.
It’s too simple to stop at this point in the definition of ownership, though, because there are essential aspects to how ownership scope is defined in a software organization. Specifically, I have found that the most effective ownership scope for a software team is full life cycle ownership of software components and/or services. This means that a team that a team owns every aspect of the following with a caveat that many of these items are often jointly owned with Product Management:
- Planning — the team is an active advocate during planning for how their area should evolve over time. The engineering team is especially responsible for prioritizing architectural improvements, scaling work, and operational improvements. Product management is especially responsible for prioritizing customer and business needs.
- Architecture/design — the team owns their own architecture/designs. Artifacts are produced to a quality standard and used to communicate and justify decisions that were made.
- Implementation — the team owns their own code. Coding is performed to the quality standards for the organization and always peer reviewed. Test-driven development is practiced (see next).
- Quality — the team owns their own quality. All types of testing (unit, integration, canary, performance, etc.) are produced by the team.
- Operations — the team owns their own operations. The team deploys their own software, monitors their software in production, fixes defects in their software, and is paged when something breaks.
Let’s explore why this notion of ownership is so effective.
First, this structure incentivizes a full value stream capable of customer value delivery. Customers can’t practically use software unless it is actually in production and operating with some reasonably high level of quality. If there is a break in ownership at any point prior to the final step shown above, the organization is relying on one or more handoffs in the value stream. Any handoff outside of the team adds significant friction, both because there has to be additional rigor on the interface between steps to make the handoff effective. Additionally, it requires perfect alignment of resources/capacity to ensure that the downstream team is always ready to take work from the upstream one.
Note that in saying that the structure incentivizes customer value delivery does not imply that every team must be a full stack team capable of front-end and back-end changes. It is quite common to have full life cycle ownership of a purely backend component (or purely frontend) — it’s just that the customer value is realized in conjunction with other components that deliver independently in a service-oriented architecture.
Second, full-life cycle ownership tends to be self-correcting due to the inherent feedback loop. For example, consider what happens if a team didn’t put proper diligence into the quality of their software. Higher defect arrivals later mean the team will have less time to build new features. This turns into a natural incentive for engineers to fix the underlying conditions (because we all know that engineers really want to build new features :)!).
Finally, this ownership structure encourages fungibility on the team. Because each team is directly incentivized for customer value delivery, they will naturally adjust at each phase to realize this. If there is more quality work or operations work in a particular sprint or sprints, they will simply do that work without regards to specific titles and roles on the team. Contrast this with an organization design that relies more on centralized teams for functions such as quality engineering. While not always the case, it is only natural that teams would accept a bottleneck at the specific phase if there is a designated centralized team responsible for certain work.
Other Essential Elements
Other than ownership, there are a few other factors that are important elements of the org design blueprint.
Most teams and engineers perform best when there is a compelling purpose and vision for what they build and own. Engineers want to be able to project themselves into that future state and see how they relate to achieving that vision. This can be a particular challenge when there are software components that are on a deprecation path but still need ownership. In these situations, it’s often best if the same team owns building some portion of the future architecture while they maintain the existing legacy components.
We already discussed team size and how there is a threshold somewhere around 10 people where it is definitely time to decide how to factor out an area of ownership. Similarly, though, there is such a thing as teams that are too small. Even if there is a full life cycle ownership scope that makes sense for a small team, having small teams leads to fragility in numerous areas. Four person teams are reasonable as a minimum and 5–6 is better in practice. If you initially scope ownership with multiple small teams, simply combine a few of them into a team that owns multiple different services/components.
This refers to the degree to which features tend to require effort from multiple teams. No matter what the team structure is there will always be some features that require effort from multiple teams. Consideration should be given to how frequently this occurs. If it’s too often, consider refactoring team ownership to support more independent delivery of customer value. Another form of coupling can occur when the underlying software causes dependencies between teams. For example, when multiple teams rely on a single shared deployment mechanism it’s important to invest in the underlying software architecture and toolchain to enable each team to independently ship software into production.
Having laid out the essential elements for our org design blueprint, there are a few aspects related to implementation that are worth discussion.
First, there is at least one potential downside to this model of ownership. Because it biases for team ownership over centralized ownership, each team is expected to adopt and uphold the organizational standards at each phase. This can be a particular challenge for newer and/or more junior teams. This is particularly relevant in areas like quality and operations as gaps in these areas are not always immediately observable. The downside risk is best mitigated with review processes at important gates, especially a robust operational readiness review before new software goes into production.
Next, consider the question of centralized teams and when they are appropriate in this design. The full life cycle model of ownership rules out some of the more common scenarios with centralized quality or operations/SRE teams. The better question to consider is whether there is a centralized ownership structure that makes sense. For example, we decided at Coalition that we wanted a team to own our AWS cloud platform along with other development tools so that infrastructure best practices could be leveraged by all other teams. This platform team adopts the same ownership mindset as any other dev team (delivering the services, ensuring quality, monitoring in production, etc.).
Finally, it’s important to approach team structure as something that is flexible but slowly changing. Ultimately, team structure and resourcing needs to match the needs of the business. Knowing these needs are ever-changing, it may be tempting to try and shuffle resources during each planning cycle. When we change team structure or composition, though, we affect the affinity to identity, vision, and purpose of the team(s). There is no right answer on this topic — leaders just need to strive for balance between keeping teams and team structure intact and adjusting to meet the needs of the business at any given moment.
This post is particularly timely for me in the context of my new role as Head of Engineering at Coalition. The team at Coalition has been through a period of hyper-growth (for reference, this post covers the tailwinds around Coalition and our mission to solve cyber risk). Over the past 9–12 months they had rocketed all the way from a single team to an organization of close to 30 engineers. Needless to say, this journey was challenging. Almost universally in my early talks with members across the team, they expressed a need for better ‘structure’. Much of the above is based on a document that I wrote together with senior leaders as an articulation for how we think about team structure at Coalition Engineering. We are now in the process of rolling out a new structure aligned with this blueprint that will enable us to scale well into the future.
Needless to say, I’m certain that Coalition Engineering is not alone in this journey and challenge with scaling. My recommendation above all is to recognize the challenge of one-team to many-team scaling in software engineering and approach it purposefully. The model I presented here is one that has worked for me. Regardless how much of this approach you adopt, write down the philosophy and tenets that will shape your own org design now and into the future!
Note: This post was originally published at Medium.