monorepos :: Exploravention

Monorepos: The Architectural Swiss Army Knife You Might Need

The software development world is no stranger to trends, and one that's been steadily gaining traction is the monorepo. You've probably heard the term, perhaps whispered in hushed tones of awe or frustration. But what exactly is a monorepo, and why should you, as a forward-thinking organization, care?

At its core, a monorepo is a single version control repository that holds the code for many different projects, libraries, and applications. Think of it as one giant codebase instead of many smaller, disparate ones. Companies like Google, Meta, and Microsoft have famously employed monorepos for years, and the tooling around them has matured significantly, making them accessible to teams of all sizes. You can deploy a monorepo for the entire organization or a monorepo per division or team.

When done well, a monorepo can solve some interesting and compelling problems in software development. A monorepo is also more expensive. Does the cost of a monorepo justify its benefits? That is for you to decide. The rest of this blog describes monorepos, their best practices, what you can expect from them, their pain points, and how to get around those. Once you have read this blog, I hope you will feel more informed and capable of answering the following question. Is monorepo right for you?

How is a monorepo different from a monolith? While monorepos and monoliths can build multiple artifacts (such as shared libraries), a monolith should be responsible for creating a single deployable service that handles all of the functionality for an entire application. A monorepo can build many separately deployable services.

Let's break down the key considerations when evaluating or implementing a monorepo.

The All-Important Pipeline

A monorepo lives or dies by its pipeline. With potentially vast amounts of code, efficient and intelligent automation is paramount.

Code Generation: Monorepos excel here. Your engineers define schemas (like Protobufs or OpenAPI specs). The build pipeline includes tools that input those schemas and generate client libraries, server stubs, and documentation across multiple projects, ensuring consistency.

Package Management: This is the heart of the pipeline's tooling. In addition to managing the code's transitive dependencies, package managers are also responsible for driving the code generation, linting the source code, compiling the code, building publishable or deployable artifacts from the compiled code, running test automation, profiling test runs, publishing artifacts, deploying services, running code quality checks and other quality gates, and reporting on all of the above in a way that is actionable by both build pipelines and human engineers. Let's break down the different package management tools by the execution environment.

JVM: Tools like SBT, Maven, Gradle, and Bazel can manage dependencies and build artifacts within a unified structure. Bazel, in particular, is built for large-scale monorepos. While JVM-focused initially, Bazel can now build artifacts that run in other environments.
Python: Pip and Poetry can be used, often in conjunction with monorepo-aware tools, to manage Python packages and their virtual environments.
Node.js: The JavaScript/TypeScript ecosystem has robust monorepo tooling like Nx and NPM/Yarn/PNPM workspaces. These tools understand the inter-dependencies between packages within the repo.

Incremental Builds: This is crucial! You don't want to rebuild everything on every commit. Modern monorepo tools (Bazel, Gradle, Nx, Turborepo, Lerna) can determine which projects are affected by each change and rebuild and retest only those projects, dramatically speeding up CI/CD. Be advised that a code generation step can frustrate the performance gains from incremental builds. You may need to figure out how to compensate for this.

The granularity of Dependency Management: Some build systems, such as Bazel, require explicit dependency configuration on a per-package level. Others require explicit dependency configuration, with implicit transitive closure of dependencies, at the project level. Why do you care? Build systems do not permit circular dependencies. If that is at the package level, then no two or more packages can have dependencies on each other. If that is at the project level, then no two or more projects can have dependencies on each other. Enforcing the restriction of circular dependencies takes more upfront effort but is an essential step toward managing the code complexity in your software. Those who value the software organizational principles of Robert Martin should consider breaking up services into multiple projects to enforce the outside-in dependency rule for hexagonal architectures, as documented in his book entitled Clean Architecture.

Unit, Integration, Load Tests: A monorepo provides a holistic testing strategy. A PR for a refactor can also include the corresponding changes to the unit tests. You can efficiently run integration tests that span multiple services because they're all in the same place. Affected project detection applies here – only run tests relevant to the changes. Your test automation must be high quality, meaning good coverage, repeatable results, and no false positives or negatives.

Automatic Versioning: One of the significant advantages of a monorepo is avoiding the unnecessary complexity of internal components depending on various versions of other internal dependencies. If done correctly, there is no reason for semantic versioning, as everything always depends on the latest version of the components within the monorepo. You get to realize this benefit only if you adopt continuous deployment. External components (i.e., outside the monorepo) are different, and the usual versioning issues are still at play.

Continuous Deployment: This is a significant topic that deserves a separate blog. The net of it is this. You click the merge button on your PR, and the changes are deployed to production automatically. You will need to configure your monorepo pipeline to support that functionality. Again, tooling can identify which specific deployable units are affected by changes. You will need both automatic versioning and high-quality test automation to gain the confidence required to turn on continuous deployment. Be advised that merging a change to a common, shared library can have the scary effect of overwhelming your GHA runners and your artifact repository. It will also light up your deployment dashboard and trigger some APM alerts. Continuous deployment is another reason your code generation tasks should be selective and generate code only if there has been a change to the underlying schema.

Ownership: Who Owns What?

Putting all your code in one repo doesn't mean a free-for-all. Clear ownership is vital.

Open Inner Source: A monorepo naturally promotes an "inner source" model. Teams can easily discover, use, and contribute to other teams' code, fostering collaboration and reducing duplication.

Team-Based Ownership: Despite shared visibility, clear boundaries are essential. Specific teams typically own specific directories or packages.

Approval Process: Mechanisms like CODEOWNERS files (popularized by GitHub/GitLab) ensure that changes to specific parts of the codebase require approval from the designated owners. It is easier to maintain quality and accountability if the owning teams must review and approve each change to their area.

Structure: Organizing the Behemoth

A well-defined structure is key to navigating and managing a monorepo effectively, especially using the CODEOWNERS approach to project ownership.

Folder Hierarchy: The folder hierarchy might look like this, starting at the top level (the child folders under the repo's root folder).

capability domain
+ application or platform
  + bounded context
    + project that creates a deployable artifact

This hierarchy assumes that your organization groups applications by capability domain, not vice versa. Your organization may not use that precise term. In this case, think of capability domains as how you classify applications in a way that doesn't split teams. Try to be consistent about whatever organizing principles you use here.

I don't like folder names called util or misc because that is the go-to term when you are too lazy to figure out a better name. Having said that, some projects will span bounded contexts and even capability domains. Do we really need to split that code up into a half dozen different, yet well named, paths? What about code ownership? If the same team maintains all that code, does it make sense for the team members to commit PRs across the folder hierarchy? I don't have a clearcut answer here. Park the util project under the capability domain and bounded context associated with the owning team and just be okay with the inconsistency. Have a util folder, after all, and be dilligent about not letting just everything that is not easily categorized collect there.

I have mentioned code generation earlier in this blog. Where should the shared schemas be stored? Who should have to approve PRs containing changes to those schemas: the consumers, the producers, or both? Schema changes might take longer because they need more approvals, but maybe that is the way it should be. Consider using automation tools that automatically reject any PR that includes schema changes that are backwards breaking.

At the project level, follow the best practices for the idiomatic folder hierarchy organization of the tech stack under which the code operates.

You may likely end up coding custom programs that provide CI/CD tooling for the monorepo. You are likely committing those assets in the monorepo itself. For those using GitHub Actions, you will store the custom actions that everyone depends on. That kind of tooling tends to lack proper test automation. If a bug gets approved and merged, the impact could shut down all of engineering until that bug gets fixed. Put all that in one place and have a dedicated ops team responsible for it. Don't forget to set up robust observability for the monorepo pipeline to identify and resolve these issues quickly.

Benefits vs. Challenges

We have already covered a lot of work to maintain a resilient monorepo. Why go to all that trouble?

What is the payoff?

Atomic Commits/Changes: Project-spanning changes can be reviewed and merged in a single pull request.

Simplified Dependency Management: Internal dependencies are straightforward because the version is always the latest.

Code Sharing & Reuse: Easy to discover and use shared libraries or standard patterns. Has someone already coded something similar to what you need to do today? With a monorepo, you're just a grep call away from answering that question. Be advised that GitHub code search can also solve that problem with multi-repos under the same organization.

Improved Collaboration: Greater visibility across teams.

Standardized Tooling: Enforce consistent linters, formatters, and build tools.

What are the hurdles?

Tooling Complexity: Setting up and maintaining a monorepo requires specialized tools and expertise.

Build & Test Times (if not optimized): CI can become slow without smart, incremental systems.

Repository Size: This can become very large, though modern Git handles this better.

Learning Curve: Teams need to adapt to new workflows.

Access Control: Granular permissions can be more complex than multi-repo setups.

Is a Monorepo Right for You?

A monorepo isn't a silver bullet. It's a powerful architectural choice that significantly benefits teams managing multiple interdependent projects, aiming for high degrees of code sharing and willingness to invest in the necessary tooling and practices.

Startups: Can benefit from simplicity and ease of refactoring early on. Small teams can delay some of the costs to monorepo until they get bigger.

Growing Companies: As inter-project dependencies increase, a monorepo can tame complexity and help reduce release anxiety.

Large Enterprises: Proven to scale with the right investment (see Google, Meta). Not all large enterprises stand to benefit significantly from a monorepo. If your main motivation for adopting monorepo is to be like Google, you should probably not adopt a monorepo.

Reasons why a monorepo might not be a great fit for you.

Your projects are mainly independent.
Your engineering culture is such that your engineers would bristle at embracing conformity.
Your engineering organization lacks the resources to manage sophisticated build systems.
Your engineering organization is unwilling to treat your build pipeline as a critical system with robust observability and a dedicated support team.

The Takeaway

Monorepos represents a mature approach to managing complex codebases, fostering collaboration, and streamlining development pipelines. By carefully considering the aspects of pipeline, ownership, and structure, you can decide if this architectural pattern is the right fit to accelerate your development and innovation.

The decision doesn't have to be all or nothing. Don't get too caught up in the literal meaning of the word. You can have less than a million services, yet you still call it microservice architecture. You can have more than one repo yet still call it a monorepo architecture.

Need help evaluating if a monorepo strategy is right for your organization or assistance in designing and implementing one? I can be of service. I have worked with several organizations whose engineering groups have embraced monorepos to varying degrees. I've seen where they shine, and I've also seen their warts. I bring an impartial and balanced perspective to this topic.

Reach out Via Email