Spotify’s Backstage: A Strategic Guide
A look at Backstage’s features, what is needed to run it and how you figure out the best option for you.
Backstage is an internal developer portal originally built by Spotify and contributed to the open source community a few years ago. Spotify’s marketing prowess has helped it gain attention throughout the community, but is it the right internal developer platform for you? What are the pros and cons? What do you need to know as you evaluate it versus specialist solutions?
Let’s look at Backstage’s features, what it takes to run it and how you can decide if it’s your best option for an engineering home base.
Backstage is an internal developer portal built around a catalog. It helps you organize all your services, data pipelines, etc. in one place and offers a scaffolder to spin up new projects using architectural blueprints and a docs-as-code solution.
With Backstage you can:
Spotify describes Backstage as being “born out of necessity.” Its infrastructure grew quickly, with development teams moving fast and distributed across several regions. As a result, their tools and processes were fragmented and hard to manage.
So, their teams were spending more time figuring out how to get started than they were writing, building and testing code. They needed a single source of truth for their infrastructure.
In the 2010s, Spotify built a layer that sat on top of its tools and infrastructure, making it easier to find, use and manage its services and tooling. A fully featured internal developer portal requires a sizable engineering investment, however. In 2020, Spotify elected to donate core components of its internal developer portal to the community, and has sought to build a community around it to advance the engineering effort.
Backstage consists of four main features: a catalog, analytics, a scaffolder and docs-as-code.
Backstage has a software catalog that acts as a centralized location for tracking internal applications, libraries, pipelines and websites. Engineering teams register entities in the catalog via YAML files that make up an explorable index of components, APIs and resources. Backstage expects the YAML files to be stored in your git repositories, so it backs your catalog up and is recoverable with a complete history. Dependencies among entities in the catalog can be declared manually.
Metadata about catalog entries is ingested from your tools via plugins that are configured in the service’s YAML.
The Software Catalog solves two big problems:
Another challenge in large engineering organizations is making it easy for teams to spin up new services while maintaining consistency across projects. Backstage’s Golden Path Scaffolder help address portions of that problem.
You can create templates inside Backstage that, with the click of a button, create new projects in places like GitLab and GitHub. So, rather than legislating conventions and standards, you can make it easy for teams to start new projects on the right foot.
TechDocs is a tool for storing your documentation with your code while still making it easy for engineers to view.
Your engineers write their documentation in Markdown and store it in source control, right next to their code. An MkDocs build step creates a documentation website when your CI pipeline builds the code. Backstage pulls in the site and displays it in your portal.
Since you download, install and build Backstage yourself, you can run it anywhere you wish. This means it can run on-premises or in your cloud infrastructure, so it’s easy to tie it directly to your systems without having to worry about opening it up to the outside world.
That makes Backstage’s promise to “unify all your infrastructure tooling, services and documentation” possible. Regardless of what you’re running and where you’re running it, you can make it available to Backstage’s abstraction layer.
Backstage’s plugin architecture, and the 130 plugins already available, means it’s remarkably powerful and extensible “out of the box.” Backstage can talk to AWS Lambda, Azure Pipelines, Datadog, GitHub Actions and many more third-party tools.
So, Backstage has the power to fulfill its promises, but at what cost?
Cost? But didn’t we say the Backstage is free as in beer? Yes, we did. But, as we also said, you download, install and build Backstage yourself. Building a Backstage portal requires a lot of work, and ongoing maintenance including editing code to set up and configure plugins.
In short, Backstage is a project, not a solution. Before you adopt Backstage, ask yourself this question: Does your company have resources that can take on extra projects, or is your engineering capacity scarce and need to stay focused on delivering value to your customers? If your answer is the latter, you need a solution and not Backstage.
Another option is to buy Backstage services from a third-party hosting provider. While this outsources the time and effort to maintain your IDP, it limits your options to the services your service provider will provide, and Backstage is a project for them too.
Despite being a CNCF offering, Backstage doesn’t ship pre-built Docker containers. The installation requires Node.js and Yarn and, depending on your existing tech stack, may require additional personnel to build the IDP and keep it running.
Backstage is a code-based product that requires a lot of customization via editing both source and YAML files. There’s no single-, or even multiple-click install procedure. You’ll almost assuredly need a sizable dedicated team to maintain the systems and to keep the application up to date.
Further, configuring plugins to bring data into Backstage adds an additional layer of complexity. Each plugin requires configuration versus single-click integration. Where Backstage does have plugins for key cloud resources, the number is extremely limited, you must configure a plugin per cloud service versus a single-click integration for all of your cloud resources. (AWS Lambda and AWS Proton each require their own plugins with separate configurations versus one-click AWS integration covering dozens of cloud resource types).
Difficult to set up means difficult to maintain too. Each plugin has a bug fix and release schedule, and every update may come with a breaking change.
Backstage hosting providers have limited resources, so they are often not willing to apply updates right away, and they ration their support resources via pre-arranged commitments. So instead of keeping your costs as low as possible, you may be presented with a minimum. Beware of non-transparent pricing!
To unlock value from a catalog like Backstage, you have to get the right information into the it. This happens via plugins, which are written by third parties. As a result, the quality varies materially among plugins, and it’s entirely up to you to view the plugin’s GitHub repository and decide whether it’s worth installing it.
For instance, the Backstage Kubernetes plugin supports native K8s and provides helpful details around current status, errors, proximity to autoscaling limits and container restarts; however, there is no plugin for Azure’s AKS nor AWS’ EKS, and the Google GKE plugin is limited to usage and cost monitoring. Additionally, plugins have different depths of implementation. For example, some will enable search and others won’t, and it’s not transparent unless you read the code exactly what you’re getting with respect to the depth of the plugin’s capabilities. The documentation is scant if it exists at all.
Backstage is missing important categories of information that limits the scenarios and value it can deliver. It does not understand concepts like applications and environments, and features extremely limited coverage of the major cloud providers with only three plugins for AWS, two for Azure and only one for GCP and no knowledge of how all these items relate to each other. These concepts and data sets help developers and ops teams self-serve essential information useful to deflect tickets as well as more quickly resolve tickets. It also limits the Backstage IDP's ability to provide Scorecards that measure compliance with important internal standards around reliability, security, and cost as well as other reporting and analytics scenarios.
It also lacks uniform search. Each plugin gathers its own data, and there’s no standardization on what it should collect or how it should format it. Backstage has documentation for implementing or adding a search engine. So you can try to improve your ability to search the system, but only if you have the time, staff and money to put in the effort. The project focuses all the use cases on searching for entities, and none covers the relationships between them, which is a hint about its limitations.
Many of these limitations come from Backstage’s architecture. It acts as an abstraction layer over your existing services and tools. It reads YAML files and feeds describing your infrastructure, but it has few querying and reporting capabilities. It displays a syndicated feed that describes your assets.
This severely limits its ability to enable users to answer questions, to build useful analytics and even limits the utility of the core catalog in some cases (imagine if you don’t have a time series database to easily observe change history when troubleshooting).
Backstage also lacks a solution for data integrity. As such, when data in an enterprise’s tool chain changes, catalog entries in Backstage become stale until a service owner independently recognizes drift has occurred and manually updates their YAML files.
For some, Backstage’s unified approach to TechDocs is a plus. For others, though, it’s a least common denominator approach that locks their development teams into one tool. Why not let development teams pick the best tool for their situation?
Backstage further only offers a scaffolder, which might work well for creating a new service but may the best tool for the job when it comes to common day 2 actions, 3rd-party runner orchestration, and using your IDP's API to trigger or gate actions.
Spotify Backstage’s syndicated approach to gathering and displaying data also harms more than just search capabilities. It limits the platform’s ability to display useful analytics because they can’t be integrated across devices. Backstage lacks the flexible analytics tools that its paid competitors have.
Backstage does offer insights, but you’re locked into its pre-canned Tech Health and Cost tools, and the data they report is limited to what plugin developers decide to offer.
If you’re going to pay for the internal administrative costs associated with Backstage or the hosting costs associated with a service provider, then why not consider a closed-source alternative? If you don’t, you may give up important feature differences and favorable total cost of ownership (TCO) advantages.
Configure8 provides a truly universal catalog — all your infrastructure, environments, services (and serverless functions, data pipelines, etc) and applications along with key data about them sourced from the team and the enterprise’s toolchain — in an always-fresh socio-technical knowledge graph with a standardized schema.
It also provides advanced analytics on data in the graph, which can illustratively be used to answer questions around architectural standards, compliance, security, reliability, even value stream metrics. Users can also organize initiatives to help teams improve around similar measures that can all be custom-defined by management (versus being locked into a limited set of rigid templates). Configure8 offers a liberal free tier and users can access the platform without talking to sales.
Eric Goebelbeckerhas worked in the financial markets in New York City for 25 years, developing infrastructure for market data and financial information exchange (FIX) protocol networks. He loves to talk about what makes teams effective (or not so effective).