Category Archives: Idea

The occasional insight presented in a mini-article.

Opportunity Cost in the Technical Debt Business Case

A few years back, I discussed the business case for reducing technical debt, and the importance of accounting for the risk exposure in that business case. However, there is another item in that business case that deserves some attention: the opportunity cost, defined by The New Oxford American Dictionary  as “the loss of potential gain from other alternatives when one alternative is chosen.”

When a Dev team spends resources and time on reducing technical debt (upgrading, refactoring, repairing), the team will produce fewer end-user stories during that time. Opportunity cost represents the business value that those end-user stories would have yielded, as a way of accounting for the scarcity of the team’s resources.

The literal term ‘opportunity cost’ is seldom heard during technical debt discussions, but it is often a major factor in deciding when to reduce the debt. Whenever a stakeholder (e.g. a product manager) says something like “Yes, we should do something about this debt, but we cannot afford to do it now”, she is probably referring to the business features that end-users are waiting for, or that have been promised before a certain deadline. In other words, the opportunity cost of reducing the technical debt – the potential gain from the alternative of delivering the business features on time – is higher than the interest on the technical debt incurred during that period.

The figure is an attempt to illustrate the opportunity cost by comparing two scenarios: in scenario 1, the technical debt is not payed back, and in scenario 2, the debt is payed back in release 1.2. The value curve at the top of the figure makes a little dip in scenario 2 (dashed line), compared to the continued growth of scenario 1. Using Philppe Kruchten’s backlog color coding, the figure shows that in scenario 1, release 1.2 introduces five new (green) user stories, while in scenario 2, there is only time for one user story because we have spent the rest of the resources on reducing the (black) technical debt. The gap between the dashed and the solid line represents the opportunity cost of reducing the technical debt. (In case you are wondering why the dashed line goes down in release 1.2, even though we are adding a user story: I always feel that existing business features in a solution are subject to some kind of value decay, due to growing expectations and demands from end-users – debatable I know, but beside the point of this blog post).

Example

A good example of opportunity cost in architectural technical debt reduction was presented to me by attendants of an RCDA Practitioner Course a few months back. In their organization, a team had been developing business process automation features for 4 years. The organization had kept track of the labor cost savings attributed to that automation effort, which amounted to 9 FTE (full time equivalent positions) per year on average. The platform the software was running on was due for a major overhaul, because it could not easily be made compliant with new European Commission regulations (most notably GDPR). During the overhaul, they would not be able to develop new features – meaning an opportunity cost equivalent to 9 FTE per year, or 0.75 FTE per month spent exclusively on the overhaul. A significant opportunity cost, but it was determined that the risk of non-compliance outweighed the opportunity cost, in favor of the overhaul.

In conclusion: if you need to draw up a complete business case for taking care of a piece of technical debt, make sure you include the opportunity cost on the costs side. This will help to facilitate a rational discussion about the impact of delaying features, putting this (architectural) choice in its business context. And while you’re at it, don’t forget to include the reduced risk exposure on the benefits side!

Architecture is Context

For architects designing complex solutions, a well-documented set of requirements can never be the sole basis of the architecture. Architects have to undestand why these specific requirements are relevant to their stakeholders, and most likely will have to ask “why?” again and again to understand the stakeholders’ real needs, so as to design the architecture that best fulfills those needs and the goals behind them.  When talking to architects about their work, I often summarize this principle as “Architecture is Context”.

The importance of context

What makes it so important for architects to understand the wider context of their solution? When we view architecture as a set of design decisions, a good measure of a decision’s architectural significance is the economic impact of that decision on the collective stakeholders affected by it (see RCDA: Architecting as a Risk- and Cost Management Discipline). So in order for architects to understand the significance of their decisions, they need to have a firm grasp of the economic context of the solution they are architecting.

A second reason architects need to understand context and background is because requirements (especially non-functional or quality attribute requirements) often turn out to cause conflicts in design decisions: for example, making certain company data available in a mobile app may be good for  productivity, but bad for security.  Architects can only make these types of trade-offs if they not only know the requirements, but also fully understand the business and technology drivers behind them (asking your stakeholders to prioritize NFRs doesn’t really help, since this quickly becomes a meaningless exercise if it is done separate from the design – see Issues Dealing with Non-Functional Requirements across the Contractual Divide).

Flavors of context

A complex solution has many types of stakeholders, each of which have their own goals and needs, contributing to the integral solution context. When focusing on economic impact, the first thing that comes to mind is the business context defined by the business stakeholders, who pay for or benefit from the solution. Architectural significance, however, is determined by a much wider range of stakeholders. Examples are operational and delivery stakeholders, for whom we need to understand the technology and project/release context, and citizens affected by the solution’s safety, security and privacy context.

Understanding context

What can architects do to improve their understanding of a solution’s context, and design architectures that better fit their stakeholders’ underlying needs?

  • Talk to stakeholders. Architects who talk to their stakeholders get a better understanding of their context – well duh, talk about kicking in an open door. And yet… I still run into architects that are so focused on documentation, or on a particular subset of stakeholders, that they forget to interact with the rest. This goes both ways: being too exclusively focused on your technical stakeholders (“architects must code!”) is as bad as only being interested in “the business” side of things. Make sure you speak your stakeholders’ language – watch Jochem Schulenklopper’s SATURN 2015 talk “Why they just don’t get it” if you want to know how. Without actually talking to stakeholders, my other two tips (modeling context and trace decisions to context) lose most of their value.
  • Example of a system context diagram

    Modeling context. Modeling system context has been part of software design methodologies since the early days. In the 70s, the Yourdon Structured Design method involved a context diagram, which showed which external systems and users interacted with a system – a technique many (including me) still consider essential for clarifying the solution boundary and external dependencies. In fact, the first C in Simon Brown’s C4 model stands for Context, represented by just such a context diagram. UML’s use case diagrams fulfill a similar role (but much less clearly). Michael Jackson’s problem frames go a step further by modeling not just physical and logical context of the system itself, but also the wider context needed to understand the design problem to be solved.

  • Trace design decisions to context. Once we have gone to all the trouble of understanding the context of the solution we are architecting, it is paramount that we record the impact of that understanding on our architecture. The perfect place to do that is in the ‘rationale’ section of our architectural decision record. Here is where we show that our decision is based on something in the real world, real concerns, goals or other drivers from our stakeholders. This not only helps us get buy-in for the decision, but also brings huge benefits if the context changes (or should I say “when the context changes”, because it usually does). If we have to revisit an architectural decision, knowing the full context of the original decision rationale makes re-appraising the trade-offs much easier. Putting it in economic terms, tracing your architectural decisions to their context lowers the cost of change – making your architecture more agile.

In conclusion, context is key in architecture, and there are several ways in which architects can improve their understanding of context and its impact. The most important tip is to talk to your stakeholders, and keep asking them “why?”. What is your take on this? Do you always draw a context diagram? How much context do you put in your architectural decision records?

 

 

Shortening the architectural feedback loop

One of the things architects can learn from the Agile mindset is the importance of short feedback loops. The quicker an architect receives feedback on their output, the faster they learn about its effect in their specific solution context – and better informed architects make better decisions. Architecture is a matter of reducing uncertainty by gathering knowledge and making decisions, and a shorter architecture feedback loop speeds up that uncertainty reduction, leading to better architectures. On top of this, shorter loops lead to shorter reaction times when things change, which increases agility. In this blog post, I will share some tips for shortening the architectural feedback loop.

Architectural decisions are your primary deliverable

What does your organization consider the primary architectural deliverable? Chances are it is a document with a name like Project (Start) Architecture or Software Architecture Description – let’s call it The Architecture Document. It takes weeks or even months to produce, after which time all architectural models and decisions are approved and distributed in one foul swoop – The Architecture Document version 1.0. If you are in this situation, the first step to shortening your feedback loop is to start viewing individual architectural decisions as your primary deliverable. The finer granularity of these decisions, compared to The Architecture Document, will make it much easier to speed up the feedback loop.

Continuously share concerns and decisions

Do not be afraid to share your unfinished architecture output. The sooner you share, the faster you learn. I like to let my stakeholders (both business and technical) know what I think are the most critical architectural concerns, and share decisions as soon as I am aware that they need to be taken. (One word of caution: always make sure the status of the decision-in-progress is absolutely clear to prevent people acting on them prematurely.) This gives stakeholders the opportunity to contribute to the architectural process from the start. Make this information as easy as possible to access: don’t put it in text documents, but find a low-threshold platform like a Wiki or issue tracking system.

Invite immediate feedback from stakeholders

We are often hesitant to ask for feedback on something that we ourselves do not consider to be perfect already. Architects, however, cannot afford to wait that long. Most architectures emerge from a dynamic process of frequently identifying new concerns, repeatedly finding out new facts and continuously adjusting partial decisions that interact with each other and our context. The perfect architecture is only known after the solution has been delivered (if then).

So tell your stakeholders to give you feedback when they have it, and not to wait for ‘official’ review moments like a ‘version 0.9’. Make it as easy as possible for them: enable the comments box on your Wiki, invite people to email you. If big print-outs of your architectural models adorn the team’s war-room, make sure to put a pencil on a string next to them. Make sure everyone knows you welcome feedback by asking for it at the water cooler, or on your team chat platform if you are not co-located.

Simplify your architecture documentation template

Template bloat is one of the causes of long architectural feedback loops. Organizations often lack a decent repository for architectural knowledge, and abuse their architecture documentation templates as a dumping place for all lessons learned. Diverse stakeholder concerns for all types of solutions end up getting their own sections in the template. On top of that, architects for whom The Architecture Document is the only place to store knowledge about a solution cause even more bloat. There are two things you can do to fight this document obesity:

  • Create a template with sections for only the most common concerns at only the start of a solution’s lifecycle. Get rid of all ‘placeholder’ sections. Add views for other concerns only as they become significant later in the lifecycle, and only insofar as they are relevant to your context – in short, create living, minimal architecture documentation.
  • Make sure you have another repository for knowledge that is not immediately relevant at the current point in the solution lifecycle. Create a wiki or library for the organization’s lessons learned and documentation plug-ins for views to be added later in the lifecycle, and find a place outside of The Architecture Document for solution-specific insights that you have gathered.

Get involved in delivery

The final tip for an effective, short feedback loop is to get involved in the delivery of your solution. If you are a software architect, coding key parts of the solution yourself is a great way to get involved. If you do not have that opportunity, get involved in integration, quality attribute testing or other architecturally significant delivery activities. You will not only become your own feedback channel, but also stimulate other delivery team members to tell you about their concerns and help improve your architecture.

In my work of coaching organizations to approach architecting in an agile way, shortening the architectural feedback loop has proven to be one of the most effective ways to improve architects’ effectivity and business value. Especially its positive effect on the scability of architecture work and the start-up time of smaller projects is quickly noticed and appreciated by business stakeholders. Let me know if the five tips in this post prove to be useful to you as well.

Architecting transient solutions

When I teach architecture classes, I increasingly run into architects who never architect a “system”. They do all the things architects do: analyse stakeholder needs, identify ways to address these needs, (help) decide on the best way by making trade-offs, communicate these decisions and oversee the solution’s implementation. However, after all the work is done, the stakeholder needs are fulfilled, but there is nothing in the real world that they can point to and say: “that is the thing I architected”. At the end of the road, the implementation of their solution to the stakeholder needs consisted of a series of changes to existing systems, but did not require the creation of a new system. Their job is to architect transient solutions: changes to a domain or product they are responsible for.

One example was brought to me by the architect responsible for a transportation ticket vending machine. There was a stakeholder need to reduce the time and cost required to deploy new products and user interaction schemes, and the solution was to move most of the logic and data from the vending machine to a central place. Projects like this require significant (and careful) design effort, and substantial implementation time. A documented solution architecture is a must. However, at the end of the road, no new systems have been created (with the possible exception of minor components like hubs between previously unconnected systems).

Depending on the prevailing flavor of delivery governance, such transient solutions are defined in architecture epics, project documents or change requests. The architects designing them call themselves “domain architect”, “project architect”, “tech lead” or “master builder”.

What’s in a name

Is this really architecture? The ISO 42010 definitions of architecting and architecture both refer to a “system” of interest, e.g. “fundamental concepts or properties of a system in its environment embodied in its elements, relationships, and in the principles of its design and evolution”. All the other words seem to apply, so can we ignore the fact that there is no one “system” to point to after implementation of the architecture? Is this question really relevant, as long as the person designing these solutions knows how to apply architecting principles and practices?

If this is architecture, what type of architecture is it? Of all the different genres of architecture (software, infrastructure, system, enterprise architecture, etc.) the term solution architecture seems to best cover these transient solutions. This phenomenon actually helps us to create an appropriate definition for the term solution in solution architecture. Where previously we had to make do with something like “a solution is a way to solve a problem”, I would suggest the following:

Solution: a coherent set of changes delivered to address a defined set of stakeholder needs.

The changes referred to in this definition pertain to elements which can be newly created, modified or removed as part of the solution. The term coherent implies that the (changes to the) elements cooperate to achieve the fulfillment of the stakeholder needs (using Fred Brooks’ term, the solution has conceptual integrity). Coordination of a solution’s delivery depends on the governance model applied, this can be using agile or traditional delivery models, in a value stream, in a program or project, governed by a contract or otherwise. How and where the stakeholder needs are defined also depends on the governance model applied, this can be in an epic or set of (user) stories, but also in a program or project definition, contract or change request. From the perspective of this definition of solution, architecting a new system or service is just a special case where the “set of changes” consist of the creation of one new system or service.

Applying architecture practices

How does this way of looking at solution architecture affect the way we apply architecture practices? To explore this question, let’s take Risk- and Cost-Driven Architecture (RCDA) as our reference practices. As stated in the introduction, most core practices apply to both transient solutions and to system (or system of systems) solutions:

Architectural Requirements Prioritization No change.
Architectural Decision Making No change.
Applying Architectural Strategies System solutions: focus on decomposition.

Transient solutions: also focus on transformation of
elements.

Architecture Documentation System solutions: architecture documentation maintained
with system.Transient solutions: documentation dissipates into the
documentation of the systems affected by the solution.
Solution Costing System solutions: solution breakdown structure decomposes
into new elements to be created .Transient solutions: solution breakdown structure
decomposes into transformation of affected elements.
Architecture Evaluation No change.
Architecture Implementation System solutions: focus on build and integration.

Transient solutions: focus on dependencies between changes
to affected elements (and integration).

Technical Debt Control System solutions: technical debt controlled at system level
(and higher).Transient solutions: technical debt controlled at level of
changed systems and higher.
Architecture Evolution Not applicable to transient solutions after completion,
because the solution has then dissipated into the affected
elements.

As you can see from this brief analysis, the main differences when applying architecture practices occur after the solution has been delivered. The most visible difference is what happens to the architecture documentation, for which it makes no sense to remain “as a unit” after implementation of a transient solution, except for historic reference. Extra care must be taken that the relevant information (such as rationale of architectural decisions) remains available afterwards, either in the documentation of the systems affected, or at a higher level (e.g. in domain or enterprise architecture documentation).

Conclusion

As less and less green field systems are created, an increasing number of architects find themselves mainly designing transient solutions – solutions that dissipate into the landscape and cannot afterwards be identified as a product or system. Good architecture practices are still applicable in this situation. This realization leads to the idea for a defininition of Solution Architecture as the “architecture of a coherent set of changes delivered to address a defined set of stakeholder needs”.

Time as a First Class Architectural Concept

20150807_164338Architects in the digital world often compare themselves to architects in the construction world. The metaphor works on many levels: architects in both worlds are responsible for conceptual (and structural) integrity, are the content leaders with overview over design and construction, making key design decisions and drawing blueprints. There is at least one aspect, however, in which the construction world is very different from the digital world: IT-based solutions such as software, infrastructure and services are subject to change far more frequently than buildings.

The role of change in the digital world is recognized in the ISO 42010 definition of architecture, which explicitly mentions evolution as part of architecture. Change is also the central theme in modern software development methodologies like Agile and DevOps. The digital architecture disciplines seem to be lagging behind a bit in this development, perhaps hindered by the metaphor that gave them their name. It is time to give change and evolution their proper place in the digital architecture world, and make Time a first class concept for architects of software, infrastructure, services, enterprises etcetera.

Issues with time-agnostic architectures

When construction of a building is finished, its architecture documentation is final: it will not change for decades. The architecture documentation was probably considered final before construction began, since in the vast majority of cases only non-architectural changes occur during the construction of buildings. It is perhaps this idea of “final and unchanging architectural documentation” that prompts managers to ask architects “when will your architecture be finished”? In view of the prominence of change in the digital world, however, solution architects can rarely give a straightforward answer to this question.

Nowadays, practically all software-intensive systems are part of a complex application landscape, forming systems-of-systems with myriads of interdependencies between commercial and bespoke software systems, hardware platforms and organizational entities, all with their own evolution cycles. In such a landscape, a time-agnostic architecture is a very perishable good: its best-before date is at most weeks in the future. This  leads to the following observed issues:

  • architecture documents that are perpetually “almost finished” (causing delays in projects dependent on them) or already obsolete when they’re issued
  • development based on obsolete architectural assumptions
  • difficulty planning ahead

Popular architectural styles like SOA and microservices attempt to reduce the pain caused by these ever-changing dependencies at the technical level, but can provide only limited relief when it comes to logical dependencies. In modern application landscapes, the only other way for architects to address the issues observed above seems to be to design the solution’s evolution into the architecture. We have to create architecture documentation that not only describes the as-is and to-be situations, but explicitly identifies and deals with architecturally significant events on the way from as-is to to-be.

Architecting Time: the Evolution viewpoint

According to the ISO 42010 standard, architecture documentation consists of views that represent the architecture from certain viewpoints. Viewpoints are designed specifically to demonstrate to stakeholders how the architecture addresses a particular set of their concerns. Philippe Kruchten’s seminal “4+1 View model” paper gives five good examples of viewpoints that do this for common stakeholder concerns in software development. What if we added an “evolution viewpoint” that is specifically designed to show how the architecture addresses the impact of changes in the solution’s environment? An Evolution view would identify future events that will have architectural impact on the solution. It would specify that impact in terms of business value, cost and risk, and analyze dependencies between the solution and the events. The view would also document how the solution (and/or its delivery/operations team) will anticipate and react to the events when they occur. Typical examples of events with architectural impact are:

Event When expected Impact type Impact
Competitor releases next generation product Q2/2016 Business value Our own product will be harder to sell if we do not match their new features
XP support discontinued 4/2014 Risk Vulnerabilities no longer patched
Corilla license contract expires 5/2017 Cost Opportunity for cost reduction by switching to open source alternative
New version of WebSphere 11/2015 Cost Opportunity for maintenance cost reduction by using new features announced for next version
Project to build System Y finishes Q3 2016 Business value System Y (which is interdependent with ours) requires interface features that are currently not supported by our solution

Stakeholders interested in the Evolution viewpoint would be anyone who has concerns related to change and planning: specifically project/program/product managers, product owners and  architects/designers/developers working on other solutions in the same interdependent system of systems. It would help the managers and product owners plan ahead, and by acknowledging future events in the time dimension it would help fellow workers that depend on your architecture by telling them what aspects of the architecture will change, and when.

In short

Making the Time dimension part of your architecture documentation may look like more work, but anticipation should be a large part of your job as architect anyway. By writing it down (for example using an Evolution viewpoint), your architecture description will stay valid longer, and you will have a ready answer when stakeholders ask you how their change and planning concerns are addressed.

Sources:

Software Systems Architecture – Working With Stakeholders Using Viewpoints and Perspectives by Rozanski and Woods, specifically the Evolution perspective.

Enabling Agility Through Architecture by Brown, Nord and Ozkaya gives us the tools for “Informed Anticipation” in the architecture: dependency analysis, real option analysis and technical debt management.

How DevOps impacts architecture

DevOps ArchitectureDevOps is an approach to IT that radically alters the traditional relationship between development, maintenance and operation of software. The traditional separation between Dev and Opsbrings with it a substantial amount of waste in terms of elapsed time and budget. This waste is caused by differences in way of working, attitude and hand-over inefficiencies. The waste is exacerbated by accountability mechanisms that incentivize Dev and Ops departments to erect barriers, to protect themselves in case the other side fails to fulfill its responsibilities (“cover your behind”).

In DevOps, this waste is largely eliminated by making one team responsible for both DEVelopment and OPerationS of applications. Supported by extensive automation (mostly by open source tooling) of testing and deployment, DevOps allows organizations like Facebook, Netflix and Google to deploy new releases into the Cloud on a daily basis. This is quite an improvement over the usual “once or twice a year” releases seen in traditional software products, and it gives organizations tremendous agility in reacting to market developments.

DevOps appears to be most successful in end-user facing (front-end) applications that run in public or private clouds and have a web-based user interface. The tooling supporting the approach is fascinating, especially in the way it ensures quality control: see Netflix’s Dianne Marsh take on this tooling at this year’s Saturn conference. But how does DevOps impact the architecture of the software that is developed with it? So far, I have found three ways in which DevOps influences architecture:

  • Architecting for deployability. Of all the quality attributes that a system may have, the one that is most directly related to DevOps is the ability to quickly deploy new releases into operation, with as much automation as possible. As I already mentioned in a previous blog, Stephany Bellomo and Rick Kazman of the SEI have made a study of this, and so has Len Bass of NICTA. An important consequence for architects appears to be the necessity to simplify and streamline the design. This includes standardization of infrastructure components, but also removing architectural elements which were originally intended to ensure other quality attributes, now overruled by the higher priority of deployability. One remarkable example of streamlining the SEI researchers found was the removal of an Enterprise Service Bus in one organization: originally implemented to enhance connectivity and modifiability, the ESB was found to be an obstacle to the speed of (automated) deployment and removed in the transition to DevOps (just an example – not a tip, ESBs generally have no quarrel with DevOps).
  • Architecting for systemic resilience. As the speed and scale of deployment increases, chances are that from time to time stuff (services, app, APIs) that your software depends on may not be available for a few minutes or hours. Your software needs to be able to cope with this: the system as a whole needs to still be as available as possible. Your particular piece of the solution needs to stay alive, even if the whole world turns against it. Organizations like Netflix harden their system (and I guess their developers!) by actually making things worse: their simian army of chaos monkeys, latency monkeys and so on wreak havoc in test and production(!) environments. As Netflix says, “the best defense against major unexpected failures is to fail often”.
  • Consolidated development architecture. In software development the target architecture for a solution has always put constraints on the set-up of the development environment, but with DevOps, it becomes almost impossible to see the development and operational environment as separate entities. From an architectural point of view, the development and target environments form one system with three main categories of users: developers, operators and end-users. One could even argue that that first two of those categories are the same in the DevOps philosophy. Many quality attributes affect all these user categories, and many architectural decisions have impact across the board as well. In DevOps, an architect (or “master builder”) does not design a target architecture after which an appropriate development toolset is selected: the whole ensemble forms one consolidated solution, with one solution architecture that addresses the combined architectural concerns for development and operation.

So as an architect, you may need to learn three new things: 1) to remove stuff rather than add stuff, 2) to design failure into your solution and 3) to design a consolidated development environment into your architecture. And you had better learn fast: Gartner calls this architectural approach “Web-scale IT”, and predicts that by 2017 50% of global enterprises will be using it.

What other aspects of architecture do you think is impacted by DevOps and Web-scale IT? Please leave your comments and let me know what I missed.

The business case for technical debt reduction

When I teach solution architecture classes, technical debt is always a very popular topic among practicing architects. Technical debt is a metaphor that transposes the concepts of loan and interest to IT based solutions. It respresents work that should be done in order to deliver a consistent, maintainable solution. As long as the work has not been done, the solution is in debt, which means that some stakeholders pay interest in the form of e.g. extra effort needed for simple changes, or higher support fees. Repaying the loan’s principal means doing the work needed to remove the debt: this could mean e.g refactoring software or upgrading hardware. As soon as that work has been done, the stakeholders stop paying the interest, just like when a loan has been repayed.

Architectural debt

The technical debt metaphor has been very popular in the software development world, where it refers mainly to low code quality or unnecesary complexity. Tools like SONAR now have functionality that analyses source code to measure such “implementation debt”. For architects, however, other types of technical debt may be more interesting. Aside from implementation debt, there is “architectural debt”: this is typically structural in nature, or represents a technology gap.

esb td hi

 

An example of structural architectural debt is when an architectural principle like “all applications should use the Enterprise Service Bus (ESB) to exchange data” is temporarily violated. An architect could decide to allow direct access from application A to application B’s database if A needs data of B that have not yet been exposed through the ESB, and doing it properly would mean missing an important deadline. The interest in this case is caused by reduction of control of the information flow through the application landscape, and potential errors being introduced by teams that are not aware of the shortcut. The principal is the refactoring that needs to be done later on: changing applications A and B to route the data through the ESB, and configuring the ESB. This type of debt cannot be measured in the software code of either application: it is structural in nature. The same is true of technology debt,  when a solution uses obsolete hard- or software products that cause potential failures and risks (interest) and needs to be upgraded (principal).

Business case

One of the problems many architects face is convincing their stakeholders to reduce technical debt, mainly because the debt is invisible to the end-user (see Philippe Kruchten’s categorization). Making the case for technical debt reduction in technical terms will usually not convince the business stakeholders. That requires translation into economic terms – in other words, a business case.

If one has unlimited resources (time, budget, staff), the business case for repaying technical debt is quite simple: the longer you wait, the more interest you pay, so the economic optimum is immediate repayment. The only exception is when the solution is planned to be decommissioned, and the total interest to be paid over the remaining lifespan is lower than the cost of repayment. Usually, however, resources are limited and the business case for technical debt reduction needs to compete with other solution improvements, such as new features.

In all cases, proper representation of the interest is crucial to making a compelling case. In case of structural or technology debt, it is often hard to quantify the extra costs caused by the debt. The difficulty is compounded by the inherent uncertainty: things might go wrong, but they might also go smoothly, even with the technical debt present. One often hears arguments like “we’ve run this application on this platform for 15 years, and it has never caused us any problems – so why invest in an upgrade?”.

The risk factor

The key to making the business case for technical debt reduction is to account for the risk caused by the debt. The proper way to calculate the total expected cost of uncertain failure is the well-known risk exposure formula: E(S) = p(S) x C(S), where p(S) is the probability of failure scenario S occurring, and C(S) is the cost incurred when S occurs. By summing up the risk exposure E over all possible failure scenarios S caused by the technical debt, you come as close as statistically possible to an accurate prediction of the expected cost of failure.

I recently encountered a situation where a large transportation company was running some of their core business systems on ancient mini-computers. Spare parts were very hard to get by, and the manufacturer had put severe limitations on their maintenance contract. The organization in question had a hard time making the business case for migrating the system to a modern, virtualized, blade-based solution: the cost of the old platforms was so low that the ROI for the migration looked negative. The risk of failure, however, was substantial: a single missing spare part could potentially break the company by disabling their core system for a few days. Including that risk exposure in the technical debt interest leads to a completely different business case.

Forgetting the risk factor in the business case for technical debt reduction is a common mistake, which can lead to very wrong decisions. The organization in the example above instinctively knew they had to make the upgrade anyway, but the omission of the risk in the business case did lead to unnecesary delays in the decision making process.

In short: architects arguing for technical debt reduction should make sure that they articulate the risk component in the interest. This will help them convince stakeholders to give proper priority to things they might otherwise find less interesting for being invisible to end-users.