Architects in the digital world often compare themselves to architects in the construction world. The metaphor works on many levels: architects in both worlds are responsible for conceptual (and structural) integrity, are the content leaders with overview over design and construction, making key design decisions and drawing blueprints. There is at least one aspect, however, in which the construction world is very different from the digital world: IT-based solutions such as software, infrastructure and services are subject to change far more frequently than buildings.
The role of change in the digital world is recognized in the ISO 42010 definition of architecture, which explicitly mentions evolution as part of architecture. Change is also the central theme in modern software development methodologies like Agile and DevOps. The digital architecture disciplines seem to be lagging behind a bit in this development, perhaps hindered by the metaphor that gave them their name. It is time to give change and evolution their proper place in the digital architecture world, and make Time a first class concept for architects of software, infrastructure, services, enterprises etcetera.
Issues with time-agnostic architectures
When construction of a building is finished, its architecture documentation is final: it will not change for decades. The architecture documentation was probably considered final before construction began, since in the vast majority of cases only non-architectural changes occur during the construction of buildings. It is perhaps this idea of “final and unchanging architectural documentation” that prompts managers to ask architects “when will your architecture be finished”? In view of the prominence of change in the digital world, however, solution architects can rarely give a straightforward answer to this question.
Nowadays, practically all software-intensive systems are part of a complex application landscape, forming systems-of-systems with myriads of interdependencies between commercial and bespoke software systems, hardware platforms and organizational entities, all with their own evolution cycles. In such a landscape, a time-agnostic architecture is a very perishable good: its best-before date is at most weeks in the future. This leads to the following observed issues:
- architecture documents that are perpetually “almost finished” (causing delays in projects dependent on them) or already obsolete when they’re issued
- development based on obsolete architectural assumptions
- difficulty planning ahead
Popular architectural styles like SOA and microservices attempt to reduce the pain caused by these ever-changing dependencies at the technical level, but can provide only limited relief when it comes to logical dependencies. In modern application landscapes, the only other way for architects to address the issues observed above seems to be to design the solution’s evolution into the architecture. We have to create architecture documentation that not only describes the as-is and to-be situations, but explicitly identifies and deals with architecturally significant events on the way from as-is to to-be.
Architecting Time: the Evolution viewpoint
According to the ISO 42010 standard, architecture documentation consists of views that represent the architecture from certain viewpoints. Viewpoints are designed specifically to demonstrate to stakeholders how the architecture addresses a particular set of their concerns. Philippe Kruchten’s seminal “4+1 View model” paper gives five good examples of viewpoints that do this for common stakeholder concerns in software development. What if we added an “evolution viewpoint” that is specifically designed to show how the architecture addresses the impact of changes in the solution’s environment? An Evolution view would identify future events that will have architectural impact on the solution. It would specify that impact in terms of business value, cost and risk, and analyze dependencies between the solution and the events. The view would also document how the solution (and/or its delivery/operations team) will anticipate and react to the events when they occur. Typical examples of events with architectural impact are:
|Event||When expected||Impact type||Impact|
|Competitor releases next generation product||Q2/2016||Business value||Our own product will be harder to sell if we do not match their new features|
|XP support discontinued||4/2014||Risk||Vulnerabilities no longer patched|
|Corilla license contract expires||5/2017||Cost||Opportunity for cost reduction by switching to open source alternative|
|New version of WebSphere||11/2015||Cost||Opportunity for maintenance cost reduction by using new features announced for next version|
|Project to build System Y finishes||Q3 2016||Business value||System Y (which is interdependent with ours) requires interface features that are currently not supported by our solution|
Stakeholders interested in the Evolution viewpoint would be anyone who has concerns related to change and planning: specifically project/program/product managers, product owners and architects/designers/developers working on other solutions in the same interdependent system of systems. It would help the managers and product owners plan ahead, and by acknowledging future events in the time dimension it would help fellow workers that depend on your architecture by telling them what aspects of the architecture will change, and when.
Making the Time dimension part of your architecture documentation may look like more work, but anticipation should be a large part of your job as architect anyway. By writing it down (for example using an Evolution viewpoint), your architecture description will stay valid longer, and you will have a ready answer when stakeholders ask you how their change and planning concerns are addressed.
Software Systems Architecture – Working With Stakeholders Using Viewpoints and Perspectives by Rozanski and Woods, specifically the Evolution perspective.
Enabling Agility Through Architecture by Brown, Nord and Ozkaya gives us the tools for “Informed Anticipation” in the architecture: dependency analysis, real option analysis and technical debt management.
Another great SATURN (the Software Engineering Institute (SEI) Architecture Technology User Network) conference in Baltimore this year. The setting was a bit surreal – due to social unrest and a curfew(!) we couldn’t really explore the city during the conference, and were pretty much confined to the conference hotel. After three days of intense knowledge exchange, it was a pale, squinting collection of software architects that emerged from the Lord Baltimore Hotel – tired but satisfied. As predicted by George Fairbanks at the start, it was also the “most annoying SATURN ever” – in the sense that there were four parallel tracks, and whichever one I chose to attend, I always knew that there was at least one other very interesting thing going on in one of the other “salons”. Fortunately, many sessions were videotaped and can be viewed on-line.
— Eltjo Poort (@eltjopoort) May 1, 2015
Last year‘s most prominent topic was DevOps, and it was still prominent on this year’s event. Another topic that keeps attracting attention is the interplay between architecture and agile delivery, the topic of my own two contributions. The rest of the sessions could roughly be divided in coding-oriented (in which MicroServices was the most discussed topic) and business-oriented subjects. I mostly attended the latter.
The conference opened with a great keynote address by Mary Shaw about the progress of Software Engineering as an engineering discipline. Mary had studied various engineering disciplines and abstracted a simple and elegant lifecycle, highlighting the growth of a discipline from art to production, commerce, science and eventually engineering. According to Shaw’s model, something is a real engineering discipline if the combination of science, art, production and commerce allows “ordinary” trained people to be systematically successful (as opposed to an art that can only be masted by virtuosi). Mary’s conclusion was “we aren’t quite there yet”.
Right after the keynote it was my turn to lead a participatory session on “Maximizing your business value as an architect“, and I felt priviliged that the room was packed with 50 practicing architects from all kinds of organizations. The discussions were lively, most participated vigorously in the exercise (“prioritize architectural concerns by risk and cost”), and almost all participants indicated they would be able to use the approach immediately in their daily work. I’d like to thank all participants for making this such a gratifying session.
On Wednesday morning, InspearIT’s Jochem Schulenklopper gave the presentation that would win the Architecture in Practice award: “Why they just don’t get it: communicating architecture to business stakeholders.” Jochem made some excellent points about the need for architects to learn and communicate in business stakeholders’ language. Learning a language is not trivial and requires exposure. And using a modeling language like UML or ArchiMate to communicate with business stakeholders is useless: you will quickly find yourself explaining the language in stead of explaining the architecture. After the session, even Len Bass indicated that he had learned new insights from Jochem’s talk – a true compliment! Wednesday afternoon started with another great keynote, by Gregor Hohpe: “It’s good to be architect”. Gregor had some very interesting insights into the value of architects to organizations, such as “If you make architecture a second class citizen, you will attract second-class architects who build a third-class architecture”. His metaphor of the “architecture elevator” immediately struck a chord: the value of architects can be measured by how many levels of hierarchy and abstraction they can traverse quickly. Good architects talk to technical engineers and translate their concerns in such a way that they get the attention they deserve at C-level. Conversely, they make the translation of policy and business goals all the way through to the levels below. Gregor posted his own SATURN impressions here.
Thursday morning we discussed Fast and Slow Thinking for architects with Rebecca Wirfs-Brock: apart from “be aware of all your biases”, Rebecca had a valuable tip about using pre-mortems at the start of projects to fight some of the more dangerous biases. A pre-mortem is a group exercise in which attendants imagine they are in the future: the project has failed, and everyone describes the scenario that led to failure. A great remedy against WYSIATI (What You See Is All There Is), availability bias etc. I also gave my presentation “Agilizing the Architecture Department”, about our experiences implementing RCDA at ProRail.
In the closing keynote, Mark Schwartz of the US Citizenship and Immigration services gave a fascinating and very frank account of a large governement organization’s struggle to find the right architecture organization form to balance agile delivery with cross-system architectural reasoning. Is it Architect as Teacher? Architecture as a Service? The final verdict is still out…
Thanks to all participants who were cooped up together and made this another great SATURN. And it’s not over yet: we still have to watch the recorded trackes that we didn’t attend, so we’ll be busy for a while yet. Next year: San Diego.
Software Architect, once again rated the best job in America by CNN/Money. I can tell you from experience, outside of America it’s pretty great too.
Watch my Saturn 2014 talk on the Costing View or Architecture on YouTube:
Eltjo R. Poort, IEEE Software, vol.31, no. 5, pp. 20-23, Sept.-Oct. 2014
Five pieces of advice can help architects become more effective in an agile world without having to implement new methods or frameworks. They describe changes in attitude or behavior rather than complete practices or principles so they’re easy to digest and apply.
Free version (Computing Now)
DevOps is an approach to IT that radically alters the traditional relationship between development, maintenance and operation of software. The traditional separation between Dev and Opsbrings with it a substantial amount of waste in terms of elapsed time and budget. This waste is caused by differences in way of working, attitude and hand-over inefficiencies. The waste is exacerbated by accountability mechanisms that incentivize Dev and Ops departments to erect barriers, to protect themselves in case the other side fails to fulfill its responsibilities (“cover your behind”).
In DevOps, this waste is largely eliminated by making one team responsible for both DEVelopment and OPerationS of applications. Supported by extensive automation (mostly by open source tooling) of testing and deployment, DevOps allows organizations like Facebook, Netflix and Google to deploy new releases into the Cloud on a daily basis. This is quite an improvement over the usual “once or twice a year” releases seen in traditional software products, and it gives organizations tremendous agility in reacting to market developments.
DevOps appears to be most successful in end-user facing (front-end) applications that run in public or private clouds and have a web-based user interface. The tooling supporting the approach is fascinating, especially in the way it ensures quality control: see Netflix’s Dianne Marsh take on this tooling at this year’s Saturn conference. But how does DevOps impact the architecture of the software that is developed with it? So far, I have found three ways in which DevOps influences architecture:
- Architecting for deployability. Of all the quality attributes that a system may have, the one that is most directly related to DevOps is the ability to quickly deploy new releases into operation, with as much automation as possible. As I already mentioned in a previous blog, Stephany Bellomo and Rick Kazman of the SEI have made a study of this, and so has Len Bass of NICTA. An important consequence for architects appears to be the necessity to simplify and streamline the design. This includes standardization of infrastructure components, but also removing architectural elements which were originally intended to ensure other quality attributes, now overruled by the higher priority of deployability. One remarkable example of streamlining the SEI researchers found was the removal of an Enterprise Service Bus in one organization: originally implemented to enhance connectivity and modifiability, the ESB was found to be an obstacle to the speed of (automated) deployment and removed in the transition to DevOps (just an example – not a tip, ESBs generally have no quarrel with DevOps).
- Architecting for systemic resilience. As the speed and scale of deployment increases, chances are that from time to time stuff (services, app, APIs) that your software depends on may not be available for a few minutes or hours. Your software needs to be able to cope with this: the system as a whole needs to still be as available as possible. Your particular piece of the solution needs to stay alive, even if the whole world turns against it. Organizations like Netflix harden their system (and I guess their developers!) by actually making things worse: their simian army of chaos monkeys, latency monkeys and so on wreak havoc in test and production(!) environments. As Netflix says, “the best defense against major unexpected failures is to fail often”.
- Consolidated development architecture. In software development the target architecture for a solution has always put constraints on the set-up of the development environment, but with DevOps, it becomes almost impossible to see the development and operational environment as separate entities. From an architectural point of view, the development and target environments form one system with three main categories of users: developers, operators and end-users. One could even argue that that first two of those categories are the same in the DevOps philosophy. Many quality attributes affect all these user categories, and many architectural decisions have impact across the board as well. In DevOps, an architect (or “master builder”) does not design a target architecture after which an appropriate development toolset is selected: the whole ensemble forms one consolidated solution, with one solution architecture that addresses the combined architectural concerns for development and operation.
So as an architect, you may need to learn three new things: 1) to remove stuff rather than add stuff, 2) to design failure into your solution and 3) to design a consolidated development environment into your architecture. And you had better learn fast: Gartner calls this architectural approach “Web-scale IT”, and predicts that by 2017 50% of global enterprises will be using it.
What other aspects of architecture do you think is impacted by DevOps and Web-scale IT? Please leave your comments and let me know what I missed.
SATURN (the Software Engineering Institute (SEI) Architecture Technology User Network (SATURN) Conference) has become my favorite annual professional event. What a great mix of architects, developers and researchers, and what a nice way to learn from each other.
The pervasive theme at this year’s SATURN was DevOps. Throughout the conference it became clear that DevOps is not just a hype-movement “taking agile to the next level”. DevOps will quickly become the standard way to produce a significant part of the world’s software. It is the only way to achieve real web-scale IT, supported by a high degree of automation – and by a strong architecture foundation to ensure qualities like reliability, availability and security.
The goodness started with Stephany Bellomo and Rick Kazman’s tutorial on the architectural foundations for achieving DevOps goals – most notably “deployability”. Stephany and Rick have looked into a number of cases to identify architectural strategies and tactics to decrease deployment time. They did not find any new techniques, but found that existing strategies often needed to be revisited to achieve DevOps goals. According to Stephany, DevOps forces architects to think about streamlining: removing stuff rather than adding. In some cases, this meant removing SOA components… which was ironic, since Stephany is one of the SEI’s leading SOA experts.
On Tuesday afternoon, I gave my own tutorial about Agile Solution Architecting. We had a nice crowd, which included OO design guru Rebecca Wirfs-Brock. Great discussions and feedback.
Wednesday had a nice keynote on refactoring by Bill Opdyke of JPMorgan Chase – some interesting points about making the business case for refactoring to your stakeholders reminded me of a previous blog post. In a session on Customer Collaboration, Samsung told us how they analyzed Twitter data to gather quality attribute requirements for their Galaxy model S5 – quite revealing. My presentation on the Costing View of Architecture was part of the afternoon’s Leadership and Business session.
The final session on Wednesday triggered some great discussion on fostering architecture communities in companies, which led to the creation of a LinkedIn group to allow the discussion to continue after the conference.
Thursday’s highlights were the two keynotes. Netflix’s Dianne Marsh told the story of the open source based tool stack supporting continuous delivery at Netflix. Very inspiring to see DevOps at work, although Dianne’s remark that she “never had to argue a business case or think much about cost” caused some of us to wonder whether successful DevOps implemenations were dependent on unlimited budget availability…
In the closing keynote, IBM’s Jerome Presenti gave us an awesome glimpse into the future. Jerome’s team develops cognitive systems, based on the IBM Watson system that won the Jeopardy quiz. It looks like the “Minds” in Iain M. Banks’s Culture novels are closer than we think… 🙂
On Friday, Rebecca Wirfs-Brock gave a tutorial on “Being Agile about System Qualities”. In terms of concepts, Rebecca’s had a lot of overlap with my own tutorial on Tuesday – but she explained them from the point of view of the agile mindset, which was quite enlightening. Architects can gain more traction with agile teams by being aware of their jargon (e.g. say “Design Spike” in stead of “Architectural Prototype”) and allergies (e.g. don’t use Powerpoint…). Perhaps we should stop using the A-word alltogether, and rather call ourselves “Master Builder” as suggested to me yesterday by a chief architect in a client organization…
All in all, a great week, many new ideas and new friends. Many thanks to all participants and organizers!
Edit (October 2014): you can now watch my Saturn talk (and most others as well) on YouTube:
We all know that solution architecture adds value to business and IT solutions, but as Philippe Kruchten writes on his blog, we sometimes have trouble quantifying the benefits towards (project) managers or product owners. I recently ran into Raymond Slot’s PhD thesis titled “A method for valuing architecture-based businesstransformation and measuring the value of solutions architecture”, and was amazed by the resounding results he presents in it. Raymond did a thorough analysis of a survey of 49 software development projects between €50,000 and €2,500,000. He concludes that use of solution architecture correlates with the following effects:
- 19% decrease in project budget overrun
- Increased predictability of project budget planning, which decreases the percentage of projects with large budget overruns from 38% to 13%
- 40% decrease in project time overrun
- Increased customer satisfaction, with 0.5 to 1 point – on a scale of 1 to 5
- 10% increase of results delivered
- Increased technical fit of the project results
At first sight, these correlations already look pretty good, but if you read Raymond’s thesis, you see that the underlying numbers are even better. For example, the decrease in project budget overrun is presented as a percentage of the total budget. The expected overrun decreases from 22% to 3% – a factor 7 improvement!
The other results have equally interesting background data:
- Better predictability of solution costing by a factor 3, specifically associated with the presence of an architect during calculation of the technical price, and with the quality of the solution (project) architecture.
- The 40% decrease in project time overrun (c) is also much better than it looks, since it is stated as a percentage of project duration. Factors 6x-10x improvement of overrun are associated with the quality of the architecture and informal architecture compliance testing.
- Increased customer satisfaction (d) is specifically associated with the quality of the architecture, and with matching the seniority level of the architect to the complexity of the project.
- The increased technical fit (f) refers to better fulfillment of non-functional requirements, and is specifically associated with the quality of the project architecture.
In short, this research shows there are tremendous benefits to using solution architecture. The (b) effect in Raymond’s table represents a reduction of troubled projects by a factor 3. Together with the (a) effect, this result confirms the effectiveness of architecture as a risk- and cost-management discipline.
Thank you Raymond for helping us quantify the benefits side of the business case for solution architecture!
- Slot, R. (2010). A method for valuing architecture-based business transformation and measuring the value of solutions architecture (PhD Thesis). University of Amsterdam.
- Poort, E. R., & van Vliet, H. (2012). RCDA: Architecting as a Risk- and Cost Management Discipline. Journal of Systems and Software, 1995-2013.
When I teach solution architecture classes, technical debt is always a very popular topic among practicing architects. Technical debt is a metaphor that transposes the concepts of loan and interest to IT based solutions. It respresents work that should be done in order to deliver a consistent, maintainable solution. As long as the work has not been done, the solution is in debt, which means that some stakeholders pay interest in the form of e.g. extra effort needed for simple changes, or higher support fees. Repaying the loan’s principal means doing the work needed to remove the debt: this could mean e.g refactoring software or upgrading hardware. As soon as that work has been done, the stakeholders stop paying the interest, just like when a loan has been repayed.
The technical debt metaphor has been very popular in the software development world, where it refers mainly to low code quality or unnecesary complexity. Tools like SONAR now have functionality that analyses source code to measure such “implementation debt”. For architects, however, other types of technical debt may be more interesting. Aside from implementation debt, there is “architectural debt”: this is typically structural in nature, or represents a technology gap.
An example of structural architectural debt is when an architectural principle like “all applications should use the Enterprise Service Bus (ESB) to exchange data” is temporarily violated. An architect could decide to allow direct access from application A to application B’s database if A needs data of B that have not yet been exposed through the ESB, and doing it properly would mean missing an important deadline. The interest in this case is caused by reduction of control of the information flow through the application landscape, and potential errors being introduced by teams that are not aware of the shortcut. The principal is the refactoring that needs to be done later on: changing applications A and B to route the data through the ESB, and configuring the ESB. This type of debt cannot be measured in the software code of either application: it is structural in nature. The same is true of technology debt, when a solution uses obsolete hard- or software products that cause potential failures and risks (interest) and needs to be upgraded (principal).
One of the problems many architects face is convincing their stakeholders to reduce technical debt, mainly because the debt is invisible to the end-user (see Philippe Kruchten’s categorization). Making the case for technical debt reduction in technical terms will usually not convince the business stakeholders. That requires translation into economic terms – in other words, a business case.
If one has unlimited resources (time, budget, staff), the business case for repaying technical debt is quite simple: the longer you wait, the more interest you pay, so the economic optimum is immediate repayment. The only exception is when the solution is planned to be decommissioned, and the total interest to be paid over the remaining lifespan is lower than the cost of repayment. Usually, however, resources are limited and the business case for technical debt reduction needs to compete with other solution improvements, such as new features.
In all cases, proper representation of the interest is crucial to making a compelling case. In case of structural or technology debt, it is often hard to quantify the extra costs caused by the debt. The difficulty is compounded by the inherent uncertainty: things might go wrong, but they might also go smoothly, even with the technical debt present. One often hears arguments like “we’ve run this application on this platform for 15 years, and it has never caused us any problems – so why invest in an upgrade?”.
The risk factor
The key to making the business case for technical debt reduction is to account for the risk caused by the debt. The proper way to calculate the total expected cost of uncertain failure is the well-known risk exposure formula: E(S) = p(S) x C(S), where p(S) is the probability of failure scenario S occurring, and C(S) is the cost incurred when S occurs. By summing up the risk exposure E over all possible failure scenarios S caused by the technical debt, you come as close as statistically possible to an accurate prediction of the expected cost of failure.
I recently encountered a situation where a large transportation company was running some of their core business systems on ancient mini-computers. Spare parts were very hard to get by, and the manufacturer had put severe limitations on their maintenance contract. The organization in question had a hard time making the business case for migrating the system to a modern, virtualized, blade-based solution: the cost of the old platforms was so low that the ROI for the migration looked negative. The risk of failure, however, was substantial: a single missing spare part could potentially break the company by disabling their core system for a few days. Including that risk exposure in the technical debt interest leads to a completely different business case.
Forgetting the risk factor in the business case for technical debt reduction is a common mistake, which can lead to very wrong decisions. The organization in the example above instinctively knew they had to make the upgrade anyway, but the omission of the risk in the business case did lead to unnecesary delays in the decision making process.
In short: architects arguing for technical debt reduction should make sure that they articulate the risk component in the interest. This will help them convince stakeholders to give proper priority to things they might otherwise find less interesting for being invisible to end-users.