What the Scaled Agile Framework Actually Demands From Your Organization

What the Scaled Agile Framework Actually Demands From Your Organization

The two videos in this series made the case for SAFe and walked through its mechanics. Part 1 established why Agile breaks down at scale and introduced the Agile Release Train. Part 2 went inside the machinery: ART roles, PI Planning’s two-day agenda, the three-tier backlog, Inspect and Adapt, and a set of practical starting points.

This post picks up where the videos ended. It covers the things video format doesn’t have time for: the organizational change realities that determine whether a SAFe implementation lives or dies, the concept most organizations get wrong during setup, an honest comparison with competing frameworks, the metrics that actually reveal what’s happening, and a clear-eyed look at what higher education’s unique structure means for SAFe adoption.

If the videos were the map, this is closer to the terrain.

The Organizational Change Problem SAFe Cannot Solve

SAFe gives you a structure. It does not give you a culture. This is the most important thing experienced practitioners will tell you after living through multiple implementations. You can run the ceremonies, stand up the ARTs, train the Release Train Engineers and still fail. The failure is almost always cultural before it is structural.

Three cultural shifts are consistently underestimated:

1.  From project funding to value stream funding

Most organizations fund projects. A project has a budget, a scope, a start date, and an end date. Once the end date arrives, the team disbands. However, value stream funding works on a different logic entirely: you fund the capability to deliver value continuously, adjusting scope as you learn rather than committing scope upfront against a deadline. Teams are stable and long-lived; the work that flows through them changes as priorities shift.

This shift touches annual budget processes, finance teams, and executive mental models about accountability and predictability. Finance departments built around project cost codes will need new accounting structures. PMOs that report on budget-versus-actuals will need to evolve their reporting. None of this is a SAFe ceremony – it is a fundamental change in how the organization decides to allocate resources.

2.  From output accountability to outcome accountability

Teams in traditional PMOs are typically measured on delivery: did you ship on time, on budget, to scope? SAFe, done well, shifts accountability to outcomes: did the thing we shipped actually move the needle for the customer? PI Objectives are written as outcome commitments (measurable results) not task lists. The question at a System Demo is not “did you finish the feature?” but “does it do what we expected for the customer?”

This is harder than it sounds. It requires leadership to tolerate more short-term planning ambiguity. It requires product managers to develop genuine hypotheses about user behavior rather than detailed requirements. And it requires executives to resist the impulse to ask for detailed task plans when outcomes feel insufficiently concrete.

3.  From sequential approval to concurrent collaboration

Traditional governance runs sequentially: architecture is approved before development starts; testing happens after development ends; compliance review happens before release. SAFe runs concurrently: Product Management, architecture, and development are all active at the same time, with continuous integration ensuring the system is always in a deployable state. This breaks the traditional audit trails and stage-gate approval processes that large organizations – especially regulated ones and those with strong legal and compliance functions – have built over decades.

The resistance to this shift is most concentrated in exactly the places with the most organizational power: Finance, Legal, Internal Audit, and senior leadership. A SAFe implementation that does not address this resistance explicitly will eventually accommodate it — and the accommodation almost always looks like Scrumfall.

Value Streams: The Concept Most Organizations Get Wrong

Value streams were introduced in the videos as the path work takes from customer request to delivered product. That definition is accurate. The practical challenge of value stream identification is where most implementations make their first serious mistake.

The mistake: mapping value streams to org chart boundaries.

This is understandable. It is easier, faster, and avoids political conflict. But it produces ARTs that reflect how the organization is structured rather than how value actually flows. The result is an ART that runs PI Planning well on paper while the real bottlenecks continue to produce the same delays they always have because they cut across organizational boundaries.

HOW TO IDENTIFY VALUE STREAMS CORRECTLY Start from a customer outcome, not an organizational unit. ‘Student registers for classes.’ ‘Invoice is processed.’ ‘New employee is onboarded.’Walk the process backwards. Who touches it? What systems are involved? Where does work sit waiting? That waiting time is your first measure of value stream health. The team groupings that emerge from this exercise often look nothing like your existing org chart. That is not a problem to be solved – it is the exercise working correctly. Expect political resistance. The people whose departments get reorganized around value streams are the same people whose power derives from the current structure.

The distinction between operational and development value streams also deserves more attention than it typically receives. Operational value streams are the processes through which you deliver value to customers today – the systems and workflows already in production. Development value streams are how you build the capabilities that will deliver value tomorrow. SAFe’s development value streams exist to support and improve operational ones; they are not the same thing.

Confusing the two leads to ARTs that mix operational support work with new development – different urgency profiles, different stakeholder sets, different definitions of done. The resulting ART is perpetually torn between keeping the lights on and building new capability, and it does neither particularly well.

SAFe vs. the Alternatives: An Honest Comparison

SAFe is the most widely adopted Agile scaling framework, but it is not the only serious option. If you’re considering implementing SAFe, you should understand the landscape well enough to argue for and against it intelligently.

FrameworkPhilosophyBest fitWatch out for
SAFeAdd structure to coordinate at scale. More process enables more alignment.Large orgs (200+ in delivery roles), executive visibility required, multi-level planning needed.Overhead creep. More process than the org can sustain. Expensive to implement well.
LeSSScale by removing process. Keep Scrum pure at team level, eliminate coordination layers.Strong Agile maturity already present. Culture that will genuinely use less process, not fake it.Requires real discipline. Executives often reject the reduced visibility. Hard to sell up.
Scrum@ScaleFractal Scrum: the pattern that works for one team scales by repetition.Orgs wanting flexibility over prescription. Existing strong Scrum culture.Less prescriptive guidance means higher coaching requirement. Easy to implement loosely.
NexusThin coordination layer on top of 3–9 Scrum teams. Stays close to the Scrum Guide.3–9 teams working on one product. Minimal overhead preference. Scrum-fluent organization.Doesn’t address portfolio or strategy layer. Outgrown quickly by larger programs.

One framework that gets referenced constantly and should be named explicitly: the Spotify Model. It is not a framework. It is a cultural description of how Spotify organized its engineering teams at a specific point in time, documented in a 2012 white paper. Spotify itself has evolved significantly away from it. Treating it as a prescribable model for your organization is a common and expensive mistake that typically produces the vocabulary (squads, tribes, guilds, chapters) without the underlying conditions that made those structures work at Spotify.

The question is not which framework is best. The question is which framework your organization will actually implement with discipline — and that depends more on your culture and coaching capacity than on the merits of the framework itself.

The Metrics That Actually Reveal Whether SAFe Is Working

PI Predictability, which is the percentage of PI Objectives delivered as committed – is a legitimate metric. It measures the accuracy of planning and reveals patterns over time. An ART that consistently scores below 80% has a planning problem, a dependency problem, or a scope management problem, and the number is intended to draw your attention to the issue.

But PI Predictability is a lagging indicator. It measures what already happened. And it measures the accuracy of plans, not the health of value delivery. An ART can score 95% on PI Predictability while delivering features that users find low-value, carrying enormous technical debt, and burning out its teams.

The metrics that give a more complete picture come from flow theory – specifically the framework Mik Kersten developed in “Project to Product”, which grounds SAFe in a broader model of how technology organizations create value:

◆  Flow Velocity: How many Features are being completed per unit time? The absolute number matters less than the trend. Velocity that is flat or declining despite capacity staying the same signals a systemic problem – usually accumulating technical debt or increasing dependency overhead.

◆  Flow Efficiency: What percentage of a Feature’s total time is spent in active work versus waiting? Most organizations are genuinely surprised to find this number is below 20%. A Feature spends 80% or more of its life sitting in a queue somewhere. Flow Efficiency is the most direct measure of the waste SAFe is supposed to eliminate.

◆  Flow Time: How long does it take for a Feature to move from commitment to deployment? This is your true cycle time – not the time a team actively works on it, but the wall-clock time from when it enters the Program Backlog to when it is in production. Reducing Flow Time is the primary lever for competitive responsiveness.

◆  Flow Load: How many Features are in progress simultaneously across the ART? High load correlates with high wait time and low flow efficiency. The solution is counterintuitive: doing less in parallel makes the whole system faster.

◆  Flow Distribution: What types of work make up your flow? SAFe identifies four: new features, defects, risk reduction, and technical debt. An ART that is 80% new features and 0% debt is building tomorrow’s crisis. Flow Distribution shows whether the ART is managing the full cost of its work or just the visible part.

COLLECTING FLOW METRICS REQUIRES TOOLING INVESTMENT Most organizations use Jira, Azure DevOps, or Rally/Targetprocess. The data is usually present – it just isn’t being surfaced. SAFe’s own tooling guidance recommends capturing ‘entry date’ and ‘exit date’ for Features at each stage of the Program Kanban. This is the raw data for Flow Time and Flow Efficiency calculations. Even imperfect data surfaced consistently is more valuable than perfect data surfaced quarterly. Make the metrics visible in I&A from the beginning, even if the methodology isn’t yet precise.

Higher Education’s Specific SAFe Challenges

The videos used a university context throughout, and it’s worth going deeper on that context – because higher education has structural characteristics that create implementation challenges unique to the sector.

Fiscal year versus PI cadence mismatch

Most universities operate on annual or biennial budget cycles tied to the academic calendar. A standard PI runs ten weeks — roughly five PIs per year. Lean Portfolio Management, which connects strategic investment to ART funding, works best when budget decisions can be made at the PI cadence. When the annual budget is locked before PI 1 begins, LPM becomes advisory rather than directive – the Portfolio Kanban tells you what should be prioritized, but the budget tells you what actually gets funded, and they are often different.

Universities that have implemented SAFe most successfully have typically negotiated a form of rolling budget authority for IT and major initiative funding – not full participatory budgeting, but enough flex to redirect investment at PI boundaries when priorities shift.

Fiscal year versus PI cadence mismatch

Initiatives that touch academic functions such as curriculum management systems, learning management platforms, student advising tools, degree audit systems require approval through governance bodies that operate on their own cadence. Academic Senate, Faculty Senate, and curriculum committees meet monthly at best, often less frequently, and their timelines are not negotiable in the way a Product Owner might negotiate a Feature’s scope.

SAFe implementations at universities need explicit protocols for how shared governance requirements feed into PI Planning. The governance review is a dependency – it should appear on the ART’s dependency board with an owner and a date, treated with the same seriousness as a technical dependency. Universities that treat governance as a formality discover late that a governance body has the authority to block a deployment regardless of whether the PI objective was achieved.

Decentralized IT and shadow IT

Many universities have a central IT function and distributed IT resources within individual schools and colleges. A central IT ART may include the infrastructure and enterprise systems teams but not the departmental IT staff who are critical to local deployment and adoption. If those departmental staff are not part of the ART – and they often cannot be due to HR and budget structures – they need to be treated as external dependencies with explicit interface agreements.

Semester rhythm

The academic calendar creates hard blackout periods at the beginning and end of each semester when change freezes are common, staff capacity drops due to peak operational demands, and stakeholder availability for PI ceremonies is limited. PI boundaries should ideally align to semester transitions. A university running five PIs per year should map them to: pre-fall, mid-fall, winter break, pre-spring, and summer – adjusting sprint lengths slightly to align with the academic rhythm rather than forcing a rigid ten-week cadence that repeatedly runs PI Planning during the most operationally demanding weeks of the year.

The Anti-Patterns: What Goes Wrong Even When Organizations Try

Experienced SAFe practitioners recognize a consistent set of failure patterns that appear across organizations regardless of industry. Knowing them is the difference between diagnosing problems early and being surprised by them late.

▸  Scrumfall

Team-level ceremonies are Agile. PI-level planning is waterfall. The product roadmap is fixed twelve months out. Features arrive to Product Management fully specified. PI Planning becomes a scheduling exercise in which teams slot pre-determined features into sprints rather than a collaborative commitment process. The form of SAFe is present; the adaptive value is not.

▸  PI Planning theater

The two-day event runs well. Teams present plans, dependencies are mapped, confidence votes are taken. Then everyone returns to their desks and works from a different set of priorities – executive Slack messages, urgent requests that bypass the backlog, leadership decisions made in hallways that supersede PI Objectives. The Program Board is maintained for appearances. The real work is driven by informal power structures that PI Planning was supposed to replace.

▸  ART inflation

The organization launches multiple ARTs simultaneously before any one is running well. Each ART is understaffed in RTEs and Product Management. Ceremonies run but the coaching capacity to make them meaningful is absent. Launching multiple ARTs simultaneously looks like organizational progress; it is usually organizational overconfidence. One ART running well is worth more than five ARTs running badly.

▸  Metric gaming

Teams learn that PI Predictability is measured and begin writing PI Objectives designed to be easily achievable. Stretch goals disappear. Business Owners stop challenging plans because they also want predictability scores to look good to their executives. The metric that was supposed to drive improvement becomes a measure of sandbagging sophistication. The fix requires separating the predictability metric from any form of team performance evaluation.

▸  RTE as scheduler

The Release Train Engineer role is staffed with someone whose primary orientation is coordination and meeting facilitation rather than coaching and impediment removal. The RTE runs PI Planning well, keeps the Program Board updated, and produces clean reports. But teams are not getting better, systemic impediments are not being escalated, and the cultural change SAFe requires is not happening. The RTE is the most important role in the ART; staffing it with a project coordinator instead of a coach is the single most common root cause of stalled transformations.

What the Independent Research Actually Says

The research base on SAFe outcomes is thinner than practitioners often acknowledge, and practitioners should be honest about this. Most published case studies originate from Scaled Agile Inc. itself or from consulting firms with commercial interests in promoting SAFe adoption. Independent academic research is limited, though growing.

What we do have, from the most credible available sources:

◆  Market adoption: The State of Agile reports, now in their seventeenth year, conducted by Digital.ai consistently show SAFe as the most widely adopted scaling framework by a significant margin, typically cited by 35–45% of organizations using any scaling approach. The next closest frameworks are typically at 5–10%.

◆  Self-reported outcomes: Organizations that report successful SAFe implementations most commonly cite improved cross-team visibility, faster time-to-market, and better alignment between business and technology teams. They least commonly cite cost reduction — a point worth noting for executives who approach SAFe as a cost-cutting measure.

◆  Self-reported failure factors: The most common factors cited in unsuccessful implementations: insufficient leadership engagement and behavior change, inadequate coaching and training investment, and attempting to scale Agile before establishing Agile fundamentals at the team level. These three factors appear consistently across every serious analysis of SAFe implementation failure.

◆  Duration matters: Organizations that report the strongest outcomes have typically been running SAFe for three or more years. The first year is almost universally described as difficult. The second year is where the patterns stabilize. The third year is where genuine organizational capability develops. Assessments made at the twelve-month mark are almost always premature.

A NOTE ON CONFIRMATION BIAS Organizations that implement SAFe are invested in its success. Failed implementations are underreported because failure reflects on leadership decisions and is rarely published. The most honest assessments of SAFe outcomes come from practitioners who have implemented it multiple times across multiple organizations – because they’ve seen both the successes and the failures without institutional incentive to hide either. Before committing to SAFe, find two or three practitioners who have implemented it and ask specifically about what went wrong in implementations they’ve been part of. The quality of their answer tells you more than any case study.

Where to Go From Here

If you’ve watched both videos and read this far, you have a substantive foundation in SAFe – enough to have an informed opinion, evaluate an implementation proposal, or begin building the internal capability to launch an ART. The natural next step depends on where you are.

If you’re evaluating whether to adopt SAFe

Read the official SAFe website (scaledagileframework.com) critically – it is well-documented but optimistic. Balance it with practitioner accounts from the SAFe Community Platform and LinkedIn groups, where honest discussion of implementation difficulties is more common. Ask your network specifically: what did you wish you had known before you started? The answers are more consistent than you might expect.

If you’re in early implementation

Invest in your RTE before anything else. One trained, experienced RTE who has run PI Planning before is worth more than twenty teams going through SAFe training simultaneously. The training builds vocabulary; the RTE builds culture.

Measure from the start. Establish your baseline Flow Time, Flow Efficiency, and PI Predictability before you run your first PI — not because the numbers will be good, but because you need a baseline to measure against. Improvements that can’t be measured don’t become organizational learning.

If you’re looking for deeper reading

Mik Kersten’s “Project to Product” (2018) grounds SAFe in a broader theory of how technology organizations create value and is essential reading for understanding why SAFe’s metrics matter. Dean Leffingwell’s SAFe 5.0 Distilled remains the primary text; SAFe 6.0 updates are available on the framework website. For the organizational change side, Kotter’s Leading Change and Prosci’s ADKAR model both address what SAFe is implicitly asking organizations to become – and why it is hard.

If you’re specifically in higher education

The EDUCAUSE community has growing documentation of Agile and SAFe adoption in higher education. It’s worth reviewing for sector-specific case studies and the particular governance challenges described in this post. Several institutions including the University of Minnesota and Brigham Young University have published accounts of their Agile transformations. These are useful not as blueprints but as evidence that the translation from enterprise SAFe to higher-ed context is possible and what it typically requires.

Photo by Daniel Abadia on Unsplash