What Every CEO Should Know About Software Planning

How business leaders can use planning processes and metrics to prevent project failures

Sep 23, 2024

For many CEOs, software engineering is a black box. A roadmap comes out, money goes in, and then software comes out… maybe.

As a result, software project failures are rampant at companies both big and small.

Large companies might be able to absorb such failures, but even one could mean the end for a smaller bootstrapped business.

When I sold Collage.com and first took over as CTO at the parent company, the CEO told me: “Project X was supposed to take three months. We’ve been working on it for a year and I have no idea why.”

I hear different versions of this story every day, and they share one thing in common: bad planning.

It doesn’t have to be this way.

By instituting a few key practices and metrics, business leaders can dramatically decrease the risk of software project failures without micromanaging or getting into the weeds on technical details.

Why do software projects fail?

The simple answer is: poor planning.

There can be other reasons like unforeseeable technical risk or a key person leaving, but nine times out of ten, the project would have succeeded (or not started in the first place) with a better planning process.

Why do software plans fail?

The answer here is also simple: because they’re too big.

The larger a plan is, the more ways there are for it to go wrong, and the greater the impact if it goes off the rails.

Some projects are inherently big, like building a car. However, if you look at failed software projects, most of them are a lot larger than they need to be.

Also, even if the overall plan isn’t too big, large code changes or tasks within that plan can still derail a project.

Myth: Agile/scrum will save you

If you plan work in two-week sprints, then each one will be small and therefore more likely to succeed.

Or so the thinking goes.

The first problem is that some projects take longer than one sprint from business planning to customer value. You can plan sprints all you want, but if the value delivery cycle is longer than a sprint, then sprints won’t prevent project failures.

Going back to my experience as CTO, the team working on the project that was 12 months into a 3-month estimate had been diligently planning sprints the whole time!

The second problem with sprints is that individual code changes should be much smaller than the sprint duration. Two weeks may be short for solving a customer problem, but if you put all the team’s work for two weeks into a single pull request, you’re asking for trouble.

Finally, sprints are blind to bugs. When you create bugs, those bugs are really part of the original feature cost. With sprints, however, they just show up as new tickets in a future sprint.

Sprint metrics on their own don’t drive or hold people accountable for quality. By creating more tickets and points to fix bugs, they can in fact do the opposite because it’s easier to meet your sprint commitment if you cut corners on software quality.

Levels of planning

To truly reduce the risk of project failures, it’s important to understand the different levels of planning. A whole project or sprint may be small – let’s say a few weeks – but if the individual tasks are bundled into large chunks, it can still blow up.

Good planning should seek to minimize work batch sizes at each of the following levels.

Project level: Value live

The project itself should represent work that delivers value to the customer by solving a problem.

Typically this is represented by an “epic” parent ticket in Jira, but any equivalent field that groups tickets will suffice.

The important thing is that once the project is complete, you can validate whether it delivered customer value.

Ticket level: Feature live

Below the project level, you have individual features that work together to deliver value.

Each feature is typically represented by a ticket in your project management system. You may have subtasks or a checklist, but the ticket should represent working functionality.

For a ticket to be complete, the customer (or a representative of the customer if it’s in a staging environment) must be able to use the functionality so you can validate that it works as intended.

Pull request level: Code live

Below each feature is an individual code change, typically consisting of a pull request in your version control system.

For a unit of work to be complete at this level, it must be deployed – ideally to production but possibly in a staging environment.

With work units at this level, you want to validate that the code itself doesn’t break once introduced into the full environment.

Planning pitfall #1: Projects are too big

Each project or epic should be the minimal size to solve a problem for the customer.

At the same time, once an epic is “done,” it should be possible to validate the customer solution, rather than splitting up epics just for the sake of it and having larger hidden value-delivery batches.

An epic isn’t truly done until the customer is able to realize value.

People run into trouble when a lot of solutions are bundled together. If there is a larger initiative, you should use a distinct epic for each milestone that creates value. This way, you can incrementally verify that value and still have something to show for your effort if the project stops before completing all the milestones.

Planning pitfall #2: Tickets are too big

It’s not enough for projects to be small. Each ticket should be small as well.

Each ticket should be code-live – that is, you should not bundle multiple tickets into a single pull request. Otherwise, you may encounter integration problems and have to do more work after tickets have been marked done.

Each ticket should also be feature-live, meaning that someone can use the functionality of the ticket before marking it as done.

A ticket/feature is not done until users have had the opportunity to break it.

Large tickets that encompass multiple pieces of functionality are a lot more likely to go over estimate. The reason is that a large ticket indicates the developer hasn’t carefully thought through each step and risk associated with the task.

Planning pitfall #3: Code deployments are too big

Ideally, each code change should be done in a small pull request and deployed to production when that pull request is merged using a continuous integration/continuous deployment (CI/CD) system. This means multiple deployments per day.

Sometimes this is not practical or possible, such as if you have to submit to an app store that only allows weekly updates.

If you can’t deploy to production daily, you should at least deploy to a staging environment that's as realistic as possible each day.

A code change is not done until it is deployed in a realistic environment.

Until that point, you can never be sure what will happen when it’s integrated with existing code and exposed to production workloads that can trigger subtle performance issues and other problems.

Furthermore, as the size of a deployment grows, the risk of it causing problems and cost of fixing those problems increases multiplicatively.

Really big deployments can require weeks of post-launch hotfixes, delaying the true project completion time and tarnishing the company’s reputation in the process.

How to avoid failures with technical planning

For the 3-month project I mentioned earlier that I took over 12 months in, I eventually learned that the 3-month estimate came from a product leader who arrived at the number without consulting engineers. He just wanted it to take that long.

The number one reason this project blew up was a lack of planning step between roadmap commitment and sprint implementation. I call this step technical planning.

When you’re first discussing projects during roadmap planning, you may not have small tasks because the scope may not be defined.

The technical planning stage involves identifying technical risks and defining the individual tickets you will need to complete to finish the epic and deliver value for the customer.

It should happen after roadmap planning but before implementation so that there is an opportunity to change scope or cancel the project. Why? Because until technical planning is complete, you don’t know the true cost of the project.

But planning tickets ahead of time isn’t agile!

Too bad. Make your epic smaller, but if you can’t fit it into one sprint, then you need to plan more than a sprint’s worth of tickets up front.

If you’re not able to deliver and validate a customer solution (the purpose of an epic) in one sprint, then it’s not really “agile” anyway.

Incidentally, when I introduced the concept of technical planning to some people on the team that was nine months behind schedule, I was met with fierce resistance. I was told that it was a waste of time, and while my team was busy with technical planning, their team was shipping software. (While it was tempting to point out that they hadn’t actually done so for a year, it didn’t seem like it would help my argument, so I kept quiet.)

Instead of holistic technical planning, this team was effectively doing technical planning for the next component of the project each sprint. This meant that the true project cost was being discovered in two-week increments. It was always “almost done,” but nobody could say how much longer it would take.

If there’s one thing you should take away from this article, it is to never start on an open-ended project. This is like agreeing to buy something without knowing the price.

A project can be open-ended if the epic does not have all of its tickets, or if those tickets are big and vaguely defined, which we’ll discuss next.

What makes a good technical plan

A good technical plan does two things. First, it mitigates technical risks. Second, it provides a precise estimate of the overall cost by breaking down work into small tasks.

A technical plan should be completed by the team who will do implementation and reviewed by leadership before implementation begins.

One pitfall people encounter is thinking that all code is implementation. As a result, they fail to account for technical risks like system interoperability or performance during technical planning and end up having to redo large amounts of work.

During technical planning, engineers should be encouraged to build small prototypes to validate the architectural design and better anticipate the scope of implementation work.

Conversely, not all specification of functionality constitutes planning. Figuring out the exact button color is probably an implementation activity because it is not likely to have dependencies or impact the overall project estimate.

Doing functional specification in too much detail during the technical planning phase can bloat the process and waste time if the project does not move forward.

Identifying risks in a technical plan

My favorite way to explain risk mitigation during planning is with a peanut butter and jelly sandwich analogy.

A bad technical plan will have one task that says “Make PB&J sandwich by putting peanut butter and jelly between two pieces of bread.”

This task could easily blow up for many reasons. A good technical plan will explore all of the risks and details, such as:

What if you are out of an ingredient, or don’t have a knife or plate?
Who’s going to be eating the sandwich, and do they have any allergies or gluten intolerance?
Does the consumer prefer more or less of any ingredients, or have any quality preferences? When and how will they communicate those preferences?
When will you have to make the sandwich? How long does it take to get to the store that time of day and replace an ingredient? Will you have transportation available?
How soon is the sandwich expected to be ready? If you can’t produce the full sandwich, is an on-time peanut-butter-only sandwich better than a late PB&J sandwich?

As you can see, even a simple task becomes complicated if you want it to succeed predictably under a variety of circumstances.

When it comes to software, a good technical plan should contain a checklist of risks. Many of these may be business-specific, but you should consider things like performance, security, backward compatibility, localization, accessibility, legal issues, etc.

At my former company, Collage.com, we had a list of about ten items like this and it saved us on many occasions.

Breaking down work into small tasks

A good technical plan should also break down work into small tasks. The main benefit of this is that it forces you to think through each step. If you don’t do it, then you’re liable to overlook things and underestimate the tickets.

If you’re using story points and one point is approximately a day, then small means 1-2 points, medium is 3, and large is 5 or more.

Any tasks estimated at more than a few days are a red flag.

I can’t tell you how many times I’ve seen a 5-day estimate (e.g., implement user profile editing) turn into four weeks after I asked a developer to list each specific one-day task.

How CEOs should review plans to prevent project failures

With a solid technical planning process in place, CEOs have a good way of reviewing plans to prevent failures without micromanaging.

Note that this applies to CEOs of small companies. At larger companies, a lower-level leader might fill this role, but whoever is in charge should do the following:

Roadmap Planning - CEOs should review and approve the roadmap plan including rough initial estimates.
Technical Planning - CEOs should review and approve the technical plan prior to implementation. The executive’s role here is to verify that the project still makes sense given the more precise cost estimate, and that the plan is not open-ended by failing to address risks or break down work into small enough tasks.
Sprint Planning - At the end of each sprint, the CEO should review progress on the project to decide if it should continue by looking at two things: (1) how much work has been completed, and (2) how much new work has been added to the epic (with an understanding that small amounts of discovered work during implementation is normal). This helps identify external impediments to velocity or problems with the technical plan before they derail the project timeline.

The nice thing about this structure is that it gives the CEO necessary visibility to prevent project failures without making the team feel micromanaged or not trusted. Without this, you can avoid randomly asking the team “Why isn’t the project done yet?” which can be frustrating and disruptive.

Core project planning metrics

In addition to reviewing individual project plans, CEOs should look at core project planning metrics to make sure the plans are accurate and reliable.

Non-negotiable bookkeeping for traceability

For leaders to have any visibility, individuals need to record their work in a way that it’s not hidden.

This generally means the following:

Nearly all coding work should use version control and pull requests
Nearly all work done by developers should be documented in a ticket
Nearly all pull requests should be linked to tickets
All project tickets should be in an epic (or have an equivalent project field)
All tickets should have estimates

The overhead of these things is negligible and they are essential for visibility, so you may have the occasional slip up, but there’s no excuse for not doing them >95% of the time.

To track these things, you can manually export lists of main branch commits, pull requests, tickets, and epics to verify that they have the appropriate fields set. When looking at tickets, you can filter by those that have started work.

minware provides a Code/Ticket Traceability report that automatically tracks all of these things and rolls them up into a target percentage based on work time so you can spot large amounts of untraceable work.

Work batch sizes

In addition to reviewing individual project plans, It is also helpful to review work batch sizes in aggregate at each level – pull request, ticket, and epic.

When looking at these metrics, you should assess the number of days spent on each pull request, ticket, and epic. Pull requests and tickets with more than five days of work are a red flag.

With these metrics, it’s important to not just look at the average, but also dig into the largest outliers, because those will have the biggest impact on productivity.

There are different ways to gather these numbers yourself. Some companies use time logs, which are precise but impose overhead on each developer.

You can also look at story point estimates, which might suffice if ticket estimates are reliable, or total duration that pull requests are open.

minware has a Work Batch Sizes report that shows you all this information in a single dashboard so that you can spot areas where the work batch sizes are larger than you desire.

Rate of new and resolved bugs

Because one way to make work batches smaller is to cut corners on testing, you should also look at the number of newly created bugs. This gives you an indication of whether project work has issues with quality and whether underlying technical debt may be slowing teams down.

The rate of completed bugs is also important to make sure that teams are fixing the bugs they create. I’m a strong advocate of fixing all your bugs as a way to improve your development velocity, which you can read about in this article.

You can easily export a list of bugs from your project management system. minware also offers an automated Bug Management report that tracks fix vs. find rate and bug load by team.

Pull request, ticket, and epic scope

Earlier, we talked about how it was important that each pull request, ticket, and epic correspond to live code, live functionality, and live value.

To ensure the trustworthiness of your other metrics, you should audit items at each level to verify they actually correspond to work batches at the right level.

Because the meaning of live code, functionality, and value will depend on your environment and business, it’s hard to automate these metrics.

So, I recommend randomly auditing a sample of each item and looking at whether it represents less than or greater than the appropriate unit of work. One way to do this with tickets is look at their acceptance criteria and see if it involves using the functionality. Similarly, with epics, you can look for validation steps that check whether it provides value for the customer

Unlaunched code size

Ideally, you want to know how much undeployed code you’re sitting on for “done” tickets. This gives you a pulse on the risk of unexpected failures and rework later in a project.

Deployment frequency may be an okay proxy for this. It is one of the DORA metrics, which you can find in minware’s Dora Metrics report.

The problem with deployment frequency is it can mask big change sets that are sitting around for a long time while many smaller change sets are going out the door, which poses a big risk of merge conflicts.

If you don’t track deployment frequency, it may suffice to simply review your CI/CD practices and pull request sizes, especially if you merge every pull request to the main branch and launch it automatically (a.k.a. trunk-based development).

Another approach is to put tickets in an “awaiting deployment” status in your project management system. This takes extra work to track, but may be necessary if you use a feature flagging system where code can be technically deployed but turned off until you enable it with a feature flag.

Conclusion

Many CEOs don’t get involved in software development planning. This can work out fine if you have a strong engineering leader, but this often isn’t the case in small companies, which leads to major cost overruns.

The processes and metrics outlined here provide a simple way to gain visibility into project planning and greatly reduce the risk of project failures.

As CEO, you’re ultimately responsible for software development. It’s important to delegate work and not micromanage, but accountability is essential.

As a CEO myself, I follow all of these practices, and they have helped me avoid a lot of mistakes. Before I learned these things, I wasted orders of magnitude more time than the overhead of planning and collecting metrics, even with a small team.

My hope is that by sharing these previous failures, I can help others succeed and do more with less.

minimal engineering

Discussion about this post