When things go wrong, it’s human instinct to jump to the nearest possible solution and execute. The goal is to rectify the problem as soon as possible, to ensure that it either never happens again, or the chances of it happening again are slim. This hasty response, however, more often than not leads to suboptimal solutions that merely put a band-aid on the problem and do not address the underlying root causes.
Take for example a scenario that I once encountered myself - the resolution of content issues with things we publish on docs.microsoft.com. I’ve noticed that one particular repository had a lot of issues that kept being merged into the
live branch, therefore making broken content available to external customers. Nobody wants to deal with issues in production, especially issues that can be prevented, so my first hunch was to put in place the two-key rule. If the content needs to be live, two people need to sign-off on the quality of said content before the pull request (PR) to be merged. In theory, this should have guaranteed that people that sign off on the quality of the PR would also check the content and ensure that nothing is wrong. That did not work. All it did was create overhead in people now emailing other people to approve the PR daily. Instead of having one problem, I created another one that someone else now needed to deal with.
So, what could I have done differently? For starters, the problem here is the lack of tools that can help prevent an error. It’s not the human who’s at fault - nobody maliciously tried to merge a bad PR, there just wasn’t enough testing in place to ensure that issues are caught early. Besides, a lot of the PRs in question have thousands of lines of content changed - I am not aware of a single human being who can validate that in a reasonable time and with good outcomes.
Impulse decisions on problem mitigation might help you in the short-term but will not scale. When it comes to decision around product and its success or velocity, hard controls cannot replace training and tools in place. The goal is to empower people to focus on their main jobs with minimal risk of failure that can be reliably prevented automatically. What this means is that to properly assess a problem, you need to understand at what stage the failure occurred, and how you can help prevent it in a way that does not cause more hassle than the problem itself.
That can be done in two ways - training and tooling.
By far, the most important aspect of ensuring that failures occur less is to train your staff. Help them understand how to check when things are off and need review. Yes, the learning is often times post-factum, but it’s nonetheless very effective. When you train your staff, you have the opportunity to showcase both best practices as well as potential pitfalls that sometimes are hard to catch for someone that does not have the experience to do that. If someone on your team did not sync with the design crew before throwing their feature into the planning pool, it’s likely much better to simply have a training sessions for your PMs discussing the process, rather than instituting a process in which there is a mandatory formal pre-review with the design team.
Training is also relatively inexpensive to do - your investment is time. It’s an investment with a high return.
The question you can ask yourself is “How can I leverage various tools to enable others to do their jobs in the best way possible?” Notice that I am not putting an emphasis on any particular part of the process - the tools can span areas and responsibilities. This can be anything around automation, validation, authoring and others. Absolutely anything that the product team might be involved in can be in some shape formalized that the process is not something that slows the team down but rather increases velocity because now you have a much more robust pipeline in getting things done. This goes hand-in-hand with good training - you still need to explain how different pieces of the puzzle work together, and why those are in place, but once again - it’s relatively cheap compared to dealing with constant failures that are the result of preventable causes.
Rather than creating artificial barriers to solve a problem, reverse the frame of thinking - what barriers can you remove in the way of shipping a quality product, while simultaneously empowering your team to make decisions in a way that reduce the probability of error. This is easy to fix, compared to fixing a broken product.