Many “emergency” situations in our modern world would’ve been easy to fix had they been foreseen in advance. If only we’d known what was going to happen, the badness could’ve easily been prevented.
Unfortunately, when problems are discovered only “as they happen” in production, everyone must race to minimize the damage and put out the fires, dramatically increasing the cost and fallout of otherwise-trivial problems.
Yes, you, dear reader. You travel through time. A second has passed since you started reading this paragraph. You are never getting that second back (sorry!).
For all of its complexities, time (or at least our perception of it) is the most reliable thing in the universe. With every passing second, the universe ticks forward in time.
So, now we’ve established that time travel is not only possible, but guaranteed. Unfortunately, this sort of time travel isn’t very useful for our purposes. It’s not useful because everyone around us is also traveling through time at the same rate. If we’re doing a great job of paying attention, perhaps we’ll notice a problem before everyone else and maybe we’ll be able to fix it before anyone else notices, but this is still the definition of an emergency– we’re in a race against the clock. It’s exhausting, and we’ll usually lose.
If we had a way to send messages back to our past, perhaps disasters could be averted or great fortunes could be won. Alas, time is an arrow, and that arrow only points to the future. Unfortunately, changing the past is out.
So, we’ll need to change the present based on what happens in the future.
If only we had some sort of magical time machine to let us see what was going to go wrong before it actually does. Seems like an overused trope for TV shows and so many of my favorite movies, right?
What would you say if I told you that you’re surrounded by working time machines, you just have to use them?
Before I tell you how to use or build time machines that let you glimpse the future, let’s restate our goal– we want to prevent future emergencies by discovering and fixing them in advance.
A key reason we’re so often surprised by emergency problems is that we expect the tools we use successfully today to continue to work tomorrow. This is a fundamental mistake.
It’s probably not true that Einstein said “The definition of insanity is doing the same thing over and over again, but expecting different results,” but it’s absolutely true that I have said “The definition of insanity is doing the same thing over and over again, and expecting the same results.“
That’s because we can’t ever truly “do the same thing” over and over again: before we were doing it then, but when we do it again, we’re doing it now. We’re in a changed universe.
The only constant is change, and the present is simply different than than the past; similarly, the future will be different than the present in innumerable ways. Many of those differences will be unpredictable, but fortunately some are predictable at high confidence.
For example, our software and services in the future will:
Given this knowledge, many of our software emergencies are entirely predictable:
Computers have a rather primitive understanding of the real Universe; they largely believe whatever inputs we give them. By controlling the inputs, we control their Universe.
Your computer can travel to the future even more easily than a DeLorean. The UI isn’t quite as cool, but it doesn’t require plutonium. Simply open your system control panel, turn off the “Set time automatically” option, and click the button to adjust the system date/time to your target.
Almost immediately, all hell will break loose. Microsoft Teams and most other applications will go “offline.”
Webpages will stop loading:
Signed programs might start getting reported as “Unknown publisher”:
…and on and on.
Now, as a time machine, this one’s a bit buggy. It shows what is likely to happen in the future if nothing else changes. But some things will change. Most web servers are now using the ACME protocol to automatically renew certificates shortly in advance of their expiration, so setting your clock five years into the future is likely to turn up a lot of false positives. However, setting your clock five days into the future is probably reasonable – you’ll get enough advance warning to figure out why a certificate is expiring, what the implications are, and deploy a new certificate.
Automated renewal doesn’t solve all problems for certificates though– for instance, it doesn’t fix up certificates embedded in client software, and it typically only applies to the end-entity certificate, not intermediates. Back in 2019, Mozilla accidentally let a certificate expire, breaking all extensions. This week, an intermediate certificate used by LetsEncrypt expired, leading to a trail of broken apps, sites, and devices.
In the final screenshot above, Windows shows the “Unknown publisher” warning because the software’s signature lacks a timestamp.
Without an Authenticode timestamp, when the signer’s certificate expires, all of the signatures from that certificate are deemed invalid. Good luck replacing every binary on every user’s system.
Beyond certificates expiring, you’re likely to notice other issues as well– trial software will expire, Strict-Transport-Security preloads will stop working, etc. For each of these issues, you will need to investigate and decide to take action.
Note: Depending on what you’re trying to do, you might not need to change the entire system clock; you might be able to change the clock for just one app.
One of the most common predictable emergencies is when a scenario begins to fail because a software’s version number gets “too high.”
This happened so commonly in Windows client software that Microsoft changed the GetVersionEx function to start lying to applications unless they declare (via manifest) that they can handle the truth. Similarly, Windows 11 will advertise itself as version 10.0 to websites.
The Chromium team, anticipating that Chrome’s upcoming release of 100
is likely to break websites, built-in a simple flag to allow folks to test with the new 100
version number in the browser’s User-Agent string. Here’s an example of a problem (actually two) found when using the new “Force Version 100” flag:
As you can see, the website incorrectly parses the User-Agent string and concludes that Chrome is outdated, pushing the user to install another browser using icons circa 2009. And closer to home, my Show Chrome Version extension truncates the string 100
to 10
when rendering the information in the toolbar button. Oops.
Even if your browser vendor doesn’t offer a simple “Fake the future” flag, you can typically override the User-Agent string via a command-line argument (--user-agent
), DevTools Command (Network Conditions
), extension, or simple FiddlerScript rule.
Most state-of-the-art software these days is available in multiple channels— a current version (commonly called “Release” or “Stable”) and future versions (often called “Pre-Release”, “Beta”, “Alpha”, “Dev”, “Canary”, “Preview”, or “Nightly”).
All major web browsers, in particular, follow this paradigm.
Using Pre-Release versions of browsers is, by far, the best way to learn what changes are coming in the future that might break your sites and services. There’s no higher-fidelity way (short of that plutonium-powered DeLorean) to find out what bugfixes, new features, and regressions (gulp) are headed for your user-base than to simply try the code that everyone will be running in a few weeks or months.
This option is convenient, low-friction, and is highly recommended for developers, IT, and a small “ring” of every Enterprise’s everyday users.
I cannot count how many times we (Chromium or Microsoft) got feedback from a Beta/Dev/Canary user of the browser that we used to fix a problem before it broke anyone in the Stable channel. By way of example– two days before Chrome 88 branched for Stable release, an Enterprise reported that their app was broken in Beta. We quickly investigated and isolated the regression in handling of XSLTs. We fixed it within a day. Unfortunately, a different department in the same Enterprise did not deploy a Beta ring, and a regression in their workflow wasn’t discovered until 88 Stable released, impacting everyone in the department.
Google Chrome and Microsoft Edge recently introduced support for an Extended Stable version, which is a bit like a time-machine that keeps your users “stuck in the past”, allowing them to run an (even numbered) “Extended Stable” release for up to four weeks beyond the release date of the subsequent (odd numbered) “Stable” channel. This is an interesting variant of the time machine approach– you’re hoping that the extra four weeks will allow any problems in the “Stable” release to be found and fixed before upgrading to the next release. But it’s not a panacea. Say you’ve got your users on Edge 94 Extended Stable. They skip Edge 95 entirely, and four weeks later are upgraded to Edge 96 Extended Stable. While any problems in Edge 95 may have been mitigated before your users update to Edge 96, only problems found in the pre-release channels of Edge 96 will get fixed before everyone moves up to 96.
Beyond running pre-release channels yourself, you can also take advantage of published documentation about what changes are coming in the future. Ranging from Edge’s Site Impacting Changes list to our Roadmap to the Chrome Platform Status page, you can see lists of top changes that are heading to Stable in the future. While documentation is useful, I still strongly recommend running Beta/Dev channels in your environment– our documentation covers the top changes we made on purpose— it doesn’t cover accidental bugs or the thousands of other changes made to each release that we didn’t expect to break anyone.
Some Enterprises have asked “Can’t you just give us the full changelist for each release?!?” and I’ve been put in the unfortunate and dangerous position of saying “I know enough about you and your business to know what you’re asking for isn’t what you really want.”
The list changes in each new Stable release would run to hundreds of pages. A change like “Enable TableNG“, which completely replaced the table-layout engine, would be both extremely difficult to describe, and you as an IT Manager would be poorly equipped to understand whether it would impact your sites — the only practical approach is to test your sites and apps in a pre-release change.
I promise.
There are other ways to discover what will happen in the future. Run your browser with the F12 Developer Tools open and watch the Console and Issues tabs for upcoming changes and deprecation notices:
Before deploying new features like Content-Security-Policy on your website, use the report_only mode to collect issue reports from real-world users, before enabling enforcement of those policies.
In many cases, sites quietly broadcast when something will break– for instance, a server’s certificate contains its expiration date. You can write rules in FiddlerScript to warn you on every HTTPS response whose certificate is soon to expire.
You’re a time traveler. Act like one!
-Eric