I woke up in the morning, got to the desk in my home office, checked my email, discord, and the news. Then I switched from my desktop to my laptop and... there's no internet.
That's weird. I just browsed the net on my PC, so what's up with the laptop? Both are connected to the same network, so it's not the problem of the network not having connectivity. As such, the problem lies between my ISP's modem and the laptop (inclusive).
I started with disconnecting and reconnecting the ethernet network cable (it's a pretty stationary laptop, so I keep it wired). That didn't fix anything, apart from displaying a short spinning animation indicating it's trying to get an IP address assigned (a DHCP issue then?). Just to be sure it's nothing on the laptop side I did a reboot, and then power-cycled the nearest network switch for good measure as well. No luck.
Following up on the DHCP lead I logged into my home server, which runs the DHCP daemon... wait... what is this?
ssh: connect to host home server port 22: No route to host
So I moved the chair a bit to check my server rack, and found the home server dark. That's unusual. On closer inspection actually the LEDs on the motherboard next to the power/reboot buttons were lit. A minor explanation here: I use customized Open Benchtable mounts, so the mobo is easily accessible; at the same time it means there are no power/reboot buttons on the case – as there is no case – so I rely on mobos having power/reboot buttons directly on them (or, failing that, small buttons-on-PCBs that you hook into the normal case button connector on the mobo).
I clicked the power button, and... even the two last LEDs went dark. Not great. They did light back up a few seconds later though, so re-tried a couple of times, with the same result. The closest I got to a "fully functional and running server" was the CPU fan spinning up for 0.5 seconds.
At this point I had good news and bad news:
The next step was to turn on some DHCP server in the network so that the Internet actually works in the household, and to let everyone using the server know that there are problems.
Of course it's rarely that the whole computer dies – usually it's just one component. As such, the next step was to figure out which component(s) are defective.
The usual algorithm for this is:
In my case I basically run out of options at point 5, which translates to: it's what's left, i.e. the problem lies either in the CPU, the motherboard, the PSU, or ?all the RAM dice? (unlikely, but at the end of the day anything can break). And to figure out which one is it, you have to start taking each of these components and testing them on a different setup AND/OR replacing the component in the debugged setup with a working one (worst case scenario is if doing this causes the good component(s) to also break). This requires one to have at least another (ideally similar) computer – thankfully I have some old hardware lying around.
I've started with hooking up a different PSU, since that's obviously the easiest to swap out, but also the most probable issue. And the "minimized" server actually started normally, with no issues whatsoever! So at that point I was pretty sure it's the PSU, but to double check things I've added all the PCIe peripherals, and... it booted again with no issue. Cool.
Unfortunately it turned out I don't have a PSU I could use as a replacement. While I have some modular PSUs lying around, they either were from a different manufacturer (which would require me to order new modular power cables to hook up all the HDDs), or were from the same company but didn't have all the connectors I needed (to be more exact: the PSU I had was missing one custom "SATA/Molex" PSU connector). So I had to order a new PSU from the same company.
Thankfully I did this debugging in the early morning, so the replacement PSU arrived by post by early evening. After connecting it all back together the home server booted without an issue. So problem solved. All that was left was to disable the temporary DHCP and... write a blog post about it I guess?
While things breaking can be frustrating at times, I do have to say I did enjoy this bit of relatively simple technical work – it was a nice distraction from the paperwork that awaited me for the rest of that day ;)