There is no perfect IT infrastructure in real world. A downtime can happen to anyone at any time, but there are ways to minimize the impact or avoid the nightmare. The best way to avoid such catastrophes to adopt proactive approach rather then reactive when designing, deploying and maintaining your systems and network. Being prepared for the worst can save you when the worst does eventually happen.
Step1 Proper documentation
Document network layout, subnet, IP addresses, Visio diagram, password, config files, operation manual, software license, service tag, vendor and supplier contact. document filed away in a quickly accessible location.
Step2 Be organised
Organise data centre, cabling, label servers & switches, Label CDs and DVDs.
Step3 Maintain standard
Maintain standard hardware and software, SOE, application. Maintain standard vendor and supplier. Create specific contact point for each vendor.
Step4 upgrade and patch up
Deploy patches and hotfix using WSUS deployment or other means. update antivirus and orchestrate virus scanning.
Step5 Diagnose and troubleshoot
Run diagnostic tools such as addiag, netdiag, dcdiag. use third party tools like solarwinds engineers tools, desktop central, manage engine AD plus, wireshark, prtg network traffic grapher.
Backup file server, AD, database, ESX using BackupExec or commvault or Veeam. restore backed up data time to time in different location to test your backup.
Bad things might happen so being prepared you can avoid catastrophe. you can have peace of mind and feel confident while doing your work. What else you suggest to avoid a nightmare?