Why DNS configuration errors can take your entire site offline
When DNS goes wrong, everything can go wrong. A small misconfiguration can cause browsers worldwide to fail to find your domain, showing “This site can’t be reached” or “Server IP address could not be found.” The good news: most DNS outages are diagnosable and fixable in minutes—if you know where to look and what the errors mean.
This guide shows you how to quickly diagnose and resolve DNS configuration errors that cause complete site inaccessibility, with a focus on domain routing and nameserver issues. You’ll get a fast triage flow, command examples, common scenarios and their fixes, and prevention tips to avoid repeat incidents.
Quick DNS refresher: the path from domain to server
Before diving into fixes, align on the moving parts:
- Registrar: Where your domain is registered (e.g., Namecheap, GoDaddy). This is where you set which nameservers your domain uses.
- Authoritative DNS provider: The service hosting your zone (e.g., Cloudflare, Route 53, DNS Made Easy). This holds your records (A, AAAA, CNAME, etc.).
- Recursive resolver: The DNS server your ISP or device uses (e.g., 1.1.1.1, 8.8.8.8). It caches answers to speed things up.
- Nameservers (NS): The authoritative DNS servers that own your zone’s records. Your registrar delegates your domain to these.
- Records: A/AAAA link your domain to IP addresses; CNAME points to another hostname; NS delegates; SOA defines zone metadata; DS/DNSKEY are DNSSEC keys; TXT often holds verification.
If your registrar points to the wrong nameservers, or those nameservers don’t have the right records, your domain won’t resolve. Add DNSSEC mismatches, missing glue records, or apex CNAME errors, and you can get immediate outages.
Recognize the failures: what the error means
Knowing the error type helps you aim your fix:
- NXDOMAIN: The domain or subdomain does not exist. Often caused by wrong nameservers at the registrar, missing zone, or missing record.
- SERVFAIL: A server failed to answer. Frequently DNSSEC-related (mismatched DS/DNSKEY), but can also be a broken authoritative server, EDNS issues, or lame delegation.
- Timeout: No response from authoritative servers. Firewall blocking DNS, dead nameserver, or broken network path.
- REFUSED: The server refuses to answer (e.g., misconfigured ACLs on self-hosted DNS).
- Wrong IP/host: Record exists but points to the wrong place (mis-typed A/AAAA, wrong CNAME target, stale CDN records).
- Intermittent resolution: Mixed nameservers or some servers updated and others not; or geo/anycast issues.
15-minute triage: your shortest path to root cause
Run this checklist in order. Stop when you find the problem.
- Check domain status at the registry (WHOIS)
- Look for clientHold/serverHold, expired domain, or pendingTransfer states that can nullify DNS.
- Example commands:
- macOS/Linux: whois example.com
- Online: icann.org/en/lookup
- Confirm nameservers at the registrar
- At the registrar dashboard, note the NS set. Are those the nameservers you expect from your DNS provider?
- Common failure: Migrated DNS providers but forgot to update nameservers at the registrar.
- Query your domain with trace to see delegation
- Use dig to trace resolution from the root down:
dig +trace example.com
- Look for:
- Which NS the TLD (.com) delegates to
- Whether those NS respond
- Any SERVFAIL/NXDOMAIN along the path
- Ask the authoritative nameservers directly
- Identify the authoritative NS from the trace or provider UI, then:
dig @ns1.yourdnsprovider.net example.com A +norecurse
dig @ns1.yourdnsprovider.net www.example.com CNAME +norecurse
dig @ns1.yourdnsprovider.net example.com AAAA +norecurse
- If the authoritative NS doesn’t have the right records, fix them in your DNS provider’s zone.
- Test DNSSEC quickly
- If DNSSEC is enabled (check WHOIS DS records or your DNS provider), validate:
dig example.com A +dnssec
dig +trace example.com +dnssec
- Use dnsviz.net for a visual of DS/DNSKEY/RRSIG chain. A mismatched DS at the registrar causes global SERVFAIL.
- Check for apex and www coverage
- Many outages come from missing either example.com or www.example.com.
- If using a CNAME-like record at apex, ensure your provider supports ALIAS/ANAME; plain CNAME at apex is not allowed in classic DNS.
- Compare answers from multiple resolvers
- Quick propagation sanity:
dig example.com A @1.1.1.1
dig example.com A @8.8.8.8
dig example.com A @9.9.9.9
- If answers differ, some resolvers may be caching old data. Verify TTLs; you may need to wait out or flush caches after a fix.
Common DNS outage scenarios and how to fix them fast
1) Nameserver mismatch at registrar
- Symptom: NXDOMAIN or timeout. dig +trace shows TLD delegating to old NS; those NS either lack the zone or return no records.
- Cause: You changed DNS providers but didn’t update NS at the registrar, or set incorrect NS hostnames.
- Fix:
- In your registrar’s panel, set the nameservers to those given by your current DNS provider (e.g., ns1.provider.com, ns2.provider.com).
- Save and confirm the update. Nameserver delegation updates can take minutes to hours to propagate across the TLD registry and caches.
- Verify with:
dig NS example.com
dig +trace example.com
- Tip: Avoid mixing old and new NS. All delegations should point to the same provider set.
2) Missing apex A/AAAA or misused CNAME at apex
- Symptom: example.com fails, but www.example.com might work.
- Cause: No A/AAAA at apex; or attempted to use CNAME at apex with a provider that doesn’t support it.
- Fix:
- Add A and AAAA records at example.com pointing to your server or CDN-provided IPs.
- If you need to point apex to another hostname, use ALIAS/ANAME if your provider supports it.
- Ensure www has a CNAME to apex (or vice versa), but do not create CNAME loops.
3) Wrong record values or environment drift
- Symptom: Domain resolves to wrong site or old IP, or intermittently wrong after a migration.
- Cause: Old records left in place, typos, or multiple zones for the same domain across providers.
- Fix:
- Inventory all records in the authoritative zone. Remove stale entries and verify current IPs or CNAME targets.
- Set TTLs low (e.g., 60–300 seconds) before planned migrations to enable faster rollback.
- Confirm there is only one authoritative zone set via dig +trace; remove shadow zones at old providers.
4) DNSSEC DS/DNSKEY mismatch
- Symptom: SERVFAIL on most resolvers; domain works only when DNSSEC validation is disabled.
- Cause: DS record at the registrar doesn’t match the DNSKEY in your zone (e.g., after provider change or key rollover).
- Fix:
- If you changed DNS providers or disabled DNSSEC in the zone, remove DS records at the registrar.
- If you keep DNSSEC enabled, publish the correct DS at the registrar using your provider’s DS parameters (algorithm, digest, digest type).
- Validate with dnsviz.net and:
dig example.com DNSKEY +dnssec
dig example.com A +dnssec
- Note: DNSSEC changes propagate quickly at the registry, but caches may hold negative responses briefly.
5) Broken delegation with vanity/child nameservers (glue issues)
- Symptom: Timeout or SERVFAIL; dig +trace stops at the TLD or shows “lame delegation.”
- Cause: Using ns1.example.com as your nameserver for example.com without creating glue A/AAAA at the registry; or glue IPs are wrong.
- Fix:
- Register host records for your child nameservers at the registrar (often called “hostnames” or “glue records”), mapping ns1.example.com -> IP.
- Update the domain’s NS to those hostnames.
- Ensure the authoritative zone includes matching A/AAAA for ns1/ns2.
- Re-verify with:
dig +trace example.com
6) Registrar or registry hold, expired domain, or transfer issues
- Symptom: Sudden NXDOMAIN, or nameservers change to parking. WHOIS shows clientHold/serverHold or expired.
- Cause: Unpaid renewal, verification failure, or compliance hold.
- Fix:
- Renew domain and clear holds through registrar support.
- Wait for registry update; this can take minutes to propagate.
- Reconfirm NS delegation after release.
7) EDNS/fragmentation/firewall causing timeouts
- Symptom: Timeouts from certain networks or resolvers only; large DNSSEC responses fail.
- Cause: Firewalls dropping fragmented UDP packets or blocking EDNS options.
- Fix:
- Enable TCP fallback on authoritative DNS; many providers handle this automatically.
- Reduce response size (minimal ANY, avoid excessively large TXT/CAA sets).
- Consider disabling ECS (EDNS Client Subnet) if not needed.
- Test with:
dig example.com A +dnssec +bufsize=1232
dig example.com A +tcp
8) AAAA pitfalls, IPv6-only clients, or broken dual-stack
- Symptom: Some users can’t reach your site, especially on IPv6 networks.
- Cause: AAAA points to wrong IP; or you publish AAAA but the server isn’t listening on v6; or firewall blocks v6.
- Fix:
- Verify AAAA points to your server’s correct IPv6 and the server listens on v6.
- If you cannot support IPv6 yet, temporarily remove AAAA to force v4 connectivity.
- Confirm with:
dig example.com AAAA
ping6 your.ipv6.addr
9) CNAME conflicts and wildcard surprises
- Symptom: Some subdomains resolve to unexpected targets or not at all.
- Cause: A record and CNAME at same label (illegal), or wildcard (*) catches subdomains unintentionally.
- Fix:
- Never mix A/AAAA with CNAME at the same name.
- Remove or scope wildcards carefully; define explicit records for critical hosts.
- Validate effective answers with:
dig test-unexpected-sub.example.com A
10) CDN/proxy-specific DNS modes misunderstood
- Symptom: Domain resolves, but origin unreachable or loops through non-existent hostnames.
- Cause: Using CDN “proxied” mode without setting correct origin records or targets; using provider-specific flattened CNAMEs incorrectly.
- Fix:
- Follow your CDN’s instructions for DNS targets (often a CNAME to a cdn.example.net host).
- For apex, use ALIAS/ANAME/flattening if supported by your provider.
- Ensure your origin record (often origin.example.com) is not proxied if the CDN requires direct DNS-only.
Actionable diagnosis with examples
Use these commands and interpret the results to pinpoint issues.
1) Find who’s authoritative and what they say
dig NS example.com
dig +trace example.com
- If +trace shows TLD delegating to ns1.oldprovider.com but you expect ns1.newprovider.com, update your registrar’s NS.
- If authoritative NS respond with NXDOMAIN for example.com, your zone is missing—create/import it at that provider.
2) Validate A/AAAA and www
dig example.com A
dig example.com AAAA
dig www.example.com CNAME
dig www.example.com A
- If www is a CNAME to another host, confirm that host resolves to valid A/AAAA.
- Ensure both apex and www work unless you intentionally redirect one.
3) DNSSEC troubleshooting
dig example.com A +dnssec
- Look for the AD (Authenticated Data) flag when querying validating resolvers like 1.1.1.1:
dig @1.1.1.1 example.com A +dnssec
- If you get SERVFAIL and dnsviz shows a broken chain, fix DS at the registrar or disable DNSSEC until you can roll keys properly.
4) Compare resolvers and flush caches
- Different resolvers:
dig example.com A @1.1.1.1
dig example.com A @8.8.8.8
- Flush local caches:
- Windows: ipconfig /flushdns
- macOS: sudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder
- Linux (systemd): sudo resolvectl flush-caches
- Chrome: chrome://net-internals/#dns (clear host cache)
5) Test authoritatives directly, bypassing recursion
dig @ns1.provider.net example.com SOA +norecurse
dig @ns1.provider.net example.com A +norecurse
- If SOA/records appear only on some NS (ns1 vs ns2), your provider’s zone isn’t fully synced—open a ticket or wait for replication.
Resolution playbooks by scenario
Move to a new DNS provider without downtime
- Import your zone to the new provider. Verify records thoroughly.
- Reduce TTLs on current provider (A/AAAA/CNAME to 60–300 seconds) 24–48 hours before switching.
- Enable DNSSEC at the new provider and publish DS there only when ready—or temporarily disable DNSSEC for the cutover to avoid DS mismatches.
- Change nameservers at the registrar to the new provider.
- Monitor with dig +trace and external probes. After full propagation, remove old zone.
- If using DNSSEC, update the DS at the registrar to match the new provider’s DNSKEYs, or enable provider-managed DS publishing if supported.
Fix a broken DNSSEC configuration in a hurry
-
Option A: Temporarily disable DNSSEC
- Remove DS at the registrar. Keep RRSIG/DNSKEY in zone or disable DNSSEC at provider.
- Validate resolution returns to normal (no SERVFAIL).
- Plan a proper re-enable with a clean DS publish.
-
Option B: Correct the DS to match active DNSKEY
- Get DS digest from your DNS provider.
- Update DS at the registrar to match. Check with dnsviz and validating resolvers.
Repair vanity nameservers with missing glue
- Register hostnames at the registrar (e.g., ns1.example.com -> 203.0.113.10; ns2.example.com -> 203.0.113.11).
- Set your domain’s NS to ns1.example.com and ns2.example.com at the registrar.
- Inside your zone, create A/AAAA for ns1/ns2 to the same IPs.
- Verify with dig +trace that recursion reaches your NS and they answer authoritatively.
Restore apex and www resolution after an accidental deletion
- Add A to example.com pointing to your server IP; add AAAA if you support IPv6.
- Create www CNAME to example.com (or the reverse) to avoid divergent configs.
- Set TTL to 300s for quick corrections; increase later to reduce query load.
- Validate from multiple resolvers; flush caches if necessary.
When it’s not DNS (but still looks like DNS)
Don’t lose time chasing DNS if DNS is resolving correctly:
- hosts file overrides: Local /etc/hosts or Windows hosts can hijack resolution.
- Corporate DNS filtering: NXDOMAIN might be injected or domain blocked internally.
- Captive portals: Public Wi-Fi can intercept DNS queries.
- HTTP/TLS errors: If dig returns valid IPs but browser fails with HTTPS/TLS errors, the issue is at the web/app/CDN layer, not DNS.
To confirm: If dig returns the expected A/AAAA for your domain and traceroute/ping reach the IP, DNS is likely fine.
Verification and propagation best practices
-
Understand TTL and negative caching:
- Record TTL controls how long resolvers cache your answer.
- Negative TTL (from SOA MIN or explicit) controls how long NXDOMAIN is cached.
- After fixes, some users may still see failures until caches expire.
-
Use multi-region checks:
- Tools: whatsmydns.net, dnschecker.org, intodns.com, dnsviz.net.
- Confirm that different regions and ISPs agree on NS and A/AAAA.
-
Confirm authoritative consistency:
- Check multiple NS:
for ns in ns1.provider.net ns2.provider.net; do dig @$ns example.com SOA +norecurse; done
-
All should return identical SOA serials. If not, wait or escalate to your provider.
-
Revert TTLs after incident:
- Once stable, raise TTLs to 1800–3600+ seconds to reduce query volume and improve cache performance.
Prevention: build a DNS change safety net
- Version control your zone:
- Store DNS as code (e.g., Terraform, OctoDNS). Review changes via PRs with approvals.
- Staged rollouts and change windows:
- Reduce TTLs 24–48 hours before planned changes.
- Schedule during low-traffic windows with rollback plans.
- Monitoring and alerts:
- Set up health checks for DNS availability from multiple points.
- Monitor for NS changes at the registrar (unauthorized changes are a red flag).
- Watch for DNSSEC validation failures via dnsviz or monitoring tools.
- Provider redundancy (where feasible):
- Multi-DNS providers with identical zones can improve resilience, but require tight automation and DNSSEC planning. If you can’t automate consistently, a single robust provider is safer.
- Document nameserver ownership:
- Keep updated records of registrar login, DNS provider accounts, NS hostnames, and DS settings.
- Train teams:
- Ensure everyone knows the difference between registrar and DNS provider and how to run dig +trace.
Quick role-based fixes
-
Site owner (non-technical):
- Log in to your registrar; verify domain is active and nameservers match your DNS provider.
- If you recently changed providers, ensure NS were updated and DNSSEC DS is either removed or updated.
- If stuck, open tickets with registrar and DNS provider; share dig +trace output.
-
DevOps/engineer:
- Run dig +trace and dnsviz. Fix zone records, DNSSEC, or glue as indicated.
- Ensure apex A/AAAA and www coverage, correct TTLs, and consistent SOA serial across NS.
- Validate from multiple resolvers; flush caches where possible.
-
Registrar support:
- Check domain status and remove holds.
- Confirm NS delegation and DS records.
- Help register glue for vanity nameservers.
-
DNS provider support:
- Confirm zone exists, is published to all NS, and DNSSEC keys are current.
- Investigate EDNS/UDP size limits or anycast anomalies if timeouts persist.
Practical examples: from error to fix
Example A: “This site can’t be reached” and NXDOMAIN for example.com
- Diagnosis:
- whois shows domain active.
- dig +trace example.com reveals TLD delegates to ns1.oldhost.com; you moved to NewDNS last week.
- Fix:
- Update nameservers at registrar to ns1.newdns.net, ns2.newdns.net.
- Validate with dig NS example.com until it reflects.
- Confirm records exist at NewDNS and resolve globally.
Example B: SERVFAIL only when DNSSEC is enabled
- Diagnosis:
- dig example.com A +dnssec returns SERVFAIL; dnsviz shows DS at registrar doesn’t match zone DNSKEY after you changed providers.
- Fix:
- Remove DS at registrar or replace with the new provider’s DS.
- Re-test; SERVFAIL disappears and AD flag appears on validating resolvers.
Example C: www works but naked domain doesn’t
- Diagnosis:
- www.example.com CNAME -> app.hosted.net -> resolves fine.
- example.com has no A/AAAA; provider doesn’t support CNAME at apex.
- Fix:
- If provider supports ALIAS/ANAME, point apex to app.hosted.net via ALIAS.
- Otherwise, use A/AAAA to the platform’s provided IPs or move to a provider with apex flattening.
Example D: Intermittent timeouts from some ISPs
- Diagnosis:
- dig +trace sometimes times out at ns2.provider.net; ns1 responds fine.
- Only some regions affected.
- Fix:
- Open a ticket with provider; ask them to check anycast reachability or rate limiting.
- As a stop-gap, ask to remove the failing NS from delegation or add an additional healthy NS if supported.
A 60-minute deep-dive workflow (when quick triage isn’t enough)
- Map the entire delegation chain:
dig +trace example.com
- Record root -> TLD -> NS and note IPs.
- Check authoritative health per server:
for ns in ns1 ns2 ns3; do
dig @$ns.example-dns.net example.com SOA +norecurse +time=2 +tries=1
dig @$ns.example-dns.net example.com DNSKEY +norecurse +dnssec
dig @$ns.example-dns.net www.example.com A +norecurse
done
- Look for timeouts, inconsistent SOAs, missing records.
- Validate EDNS and response sizes:
dig example.com A +dnssec +bufsize=4096
dig example.com A +dnssec +bufsize=1232
dig example.com A +tcp
- If UDP with large bufsize fails but TCP works, suspect fragmentation/firewall.
- Compare external viewpoints:
- Use at least three external tools (Cloudflare 1.1.1.1, Google 8.8.8.8, Quad9 9.9.9.9) and two checker sites.
- Inspect zone for conflicts:
- Check for illegal combinations (A + CNAME at same label), unintended wildcards, and missing apex.
- Verify that CDN hostnames you point to are valid and provisioned.
- Review registrar settings:
- NS list exactly matches provider’s doc?
- DS matches DNSKEY? If unsure, temporarily remove DS to isolate DNSSEC.
- Decide and execute remediation:
- Correct delegation, fix zone, or disable DNSSEC temporarily.
- Raise provider tickets with evidence (dig outputs, dnsviz link).
- Monitor for recovery; document RCA.
FAQs
-
How long do DNS changes take to propagate?
- Most NS delegation and record changes are visible within minutes; caching TTLs and TLD update cycles can push this to hours in some cases. Plan for up to 24 hours globally, but verify sooner with authoritative queries and multiple resolvers.
-
Can I use CNAME at the apex (example.com)?
- Classic DNS forbids it. Use ALIAS/ANAME or provider “flattening,” or point A/AAAA directly.
-
Do I need IPv6 records?
- Not strictly, but increasingly recommended. If you publish AAAA, ensure your server and firewall are IPv6-ready.
-
Can I have both A and CNAME on the same name?
- No. A label can’t be a CNAME and have other records (except some special cases like DNSSEC). Use one or the other.
-
What is “glue” and when is it needed?
- Glue are A/AAAA records at the registry for nameservers that are children of the domain they serve (e.g., ns1.example.com for example.com). Without glue, resolvers can’t find your nameserver’s IP to start resolving.
-
Is DNSSEC worth it if it can break things?
- Yes, it protects against tampering. Use it with managed providers that automate key rollovers, and follow safe migration steps to avoid DS mismatches.
A concise incident checklist to keep handy
- Is the domain active (no holds/expired)? WHOIS check.
- Do registrar nameservers match your DNS provider’s NS exactly?
- Does dig +trace show clean delegation to responsive authoritative NS?
- Do apex and www resolve (A/AAAA, CNAME/ALIAS)?
- If DNSSEC is on, does DS match DNSKEY? dnsviz clean?
- Are authoritative NS consistent (same SOA serial, same answers)?
- Any weirdness like apex CNAME, A + CNAME conflict, or unintended wildcards?
- Does IPv6 resolve correctly if published?
- Are responses too large or failing UDP? Try TCP and reduce size.
Final thoughts
DNS can be fragile when changes are rushed or undocumented, but outages are almost always fixable with a systematic approach. Start at the registrar, follow delegation with dig +trace, validate authoritative answers, and keep DNSSEC and glue in mind. Once you recover, invest a bit of time in versioned zones, monitoring, and safer change practices. The next time something breaks, you’ll be back online in minutes—not hours.