docs: salvage network operations patterns

This commit is contained in:
Affaan Mustafa
2026-05-11 07:51:08 -04:00
committed by Affaan Mustafa
parent d52cdccb0d
commit 0e12267ff2
14 changed files with 734 additions and 21 deletions

View File

@@ -0,0 +1,129 @@
---
name: homelab-network-setup
description: Practical home and homelab network planning for gateways, switches, access points, IP ranges, DHCP reservations, DNS, cabling, and common beginner mistakes.
origin: community
---
# Homelab Network Setup
Use this skill to design a home or small-lab network that can grow without
needing a full rebuild.
## When to Use
- Planning a new home network or redesigning an ISP-router-only setup.
- Choosing gateway, switch, and access point roles.
- Designing IP ranges, DHCP scopes, static reservations, and DNS.
- Preparing for future VLANs, Pi-hole, NAS, lab servers, or VPN access.
- Troubleshooting a new network that has double NAT, unstable Wi-Fi, or changing
server addresses.
## How It Works
Start by separating device roles:
```text
Internet
|
Modem or ONT
|
Gateway or router NAT, firewall, DHCP, DNS, inter-VLAN routing
|
Managed switch wired clients, AP uplinks, optional VLAN trunks
|
Access points Wi-Fi only; ideally wired backhaul
Servers and NAS stable addresses, DNS names, monitoring
Clients and IoT DHCP pools, isolated later if VLANs are available
```
Pick a gateway that matches the operator, not just the feature checklist:
| Option | Best fit | Notes |
| --- | --- | --- |
| ISP router | Basic internet only | Limited control and often poor VLAN support |
| UniFi gateway | Managed home network | Good UI, ecosystem lock-in |
| OPNsense or pfSense | Flexible homelab | Strong VLAN, firewall, VPN, and DNS control |
| MikroTik | Advanced network users | Powerful, but easy to misconfigure |
| Linux router | Tinkerers | Document rollback before using as primary gateway |
## IP Plan
Avoid the most common default, `192.168.1.0/24`, when you expect to use VPNs.
It often conflicts with hotels, offices, and ISP routers.
```text
Example small homelab plan:
192.168.10.0/24 trusted clients
192.168.20.0/24 IoT and media devices
192.168.30.0/24 servers and NAS
192.168.40.0/24 guest Wi-Fi
192.168.99.0/24 network management
Gateway convention: .1
Infrastructure reservations: .2 through .49
Dynamic DHCP pool: .50 through .240
Spare room: .241 through .254
```
Use `home.arpa` for local names. It is reserved for home networks and avoids the
leakage/conflict problems of ad hoc names like `home.lan`.
```text
nas.home.arpa
pihole.home.arpa
gateway.home.arpa
switch-01.home.arpa
```
## DHCP And DNS
- Use DHCP reservations for anything you SSH into, bookmark, monitor, or expose
as a service.
- Hand out the gateway as DNS until a local resolver is intentionally deployed.
- If using Pi-hole or another DNS filter, give it a reservation first, then point
DHCP DNS options at that address.
- Keep a small static/reserved range per subnet so replacements do not collide
with dynamic leases.
## Cabling And Wi-Fi
- Prefer wired AP backhaul over mesh when you can run Ethernet.
- Use a PoE switch for APs and cameras if the budget allows it.
- Label both ends of each cable and keep a simple port map.
- Put the gateway, switch, DNS server, and NAS on UPS power if outages are common.
## Examples
### Beginner Upgrade
Goal: Keep the ISP router but stabilize a small lab.
1. Set DHCP reservations for NAS, Pi, and any SSH hosts.
2. Move local names to `home.arpa`.
3. Disable duplicate DHCP servers on secondary routers or APs.
4. Wire the main AP instead of relying on wireless backhaul.
### VLAN-Ready Plan
Goal: Prepare for future segmentation without enabling it immediately.
1. Choose non-overlapping /24 ranges for trusted, IoT, servers, guest, and
management.
2. Reserve .1 for the gateway and .2-.49 for infrastructure on every subnet.
3. Buy a gateway and switch that support VLANs and inter-VLAN firewall rules.
4. Document which SSIDs and switch ports will eventually map to each network.
## Anti-Patterns
- Double NAT without a reason or documentation.
- Using `192.168.1.0/24` when VPN access is planned.
- Dynamic addresses for NAS, Pi-hole, Home Assistant, or other service hosts.
- Consumer routers repurposed as APs while their DHCP servers are still enabled.
- Flat networks with cameras, smart plugs, laptops, and servers all sharing the
same trust boundary.
## See Also
- Skill: `network-interface-health`
- Skill: `network-config-validation`

View File

@@ -0,0 +1,210 @@
---
name: network-config-validation
description: Pre-deployment checks for router and switch configuration, including dangerous commands, duplicate addresses, subnet overlaps, stale references, management-plane risk, and IOS-style security hygiene.
origin: community
---
# Network Config Validation
Use this skill to review network configuration before a change window or before
an automation run touches production devices.
## When to Use
- Reviewing Cisco IOS or IOS-XE style snippets before deployment.
- Auditing generated config from scripts or templates.
- Looking for dangerous commands, duplicate IP addresses, or subnet overlaps.
- Checking whether ACLs, route-maps, prefix-lists, or line policies are referenced
but not defined.
- Building lightweight pre-flight scripts for network automation.
## How It Works
Treat config validation as layered evidence, not as a complete parser. Regex
checks are useful for pre-flight warnings, but final approval still needs a
network engineer to review intent, platform syntax, and rollback steps.
Validate in this order:
1. Destructive commands.
2. Credential and management-plane exposure.
3. Duplicate addresses and overlapping subnets.
4. Stale references to ACLs, route-maps, prefix-lists, and interfaces.
5. Operational hygiene such as NTP, timestamps, remote logging, and banners.
## Dangerous Command Detection
```python
import re
DANGEROUS_PATTERNS: list[tuple[re.Pattern[str], str]] = [
(re.compile(r"\breload\b", re.I), "reload causes downtime"),
(re.compile(r"\berase\s+(startup|nvram|flash)", re.I), "erases persistent storage"),
(re.compile(r"\bformat\b", re.I), "formats a device filesystem"),
(re.compile(r"\bno\s+router\s+(bgp|ospf|eigrp)\b", re.I), "removes a routing process"),
(re.compile(r"\bno\s+interface\s+\S+", re.I), "removes interface configuration"),
(re.compile(r"\baaa\s+new-model\b", re.I), "changes authentication behavior"),
(re.compile(r"\bcrypto\s+key\s+(zeroize|generate)\b", re.I), "changes device SSH keys"),
]
def find_dangerous_commands(lines: list[str]) -> list[dict[str, str | int]]:
findings = []
for line_number, line in enumerate(lines, start=1):
stripped = line.strip()
for pattern, reason in DANGEROUS_PATTERNS:
if pattern.search(stripped):
findings.append({
"line": line_number,
"command": stripped,
"reason": reason,
})
return findings
```
## Duplicate IPs And Subnet Overlaps
```python
import ipaddress
import re
from collections import Counter
IP_ADDRESS_RE = re.compile(
r"^\s*ip address\s+"
r"(?P<ip>\d{1,3}(?:\.\d{1,3}){3})\s+"
r"(?P<mask>\d{1,3}(?:\.\d{1,3}){3})\b",
re.I | re.M,
)
def extract_interfaces(config: str) -> list[dict[str, str]]:
results = []
current = None
for line in config.splitlines():
if line.startswith("interface "):
current = line.split(maxsplit=1)[1]
continue
match = IP_ADDRESS_RE.match(line)
if current and match:
ip = match.group("ip")
mask = match.group("mask")
network = ipaddress.ip_interface(f"{ip}/{mask}").network
results.append({"interface": current, "ip": ip, "network": str(network)})
return results
def find_duplicate_ips(config: str) -> list[str]:
ips = [entry["ip"] for entry in extract_interfaces(config)]
counts = Counter(ips)
return sorted(ip for ip, count in counts.items() if count > 1)
def find_subnet_overlaps(config: str) -> list[tuple[str, str]]:
networks = [ipaddress.ip_network(entry["network"]) for entry in extract_interfaces(config)]
overlaps = []
for index, left in enumerate(networks):
for right in networks[index + 1:]:
if left.overlaps(right):
overlaps.append((str(left), str(right)))
return overlaps
```
## Management-Plane Checks
Parse VTY blocks by section so access-class checks do not spill across unrelated
lines.
```python
import re
def iter_blocks(config: str, starts_with: str) -> list[str]:
blocks = []
current: list[str] = []
for line in config.splitlines():
if line.startswith(starts_with):
if current:
blocks.append("\n".join(current))
current = [line]
continue
if current:
if line and not line.startswith(" "):
blocks.append("\n".join(current))
current = []
else:
current.append(line)
if current:
blocks.append("\n".join(current))
return blocks
def check_vty_blocks(config: str) -> list[str]:
issues = []
for block in iter_blocks(config, "line vty"):
if re.search(r"transport\s+input\s+.*telnet", block, re.I):
issues.append("VTY allows Telnet; require SSH only.")
if not re.search(r"\baccess-class\s+\S+\s+in\b", block, re.I):
issues.append("VTY block has no inbound access-class source restriction.")
if not re.search(r"\bexec-timeout\s+\d+\s+\d+\b", block, re.I):
issues.append("VTY block has no explicit exec-timeout.")
return issues
```
## Security Hygiene Checks
```python
SECURITY_PATTERNS = [
(re.compile(r"\bsnmp-server community\s+(public|private)\b", re.I),
"default SNMP community configured"),
(re.compile(r"\bsnmp-server community\s+\S+", re.I),
"SNMPv2 community string configured; prefer SNMPv3 authPriv"),
(re.compile(r"\bip ssh version 1\b", re.I),
"SSH version 1 enabled"),
(re.compile(r"\benable password\b", re.I),
"enable password is present; use enable secret"),
(re.compile(r"\busername\s+\S+\s+password\b", re.I),
"local username uses password instead of secret"),
]
BEST_PRACTICE_PATTERNS = [
(re.compile(r"\bntp server\b", re.I), "NTP server"),
(re.compile(r"\bservice timestamps\b", re.I), "log timestamps"),
(re.compile(r"\blogging\s+\S+", re.I), "logging destination or buffer"),
(re.compile(r"\bsnmp-server group\s+\S+\s+v3\s+priv\b", re.I), "SNMPv3 authPriv group"),
(re.compile(r"\bbanner\s+(login|motd)\b", re.I), "login banner"),
]
def check_security(config: str) -> list[str]:
return [message for pattern, message in SECURITY_PATTERNS if pattern.search(config)]
def check_missing_hygiene(config: str) -> list[str]:
return [
f"Missing {description}"
for pattern, description in BEST_PRACTICE_PATTERNS
if not pattern.search(config)
]
```
## Examples
### Change-Window Preflight
1. Run dangerous-command checks on the exact snippet to be pasted.
2. Run duplicate IP and subnet overlap checks against the full candidate config.
3. Confirm every referenced ACL, route-map, and prefix-list exists.
4. Confirm rollback commands and out-of-band access before any management-plane
change.
### Automation Preflight
Use validation as a blocking gate before Netmiko, NAPALM, Ansible, or vendor API
automation pushes a generated config. Fail closed on dangerous commands and
credentials. Warn on best-practice gaps that are outside the change scope.
## Anti-Patterns
- Treating regex validation as a device parser.
- Applying generated config without a dry-run diff.
- Recommending SNMPv2 community strings as a monitoring requirement.
- Checking VTY blocks with regex that can accidentally span unrelated sections.
- Testing firewall behavior by disabling ACLs instead of reading counters/logs.
## See Also
- Agent: `network-config-reviewer`
- Agent: `network-troubleshooter`
- Skill: `network-interface-health`

View File

@@ -0,0 +1,152 @@
---
name: network-interface-health
description: Diagnose interface errors, drops, CRCs, duplex mismatches, flapping, speed negotiation issues, and counter trends on routers, switches, and Linux hosts.
origin: community
---
# Network Interface Health
Use this skill when a network symptom might be caused by a physical link, switch
port, cable, transceiver, duplex setting, or congested interface.
## When to Use
- A host or VLAN has packet loss, latency spikes, or intermittent reachability.
- A switch or router interface shows CRCs, runts, giants, drops, resets, or flaps.
- You need to compare both ends of a link before replacing hardware.
- A change window needs before/after interface counter evidence.
- Monitoring reports rising `ifInErrors`, `ifOutErrors`, or `ifOutDiscards`.
## How It Works
Interface counters are evidence, but the trend matters more than the absolute
number. Capture a baseline, wait a measurement interval, capture again, then
compare increments.
```text
show interfaces <interface>
show interfaces <interface> status
show logging | include <interface>|changed state|line protocol
```
On Linux hosts:
```text
ip -s link show <interface>
ethtool <interface>
ethtool -S <interface>
```
## Counter Reference
| Counter | Meaning | Common cause |
| --- | --- | --- |
| CRC | Received frame checksum failed | Bad cable, dirty fiber, bad optic, duplex mismatch |
| input errors | Aggregate receive-side errors | Check sub-counters before concluding |
| runts | Frames below minimum Ethernet size | Duplex mismatch, collision domain, faulty NIC |
| giants | Frames larger than expected MTU | MTU mismatch or jumbo-frame boundary |
| input drops | Device could not accept inbound packets | Burst, oversubscription, CPU path, queue pressure |
| output drops | Egress queue discarded packets | Congestion, QoS policy, undersized uplink |
| resets | Interface hardware reset | Flapping, keepalive, driver, optic, power |
| collisions | Ethernet collision counter | Half duplex or negotiation mismatch |
## Diagnosis Flow
### CRCs Or Input Errors
1. Confirm counters are incrementing, not just historical.
2. Check both ends of the link. Receive-side errors usually point to the signal
arriving on that side, not necessarily the port reporting the error.
3. Replace patch cable or clean/replace fiber and optics.
4. Confirm speed/duplex settings match on both sides.
5. Check logs for flap events around the same timestamp.
### Drops
1. Separate input drops from output drops.
2. Compare interface rate against capacity.
3. Check QoS policy, queue counters, and whether the link is an oversubscribed
uplink.
4. Treat queue tuning as secondary. First prove whether the link is congested.
### Duplex And Speed
Prefer auto-negotiation on modern Ethernet links when both sides support it. If
one side must be fixed, configure both sides explicitly and document why. Never
mix fixed speed/duplex on one side with auto on the other.
```text
show interfaces <interface> | include duplex|speed
```
## Safe Parser Example
Slice each interface block from one header to the next. Do not use an arbitrary
character window; large interface blocks can cause counters to be missed or
assigned to the wrong port.
```python
import re
from typing import Any
HEADER_RE = re.compile(
r"^(?P<name>\S+) is (?P<status>(?:administratively )?down|up), "
r"line protocol is (?P<protocol>up|down)",
re.I | re.M,
)
ERROR_RE = re.compile(r"(?P<input>\d+) input errors, (?P<crc>\d+) CRC", re.I)
DROP_RE = re.compile(r"(?P<output>\d+) output errors", re.I)
DUPLEX_RE = re.compile(r"(?P<duplex>Full|Half|Auto)-duplex,\s+(?P<speed>[^,]+)", re.I)
def parse_show_interfaces(raw: str) -> list[dict[str, Any]]:
headers = list(HEADER_RE.finditer(raw))
interfaces = []
for index, header in enumerate(headers):
end = headers[index + 1].start() if index + 1 < len(headers) else len(raw)
block = raw[header.start():end]
errors = ERROR_RE.search(block)
drops = DROP_RE.search(block)
duplex = DUPLEX_RE.search(block)
interfaces.append({
"name": header.group("name"),
"status": header.group("status"),
"protocol": header.group("protocol"),
"duplex": duplex.group("duplex") if duplex else "unknown",
"speed": duplex.group("speed").strip() if duplex else "unknown",
"input_errors": int(errors.group("input")) if errors else 0,
"crc_errors": int(errors.group("crc")) if errors else 0,
"output_errors": int(drops.group("output")) if drops else 0,
})
return interfaces
```
## Examples
### CRCs On One Switch Port
1. Capture counters on the local port.
2. Capture counters on the connected remote port.
3. Replace the cable or optic before changing routing or firewall rules.
4. Clear counters only after recording the baseline.
5. Recheck after a fixed interval.
### Internet Slow But LAN Is Fine
1. Check WAN interface drops/errors.
2. Check LAN uplink utilization and output drops.
3. Check gateway CPU if the WAN link is clean but throughput is still low.
4. Compare wired and wireless tests before blaming upstream service.
## Anti-Patterns
- Clearing counters before saving a baseline.
- Looking at only one side of a link.
- Assuming all historical CRCs are active problems without a time window.
- Mixing auto-negotiation on one side with fixed speed/duplex on the other.
- Treating output drops as a cable problem before checking congestion.
## See Also
- Agent: `network-troubleshooter`
- Skill: `network-config-validation`
- Skill: `homelab-network-setup`