Most teams say they need “uptime monitoring,” but what they actually need is the right mix of check types. A single HTTP probe cannot tell you whether your DNS is wrong, your SSL certificate is about to expire, your cron jobs stopped running, your Redis instance is refusing connections, or your users in one region are seeing severe packet loss. This guide breaks down the major monitoring checks modern teams should run, what each one catches, where each one falls short, and how to combine them into a monitoring stack that detects real failures before customers do.
Monitoring fails when teams pick a check because it is easy, not because it matches the failure mode they care about. A simple GET /health monitor is useful, but it is not enough. If your TCP port is open but the app returns garbage, TCP alone is useless. If your website returns 200 OK but your login JSON is malformed, ping and TCP both stay green. If your cron job never runs, no pull-based HTTP check will notice. If a gRPC server is reachable but reports NOT_SERVING, a generic port check misses the incident entirely.
The right question is not “Do we have monitoring?” The right question is: Which class of failure are we trying to detect, and which check is designed to catch it?
| Check Type | Best For | Detects | Misses |
|---|---|---|---|
| HTTP / HTTPS | Websites, APIs, web apps | Status codes, latency, body validation, JSON correctness | Low-level network issues hidden behind a green HTTP response |
| TCP Port | Ports and services | Open port, connection acceptance, connect latency | Application correctness after connect |
| Ping / ICMP | Basic reachability | Host reachability, packet loss, RTT | App-layer failures |
| DNS | Domains and failover records | Resolution success, unexpected IPs | Whether the app behind the IP actually works |
| SSL / TLS | HTTPS and secure endpoints | Handshake success, cert validity, expiry warnings | Business logic failures |
| Database | PostgreSQL, Redis | Connectivity, auth, query execution | HTTP/frontend issues |
| SMTP | Email delivery infrastructure | Banner, EHLO, STARTTLS support, auth readiness | Mailbox-level deliverability problems |
| WebSocket | Realtime apps | Handshake, message send/receive, expected response text | Non-realtime paths |
| gRPC Health | gRPC microservices | grpc.health.v1.Health serving state | Endpoints without health implementation |
| Heartbeat / Push | Cron jobs, queues, workers | Missing scheduled execution, delayed jobs | Request/response path issues |
| Network Quality | Latency-sensitive workloads | Jitter, packet loss, RTT variance | Application correctness |
For most software companies, HTTP monitoring is the highest-value check type because it maps most closely to the customer experience. If your frontend, REST API, admin portal, public status page, or webhook endpoint is delivered over HTTP, this is where your monitoring strategy should start.
A mature HTTP check is much more than “does the server return 200.” Modern HTTP checks should support:
GET, POST, PUT, and DELETEnot_contains for error banners or maintenance pagesHTTP status alone is a weak signal. Many production failures still return 200 OK:
That is why keyword checks, regex rules, JSON path validation, and JSON schema validation are so valuable. They catch the incidents users actually notice.
Example HTTP monitor strategy:
- Method: GET
- URL: https://api.example.com/v1/orders
- Header: Authorization: Bearer <token>
- Expected status: 200
- Validation mode: schema + path checks
- Path checks: $.data.length > 0, $.meta.request_id exists
- Latency threshold: 800ms
TCP checks answer a lower-level question than HTTP: can a client open a socket to this host and port? That is perfect when the protocol itself is not HTTP or when you want to isolate a network problem from an application problem.
TCP monitors are ideal for:
If the port is closed, refusing connections, or timing out, you know the problem is likely below the application layer. TCP connect timing also gives you a simple but useful latency signal.
Ping is the oldest check in the book, and it still has value. It tells you whether a host is reachable and gives you round-trip latency and packet loss. For network operations teams, infrastructure engineers, and hybrid-cloud deployments, that baseline signal is still useful.
Good use cases include:
Ping checks typically measure multiple packets, calculate average RTT, and track loss percentage. That gives you a better signal than a single ping.
But ping has limits. Many hosts block ICMP by policy. And even when ICMP is allowed, a ping-successful host can still have a dead web app, a broken DB, or an overloaded runtime.
DNS is a surprisingly common source of outages, especially during cutovers, failovers, CDN migrations, and certificate rotations. DNS monitors verify that a domain resolves and, when needed, that it resolves to the expected IP addresses.
DNS checks are useful when:
One of the easiest mistakes in infrastructure is assuming “the app is down” when the real problem is “the domain now points somewhere else.” DNS checks prevent that confusion.
Certificate-expiry outages are painful because they are almost entirely preventable. SSL/TLS checks exist to make sure you never learn about an expired cert from your customers.
A good SSL check validates:
SSL checks should be attached to every public HTTPS service: apps, APIs, status pages, admin portals, gRPC TLS endpoints, and WebSocket endpoints using wss://.
They are especially important in environments with custom domains, automated certificate renewals, and multi-tenant infrastructure. Renewal jobs fail more often than teams expect.
When a database is unavailable, the app usually follows shortly after. But sometimes the app still serves cached pages or stale responses, masking the real issue. Database-specific checks let you monitor the dependency directly.
For production teams, database checks are invaluable because they can validate not only that the socket opens, but also that authentication works and a test query succeeds. That is a much better signal than a bare TCP connect.
Common use cases:
SELECT 1Email is still business-critical for password resets, verification flows, receipts, billing alerts, and incident notifications. SMTP monitors verify that the mail server can be reached, that the server banner is sane, that EHLO works, and that STARTTLS and authentication are available when required.
SMTP checks are excellent for:
SMTP checks are not the same as inbox placement or deliverability monitoring. They tell you whether the mail server path is healthy enough to accept traffic, not whether Gmail will place your message in Promotions.
Realtime systems break in ways HTTP checks cannot see. A WebSocket server might accept the initial upgrade, then fail to respond to subscription or heartbeat messages. Or it might accept some event types but not others.
That is why WebSocket monitoring should test more than the handshake. The most useful pattern is:
This is perfect for chat apps, trading feeds, collaborative tools, dashboards, and streaming updates. If your product experience depends on socket-based state sync, a plain HTTP monitor is only telling part of the story.
gRPC services should be monitored with a gRPC-aware check, not just TCP or generic HTTP. The standard approach is to call the grpc.health.v1.Health service and inspect the returned serving status.
This matters because a gRPC process can be reachable while still reporting:
NOT_SERVINGSERVICE_UNKNOWNGood gRPC health monitoring supports:
If your platform relies on internal gRPC services, this check type is often the fastest route to accurate incident detection.
Some systems are not meant to be polled. A nightly ETL pipeline, hourly billing reconciliation job, queue consumer, or backup worker may not expose a stable endpoint at all. For those workflows, heartbeat monitoring is the right pattern.
Instead of the monitor calling your service, your service calls the monitor. Each successful run pings a unique URL or tokenized endpoint. If the expected heartbeat does not arrive within the configured interval plus grace period, the check goes down.
Heartbeat checks are ideal for:
Some workloads are not merely availability-sensitive. They are quality-sensitive. Voice, streaming, low-latency gaming, remote desktop, live collaboration, trading systems, and edge workloads can all be technically “up” while still being unusable because network quality is poor.
Network quality checks go beyond reachability and measure:
These checks are especially useful for multi-region services, WAN links, and customer-facing products where stability matters as much as raw uptime.
The strongest monitoring setups use several complementary checks per critical service.
Run HTTP checks for pages and APIs, SSL checks for certificate expiry, and DNS checks for the primary domain. Add heartbeat checks for scheduled publishing or cache warmup jobs.
Use HTTP checks with response validation, latency thresholds, SSL checks, and heartbeat checks for background workers like webhooks, retry queues, and async processors.
Pair HTTP checks with database checks against PostgreSQL or Redis, plus TCP checks for dependent services if you want fast infrastructure-level visibility.
Use WebSocket checks, HTTP checks for auth/session APIs, and network quality checks from important regions. If the realtime backend is gRPC-based, add gRPC health checks too.
Monitor external entry points with HTTP, internal critical services with gRPC or TCP depending on protocol, dependencies with database checks, and all scheduled jobs with heartbeats.
| If you need to know... | Use this check | Add this companion check |
|---|---|---|
| Is my API returning correct JSON? | HTTP with JSON validation | SSL, heartbeat for workers |
| Is this host reachable? | Ping | TCP or HTTP |
| Is the service port open? | TCP | Protocol-aware check |
| Will my certificate expire soon? | SSL | HTTP or WebSocket |
| Did the cron job actually run? | Heartbeat | HTTP for user-visible output |
| Can the mail path accept traffic? | SMTP | HTTP for app workflows |
| Is my gRPC service serving? | gRPC health | TCP, SSL if public |
| Why are calls failing for one region only? | Network quality | HTTP from multiple regions |
Even though every protocol is different, the best monitoring platforms share several important capabilities across check types:
These features are often more important than the raw number of check types supported. A tool with ten check types but weak validation will still miss the incidents you care about.
Critical services almost always need at least two or three layers. For example: HTTP + SSL + heartbeat for async jobs.
Health endpoints are valuable, but they can drift away from reality. Monitor real paths that real users or systems depend on.
Many incidents begin in schedulers, queues, mail relays, or background workers before users notice symptoms elsewhere.
If you are not validating content, schema, or expected fields, you are only partially monitoring the service.
A service can be healthy from one geography and degraded from another. Single-location monitoring creates blind spots.
If you want a pragmatic, high-signal rollout, start here:
That stack covers the majority of real production incidents without overwhelming the team.
The best monitoring setups are not the ones with the most dashboards. They are the ones with the right checks for the right failure modes. Mix HTTP, TCP, DNS, SSL, database, SMTP, WebSocket, gRPC, heartbeat, and network-quality checks based on how your system actually breaks.
Start Monitoring for Free