๐Ÿ” Observability Practices: A Practical Guide With Real-World Examples

Modern software systems are more distributed, dynamic, and complex than ever. Microservices, serverless functions, containers, and event-driven architectures make traditional monitoring insufficient.
To understand what is actually happening inside your…


This content originally appeared on DEV Community and was authored by CESAR NIKOLAS CAMAC MELENDEZ

Modern software systems are more distributed, dynamic, and complex than ever. Microservices, serverless functions, containers, and event-driven architectures make traditional monitoring insufficient.
To understand what is actually happening inside your application, you need observability.

This article explains the foundational pillars of observability, best practices, common tools, and includes a hands-on real-world example using Grafana + Prometheus for metrics and ELK Stack for logs.

๐Ÿšฆ What Is Observability?

Observability is the ability to understand the internal state of a system based solely on the data it produces, such as metrics, logs, and traces.

Unlike classical monitoring, which answers โ€œIs the system up?โ€, observability answers:

  • Why is the system slow?
  • Where is the latency coming from?
  • Which dependency failed?
  • What changed recently that caused errors?

๐Ÿ“Š The Three Pillars of Observability

1. Metrics

Numeric measurements collected over time.
Examples:

  • CPU usage
  • Request throughput
  • Error rate
  • Database latency

Tools: Prometheus, Grafana, Datadog Metrics, AWS CloudWatch Metrics.

2. Logs

Immutable records about events happening in the system.
Examples:

  • Info logs
  • Warnings
  • Errors
  • Audit logs

Tools: ELK Stack, Datadog Logs, Azure Log Analytics, Loki.

3. Traces

A trace follows a single request across distributed components.
Examples:

  • Microservices call chain
  • Latency across services
  • Errors from downstream dependencies

Tools: Jaeger, Zipkin, OpenTelemetry, AWS X-Ray, New Relic.

๐Ÿงฑ Core Observability Practices

โœ”๏ธ 1. Use Structured Logging

Instead of plain text logs, use JSON.

{
  "timestamp": "2025-11-23T10:15:20Z",
  "level": "ERROR",
  "service": "orders-api",
  "message": "Payment gateway timeout",
  "orderId": 10422,
  "durationMs": 3200
}

Structured logs allow better filtering and search on platforms like ELK, Datadog, and CloudWatch.

โœ”๏ธ 2. Define Standard Business Metrics

Examples:

  • orders_created_total
  • failed_logins_total
  • payment_latency_seconds

Business KPIs help identify anomalies beyond pure infrastructure.

โœ”๏ธ 3. Trace End-to-End Requests

Use OpenTelemetry to instrument services.
Traces give visibility across microservices.

โœ”๏ธ 4. Set SLOs and Alert Policies

For example:

  • SLO: 99% of HTTP requests < 150ms
  • Alert: More than 5% errors in 5 minutes

Good alerts are meaningful, not noisy.

โœ”๏ธ 5. Centralize All Telemetry Data

Use one platform for metrics + logs + traces.
Examples:

  • Datadog
  • New Relic
  • Grafana Cloud
  • AWS CloudWatch

๐Ÿ› ๏ธ Real-World Example: Observability With Prometheus + Grafana + ELK Stack

Below is a complete example using a simple Node.js API instrumented with Prometheus metrics and ELK logging.

๐Ÿ“Œ Example Application (Node.js)

We will expose:

  • /metrics endpoint for Prometheus scraping
  • Structured logs sent to Logstash (ELK stack)
  • A simple API endpoint

๐Ÿ”ง Step 1 โ€” Install Dependencies

npm install express prom-client winston winston-elasticsearch

๐Ÿงช Step 2 โ€” Node.js Code With Observability

// app.js
const express = require("express");
const client = require("prom-client");
const winston = require("winston");
const { ElasticsearchTransport } = require("winston-elasticsearch");

const app = express();
const register = new client.Registry();

// ----- PROMETHEUS METRICS -----
const httpRequestCounter = new client.Counter({
  name: "http_requests_total",
  help: "Total HTTP requests",
  labelNames: ["method", "route", "status"]
});
register.registerMetric(httpRequestCounter);

const httpRequestDuration = new client.Histogram({
  name: "http_request_duration_seconds",
  help: "HTTP request latency",
  buckets: [0.1, 0.3, 0.5, 1, 2] 
});
register.registerMetric(httpRequestDuration);

client.collectDefaultMetrics({ register });

// ----- ELK LOGGING -----
const esTransportOpts = {
  level: "info",
  clientOpts: { node: "http://localhost:9200" }
};

const logger = winston.createLogger({
  transports: [new ElasticsearchTransport(esTransportOpts)]
});

// ----- NORMAL API ROUTE -----
app.get("/api/orders", async (req, res) => {
  const end = httpRequestDuration.startTimer();

  logger.info({
    message: "Fetching orders",
    service: "orders-api",
    environment: "production",
    timestamp: new Date().toISOString()
  });

  res.json({ orderId: 123, status: "OK" });

  httpRequestCounter.inc({
    method: "GET",
    route: "/api/orders",
    status: 200
  });

  end();
});

// ----- METRICS ENDPOINT FOR PROMETHEUS -----
app.get("/metrics", async (req, res) => {
  res.set("Content-Type", register.contentType);
  res.send(await register.metrics());
});

// ----- START SERVER -----
app.listen(3000, () => console.log("API running on port 3000"));

๐Ÿ“Š Step 3 โ€” Prometheus Scrape Configuration

Add to prometheus.yml:

scrape_configs:
  - job_name: "orders-api"
    static_configs:
      - targets: ["localhost:3000"]

Prometheus now collects:

  • request count
  • request latency
  • default node metrics

๐Ÿ“ˆ Step 4 โ€” Grafana Dashboard

Grafana automatically detects metrics from Prometheus and you can plot:

  • Requests per second
  • Error rate
  • Latency percentiles (P95, P99)

Create a panel and use this query:

rate(http_requests_total[5m])

Another useful panel for latency:

histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

๐Ÿ“ Step 5 โ€” ELK Stack for Logs

Logs go to Elasticsearch, and you can search them with Kibana:

Example search query:

service: "orders-api" AND level: "info"

Example rendered log:

{
  "message": "Fetching orders",
  "service": "orders-api",
  "environment": "production",
  "timestamp": "2025-11-23T10:20:34.123Z"
}

๐Ÿš€ Final Thoughts

Modern systems demand more than simple health checksโ€”
they require full observability to identify, diagnose, and prevent problems in production.

By applying:

  • structured logs
  • custom metrics
  • distributed tracing
  • centralized dashboards
  • SLO-driven alerting

you significantly increase reliability, and most importantly, you gain the ability to understand your system deeply.


This content originally appeared on DEV Community and was authored by CESAR NIKOLAS CAMAC MELENDEZ


Print Share Comment Cite Upload Translate Updates
APA

CESAR NIKOLAS CAMAC MELENDEZ | Sciencx (2025-11-24T02:34:56+00:00) ๐Ÿ” Observability Practices: A Practical Guide With Real-World Examples. Retrieved from https://www.scien.cx/2025/11/24/%f0%9f%94%8d-observability-practices-a-practical-guide-with-real-world-examples/

MLA
" » ๐Ÿ” Observability Practices: A Practical Guide With Real-World Examples." CESAR NIKOLAS CAMAC MELENDEZ | Sciencx - Monday November 24, 2025, https://www.scien.cx/2025/11/24/%f0%9f%94%8d-observability-practices-a-practical-guide-with-real-world-examples/
HARVARD
CESAR NIKOLAS CAMAC MELENDEZ | Sciencx Monday November 24, 2025 » ๐Ÿ” Observability Practices: A Practical Guide With Real-World Examples., viewed ,<https://www.scien.cx/2025/11/24/%f0%9f%94%8d-observability-practices-a-practical-guide-with-real-world-examples/>
VANCOUVER
CESAR NIKOLAS CAMAC MELENDEZ | Sciencx - » ๐Ÿ” Observability Practices: A Practical Guide With Real-World Examples. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/11/24/%f0%9f%94%8d-observability-practices-a-practical-guide-with-real-world-examples/
CHICAGO
" » ๐Ÿ” Observability Practices: A Practical Guide With Real-World Examples." CESAR NIKOLAS CAMAC MELENDEZ | Sciencx - Accessed . https://www.scien.cx/2025/11/24/%f0%9f%94%8d-observability-practices-a-practical-guide-with-real-world-examples/
IEEE
" » ๐Ÿ” Observability Practices: A Practical Guide With Real-World Examples." CESAR NIKOLAS CAMAC MELENDEZ | Sciencx [Online]. Available: https://www.scien.cx/2025/11/24/%f0%9f%94%8d-observability-practices-a-practical-guide-with-real-world-examples/. [Accessed: ]
rf:citation
» ๐Ÿ” Observability Practices: A Practical Guide With Real-World Examples | CESAR NIKOLAS CAMAC MELENDEZ | Sciencx | https://www.scien.cx/2025/11/24/%f0%9f%94%8d-observability-practices-a-practical-guide-with-real-world-examples/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.