Introduction #
In the distributed architecture landscape of 2025, deploying a microservice without observability is akin to flying a plane blindfolded. When a request fails or latency spikes in a production environment, you cannot rely solely on grep-ing through gigabytes of scattered log files. You need a holistic view of your system’s health.
For Java developers, the ecosystem has matured significantly. Spring Boot 3.x has standardized observability through Micrometer and OpenTelemetry, replacing the older Spring Cloud Sleuth with a more robust, vendor-neutral approach.
This article is a deep dive into building a full-stack observability platform. We will move beyond “Hello World” and construct a production-ready monitoring pipeline that correlates Metrics (Trends), Traces (Context), and Logs (Details).
What You Will Build #
By the end of this guide, you will have a running ecosystem containing:
- A Spring Boot 3.4+ application generating custom metrics and traces.
- Prometheus for scraping and storing time-series metrics.
- Jaeger for distributed tracing and latency analysis.
- Grafana for visualizing the data and correlating traces with metrics via Exemplars.
1. Observability Architecture Overview #
Before writing code, we must understand the data flow. In modern Java applications, we use the Facade pattern provided by Micrometer to abstract the underlying monitoring systems.
Here is how the components interact in our target architecture:
The Data Flow:
- The App: Generates telemetry data.
- Micrometer Tracing: Bridges the app to the OpenTelemetry (OTel) tracer.
- Prometheus: Periodically “scrapes” (pulls) metrics from the app’s
/actuator/prometheusendpoint. - Jaeger: The app pushes trace spans to Jaeger (via Zipkin or OTel protocol).
- Grafana: Queries both Prometheus and Jaeger to display unified dashboards.
2. Prerequisites and Environment #
To follow this tutorial, ensure your development environment meets these standards:
- Java Development Kit (JDK): Version 21 LTS or higher.
- Build Tool: Maven 3.9+ or Gradle 8.5+.
- Containerization: Docker and Docker Compose (V2).
- IDE: IntelliJ IDEA (Ultimate or Community) or Eclipse.
We will use Docker Compose to spin up the infrastructure (Prometheus, Grafana, Jaeger) so you don’t need to install them locally on your OS.
3. Infrastructure Setup: Docker Compose #
We need a foundation before we write the Java code. Let’s create a docker-compose.yml file that orchestrates our observability backend.
Create a folder named observability-stack and create the following files inside it.
Step 3.1: The Docker Compose File #
# docker-compose.yml
version: '3.8'
services:
# 1. Jaeger - Distributed Tracing
jaeger:
image: jaegertracing/all-in-one:1.60
container_name: jaeger
ports:
- "16686:16686" # UI
- "14250:14250" # Model
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
- "9411:9411" # Zipkin Protocol
environment:
- COLLECTOR_ZIPKIN_HOST_PORT=:9411
networks:
- monitor-net
# 2. Prometheus - Metrics DB
prometheus:
image: prom/prometheus:v2.54.0
container_name: prometheus
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
ports:
- "9090:9090"
command:
- --config.file=/etc/prometheus/prometheus.yml
- --enable-feature=exemplar-storage # Crucial for linking Metrics to Traces
networks:
- monitor-net
# 3. Grafana - Visualization
grafana:
image: grafana/grafana:11.2.0
container_name: grafana
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
depends_on:
- prometheus
- jaeger
networks:
- monitor-net
networks:
monitor-net:
driver: bridgeStep 3.2: Prometheus Configuration #
Prometheus needs to know where to find our Java application. Create a prometheus.yml file in the same directory.
# prometheus.yml
global:
scrape_interval: 5s # Scrape frequently for this demo (default is usually 1m)
evaluation_interval: 5s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['prometheus:9090']
- job_name: 'spring-boot-app'
metrics_path: '/actuator/prometheus'
scrape_interval: 2s
static_configs:
# 'host.docker.internal' allows the container to talk to your host machine (where Java runs)
- targets: ['host.docker.internal:8080'] Note: If you are on Linux, host.docker.internal might not work by default. You may need to use the host’s IP address (e.g., 172.17.0.1) or run the Java app inside the Docker network.
Start the infrastructure:
docker-compose up -d4. Spring Boot Implementation #
Now, let’s build the application. Go to start.spring.io or use your IDE.
Project Metadata:
- Project: Maven
- Language: Java 21
- Spring Boot: 3.4.x
Step 4.1: Maven Dependencies #
This is the most critical part. Spring Boot 3 replaced Spring Cloud Sleuth with the Micrometer Tracing library.
Add the following to your pom.xml:
<dependencies>
<!-- Web Starter -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- Actuator: Exposes metrics endpoints -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<!-- Micrometer Prometheus: Formats metrics for Prometheus -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
<scope>runtime</scope>
</dependency>
<!-- TRACING DEPENDENCIES -->
<!-- Micrometer Tracing: The facade -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-tracing-bridge-otel</artifactId>
</dependency>
<!-- OTel Exporter: Pushes traces to Jaeger (via Zipkin protocol for simplicity) -->
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-exporter-zipkin</artifactId>
</dependency>
<!-- Lombok (Optional, for brevity) -->
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
</dependencies>Step 4.2: Application Configuration #
Configure Actuator to expose endpoints and set up trace sampling in src/main/resources/application.yml.
spring:
application:
name: java-devpro-order-service
threads:
virtual:
enabled: true # Let's use Java 21 Virtual Threads!
management:
tracing:
sampling:
probability: 1.0 # SAMPLE 100% of requests (For Development ONLY. Use 0.1 in Prod)
zipkin:
tracing:
endpoint: "http://localhost:9411/api/v2/spans" # Sending to Jaeger via Zipkin port
endpoints:
web:
exposure:
include: "health,info,prometheus,metrics"
metrics:
tags:
application: ${spring.application.name}
distribution:
percentiles-histogram:
http:
server:
requests: true # Important for generating histograms for Grafana HeatmapsStep 4.3: The Business Logic (with Simulated Latency) #
We will create a service that simulates a real-world scenario involving simulated database calls and processing delays. This makes the traces look interesting in Jaeger.
The Service Layer:
package com.javadevpro.observability.service;
import io.micrometer.observation.annotation.Observed;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Service;
import java.util.Random;
import java.util.concurrent.TimeUnit;
@Service
public class InventoryService {
private static final Logger log = LoggerFactory.getLogger(InventoryService.class);
private final Random random = new Random();
// @Observed creates a span for this method automatically
@Observed(name = "inventory.check", contextualName = "checking-inventory-db")
public boolean checkStock(String productId) {
log.info("Checking stock for product: {}", productId);
try {
// Simulate DB latency
TimeUnit.MILLISECONDS.sleep(random.nextInt(200) + 100);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
// Randomly simulate out of stock
return random.nextInt(10) > 1;
}
}The Controller Layer:
package com.javadevpro.observability.controller;
import com.javadevpro.observability.service.InventoryService;
import io.micrometer.core.annotation.Timed;
import lombok.RequiredArgsConstructor;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
@RestController
@RequestMapping("/api/orders")
@RequiredArgsConstructor
public class OrderController {
private static final Logger log = LoggerFactory.getLogger(OrderController.class);
private final InventoryService inventoryService;
@PostMapping("/{productId}")
// Custom Metric for this endpoint specifically
@Timed(value = "order.place.request", description: "Time taken to place an order")
public ResponseEntity<String> placeOrder(@PathVariable String productId) {
log.info("Received order request for product: {}", productId);
if (inventoryService.checkStock(productId)) {
log.info("Stock confirmed. Processing payment...");
simulatePayment();
return ResponseEntity.ok("Order Placed Successfully for " + productId);
} else {
log.warn("Stock missing for product: {}", productId);
return ResponseEntity.status(409).body("Out of Stock");
}
}
private void simulatePayment() {
try {
// Simulate payment gateway latency
Thread.sleep(300);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
@ExceptionHandler(Exception.class)
public ResponseEntity<String> handleError(Exception e) {
log.error("Internal error occurred", e);
return ResponseEntity.internalServerError().body("Error");
}
}Step 4.4: Main Application Class #
Ensure you have the ObservedAspect bean to make the @Observed annotation work.
package com.javadevpro.observability;
import io.micrometer.observation.ObservationRegistry;
import io.micrometer.observation.aop.ObservedAspect;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
@SpringBootApplication
public class ObservabilityApplication {
public static void main(String[] args) {
SpringApplication.run(ObservabilityApplication.class, args);
}
@Bean
public ObservedAspect observedAspect(ObservationRegistry observationRegistry) {
return new ObservedAspect(observationRegistry);
}
}5. Verifying the Stack #
Start your Spring Boot application. Once it is up, generate some traffic. You can use curl or a tool like Postman.
# Success case
curl -X POST http://localhost:8080/api/orders/laptop-x1
# Failure case (run a few times to trigger random out-of-stock)
curl -X POST http://localhost:8080/api/orders/phone-z25.1 Checking Prometheus #
Navigate to http://localhost:9090.
In the expression bar, type order_place_request_seconds_count (generated by our @Timed annotation). You should see the count increasing.
5.2 Checking Jaeger #
Navigate to http://localhost:16686.
- Select Service:
java-devpro-order-service. - Click Find Traces.
- Click on a trace.
You will see a waterfall diagram visualizing the lifecycle of the request:
http post /api/orders/{productId}(The Controller)checking-inventory-db(The Service method, thanks to@Observed)
This visualization immediately tells you where the bottleneck is (e.g., is payment taking longer than the DB check?).
6. Connecting the Dots: Grafana Visualization #
The real power comes when you visualize this data together.
Navigate to Grafana at http://localhost:3000 (Login: admin/admin).
Step 6.1: Add Data Sources #
- Prometheus:
- Configuration -> Data Sources -> Add data source -> Prometheus.
- URL:
http://prometheus:9090(Docker networking). - Click “Save & Test”.
- Jaeger:
- Add data source -> Jaeger.
- URL:
http://jaeger:16686. - Click “Save & Test”.
Step 6.2: Configure Exemplars (The “Holy Grail” Feature) #
Exemplars allow you to click on a dot in a metric graph (e.g., a spike in latency) and jump directly to the Trace ID associated with that specific request.
- Go back to the Prometheus Data Source settings in Grafana.
- Scroll to the “Exemplars” section.
- Turn on “Internal Link”.
- Enable “Data source” and select your Jaeger data source.
- Save.
Step 6.3: Import a Dashboard #
Don’t build from scratch. Import the standard JVM Micrometer dashboard.
- Dashboards -> Import.
- Enter ID
4701(JVM (Micrometer)). - Select your Prometheus data source.
- Import.
Now, create a New Dashboard for our custom metrics:
- Add visualization.
- Select Prometheus.
- Query:
histogram_quantile(0.95, sum(rate(order_place_request_seconds_bucket[1m])) by (le)) - This shows the 95th percentile latency of your order endpoint.
- If you configured Exemplars correctly, you will see diamonds on the graph line. Clicking one will reveal the
traceIDand a link to “Query with Jaeger”.
7. Comparison: The Evolution of Java Tracing #
Understanding the shift in tools is vital for migrating legacy apps.
| Feature | Spring Cloud Sleuth (Legacy) | Micrometer Tracing (Modern) |
|---|---|---|
| Boot Version | Spring Boot 2.x | Spring Boot 3.x |
| Core Library | Brave (usually) | Micrometer Observation API |
| Standard | Custom / Zipkin | OpenTelemetry (OTel) |
| Context Propagation | Internal headers | OTel W3C Trace Context |
| Config Difficulty | Low (Auto-config) | Medium (Requires bridge setup) |
| Vendor Lock-in | Higher | Minimal (Vendor agnostic) |
Why the shift? OpenTelemetry has won the “observability wars.” It provides a single standard for traces, metrics, and logs across all languages (Go, Python, Java), making polyglot microservices debugging significantly easier.
8. Best Practices & Common Pitfalls #
1. High Cardinality (The Metrics Killer) #
Do not use unbounded values as tags in Prometheus. Bad:
// DO NOT DO THIS
registry.counter("orders", "orderId", orderId).increment(); If you have 1 million unique orderIds, Prometheus will create 1 million time series and likely crash (or cost a fortune). Use Traces for high-cardinality data (like IDs), and Metrics for low-cardinality aggregates (like status=success, region=us-east).
2. Sampling Rates #
In the configuration above, we used probability: 1.0 (100%).
- Dev: 1.0 is fine.
- Prod: 1.0 is expensive (storage and network). Use 0.1 (10%) or investigate Tail Sampling (keeping only traces that contain errors or high latency).
3. Log Correlation #
Spring Boot 3 automatic log correlation adds traceId and spanId to your MDC (Mapped Diagnostic Context). Ensure your Logback/Log4j pattern includes them:
logging.pattern.level=%5p [${spring.application.name:},%X{traceId:-},%X{spanId:-}]This allows you to copy a Trace ID from Jaeger and grep it in your log files to see exactly what happened during that span.
Conclusion #
We have successfully moved from a black-box application to a fully observable system. By integrating Spring Boot 3, Prometheus, and Jaeger, you now have the three pillars of observability covered:
- Metrics tell you that something is wrong (High latency alert).
- Traces tell you where it is wrong (Inventory Service DB call).
- Logs tell you why it is wrong (Exception stack trace linked to the Trace ID).
As you prepare your applications for 2026 and beyond, mastering the OpenTelemetry ecosystem via Micrometer is no longer optional—it is a core competency for senior Java developers.
Further Reading #
- Micrometer Tracing Documentation
- OpenTelemetry Java Instrumentation
- Google SRE Book - Monitoring Distributed Systems
Found this article helpful? Subscribe to the Java DevPro newsletter for more deep dives into cloud-native Java architectures.