In the landscape of modern software development, specifically with the widespread adoption of JDK 21 and the revolutionary Virtual Threads (Project Loom), concurrency is no longer an advanced topic reserved for high-frequency trading engines. It is the default state of enterprise Java applications.
However, writing multi-threaded code that compiles is easy; writing multi-threaded code that remains correct under high load on modern multi-core architectures is significantly harder. The root of most elusive “Heisenbugs” in production lies in a misunderstanding of the Java Memory Model (JMM).
If you have ever encountered a variable update that one thread sees but another ignores, or a singleton that initializes partially, you have battled the JMM. In this deep dive, we will demystify the core pillars of concurrency: Atomicity, Visibility, and Ordering, and explore the “Happens-Before” relationship that governs them.
Prerequisites and Environment #
To follow this guide and run the examples, ensure you have the following setup. We are focusing on modern Java development practices.
- Java Development Kit (JDK): Version 21 (LTS) or higher.
- IDE: IntelliJ IDEA 2024.x or Eclipse 2024-xxx.
- Build Tool: Maven 3.9+ or Gradle 8.5+.
- Hardware: A multi-core processor (critical for reproducing concurrency issues).
Maven Dependency #
For most examples, standard JDK libraries suffice. However, for benchmarking (later in the article), we recommend JMH.
<dependencies>
<!-- JMH for Microbenchmarking -->
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>1.37</version>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>1.37</version>
</dependency>
</dependencies>1. The Hardware Reality vs. Java Memory Model #
To understand the JMM, you must first understand the hardware. Modern CPUs are incredibly fast, while main memory (RAM) is relatively slow. To bridge this gap, CPUs use a hierarchy of caches (L1, L2, L3).
When a thread running on Core A writes to a variable, it often writes to its L1 Cache or a Store Buffer, not immediately to Main Memory. If a thread on Core B reads that same variable, it might read a stale value from its own cache or Main Memory, missing Core A’s update.
The JMM is an abstract specification that bridges the gap between this complex hardware architecture and the Java bytecode, guaranteeing certain behaviors if (and only if) you follow the rules.
JMM Abstract Architecture #
Here is how the JMM abstracts the underlying hardware interaction:
Key Takeaway: Every thread has its own “Working Memory” (an abstraction of caches and registers). JMM dictates when data must be flushed from Working Memory to Main Memory.
2. The Visibility Problem #
Visibility refers to whether the changes made by one thread to shared data are visible to other threads. Without explicit synchronization, the JMM does not guarantee when a write by one thread becomes visible to others.
The “Stuck Loop” Trap #
Let’s look at a classic example where a thread gets stuck in an infinite loop because it never sees a boolean flag change.
package com.javadevpro.jmm;
import java.util.concurrent.TimeUnit;
public class VisibilityProblem {
// Without 'volatile', changes to this variable might not be visible
private static boolean running = true;
public static void main(String[] args) throws InterruptedException {
Thread workerThread = new Thread(() -> {
System.out.println("Worker started.");
// This loop might be optimized by the JIT compiler to:
// while (true) {}
// if the compiler assumes 'running' never changes within this context.
while (running) {
// Do some work...
// NOTE: If you put System.out.println here, it might work
// because synchronized blocks (inside println) cause memory flushes.
}
System.out.println("Worker stopped.");
});
workerThread.start();
// Let the worker run for a second
TimeUnit.SECONDS.sleep(1);
System.out.println("Main thread setting running = false");
running = false; // Write to Main Memory (conceptually)
System.out.println("Main thread finished.");
}
}What happens?
On most modern JVMs (Server mode), the workerThread will likely run forever, even after the main thread prints “Main thread finished.” The workerThread has cached running = true in its working memory (CPU register/cache) and never checks main memory again.
The Solution: The volatile Keyword
#
To fix this, we declare the variable as volatile.
private static volatile boolean running = true;How volatile works:
- Read Visibility: Every read of a volatile variable will be read from main memory (bypassing local cache).
- Write Visibility: Every write to a volatile variable will be written immediately to main memory.
- Atomicity Note:
volatileguarantees visibility, but not atomicity for compound actions (likei++).
3. The Atomicity Problem #
Atomicity deals with “all or nothing” operations. An operation is atomic if it cannot be interrupted or observed in an incomplete state.
A common misconception is that simple math operations are atomic. They are not.
The Race Condition Example #
Consider a simple counter increment: count++. This looks like one instruction, but at the bytecode level, it is three:
- Load
countfrom memory to register. - Increment the value in the register.
- Store the new value back to memory.
If two threads execute this simultaneously, they can interleave, leading to lost updates.
package com.javadevpro.jmm;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicInteger;
public class AtomicityDemo {
private int count = 0;
// Solution: private AtomicInteger atomicCount = new AtomicInteger(0);
public void increment() {
count++;
// atomicCount.incrementAndGet();
}
public int getCount() {
return count;
}
public static void main(String[] args) throws InterruptedException {
AtomicityDemo demo = new AtomicityDemo();
int numberOfThreads = 10;
int incrementsPerThread = 10_000;
CountDownLatch latch = new CountDownLatch(numberOfThreads);
for (int i = 0; i < numberOfThreads; i++) {
new Thread(() -> {
for (int j = 0; j < incrementsPerThread; j++) {
demo.increment();
}
latch.countDown();
}).start();
}
latch.await();
System.out.println("Expected count: " + (numberOfThreads * incrementsPerThread));
System.out.println("Actual count: " + demo.getCount());
}
}Result:
You will likely see an output like Actual count: 98452 instead of 100000.
Solutions for Atomicity #
synchronizedkeyword: Locks the method or block, ensuring only one thread enters at a time.java.util.concurrent.atomicclasses: Uses CAS (Compare-And-Swap) hardware instructions for non-blocking atomicity.ReentrantLock: Explicit locking.
4. Ordering and Instruction Reordering #
This is the most subtle part of the JMM. To optimize performance, both the Compiler (JIT) and the Processor are allowed to reorder instructions, as long as the semantics for the current thread remain unchanged (As-If-Serial semantics).
However, in a multi-threaded environment, this reordering can break logic.
The Double-Checked Locking Anti-Pattern (and Fix) #
The most famous example of ordering issues is the Singleton pattern.
public class Singleton {
private static Singleton instance;
public static Singleton getInstance() {
if (instance == null) { // 1. First check
synchronized (Singleton.class) {
if (instance == null) { // 2. Second check
instance = new Singleton(); // 3. Critical line
}
}
}
return instance;
}
}The Problem:
The line instance = new Singleton(); involves three steps:
- Allocate memory for the object.
- Initialize the object (call constructor).
- Assign the memory reference to the
instancevariable.
The JMM allows the processor to reorder steps 2 and 3.
- Allocate memory.
- Assign reference (instance is now NOT null, but points to an uninitialized object).
- Initialize object.
If Thread A is at step 2, and Thread B enters getInstance(), the first check if (instance == null) returns false. Thread B returns a partially constructed object, leading to a crash.
The Fix:
Declare instance as volatile.
private static volatile Singleton instance;volatile prevents instruction reordering across the write variable (acts as a memory barrier).
5. The Happens-Before Relationship #
The JMM defines a partial ordering called Happens-Before. If Action A happens-before Action B, then the results of Action A are guaranteed to be visible to Action B, and Action A creates an ordering constraint.
If there is no happens-before relationship between two operations, the JVM is free to reorder them or cache them arbitrarily.
Key Happens-Before Rules #
- Program Order Rule: Each action in a single thread happens-before every subsequent action in that same thread.
- Monitor Lock Rule: An unlock on a monitor (exiting
synchronized) happens-before every subsequent lock on that same monitor. - Volatile Variable Rule: A write to a
volatilefield happens-before every subsequent read of that same field. - Thread Start Rule: A call to
Thread.start()happens-before any action in the started thread. - Thread Join Rule: All actions in a thread happen-before any other thread successfully returns from a
thread.join()on that thread. - Transitivity: If A happens-before B, and B happens-before C, then A happens-before C.
Visualizing Happens-Before #
By leveraging the Transitivity rule, if Thread A writes x=1 (non-volatile) then writes v=true (volatile), and Thread B reads v=true then reads x, Thread B is guaranteed to see x=1. This is often called “piggybacking” on synchronization.
6. Comparison of Synchronization Mechanisms #
Choosing the right tool is essential for performance. Here is a comparison of common mechanisms in Java 21.
| Feature | volatile |
synchronized |
ReentrantLock |
AtomicInteger (CAS) |
VarHandle |
|---|---|---|---|---|---|
| Visibility | Yes | Yes | Yes | Yes | Yes |
| Atomicity | No (only for r/w) | Yes | Yes | Yes | Yes |
| Ordering | Yes (limited) | Yes | Yes | Yes | Yes (fine-grained) |
| Performance Overhead | Low | Medium/High | Medium | Low/Medium | Very Low |
| Use Case | Flags, State markers | Critical sections | Complex locking, timeouts | Counters, accumulators | Library authors |
7. Performance Pitfall: False Sharing #
While understanding JMM correctness is step one, performance is step two. A common JMM-related performance killer is False Sharing.
CPUs cache data in “Cache Lines” (typically 64 bytes). If two independent volatile variables (VolatileLong a, VolatileLong b) sit next to each other in memory, they might occupy the same cache line.
If Thread A updates a and Thread B updates b, the CPU cores will invalidate the entire cache line for each other, causing “cache thrashing” (ping-ponging), even though the threads aren’t logically sharing data.
Code Example: Preventing False Sharing #
In Java 8+, we can use the @Contended annotation (requires -XX:-RestrictContended JVM flag) or manual padding.
import jdk.internal.vm.annotation.Contended;
public class FalseSharingDemo {
// Without padding, these might share a cache line
public static class DataContainer {
public volatile long x = 0L;
public volatile long y = 0L;
}
// With padding/Contended
public static class PaddedDataContainer {
@Contended
public volatile long x = 0L;
@Contended
public volatile long y = 0L;
}
}Note: @Contended is in an internal package. In standard application code, manually adding 7 long variables between volatile fields (padding) is a common “hack” used by high-performance libraries like LMAX Disruptor.
8. Best Practices for 2025 #
- Prefer High-Level Concurrency Utilities:
Avoid using
volatileandsynchronizeddirectly if possible. Usejava.util.concurrentclasses (e.g.,ConcurrentHashMap,CompletableFuture,ExecutorService). They encapsulate the complex JMM rules correctly. - Immutability is King:
Immutable objects (all fields
final) are inherently thread-safe. Once the constructor completes, the JMM guarantees visibility to all threads without synchronization. - Final Fields:
Use
finalfor fields that don’t change. The JMM provides specific initialization safety guarantees for final fields that prevent partially constructed objects from being seen. - Avoid Premature Optimization:
Don’t use
VarHandleor try to avoidsynchronizedunless profiling proves it’s a bottleneck. Modern JVM locking is highly optimized (biased locking, lock coarsening).
Conclusion #
The Java Memory Model is the contract that keeps the chaos of hardware architecture in check. By understanding Atomicity (compound actions), Visibility (memory flushes), and Ordering (happens-before relationships), you can write robust multi-threaded applications.
In the era of Java 21, while Virtual Threads make concurrency cheaper to use, the rules of shared memory remain exactly the same. A race condition in a Virtual Thread is just as dangerous as one in a Platform Thread.
Key Takeaway:
If Thread A writes to a variable and Thread B reads it, you must establish a happens-before relationship between the write and the read. Without it, you are coding on hope, not engineering.
Further Reading #
- Java Concurrency in Practice by Brian Goetz (The Bible of Java Concurrency)
- JSR 133 (Java Memory Model) FAQ
- Oracle Documentation: The Java Virtual Machine Specification, Chapter 17