Mastering the Java Memory Model in 2025: Happens-Before, Visibility, and Atomicity

Table of Contents

In the landscape of modern software development, specifically with the widespread adoption of JDK 21 and the revolutionary Virtual Threads (Project Loom), concurrency is no longer an advanced topic reserved for high-frequency trading engines. It is the default state of enterprise Java applications.

However, writing multi-threaded code that compiles is easy; writing multi-threaded code that remains correct under high load on modern multi-core architectures is significantly harder. The root of most elusive “Heisenbugs” in production lies in a misunderstanding of the Java Memory Model (JMM).

If you have ever encountered a variable update that one thread sees but another ignores, or a singleton that initializes partially, you have battled the JMM. In this deep dive, we will demystify the core pillars of concurrency: Atomicity, Visibility, and Ordering, and explore the “Happens-Before” relationship that governs them.

Prerequisites and Environment
#

To follow this guide and run the examples, ensure you have the following setup. We are focusing on modern Java development practices.

Java Development Kit (JDK): Version 21 (LTS) or higher.
IDE: IntelliJ IDEA 2024.x or Eclipse 2024-xxx.
Build Tool: Maven 3.9+ or Gradle 8.5+.
Hardware: A multi-core processor (critical for reproducing concurrency issues).

Maven Dependency
#

For most examples, standard JDK libraries suffice. However, for benchmarking (later in the article), we recommend JMH.

<dependencies>
    <!-- JMH for Microbenchmarking -->
    <dependency>
        <groupId>org.openjdk.jmh</groupId>
        <artifactId>jmh-core</artifactId>
        <version>1.37</version>
    </dependency>
    <dependency>
        <groupId>org.openjdk.jmh</groupId>
        <artifactId>jmh-generator-annprocess</artifactId>
        <version>1.37</version>
    </dependency>
</dependencies>

1. The Hardware Reality vs. Java Memory Model
#

To understand the JMM, you must first understand the hardware. Modern CPUs are incredibly fast, while main memory (RAM) is relatively slow. To bridge this gap, CPUs use a hierarchy of caches (L1, L2, L3).

When a thread running on Core A writes to a variable, it often writes to its L1 Cache or a Store Buffer, not immediately to Main Memory. If a thread on Core B reads that same variable, it might read a stale value from its own cache or Main Memory, missing Core A’s update.

The JMM is an abstract specification that bridges the gap between this complex hardware architecture and the Java bytecode, guaranteeing certain behaviors if (and only if) you follow the rules.

JMM Abstract Architecture
#

Here is how the JMM abstracts the underlying hardware interaction:

flowchart TD subgraph ThreadA ["Thread A (Core 1)"] stackA["Thread Stack A"] localA["Local Variables"] cacheA["Working Memory / Cache"] end subgraph ThreadB ["Thread B (Core 2)"] stackB["Thread Stack B"] localB["Local Variables"] cacheB["Working Memory / Cache"] end subgraph MainMemory ["Main Memory (RAM)"] heap["Shared Heap Memory"] obj["Object Instance<br/>(field x = 0)"] end localA -->|Read/Write| cacheA localB -->|Read/Write| cacheB cacheA <-->|"Save / Load Operation"| heap cacheB <-->|"Save / Load Operation"| heap style heap fill:#f9f,stroke:#333,stroke-width:2px style cacheA fill:#bbf,stroke:#333,stroke-width:2px style cacheB fill:#bbf,stroke:#333,stroke-width:2px

Key Takeaway: Every thread has its own “Working Memory” (an abstraction of caches and registers). JMM dictates when data must be flushed from Working Memory to Main Memory.

2. The Visibility Problem
#

Visibility refers to whether the changes made by one thread to shared data are visible to other threads. Without explicit synchronization, the JMM does not guarantee when a write by one thread becomes visible to others.

The “Stuck Loop” Trap
#

Let’s look at a classic example where a thread gets stuck in an infinite loop because it never sees a boolean flag change.

package com.javadevpro.jmm;

import java.util.concurrent.TimeUnit;

public class VisibilityProblem {

    // Without 'volatile', changes to this variable might not be visible
    private static boolean running = true;

    public static void main(String[] args) throws InterruptedException {
        Thread workerThread = new Thread(() -> {
            System.out.println("Worker started.");
            // This loop might be optimized by the JIT compiler to:
            // while (true) {} 
            // if the compiler assumes 'running' never changes within this context.
            while (running) {
                // Do some work...
                // NOTE: If you put System.out.println here, it might work
                // because synchronized blocks (inside println) cause memory flushes.
            }
            System.out.println("Worker stopped.");
        });

        workerThread.start();

        // Let the worker run for a second
        TimeUnit.SECONDS.sleep(1);

        System.out.println("Main thread setting running = false");
        running = false; // Write to Main Memory (conceptually)
        
        System.out.println("Main thread finished.");
    }
}

What happens? On most modern JVMs (Server mode), the workerThread will likely run forever, even after the main thread prints “Main thread finished.” The workerThread has cached running = true in its working memory (CPU register/cache) and never checks main memory again.

The Solution: The `volatile` Keyword
#

To fix this, we declare the variable as volatile.

private static volatile boolean running = true;

How volatile works:

Read Visibility: Every read of a volatile variable will be read from main memory (bypassing local cache).
Write Visibility: Every write to a volatile variable will be written immediately to main memory.
Atomicity Note: volatile guarantees visibility, but not atomicity for compound actions (like i++).

3. The Atomicity Problem
#

Atomicity deals with “all or nothing” operations. An operation is atomic if it cannot be interrupted or observed in an incomplete state.

A common misconception is that simple math operations are atomic. They are not.

The Race Condition Example
#

Consider a simple counter increment: count++. This looks like one instruction, but at the bytecode level, it is three:

Load count from memory to register.
Increment the value in the register.
Store the new value back to memory.

If two threads execute this simultaneously, they can interleave, leading to lost updates.

package com.javadevpro.jmm;

import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicInteger;

public class AtomicityDemo {

    private int count = 0;
    // Solution: private AtomicInteger atomicCount = new AtomicInteger(0);

    public void increment() {
        count++; 
        // atomicCount.incrementAndGet();
    }

    public int getCount() {
        return count;
    }

    public static void main(String[] args) throws InterruptedException {
        AtomicityDemo demo = new AtomicityDemo();
        int numberOfThreads = 10;
        int incrementsPerThread = 10_000;
        
        CountDownLatch latch = new CountDownLatch(numberOfThreads);

        for (int i = 0; i < numberOfThreads; i++) {
            new Thread(() -> {
                for (int j = 0; j < incrementsPerThread; j++) {
                    demo.increment();
                }
                latch.countDown();
            }).start();
        }

        latch.await();
        
        System.out.println("Expected count: " + (numberOfThreads * incrementsPerThread));
        System.out.println("Actual count:   " + demo.getCount());
    }
}

Result: You will likely see an output like Actual count: 98452 instead of 100000.

Solutions for Atomicity
#

synchronized keyword: Locks the method or block, ensuring only one thread enters at a time.
java.util.concurrent.atomic classes: Uses CAS (Compare-And-Swap) hardware instructions for non-blocking atomicity.
ReentrantLock: Explicit locking.

4. Ordering and Instruction Reordering
#

This is the most subtle part of the JMM. To optimize performance, both the Compiler (JIT) and the Processor are allowed to reorder instructions, as long as the semantics for the current thread remain unchanged (As-If-Serial semantics).

However, in a multi-threaded environment, this reordering can break logic.

The Double-Checked Locking Anti-Pattern (and Fix)
#

The most famous example of ordering issues is the Singleton pattern.

public class Singleton {
    private static Singleton instance;

    public static Singleton getInstance() {
        if (instance == null) { // 1. First check
            synchronized (Singleton.class) {
                if (instance == null) { // 2. Second check
                    instance = new Singleton(); // 3. Critical line
                }
            }
        }
        return instance;
    }
}

The Problem: The line instance = new Singleton(); involves three steps:

Allocate memory for the object.
Initialize the object (call constructor).
Assign the memory reference to the instance variable.

The JMM allows the processor to reorder steps 2 and 3.

Allocate memory.
Assign reference (instance is now NOT null, but points to an uninitialized object).
Initialize object.

If Thread A is at step 2, and Thread B enters getInstance(), the first check if (instance == null) returns false. Thread B returns a partially constructed object, leading to a crash.

The Fix: Declare instance as volatile.

private static volatile Singleton instance;

volatile prevents instruction reordering across the write variable (acts as a memory barrier).

5. The Happens-Before Relationship
#

The JMM defines a partial ordering called Happens-Before. If Action A happens-before Action B, then the results of Action A are guaranteed to be visible to Action B, and Action A creates an ordering constraint.

If there is no happens-before relationship between two operations, the JVM is free to reorder them or cache them arbitrarily.

Key Happens-Before Rules
#

Program Order Rule: Each action in a single thread happens-before every subsequent action in that same thread.
Monitor Lock Rule: An unlock on a monitor (exiting synchronized) happens-before every subsequent lock on that same monitor.
Volatile Variable Rule: A write to a volatile field happens-before every subsequent read of that same field.
Thread Start Rule: A call to Thread.start() happens-before any action in the started thread.
Thread Join Rule: All actions in a thread happen-before any other thread successfully returns from a thread.join() on that thread.
Transitivity: If A happens-before B, and B happens-before C, then A happens-before C.

Visualizing Happens-Before
#

sequenceDiagram participant Thread A participant Shared Memory participant Thread B Note over Thread A: Write x = 1 Note over Thread A: Write volatile v = true Thread A->>Shared Memory: Flush [x=1, v=true] Note right of Shared Memory: "Volatile Write" acts as a Fence Shared Memory-->>Thread B: Load [v=true] Note over Thread B: Read volatile v Note right of Thread B: "Volatile Read" implies visibility of x Note over Thread B: Read x (Guaranteed to be 1)

By leveraging the Transitivity rule, if Thread A writes x=1 (non-volatile) then writes v=true (volatile), and Thread B reads v=true then reads x, Thread B is guaranteed to see x=1. This is often called “piggybacking” on synchronization.

6. Comparison of Synchronization Mechanisms
#

Choosing the right tool is essential for performance. Here is a comparison of common mechanisms in Java 21.

Feature	`volatile`	`synchronized`	`ReentrantLock`	`AtomicInteger` (CAS)	`VarHandle`
Visibility	Yes	Yes	Yes	Yes	Yes
Atomicity	No (only for r/w)	Yes	Yes	Yes	Yes
Ordering	Yes (limited)	Yes	Yes	Yes	Yes (fine-grained)
Performance Overhead	Low	Medium/High	Medium	Low/Medium	Very Low
Use Case	Flags, State markers	Critical sections	Complex locking, timeouts	Counters, accumulators	Library authors

7. Performance Pitfall: False Sharing
#

While understanding JMM correctness is step one, performance is step two. A common JMM-related performance killer is False Sharing.

CPUs cache data in “Cache Lines” (typically 64 bytes). If two independent volatile variables (VolatileLong a, VolatileLong b) sit next to each other in memory, they might occupy the same cache line.

If Thread A updates a and Thread B updates b, the CPU cores will invalidate the entire cache line for each other, causing “cache thrashing” (ping-ponging), even though the threads aren’t logically sharing data.

Code Example: Preventing False Sharing
#

In Java 8+, we can use the @Contended annotation (requires -XX:-RestrictContended JVM flag) or manual padding.

import jdk.internal.vm.annotation.Contended;

public class FalseSharingDemo {
    
    // Without padding, these might share a cache line
    public static class DataContainer {
        public volatile long x = 0L;
        public volatile long y = 0L;
    }

    // With padding/Contended
    public static class PaddedDataContainer {
        @Contended
        public volatile long x = 0L;
        
        @Contended
        public volatile long y = 0L;
    }
}

Note: @Contended is in an internal package. In standard application code, manually adding 7 long variables between volatile fields (padding) is a common “hack” used by high-performance libraries like LMAX Disruptor.

8. Best Practices for 2025
#

Prefer High-Level Concurrency Utilities: Avoid using volatile and synchronized directly if possible. Use java.util.concurrent classes (e.g., ConcurrentHashMap, CompletableFuture, ExecutorService). They encapsulate the complex JMM rules correctly.
Immutability is King: Immutable objects (all fields final) are inherently thread-safe. Once the constructor completes, the JMM guarantees visibility to all threads without synchronization.
Final Fields: Use final for fields that don’t change. The JMM provides specific initialization safety guarantees for final fields that prevent partially constructed objects from being seen.
Avoid Premature Optimization: Don’t use VarHandle or try to avoid synchronized unless profiling proves it’s a bottleneck. Modern JVM locking is highly optimized (biased locking, lock coarsening).

Conclusion
#

The Java Memory Model is the contract that keeps the chaos of hardware architecture in check. By understanding Atomicity (compound actions), Visibility (memory flushes), and Ordering (happens-before relationships), you can write robust multi-threaded applications.

In the era of Java 21, while Virtual Threads make concurrency cheaper to use, the rules of shared memory remain exactly the same. A race condition in a Virtual Thread is just as dangerous as one in a Platform Thread.

Key Takeaway:

If Thread A writes to a variable and Thread B reads it, you must establish a happens-before relationship between the write and the read. Without it, you are coding on hope, not engineering.

Prerequisites and Environment #

Maven Dependency #

1. The Hardware Reality vs. Java Memory Model #

JMM Abstract Architecture #

2. The Visibility Problem #

The “Stuck Loop” Trap #

The Solution: The volatile Keyword #

3. The Atomicity Problem #

The Race Condition Example #

Solutions for Atomicity #

4. Ordering and Instruction Reordering #

The Double-Checked Locking Anti-Pattern (and Fix) #

5. The Happens-Before Relationship #

Key Happens-Before Rules #

Visualizing Happens-Before #

6. Comparison of Synchronization Mechanisms #

7. Performance Pitfall: False Sharing #

Code Example: Preventing False Sharing #

8. Best Practices for 2025 #

Conclusion #

Further Reading #

Related Articles