This content originally appeared on Level Up Coding - Medium and was authored by Wadix Technologies
Break, Remap, Debug: Using the FPB Unit on ARM Cortex M

1. Introduction
When you debug code that lives in Flash, “software” breakpoints aren’t always enough. The Flash Patch and Breakpoint (FPB) unit is a tiny hardware block in the ARM debug fabric that lets you do two powerful things without touching your binary: set true hardware breakpoints on instruction fetches, and — optionally — replace the fetched instruction with one you choose. In practice, that means you can halt exactly at a function in Flash, or even redirect the first instruction to a small veneer to tweak behavior for tests and hotfixes.
2. FPB Features
What is FPB?
The Flash Patch & Breakpoint (FPB) is a CoreSight debug component in ARMv7-M that watches instruction/literal fetches and can either raise a debug event or substitute the fetched word, enabling non-intrusive breakpoints and lightweight patches in code that runs from Flash.
2.1 Hardware breakpoints
The FPB’s comparators watch instruction fetches in the code region. When a comparator matches an address, the core raises a debug event — either halting in place (halt-mode) or entering the DebugMonitor exception — so you can stop in Flash without changing your binary. To work, you must set both enables: the global FP_CTRL.ENABLE and the per-comparator FP_COMP[n].ENABLE. Matches are at halfword granularity, so the address’s bit1 selects the lower/upper halfword you’re targeting. This is the simplest, most reliable way to break on code in Flash, and it’s what most debuggers use under the hood.

2.2 Flash Patch (remap) — lightweight instruction replacement
Flash Patch (remap) lets FPB swap one fetched instruction word with a word you pre-load in RAM. You point FP_REMAP at a 32-byte table (8×32-bit) in SRAM — it must be 32-byte aligned and lives in SRAM because hardware forces FP_REMAP[31:29]=0b001. Arm comparator n at the target address; when the CPU fetches there, FPB substitutes table[n] for the original 32-bit fetch. In Thumb, each 32-bit fetch contains two 16-bit halves: the lower halfword at address A and the upper at A+2. The REPLACE field chooses what you override: LOWER (just bits 15:0 at A), UPPER (bits 31:16 at A+2), or BOTH (all 32 bits — use this for 32-bit Thumb-2 ops like B.W). Typical uses are swapping a single instruction, tweaking a literal, or replacing the first instruction with a branch veneer in Flash (needed because a single B.W only reaches ±16 MB, so long jumps to RAM require a veneer

3. How to use FPB
3.1 Hardware breakpoint:
To halt when test() is fetched from Flash without touching the app, first enable CoreSight access (DEMCR.TRCENA=1) and turn on the FPB (FP_CTRL.ENABLE=1). Then program one comparator (e.g., FP_COMP[0]) with the first-instruction address of test(): clear the Thumb bit (bit0) but preserve bit1 so the comparator matches the correct halfword; set the comparator’s ENABLE bit. Issue DSB; ISB barriers, and call test(). On the next fetch, the FPB match raises a debug event — your debugger halts immediately (halt mode) or the DebugMonitor exception fires — giving you a true hardware breakpoint in Flash with zero code changes.
typedef struct
{
volatile uint32_t FP_CTRL, FP_REMAP, FP_COMP[8];
} FPB_Type;
#define FPB ((FPB_Type*)0xE0002000UL)
#define FPB_CTRL_ENABLE (1u<<0)
#define FPB_COMP_ENABLE (1u<<0)
void test(void)
{
/*debugger halts here*/
__NOP();
}
static inline void fpb_enable(void)
{
/* allow FPB/DWT/ITM */
CoreDebug->DEMCR |= CoreDebug_DEMCR_TRCENA_Msk;
FPB->FP_CTRL = FPB_CTRL_ENABLE; __DSB(); __ISB();
}
void demo_breakpoint(void)
{
fpb_enable();
uint32_t a = ((uint32_t)(uintptr_t)test) & ~1u;
FPB->FP_COMP[0] = (a & 0x1FFFFFFCu) | (a & 0x2u)
| FPB_COMP_ENABLE;
__DSB();
__ISB();
test();
}
int main(void)
{
HAL_Init();
SystemClock_Config();
demo_breakpoint();
while (1)
{
}
}
3.2 Flash patch (remap):
To change behavior at runtime without touching Flash, use FPB’s remap path: instead of executing the word fetched from Flash, the core can substitute a word you preload in a tiny table in SRAM. Point FP_REMAP at a 32-byte (8×32-bit) table that’s 32-byte aligned, then arm a comparator on the first instruction address of the function you want to intercept (clear the T-bit, but preserve bit1 so the correct halfword is selected).
For a 32-bit replacement (e.g., injecting a B.W), set REPLACE_BOTH; *after a DSB; ISB, the next fetch at that address will read your replacement word from the table, not from Flash. A practical pattern is to inject a 32-bit branch to a tiny veneer in Flash (keeps within the ±16 MB range of B.W), do your tweak there, then BX LR back.
/* 3.2 Flash patch (remap): replace first fetch of test() with a 32-bit B.W to a veneer */
#define FPB_COMP_REPLACE_LOWER (1u<<0)
#define FPB_COMP_REPLACE_UPPER (2u<<0)
#define FPB_COMP_REPLACE_BOTH (3u<<0)
__attribute__((aligned(32))) static uint32_t fpb_table[8];
__attribute__((noinline)) static void veneer_dec(void) {
/* example: adjust state, then return */
__NOP();
__asm volatile("bx lr");
}
/* Encode Thumb-2 unconditional B.W (T4) */
static inline uint32_t encode_bw(uint32_t pc, uint32_t target_t) {
uint32_t tgt = target_t | 1u; // ensure Thumb bit
int32_t imm = (int32_t)tgt - (int32_t)pc; // bytes
imm >>= 1; // halfword scale
uint32_t S=(imm>>20)&1u, imm10=(imm>>11)&0x3FFu, imm11=imm&0x7FFu;
uint16_t hi=(uint16_t)(0xF000 | (S<<10) | 0x0800 | imm10);
uint16_t lo=(uint16_t)(0xF800 | (S<<10) | imm11);
return ((uint32_t)hi<<16) | lo;
}
void demo_patch(void) {
/* 1) Enable CoreSight + FPB */
CoreDebug->DEMCR |= CoreDebug_DEMCR_TRCENA_Msk;
FPB->FP_CTRL = FPB_CTRL_ENABLE; __DSB(); __ISB();
/* 2) Program FP_REMAP to our 32-byte table in SRAM */
uint32_t base = ((uint32_t)(uintptr_t)fpb_table) & ~0x1Fu; // align
FPB->FP_REMAP = base;
/* 3) Build the replacement word: a 32-bit branch to a small veneer in FLASH */
uint32_t entry = ((uint32_t)(uintptr_t)test) & ~1u; // clear T-bit
uint32_t pc_for_b = entry + 4u; // PC during 32-bit fetch
uint32_t bw = encode_bw(pc_for_b, (uint32_t)(uintptr_t)veneer_dec);
/* 4) Write the same replacement into all 8 slots (keeps it simple) */
for (uint32_t s=0; s<8; ++s)
((volatile uint32_t*)(base + s*4u))[0] = bw;
__DSB(); __ISB();
/* 5) Arm comparator 0 on the exact entry halfword, REPLACE_BOTH for 32-bit */
FPB->FP_COMP[0] = (entry & 0x1FFFFFFCu) | (entry & 0x2u) // keep bit1 (halfword)
| FPB_COMP_REPLACE_BOTH | FPB_COMP_ENABLE;
__DSB();
__ISB();
/* Call: first fetch of test() is replaced by our B.W → veneer runs, then returns */
test();
}
4. Conclusion:
FPB shines when you need true hardware breakpoints in Flash without touching the binary and when you want fast, reversible hot patches — swap a single instruction, tweak a literal, or redirect a function prologue to a small veneer. It’s especially useful for production debugging and in-field diagnostics where re-flashing is risky.
If you enjoyed this article, keep learning with structured embedded systems courses at Wadix Technologies.
Checkout our embedded systems online courses today!
Break, Remap, Debug: Using the FPB Unit on ARM cortex M was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.
This content originally appeared on Level Up Coding - Medium and was authored by Wadix Technologies

Wadix Technologies | Sciencx (2025-08-25T02:24:37+00:00) Break, Remap, Debug: Using the FPB Unit on ARM cortex M. Retrieved from https://www.scien.cx/2025/08/25/break-remap-debug-using-the-fpb-unit-on-arm-cortex-m/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.