Mastering Go profiling

What is pprof

pprof is Go’s built-in profiling tool that helps you understand where your program spends CPU time, how it uses memory, where goroutines block, and more.
It’s part of the standard library (runtime/pprof and net/http/pprof) and …


This content originally appeared on DEV Community and was authored by Vladyslav Platonov

What is pprof

pprof is Go's built-in profiling tool that helps you understand where your program spends CPU time, how it uses memory, where goroutines block, and more.
It's part of the standard library (runtime/pprof and net/http/pprof) and can be used both locally and in production via HTTP endpoints.

How it works under the hood

The Go runtime includes internal hooks and counters that collect low-level statistics:

  • CPU - samples stack traces at regular intervals (default 100 Hz).
  • Memory (heap) — records allocation data from the garbage collector.
  • Block / Mutex — tracks delays caused by synchronization (e.g., sync.Mutex, channel, select).
  • Goroutine — captures snapshots of all running goroutines and their stack traces.

pprof gathers this data and can:

  • Write it to files (pprof.WriteHeapProfile, pprof.StartCPUProfile);
  • Expose it via HTTP endpoints (net/http/pprof);
  • Export it in a format compatible with go tool -pprof, Speedscope, or Parca.

Main profile types

Type Focus When to use Note How to get
CPU Execution time High CPU load, slow processing Where CPU time is spent pprof.StartCPUProfile(file) or /debug/pprof/profile
Heap Memory usage Memory leaks, OOM, high RAM Memory allocations (live and temporary) pprof.WriteHeapProfile(file) or /debug/pprof/heap
Goroutine Concurrency snapshot Deadlocks, leaks, hanging requests Stack traces of all goroutines /debug/pprof/goroutine
Block Waiting time Latency, thread stalls Where goroutines are blocked /debug/pprof/block
Mutex Lock contention Poor scalability Where mutexes are most frequently held /debug/pprof/mutex
Allocs Allocation frequency GC pressure, short-lived allocations All memory allocations, including freed ones /debug/pprof/allocs

CPU Profile - where processing time goes

The Go runtime samples stack traces about 100 times per second during execution. This tells you which functions consume the most CPU time - i.e. where the CPU is actually being used.

When to use

  • The app is slow or CPU-bound;
  • You want to identify hot paths;
  • You're optimizing loops, parsing, serialization, or number crunching. #### Typical findings
  • Slow json.Marshal in loops;
  • Overuse of fmt.Sprintf;
  • Too many small allocations per iteration.

Command

go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

How to read

  • top shows the most expensive functions;
  • web (or Speedscope) visualizes a flamegraph (width = time).

Heap Profile - memory usage

Shows how much memory is allocated and where allocations happen. Data is collected from the garbage collector (GC).

When to use

  • Memory usage keeps growing;
  • There's a memory leak;
  • You need to know who allocates too often or too much. #### Typical findings
  • Temporary objects inside loops;
  • Unbounded caches or slices;
  • Unreleased references keeping memory alive.

Command

go tool pprof http://localhost:6060/debug/pprof/heap

How to read

  • AllocObjects / AllocSpace → total allocations;
  • InUseObjects / InUseSpace → currently live objects.

Use flags like --alloc_space or --inuse_space to switch views.

Goroutine Profile - snapshot of all goroutines

Captures stack traces of all goroutines at a single point in time.

When to use

  • The app freezes or stops responding;
  • Goroutine count keeps increasing;
  • You suspect a deadlock or goroutine leak. #### Typical findings
  • Goroutines stuck on <-chan with no receiver;
  • WaitGroup never reaches zero;
  • Infinite select {} without a default.

Command

curl http://localhost:6060/debug/pprof/goroutine?debug=2

How to read

You'll see text stack traces. Look for "sleep", "chan receive", "mutex", etc.
Usually it's easy to spot where execution is stuck.

Block Profile - where goroutines are waiting

Tracks how long goroutines are blocked, waiting on synchronization primitives (channels, mutexes, conditions).

When to use

  • The app hangs under low load;
  • High latency in simple operations;
  • You want to find wait points that slow down performance. #### Typical findings
  • Overloaded channels;
  • Shared data structures causing contention;
  • Backpressure in worker queues.

Enable in code

runtime.SetBlockProfileRate(1)

Command

go tool pprof http://localhost:6060/debug/pprof/block

Mutex Profile - lock contention

Records how long goroutines hold a mutex and how long others wait for it.

When to use

  • CPU usage is low but app is slow;
  • Throughput doesn't scale with concurrency;
  • You suspect shared locks or global bottlenecks. #### Typical findings
  • A global sync.Mutex in a hot path;
  • Shared maps without sharding;
  • Logging or metrics inside critical sections.

Enable in code

runtime.SetMutexProfileFraction(1)

Command

go tool pprof http://localhost:6060/debug/pprof/mutex

Allocs Profile - all allocations (including freed ones)

Shows all memory allocations, not just the ones still in use.
Useful to understand allocation rate and GC pressure.

When to use

  • The app allocates too frequently (high GC load);
  • You're optimizing short-lived, high-throughput operations. #### Typical findings
  • Repeated string concatenations (+, fmt.Sprintf);
  • Allocating new []byte on each request;
  • Inefficient append or map usage.

Command

go tool pprof http://localhost:6060/debug/pprof/allocs

Warning! Block and Mutex profiling should not be enabled permanently in production.

Use sampling values to reduce overhead, for example:

  • runtime.SetBlockProfileRate(10000)
  • runtime.SetMutexProfileFraction(100)

Using pprof via HTTP

  1. Enable profiling on a running service:

    import (
        "net/http"
        _ "net/http/pprof"
    )
    
    func main() {
        go func() {
            http.ListenAndServe("localhost:6060", nil)
        }()
    
        // your app logic
    }
    
  2. Open this following link in your browser:

    http://localhost:6060/debug/pprof/

  3. Run a go command to collect profile information

    go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
    

Using pprof directly in code

  1. Recording a CPU profile to a file

    f, _ := os.Create("cpu.pprof")
    pprof.StartCPUProfile(f)
    defer pprof.StopCPUProfile()
    // Run your workload
    workload()
    
  2. Analyze it with:

    go tool pprof cpu.pprof
    (pprof) top
    (pprof) web
    

Best practices for production

  1. Restrict access to /debug/pprof/ (e.g., Basic Auth, internal IPs, env flag).
  2. Don't run CPU profiling all the time - it adds ~5–10% overhead.
  3. CPU profiles should be at least 10–30 seconds
  4. Heap profiles are safe to collect in production.
  5. For visualization, use:
    • pprof -http — quick online inspection;
    • Speedscope — fast and intuitive;
    • Parca — continuous profiling.

Reading profiling results

What a pprof graph represents

pprof generates a call graph, where:

  • Node (box) - a function;
  • Edge (arrow) - a call from one function to another;
  • Node weight - how much CPU time or memory that function consumed;
  • Edge weight - how much cost was passed to the callee functions.

In short, the graph shows who calls whom and where time or memory is being spent.

Red flags to look for:

  • Wide bottom block → hottest function
  • runtime.mallocgc dominating → too many allocations
  • sync.(*Mutex).Lock high → contention
  • Many narrow repeated blocks → allocations inside a loop

Common visualization modes

top

Shows a summary table:

(pprof) top
Showing nodes accounting for 85.73%, 34.29s of 40.00s total
      flat  flat%   sum%        cum   cum%
     10.23s 25.6% 25.6%     18.92s 47.3%  main.work
      8.69s 21.7% 47.3%     10.35s 25.9%  processData
Columns meaning:
  • flat - time spent inside the function itself (excluding callees) cum (cumulative) - total time including callees
  • flat% / cum% - relative to total runtime ##### Highlights:
  • Large flat time - optimize that specific function.
  • Large cum time but small flat - the problem is in a callee.

list funcName

Shows annotated source code lines:

(pprof) list compute
Total: 40s
ROUTINE ======================== compute in main.go
     10.00s     15.00s (flat, cum) 37.50% of Total
Highlights:
  • You can see which specific lines consume CPU or allocate memory — ideal for micro-optimizations.

web (or go tool pprof -http=:6060)

Opens an interactive call graph in your browser.

Color & size meaning:
  • Red - functions consuming the most resources
  • Yellow - medium cost
  • Green - minor impact
  • Box width - time or memory weight
  • Arrow - function call relationship ##### Highlights:
  • The wider and redder the box, the hotter the function.

Flamegraph

A flamegraph is the most intuitive format.
Example:

main
 └── handleRequest
      ├── parseJSON
      └── processData
           ├── validate
           └── saveToDB
Each rectangle = a function:
  • X-axis = total time (width = cost)
  • Y-axis = call stack depth
  • Bottom → top = call chain (from main to leaf functions) ##### Highlights:
  • If parseJSON is wide - JSON encoding is CPU-heavy.
  • If saveToDB dominates - DB operations are the bottleneck. ##### Example interpretation For example, your flamegraph shows:
main → handle → json.Marshal → reflect.Value.Interface

and reflect.Value.Interface takes 40% of CPU time.
That's a clear indicator of slow reflection-based serialization - replace with faster encoder like jsoniter or easyjson.

Practical reading tips

  1. Start with the widest blocks at the bottom — they consume the most time or memory.
  2. Ignore runtime internals (runtime.*, syscall.*) unless something abnormal shows up.
  3. Look for repeating narrow peaks — they often mean inefficient work inside loops.
  4. If CPU looks fine but app hangs, check block or mutex profiles — likely synchronization issues, not CPU load.
  5. Compare before/after

    go tool pprof -diff_base heap1.pprof heap2.pprof
    

Heap vs Allocs

Profile Shows Common misunderstanding
heap (inuse) live objects currently in memory "memory leak" — but may just be a cache or buffer
allocs all allocations (even freed) engineers think "growth = leak" — it's not

Visualization tools comparison

Tool Format Best for
CLI (top, list) Text Quick inspection, remote servers
Web UI (pprof -http) Interactive graph Exploring call hierarchy
Speedscope Visual Immediate hotspot identification
Parca Continuous profiling Real-time production monitoring

Conclusion

Profiling is one of the most valuable tools for diagnosing performance issues in Go applications, and pprof provides everything you need - from understanding CPU hotspots to uncovering memory leaks, goroutine leaks, synchronization bottlenecks, and inefficient allocation patterns.

The key to using pprof effectively is knowing which profile to capture, how to interpret what you see, and how to compare snapshots over time. CPU profiles reveal hot paths, heap profiles uncover leaks or excessive allocations, goroutine dumps expose deadlocks or runaway concurrency, while block and mutex profiles highlight contention that's invisible to standard metrics.

Most importantly:

  • Always start from the widest blocks in flamegraphs.
  • Use diffing to compare “before/after” optimizations.
  • Enable advanced profiles (block/mutex) only when needed.
  • Treat pprof as part of your standard debugging workflow - not as a last resort.

Mastering pprof turns performance debugging from guesswork into a repeatable, data-driven process. Once your team gets comfortable reading profiles, performance problems that used to take days can be solved in minutes.

Measure, don't guess - profile first.


This content originally appeared on DEV Community and was authored by Vladyslav Platonov


Print Share Comment Cite Upload Translate Updates
APA

Vladyslav Platonov | Sciencx (2025-11-17T11:29:38+00:00) Mastering Go profiling. Retrieved from https://www.scien.cx/2025/11/17/mastering-go-profiling/

MLA
" » Mastering Go profiling." Vladyslav Platonov | Sciencx - Monday November 17, 2025, https://www.scien.cx/2025/11/17/mastering-go-profiling/
HARVARD
Vladyslav Platonov | Sciencx Monday November 17, 2025 » Mastering Go profiling., viewed ,<https://www.scien.cx/2025/11/17/mastering-go-profiling/>
VANCOUVER
Vladyslav Platonov | Sciencx - » Mastering Go profiling. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/11/17/mastering-go-profiling/
CHICAGO
" » Mastering Go profiling." Vladyslav Platonov | Sciencx - Accessed . https://www.scien.cx/2025/11/17/mastering-go-profiling/
IEEE
" » Mastering Go profiling." Vladyslav Platonov | Sciencx [Online]. Available: https://www.scien.cx/2025/11/17/mastering-go-profiling/. [Accessed: ]
rf:citation
» Mastering Go profiling | Vladyslav Platonov | Sciencx | https://www.scien.cx/2025/11/17/mastering-go-profiling/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.