Debugging for fun and sanity


This post is simply my attempt to pair actual debugging tools to WHEN I personally choose to use them.

I think this mode of thinking is more useful than HOW to use them … though the how is still important.

Knowing that rr supports reverse-continue is useless if I reach for it every time Ir loop produces the wrong value. Knowing ASan exists is useless if I compile without -g and get a wall of addresses.

The tool matters less than matching it to the right class of problem.

So here is a breakdown.


First Instinct → printf

“The most effective debugging tool is still careful thought, coupled with judiciously placed print statements.”

  • Brian Kernighan

Before any tool gets opened, I reach for print statements. I don’t have some grand philisophical statement here. It’s just that print is fast, requires zero setup, and I am already “in” the code.

printf("value at step %d: %d\n", i, result);

Often this solves 80% of problems for me. I have a rough hypothesis, and I want a quick confirmation or denial before investing in a proper debugging session.

One thing I am careful of is debugging scope creep. I add one print, it’s not enough, I add five more, now I’m recompiling constantly and drowning in output. To me this is the sign to start looking at breakpoints.

Something I want to build better habits around is using proper log levels instead of raw prints, leaving structured diagnostic output in the code rather than adding and deleting printf statements every debugging session.


Wrong Output → Breakpoints

Logic errors, bad calculations, when print statements become a bit much.

Honestly another tool I SHOULD use more. Being able to step through is so powerful. When I have a hypothesis about where something goes wrong and see the locals update in human time is satisfying.

I run nvim-dap for this. The workflow is simple:

<leader>db    breakpoint on the suspicious line
<leader>dc    start session
<leader>dn    step over
<leader>di    step into the call
<leader>do    step out

The Scopes panel updates live as I step.

The move that saves the most time: conditional breakpoints.

<leader>dB    pause only when condition is true

If I am debugging a loop that runs ten thousand times and the corruption happens at iteration 9,847, unless I am doing the carpal tunnel speedrun, I just set the condition to i = 9847 and let it go.

REPL (<leader>dr) is powerful. I can evaluate arbitrary expressions in the current frame without modifying code. Hypothesis-driven debugging without the edit-compile-run cycle.


Memory Corruption Far From the Cause → rr

An insidious case. The value is wrong, but the line that caused it is somewhere else entirely. By the time I see the damage, the pointers have long moved on.

The Example from MIT Missing Semester: a loop that runs i < 4 on an array of size 3. The fourth write spills into adjacent memory. The thing that breaks is a completely different struct field.

rr solves this by recording the entire execution to disk first, then letting me replay it backwards.

rr record ./program
rr replay

Inside the replay, I can set a watchpoint on the corrupted address:

watch students[1].id
continue

It runs forward until the watchpoint fires. Then:

reverse-continue
# OR
rc

It rewinds to the previous write to that address. Easy to find the guilty line of code.

This only works because rr records deterministically. Every instruction, every memory access, saved to disk.

I reach for rr when:

  • The corruption and the cause are in different functions
  • The bug only shows up under specific conditions
  • I wish I could just go back in time

Crashes and Silent Corruption → ASan

Use-after-free, buffer overflows, out-of-bounds writes. All the fun stuff. Stuff that doesn’t crash immediately.

gcc -fsanitize=address -g program.c -o program
./program

ASan instruments every memory access at compile time. When I touch memory I shouldn’t, it aborts immediately with a full report: what happened, which line, when the memory was allocated, when it was freed.

Without sanitizers, a use-after-free might appear to work fine. The freed memory is still mapped to my process, the bytes are still there, nothing zeroes them out. I get garbage output, ASLR randomises it every run, and I waste hours wondering why the behaviour is non-deterministic.

ASan turns that into a controlled abort with a complete timeline. Allocated at line 6, freed at line 10, illegally accessed at line 12. And it angrily yells at me.

Two habits I want to remember:

Always compile with -g. Without debug symbols, ASan shows addresses instead of line numbers. Which is not useful to me until I learn to read hex addresses fluently.

Always null after free.

free(ptr);
ptr = NULL;

I read this one somewhere and embedded it. free() doesn’t zero the pointer. It still holds the old address. Any accidental dereference later becomes a loud null pointer crash instead of silent corruption. Loud failures beat silent ones.

Other sanitizers worth knowing:

-fsanitize=thread      # data races
-fsanitize=memory      # reads of uninitialised memory
-fsanitize=undefined   # integer overflow, bad casts

Slow Code → Profile Before I Touch Anything

The hotspot is almost never where I think it is.

I don’t personally want to scour through code guessing where I could optimise things. So profiling my runs is great.

Step 1 - get a quick overview

perf stat -e task-clock,context-switches,page-faults ./program

Tells me: how much CPU time, how many interruptions, memory overhead. Gives me a baseline before I change anything.

Step 2 - find the actual hotspot

perf record -g ./program
perf report

Two columns matter. Self % is time spent inside that function. Children % is cumulative … everything it called included. Sort by Self to find what’s actually burning CPU.

Step 3 - visualise it

perf script | stackcollapse-perf.pl | flamegraph.pl > flame.svg

Game changer. And impressive to your boss if you show the B&A. Open in a browser. Width equals time. The widest bar at the top of a call stack is my bottleneck. Tall narrow towers are deep call chains. Wide flat bars are hot loops.

One thing I noticed while learning more about profiling: the cost of starting a program is enormous relative to what it actually does. Dynamic linking, symbol resolution, locale loading… all before my first line of main(). For short-lived programs, my code is often less than 1% of total instructions. Callgrind will show this clearly if I ever doubt it.


Comparing Two Implementations → hyperfine

So using time for this kind of thing is tricky. From what I understand the OS might context-switch mid-run, the disk cache might be cold, the first run might include lazy initialisation. It’s uncontrolled.

hyperfine --warmup 3 'command_a' 'command_b'

--warmup 3 discards the first three runs per command. Everything after that is hot-cache, steady-state performance. Hyperfine runs both commands multiple times, interleaved, and gives me mean, min, max, and standard deviation.

Low σ means stable results. High σ means something external is interfering.

Something I noted during my own playing with hyperfine strategy vs algorithm. I was comparing grep and ripgrep. Ripgrep beats grep for two distinct reasons.. it uses SIMD for faster pattern matching (algorithm), and it skips .gitignore files and binaries by default (strategy). These are not the same thing. If I want to benchmark the algorithm I need to control the input e.g. plain text files only, nothing to skip. Otherwise I’re measuring strategy, not code.


Unknown Behaviour → strace

Program failing silently, opening wrong files, hanging on I/O, behaving differently in production than locally.

strace intercepts every system call my program makes. Every file open, every read, every memory mapping, every network connection:

strace ./program 2>&1 | grep openat     # every file it touches
strace -T ./program                     # with timing per syscall
strace -f ./program                     # follow child processes

What surprised me working through this: before my main() runs a single line of code, a dynamically linked program has already made dozens of syscalls… loading the dynamic linker, resolving symbols against libc, setting up thread-local storage, loading locale files. ls -l reads /etc/passwd and /etc/group just to convert uid/gid numbers to readable names.

strace makes all of this visible. It’s particularly useful for “why is this failing in production” situations where the program can’t tell me what it’s trying to open.


The Short Version

ProblemTool
Wrong output, logic errorsnvim-dap breakpoints
Corruption far from the causerr + watchpoints + reverse-continue
Memory bugs, use-after-freeASan (-fsanitize=address)
Slow codeperf record → flame graph
Comparing implementationshyperfine
Unknown behaviour, file accessstrace
Resource usage, CPU pinninghtop + taskset

rr and ASan are complementary, not alternatives. ASan tells me what went wrong at the point of violation. rr tells me why which useful when the cause and effect are separated by a hundred function calls.

The underlying principle across all of these: make failures loud and specific. Silent corruption, non-deterministic crashes, and vague error messages are BAD. Every tool here is fundamentally about converting ambiguous failure into precise, reproducible information.


If debugging is the process of removing software bugs, then programming must be the process of putting them in. - Edsger Dijkstra.