Question bankC · systems · networking

The questions they're likely to ask

Grouped by what each one probes. The trap or the thing the interviewer is listening for is called out next to each — because that's what actually moves the decision. Tap any question to open the model answer.

If you only prep two things, make them the SPSC ring buffer and parsing a packet/descriptor out of a raw byte buffer (endianness + alignment + no aliasing UB). Those two sit right on the NIC datapath and are the most probable hands-on exercises for this team.

1 · Memory layout & alignment

Their bread and butter — descriptors and registers are structs over bytes.

What is sizeof this struct, and why? struct S { char a; int b; char c; };

On a typical system where int needs 4-byte alignment: a at offset 0, then 3 bytes of padding, b at 4–7, c at 8, then 3 bytes trailing paddingso the total is a multiple of the largest member's alignment (4). That's 12 bytes, not 6.

Reorder largest-to-smallest — struct { int b; char a; char c; } — and you get 8 bytes. #pragma pack(1) forces 6 bytes but b now sits at an unaligned offset: often slower, and unsafe if code later takes an unaligned pointer to it on a strict-alignment CPU. Only pack for on-the-wire / on-disk layouts, and prefer copying fields out with memcpy or explicit byte parsing.

What they're listening for: that you reason about both interior and trailing padding, and that you reach for field reordering before #pragma pack.

How do you safely parse a packet header out of a byte buffer?

Do not cast a struct hdr * straight onto the uint8_t * buffer. Main hazards: the buffer may not be alignedfor the struct (the access can fault or tear), the object may not have that struct's effective type (aliasing), and C struct paddingwon't match the wire layout. The safe pattern is memcpy into a properly-aligned struct, or parse field-by-field with explicit shifts — which also handles endianness for free. For example:

uint16_t ethertype; memcpy(&ethertype, p + 12, sizeof ethertype); ethertype = ntohs(ethertype);

What they're listening for: whether “strict aliasing” and “alignment” come out of your mouth, or you reach for the naive cast.

When and why would you use a flexible array member?

struct packet { uint16_t len; uint8_t data[]; }; — for a fixed header followed by a variable-length payload in a single allocation: malloc(sizeof(struct packet) + len). One alloc, one free, good cache locality. (Pre-C99 this was the data[1] “struct hack.”)

2 · Endianness

Networking — very likely to come up.

What do ntohs / htonl do, and when do you need them?

Network byte order is big-endian. POSIX htons/htonl convert host→network, and ntohs/ntohl network→host, for 16/32-bit integers (include <arpa/inet.h>— they're not ISO C). Use them for multi-byte numeric header fields: ports, lengths, IPv4 addresses. On a big-endian host they're no-ops; on little-endian (x86, most ARM) they byte-swap.

Swap a 16-bit value by hand: (uint16_t)(((uint16_t)x >> 8) | ((uint16_t)x << 8)) (cast first to dodge integer promotion). For 32-bit, use __builtin_bswap32 or shift a uint32_t with explicit masks.

How do you detect endianness at runtime?

Write a known value and inspect its first byte through an unsigned char *— the portable way to read an object's representation:

uint32_t x = 0x01020304; const unsigned char *p = (const unsigned char *)&x; // little-endian if p[0] == 0x04

What they're listening for: that you inspect bytes via unsigned char * rather than union punning — both work in C, but this one nobody can argue with.

3 · volatile & memory-mapped I/O

Driver-specific — very high signal for this role.

What does volatile do, and when is it required?

It tells the compiler the object can change outside the program's visible control flow, so it must not cache the value in a register, elide a read/write, or reorder accesses relative to other volatile accesses. Commonly used for: memory-mapped hardware registers, a simple (atomically-accessed) variable touched by an ISR in embedded C, a volatile sig_atomic_t flag set by a signal handler, and setjmp/longjmp-visible locals.

What they're listening for: the follow-up they love — volatile is NOT a memory barrier. It gives no atomicity, no ordering against non-volatile accesses, and no cross-CPU visibility. Saying “I'd use volatile to make it thread-safe” is a red flag; cross-core sharing needs _Atomic / barriers. volatile is for the single-core MMIO/ISR case.

4 · Bit manipulation

Quick whiteboard — maps onto descriptor flags & register fields.

Set / clear / toggle / test bit n. And: is x a power of two?

x |= 1u << n; (set) · x &= ~(1u << n); (clear) · x ^= 1u << n; (toggle) · (x >> n) & 1u (test).

Power of two (for an unsigned x): x != 0 && (x & (x - 1)) == 0 — clears the lowest set bit; if the result is zero there was exactly one bit set. State it for unsigned so negatives / INT_MINcan't bite you.

What they're listening for: use an unsigned mask of the right width, e.g. UINT32_C(1) << n with n < 32 — shifting a signed int into the sign bit, or by ≥ the type width, is UB.

Count the set bits in a word.

Kernighan — loops once per set bit, not per bit:

for (c = 0; x; c++) x &= x - 1;

In production: __builtin_popcount (compiles to a single POPCNT instruction where available).

5 · Pointers & declarations

Dispatch tables and intrusive lists are everywhere in drivers.

Read these declarations: void (*fp)(int) vs void *fp(int). And int *p[10] vs int (*p)[10].

void (*fp)(int) — a pointer to a function taking int, returning void. void *fp(int) — a function taking int, returning void *. The parentheses bind the * to the name.

int *p[10] — an array of 10 pointers to int. int (*p)[10] — a pointer to an array of 10 ints. Read inside-out, right-to-left.

Delete a node from a singly linked list without special-casing the head.

Walk a pointer-to-pointer — it points at the link you might have to rewrite, so the head is just another link:

// Remove every node with a given value — no head special case.
void remove_val(node **pp, int v) {
    while (*pp) {
        node *e = *pp;
        if (e->val == v) { *pp = e->next; free(e); }
        else             { pp = &e->next; }
    }
}

What they're listening for: the node **idiom. It's the elegant answer they hope to see; the clumsy version tracks a separate prev and branches on the head.

6 · Undefined behavior & gotchas

Separates experienced from not. Usually “what's wrong with this code?”

Name the classic 'what's the bug?' UB cases.

Returning a pointer to a stack local — dangling on return.
Signed integer overflow is UB — use unsigned, or check before adding.
Reading an uninitialized variable.
sizeof(arr) on an array parameter — it decayed to a pointer, so you get the pointer size, not the array size.
Off-by-one: writing N+1 bytes into an N buffer.

Operator precedence: *p++ vs (*p)++, and what does a & b == c parse as?

*p++ is *(p++) — dereference, then advance the pointer. (*p)++ increments the pointed-to value.

a & b == c parses as a & (b == c) because == binds tighter than bitwise &. A classic bug — always parenthesize bit ops.

7 · Concurrency & atomics

Datapath relevance — this is a low-latency multi-core team.

Prioritize this cluster. A candidate who interviewed for driver roles reported that “just about everyone asked about atomics.”For a NIC/driver team, C11 atomics, memory ordering, cache coherency, and lock-free queues are core, not bonus material. If you're short on time, over-invest here and on the SPSC ring buffer.

Why does a shared counter need a lock or an atomic?

count++ is a read-modify-write, not a single atomic step. Two threads can both read the old value, both add one, and both store — one increment is lost. Fix with a mutex or atomic_fetch_add. Bonus insight: an atomic RMW still costs a locked bus cycle, which is why the SPSC ring avoids it entirely — give each index a single writer and you never need an RMW on the hot path.

What is false sharing, and why does this team care?

Two independent variables that happen to land on the same 64-byte cache line. When two cores each write their own variable, the cache-coherence protocol bounces the whole line between them, serializing work that looks parallel. Fix: pad / align to a cache line (alignas(64)).

What they're listening for: that you connect it to throughput on a hot path — exactly why the SPSC queue puts head and tail on separate cache lines.

Explain memory_order_relaxed / acquire / release / seq_cst.

relaxed — atomicity only, no ordering with other memory ops (fine for a counter only one thread writes). acquire (on a load) — nothing after it can be reordered before it; you see everything the releasing thread wrote before its release. release (on a store) — nothing before it can be reordered after it; it publishes prior writes. seq_cst — the default: a single global total order, correct but the most expensive.

The SPSC queue uses release to publish the index and acquire to observe it — a release/acquire pair is what creates the happens-before edge that makes the payload visible.

What they're listening for: that you say “pair” — a lone release or a lone acquire orders nothing; they only work together.

8 · OS & memory fundamentals

Reported AMD screens: virtual memory, allocators, process vs thread.

Virtual vs physical addresses, page tables, and the TLB?

Each process sees a private virtual address space; the MMU translates virtual → physical using page tables (multi-level, walked on a miss). The TLB caches recent translations so the walk is skipped on a hit. Relevance here: a NIC does DMA to physical addresses (or through an IOMMU), so a driver must pin/map pages and hand the device the right DMA address — CPU virtual pointers are meaningless to the hardware. Mention DMA coherency too: on a non-coherent system the driver needs cache clean/invalidate around DMA, or a coherent (uncached) mapping.

What they're listening for: that you bridge to DMA/IOMMU — that's the driver-flavoured version of the textbook answer.

PGD

→

PUD

→

PMD

→

PTE

…

click a page

TLB — 3 entries (LRU)

MRU— empty —

— empty —

LRU— empty —

hits 0 · misses 0 · hit-rate 0%

TLB: a hit translates in ~1 cycle; a miss costs a full 4-level walk (~100 cycles). Re-clicking a cached page is a hit; new pages evict the LRU entry.

Implement a simple memory allocator with free-block coalescing.

Keep a free list of blocks, each with a header (size + free flag, often a footer too for backward merging). alloc finds a fit (first/best-fit), splits if the remainder is large enough. free marks the block free and coalesces with the physically-adjacent previous/next block if they're also free — that's what fights fragmentation. Mention alignment of returned pointers and the size/flag packed in the header's low bits.

What they're listening for: the word coalescing and how you find the neighbour (boundary tags / footers). This was a reported AMD new-grad phone-screen task.

What's the difference between a process and a thread?

A process has its own virtual address space; a threadis a schedulable execution context that shares its process's address space with sibling threads. Threads share code/heap/globals and have their own stack and registers — cheaper to create and switch, but they need synchronization because they share memory. (Reported verbatim at IMC; standard at AMD too.)

9 · “Implement this in C”

A 15–20 min live exercise. The ring buffer is the most likely.

Implement a single-producer / single-consumer ring buffer.

This is the headline exercise — it has its own walkthrough on the NIC datapath page. The four points to say out loud: power-of-two capacity → mask not modulo, free-running indices make full vs empty unambiguous, release/acquire publishes the payload before the index, and cache-line padding avoids false sharing.

What they're listening for: that you know where the memory barrier goes (between writing the data and publishing the index) and why volatile wouldn't cut it.

Implement strlen (and how does the real one go faster?).

size_t my_strlen(const char *s) {
    const char *p = s;
    while (*p) p++;
    return (size_t)(p - s);
}
// Real libc reads a word at a time and tests for a zero byte with a
// bit trick: (w - 0x0101...) & ~w & 0x8080...  — worth mentioning.

What they're listening for: that you mention the word-at-a-time optimization — it shows you think about how the standard library actually achieves its speed.

Implement memcpy. What's the difference from memmove?

A byte loop is the baseline; the real one copies a word at a time once pointers are aligned. memcpy's src/dst are restrict — they must not overlap. If they can overlap, that's memmove, which copies backwards when dst > src to avoid clobbering.

Reverse a singly linked list.

The bread-and-butter pointer exercise (reported at Xilinx). Three pointers, one pass, no extra memory — flip each next as you go:

// Reverse a singly linked list in place — three pointers, O(1) space.
node *reverse(node *head) {
    node *prev = NULL;
    while (head) {
        node *next = head->next;  // save
        head->next = prev;        // flip the link
        prev = head;              // advance prev
        head = next;              // advance head
    }
    return prev;                  // new head
}

What they're listening for: that you do it iteratively in O(1) space and can name the recursive variant's O(n) stack cost.

How would you build a flow hash table for packet lookup?

Hash the 5-tuple (src/dst IP, src/dst port, protocol). For a datapath, open addressing with linear probing is cache-friendly and avoids a malloc per insert; chaining is simpler but pointer-chases. Size to a power of two, keep load factor moderate, and have a story for collisions and deletion (tombstones).

10 · Rapid-fire

Short, high-frequency. Have one-liners ready.

malloc/free: leak, double-free, use-after-free, stack vs heap?

Leak — you drop the last pointer to a block without freeing it. Double-free — freeing the same block twice corrupts the allocator. Use-after-free — dereferencing a freed pointer (set it to NULL after free). Stack = automatic storage, freed on scope exit; heap = manual lifetime via malloc/free.

What are the two meanings of static?

At file scope: internal linkage — the symbol is not visible to other translation units. On a local variable: static storage duration — it persists across calls and is initialized once.

What's wrong with #define SQ(x) x*x ?

SQ(a + b) expands to a + b * a + b — wrong. Parenthesize everything: #define SQ(x) ((x) * (x)). Also beware double evaluation of side effects (SQ(i++)), and wrap multi-statement macros in do { ... } while (0).

const char *p vs char * const p ?

const char *p — pointer to const char: you can't write *p, but you can repoint p. char * const p — const pointer: you can write *p, but can't repoint p. Read right-to-left.

← Technical core

Ring buffers in the NIC datapath

Strategy →

The team & the manager round