How NodeJS Made Me a Masochist: Building a Real-Time Web App in C++ (Part 2)

Or: How I Discovered Why Nginx Doesn’t Use 10,000 Threads and Nearly Had a Mental Breakdown

The Great Architectural Revelation

When we left off in Part 1, I had built what I thought was a pretty solid multi-threaded TCP server. It handled…


This content originally appeared on DEV Community and was authored by Mustafa Siddiqui

Or: How I Discovered Why Nginx Doesn't Use 10,000 Threads and Nearly Had a Mental Breakdown

The Great Architectural Revelation

When we left off in Part 1, I had built what I thought was a pretty solid multi-threaded TCP server. It handled multiple connections, had proper cleanup, and even graceful shutdown. I was feeling pretty good about myself until I ran some basic load tests and watched my beautiful creation crumble like a house of cards in a hurricane.

The problem wasn't bugs in my code - it was the fundamental architecture. My thread-per-connection model hit a wall around 200 concurrent connections, and it wasn't even close to graceful degradation. The server didn't slow down - it just started rejecting connections entirely. Memory usage was through the roof, and CPU was spending more time switching between threads than actually doing work.

That's when I learned about the C10K problem - the challenge of handling 10,000 concurrent connections simultaneously. This isn't some theoretical computer science puzzle; it's a real limitation that shaped how modern servers work. My innocent little chat server had run headfirst into the same scalability wall that forced the entire industry to rethink network programming.

The solution? Completely abandon the thread-per-connection model and embrace something that sounded terrifying: event-driven programming.

Understanding the Event Loop: The Heart of Modern Servers

If you've worked with Node.js, you've probably heard the phrase "event loop" thrown around like it's some mystical concept. I certainly treated it that way - I knew it was important, but I had no idea what it actually meant or why it mattered.

Here's the fundamental insight that changed everything: instead of dedicating a thread to each connection, you can monitor all connections simultaneously and only do work when something interesting happens. It's like the difference between hiring a personal assistant for each of your friends versus having one receptionist who answers all the phones and routes calls appropriately.

The magic happens through something called I/O multiplexing - operating system facilities that let you monitor hundreds or thousands of file descriptors with a single system call. On Linux, this is epoll. On macOS and BSD systems, it's kqueue. These aren't just performance optimizations - they're fundamentally different approaches to handling concurrent I/O.

Think about it this way: in my old threaded model, each connection was like a person standing in their own line, waiting for their turn to be served. Most of the time, they're just standing there doing nothing. In the event-driven model, everyone gets a number and sits down. When their number is called (when data arrives), they get served immediately by the next available worker.

Building the Event Notifier: Cross-Platform I/O Multiplexing

The first challenge was abstracting away the differences between epoll and kqueue. These systems do the same job but have completely different APIs. I needed a clean abstraction that would work on both Linux and macOS without littering my code with platform-specific conditionals.

class EventNotifier {
public:
    EventNotifier();
    ~EventNotifier();

    bool add_fd(int fd, bool listen_for_read = true);
    bool remove_fd(int fd);
    std::vector<EventData> wait_for_events(int timeout_ms = 1000);

private:
#ifdef USE_EPOLL
    int epoll_fd;
#elif defined(USE_KQUEUE)
    int kqueue_fd;
#endif
};

The beauty of this abstraction is that the rest of my code doesn't need to know or care which underlying mechanism is being used. The interface is clean and consistent across platforms.

But implementing this abstraction taught me just how different these systems really are. Epoll uses a simple approach - you add file descriptors to an interest set, then ask for events that occurred. Kqueue is more event-centric - you register for specific types of events and get notified when they happen.

Here's what the epoll implementation looks like:

bool EventNotifier::add_fd(int fd, bool listen_for_read) {
    epoll_event event{};
    event.events = EPOLLET | EPOLLIN;  // Edge-triggered, read events
    if (!listen_for_read)
        event.events |= EPOLLOUT;      // Also watch for write readiness
    event.data.fd = fd;                // Store the file descriptor in the event

    return epoll_ctl(epoll_fd, EPOLL_CTL_ADD, fd, &event) != -1;
}

The EPOLLET flag is crucial - it enables edge-triggered notifications. Instead of being notified continuously while data is available, you're only notified when the state changes from "no data" to "data available". This is more efficient but requires careful programming because you must read all available data in one go.

The kqueue equivalent is structured differently but accomplishes the same goal:

bool EventNotifier::add_fd(int fd, bool listen_for_read) {
    struct kevent event;
    EV_SET(&event, fd, EVFILT_READ, EV_ADD | EV_ENABLE, 0, 0, nullptr);
    return kevent(kqueue_fd, &event, 1, nullptr, 0, nullptr) != -1;
}

Kqueue uses a different mental model where you're registering interest in specific filter types (like EVFILT_READ) rather than setting flags on file descriptors. Same result, completely different API.

The Event Loop: Where Everything Comes Together

With the event notifier abstraction in place, I could build the actual event loop. This is the beating heart of the entire server - a single thread that monitors all connections and dispatches work when something happens.

void EventLoop::run() {
    std::cout << "🚀 Event loop started!" << std::endl;

    while (!should_stop_) {
        // Wait for events with a 1-second timeout
        auto events = notifier_->wait_for_events(1000);

        for (const auto& event : events) {
            handle_event(event);
        }
    }
}

This looks deceptively simple, but there's a lot happening in those few lines. The wait_for_events call is where the magic happens - it's a blocking system call that efficiently waits for activity on any of the monitored file descriptors. When something interesting happens (data arrives, connection closes, error occurs), the call returns immediately with a list of events to process.

The timeout parameter is important for responsiveness. Without it, the event loop would block indefinitely if no events occurred, making it impossible to check for shutdown signals or perform periodic maintenance tasks.

Connection Management: The Shared State Challenge

One of the trickiest aspects of the event-driven model is connection management. In the threaded model, each connection's state lived in the thread's local variables. In the event-driven model, connection state needs to be shared between the event loop and the worker threads that actually process requests.

This led me to create a ConnectionState structure that tracks everything about each connection:

struct ConnectionState {
    int socket_fd;                    // The actual socket
    std::string client_ip;            // For logging and debugging
    uint16_t client_port;
    Protocol protocol = Protocol::HTTP;  // HTTP or WebSocket
    std::string http_buffer;          // Accumulates partial requests
    bool http_headers_complete = false;
    bool websocket_handshake_complete = false;
    std::chrono::steady_clock::time_point last_activity;  // For timeout detection

    ConnectionState(int fd, const std::string& ip, uint16_t port)
        : socket_fd(fd), client_ip(ip), client_port(port),
          last_activity(std::chrono::steady_clock::now()) {}
};

The http_buffer field is particularly important - it accumulates partial HTTP requests as data arrives. Network data doesn't always arrive in convenient chunks, so you might receive half a request header in one packet and the rest in another. The buffer lets you reconstruct complete requests regardless of how the data is fragmented.

Managing these connection objects safely across threads required careful use of shared pointers and mutexes:

// Thread-safe connection storage
std::map<int, std::shared_ptr<ConnectionState>> connections_;
std::mutex connections_mutex_;

// When handling events, grab a connection reference safely
std::shared_ptr<ConnectionState> conn;
{
    std::lock_guard<std::mutex> lock(connections_mutex_);
    auto it = connections_.find(fd);
    if (it == connections_.end()) {
        return;  // Connection might have been closed by another thread
    }
    conn = it->second;
}
// Now we can safely use 'conn' outside the lock

The pattern here is crucial: hold the lock only long enough to get a shared reference to the connection, then release it immediately. This prevents the connection object from being deleted while we're using it, without holding the lock during potentially expensive operations.

HTTP Request Parsing: The Devil in the Details

With the event infrastructure in place, I could finally tackle HTTP request parsing. This seemed straightforward at first - just look for \r\n\r\n to find the end of headers, right?

Wrong. So very wrong.

HTTP parsing is full of edge cases that will make you question your life choices. Headers can span multiple lines through a mechanism called "folding". The Content-Length header determines how much body data to expect, but it might be missing or invalid. Clients might send requests incrementally over several packets, or they might pipeline multiple requests in a single packet.

Here's how I handle the basic request line parsing:

void HttpRequestTask::execute(int worker_id) {
    CORE::Request req;

    std::istringstream stream(raw_headers_);
    std::string line;

    // Parse the request line: "GET /path HTTP/1.1"
    std::getline(stream, line);
    std::istringstream reqline(line);
    reqline >> req.method >> req.path >> req.version;

    // Parse headers line by line
    while (std::getline(stream, line) && !line.empty()) {
        // Remove trailing \r if present
        if (!line.empty() && line.back() == '\r')
            line.pop_back();

        auto colon = line.find(':');
        if (colon != std::string::npos) {
            std::string key = line.substr(0, colon);
            std::string value = line.substr(colon + 1);

            // Trim leading whitespace from value
            if (!value.empty() && value.front() == ' ')
                value.erase(0, 1);

            req.headers[key] = value;
        }
    }

    // Route the request to appropriate handler
    CORE::Response res;
    if (!router_.route(req, res)) {
        // Return 404 if no route matches
        res.status_code = 404;
        res.status_text = "Not Found";
        res.headers["Content-Type"] = "text/html";
        res.body = "<h1>404 Not Found</h1>";
    }

    // Send response and close connection
    std::string raw_response = res.to_string();
    send(conn_->socket_fd, raw_response.data(), raw_response.size(), 0);
    close(conn_->socket_fd);
}

The parsing logic handles several subtle details: removing carriage returns, trimming whitespace from header values, and gracefully handling malformed headers. Each of these details represents a potential source of bugs or security vulnerabilities in a real server.

The WebSocket Handshake: Protocol Negotiation

Supporting WebSockets required implementing the upgrade handshake - the process where an HTTP connection transforms into a WebSocket connection. This involves cryptographic hashing and careful header manipulation.

The WebSocket handshake works like this: the client sends a special HTTP request with specific headers indicating they want to upgrade to WebSocket. The server responds with a computed hash that proves it understands the WebSocket protocol. If the handshake succeeds, both sides switch to WebSocket frame format for all subsequent communication.

void WebSocketHandshakeTask::execute(int worker_id) {
    std::cout << "[Worker " << worker_id << "] Processing WebSocket handshake for fd " 
              << conn_->socket_fd << std::endl;

    // Extract the Sec-WebSocket-Key header from the request
    std::string websocket_key = extract_websocket_key(raw_headers_);

    if (websocket_key.empty()) {
        // Invalid handshake - respond with HTTP 400
        send_bad_request(conn_->socket_fd);
        close(conn_->socket_fd);
        return;
    }

    // Compute the Sec-WebSocket-Accept header
    // This involves SHA-1 hashing with a magic string
    std::string accept_key = compute_websocket_accept(websocket_key);

    // Send the upgrade response
    std::string response = 
        "HTTP/1.1 101 Switching Protocols\r\n"
        "Upgrade: websocket\r\n"
        "Connection: Upgrade\r\n"
        "Sec-WebSocket-Accept: " + accept_key + "\r\n"
        "\r\n";

    send(conn_->socket_fd, response.data(), response.size(), 0);

    // Mark this connection as successfully upgraded
    conn_->protocol = Protocol::WEBSOCKET;
    conn_->websocket_handshake_complete = true;
}

The compute_websocket_accept function implements the WebSocket specification's required hashing:

std::string compute_websocket_accept(const std::string& websocket_key) {
    // The magic string is defined in the WebSocket RFC
    const std::string magic = "258EAFA5-E914-47DA-95CA-C5AB0DC85B11";

    // Concatenate key + magic string
    std::string combined = websocket_key + magic;

    // Compute SHA-1 hash
    unsigned char hash[SHA_DIGEST_LENGTH];
    SHA1(reinterpret_cast<const unsigned char*>(combined.c_str()), 
         combined.length(), hash);

    // Base64 encode the result
    return base64_encode(hash, SHA_DIGEST_LENGTH);
}

This cryptographic dance ensures that both client and server understand the WebSocket protocol and prevents certain types of cross-protocol attacks.

Thread Pool Architecture: Separating I/O from CPU Work

The event loop handles I/O efficiently, but CPU-intensive work like HTTP parsing and response generation still needs to happen somewhere. That's where the thread pool comes in - a fixed number of worker threads that process tasks as they become available.

class ThreadPool {
public:
    ThreadPool(int num_workers = 4);
    ~ThreadPool();

    void enqueue_task(std::unique_ptr<Task> task);
    void shutdown();

private:
    void worker_function(int worker_id);

    std::queue<std::unique_ptr<Task>> task_queue_;
    std::vector<std::thread> workers_;
    std::mutex queue_mutex_;
    std::condition_variable queue_cv_;
    std::atomic<bool> should_stop_{false};
};

The worker threads spend most of their time sleeping, waiting for tasks to appear in the queue. When the event loop detects incoming data, it creates a task object and adds it to the queue. The condition variable wakes up a sleeping worker, which processes the task and then goes back to sleep.

void ThreadPool::worker_function(int worker_id) {
    while (!should_stop_) {
        std::unique_ptr<Task> task;

        // Wait for a task to become available
        {
            std::unique_lock<std::mutex> lock(queue_mutex_);
            queue_cv_.wait(lock, [this] {
                return !task_queue_.empty() || should_stop_;
            });

            if (should_stop_ && task_queue_.empty())
                break;

            task = std::move(task_queue_.front());
            task_queue_.pop();
        }

        // Process the task outside the lock
        if (task)
            task->execute(worker_id);
    }

    std::cout << "Worker [" << worker_id << "] stopping." << std::endl;
}

This architecture separates concerns beautifully: the event loop focuses on I/O multiplexing and connection management, while the thread pool handles CPU-intensive request processing. The number of threads remains constant regardless of the number of connections, which solves the scalability problem that killed my original design.

Performance Insights: Why This Architecture Works

The transformation from thread-per-connection to event-driven architecture isn't just an optimization - it's a fundamental shift in how the server uses system resources. Instead of creating expensive threads for every connection, the server uses a small, fixed number of threads that stay busy processing actual work.

Consider the resource usage differences:

  • Thread-per-connection: 1000 connections = 1000 threads = ~8GB of stack memory alone
  • Event-driven: 1000 connections = 1 event loop thread + 8 worker threads = ~72MB of stack memory

The CPU usage patterns are equally dramatic. In the threaded model, the operating system spends significant time switching between threads, most of which are idle. In the event-driven model, threads only exist when there's actual work to do.

Memory allocation patterns also improve substantially. Instead of each thread having its own stack space that's mostly unused, the event-driven model allocates memory dynamically for connection state and task objects. This memory is released as soon as it's no longer needed.

Debugging and Observability: Learning to See Inside the System

Building a complex concurrent system taught me the importance of observability - the ability to understand what's happening inside a running system. Print statements aren't enough when you have multiple threads processing events asynchronously.

I added logging throughout the system to track connection lifecycle:

void EventLoop::handle_new_connections() {
    while (true) {
        sockaddr_in client_addr{};
        socklen_t client_len = sizeof(client_addr);
        int client_fd = accept(server_socket_, (sockaddr*)&client_addr, &client_len);

        if (client_fd == -1)
            break;

        make_socket_nonblocking(client_fd);
        notifier_->add_fd(client_fd);

        // Extract client information for logging
        char client_ip[INET_ADDRSTRLEN];
        inet_ntop(AF_INET, &client_addr.sin_addr, client_ip, INET_ADDRSTRLEN);
        uint16_t client_port = ntohs(client_addr.sin_port);

        auto conn = std::make_shared<ConnectionState>(client_fd, client_ip, client_port);

        {
            std::lock_guard<std::mutex> lock(connections_mutex_);
            connections_[client_fd] = conn;
        }

        std::cout << "New client: " << client_ip << ":" << client_port 
                  << " (fd: " << client_fd << ")" << std::endl;
    }
}

This logging proved invaluable during development. I could see exactly when connections were established, when data arrived, which worker threads processed which requests, and when connections were closed. Without this visibility, debugging race conditions and connection management issues would have been nearly impossible.

The Moment It All Clicked

After weeks of struggling with threading, I/O multiplexing, and protocol parsing, there was a magical moment when everything suddenly worked together. I could open multiple browser tabs, each making WebSocket connections to my server, and watch the event loop efficiently dispatch work to the thread pool while maintaining thousands of concurrent connections.

The performance difference was staggering. My original threaded server started rejecting connections around 200 concurrent clients. The event-driven version was comfortably handling 2000+ connections on my laptop with minimal CPU usage.

But more importantly, I finally understood how Node.js works under the hood. The event loop that I'd treated as mysterious black magic was just I/O multiplexing with a task queue. The "callback hell" that everyone complains about in JavaScript is just the natural consequence of event-driven programming - you can't block the event loop, so everything has to be asynchronous.

What I've Learned About System Design

Building this server from scratch taught me lessons that no tutorial or documentation could convey. The most important insight is that architecture matters more than implementation details. My original threaded implementation was bug-free and well-written, but it was fundamentally limited by its architecture. No amount of clever optimization could overcome the scalability wall inherent in the thread-per-connection model.

I also learned that abstractions have costs. Node.js makes concurrent programming feel easy by hiding the complexity of event loops and I/O multiplexing, but that simplicity comes at the cost of understanding. When something goes wrong in a Node.js application, debugging requires understanding the hidden complexity of the event loop.

The experience of building cross-platform I/O multiplexing taught me to appreciate the abstractions that most developers take for granted. The fact that socket.io works identically on Windows, Linux, and macOS represents thousands of hours of careful abstraction and testing.

Looking Forward: The Real-Time Features

With the server architecture solid, the next challenge is implementing the real-time features that motivated this entire project. This means WebSocket frame parsing, message broadcasting, and probably some kind of pub/sub system for managing different chat rooms or channels.

I also want to add HTTP/1.1 keep-alive support. Currently, I'm closing connections after each request, which is incredibly inefficient for modern web applications that make dozens of requests per page load. Supporting persistent connections will require rethinking the connection lifecycle and adding connection pooling.

There's also the question of static file serving. Real web applications need to serve HTML, CSS, and JavaScript files efficiently. This involves file system I/O, MIME type detection, and caching strategies - another rabbit hole of complexity disguised as a simple feature.

The Bigger Picture: Why This Matters

Six months ago, I was just another developer who knew how to npm install solutions to problems. Now I understand why those solutions exist and what problems they solve. This deep understanding changes how I approach system design, performance optimization, and debugging.

When I see a Node.js application struggling with high concurrency, I understand why - the event loop architecture has specific characteristics and limitations. When I read about nginx's performance, I understand the architectural decisions that enable it to handle millions of connections.

Most importantly, I've learned that the best way to understand a system is to build it yourself. Reading documentation and tutorials gives you surface-level knowledge. Building something from first principles gives you the deep understanding that only comes from confronting every edge case and design decision personally.

The journey from "just use Express.js" to building a high-performance event-driven server has been equal parts educational and frustrating. But every moment of frustration represented a gap in my understanding that I've now filled. That understanding is worth more than any tutorial or certification.

Coming Up in Part 3

In the next installment, I'll tackle the real-time features that started this whole journey. This means implementing WebSocket frame parsing, building a message broadcasting system, and probably adding some kind of authentication and room management.

I'll also dive into the performance testing and optimization phase - finding bottlenecks, implementing caching strategies, and seeing how close I can get to the performance of established servers like nginx or Node.js.

Spoiler alert: it's going to involve binary protocols, memory pools, and probably more questioning of my life choices. But at this point, I'm too deep in the rabbit hole to turn back.

The complete code for this project lives at mush1e/see-plus-plus. If you're following along or building something similar, I'd love to hear about your journey into the depths of systems programming. And if you're a recruiter wondering why anyone would choose to build HTTP servers from scratch when perfectly good ones already exist... well, that's a conversation worth having over coffee.

Still looking for internships/entry-level positions where I can channel this obsession with understanding how things work into something productive. Turns out there's something deeply satisfying about building systems from first principles, even when (especially when) it's completely unnecessary.


This content originally appeared on DEV Community and was authored by Mustafa Siddiqui


Print Share Comment Cite Upload Translate Updates
APA

Mustafa Siddiqui | Sciencx (2025-06-11T03:34:31+00:00) How NodeJS Made Me a Masochist: Building a Real-Time Web App in C++ (Part 2). Retrieved from https://www.scien.cx/2025/06/11/how-nodejs-made-me-a-masochist-building-a-real-time-web-app-in-c-part-2/

MLA
" » How NodeJS Made Me a Masochist: Building a Real-Time Web App in C++ (Part 2)." Mustafa Siddiqui | Sciencx - Wednesday June 11, 2025, https://www.scien.cx/2025/06/11/how-nodejs-made-me-a-masochist-building-a-real-time-web-app-in-c-part-2/
HARVARD
Mustafa Siddiqui | Sciencx Wednesday June 11, 2025 » How NodeJS Made Me a Masochist: Building a Real-Time Web App in C++ (Part 2)., viewed ,<https://www.scien.cx/2025/06/11/how-nodejs-made-me-a-masochist-building-a-real-time-web-app-in-c-part-2/>
VANCOUVER
Mustafa Siddiqui | Sciencx - » How NodeJS Made Me a Masochist: Building a Real-Time Web App in C++ (Part 2). [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/06/11/how-nodejs-made-me-a-masochist-building-a-real-time-web-app-in-c-part-2/
CHICAGO
" » How NodeJS Made Me a Masochist: Building a Real-Time Web App in C++ (Part 2)." Mustafa Siddiqui | Sciencx - Accessed . https://www.scien.cx/2025/06/11/how-nodejs-made-me-a-masochist-building-a-real-time-web-app-in-c-part-2/
IEEE
" » How NodeJS Made Me a Masochist: Building a Real-Time Web App in C++ (Part 2)." Mustafa Siddiqui | Sciencx [Online]. Available: https://www.scien.cx/2025/06/11/how-nodejs-made-me-a-masochist-building-a-real-time-web-app-in-c-part-2/. [Accessed: ]
rf:citation
» How NodeJS Made Me a Masochist: Building a Real-Time Web App in C++ (Part 2) | Mustafa Siddiqui | Sciencx | https://www.scien.cx/2025/06/11/how-nodejs-made-me-a-masochist-building-a-real-time-web-app-in-c-part-2/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.