Batch Processing Large Datasets in Node.js Without Running Out of Memory

How to Process Large Datasets in Node.js Without Running Out of Memory

If you’ve ever tried handling massive datasets in Node.js, you know how quickly memory issues can become a nightmare. One minute, your script is running smoothly; the nex…


This content originally appeared on DEV Community and was authored by rabindratamang

How to Process Large Datasets in Node.js Without Running Out of Memory

If you've ever tried handling massive datasets in Node.js, you know how quickly memory issues can become a nightmare. One minute, your script is running smoothly; the next, it's crashing with an out-of-memory (OOM) error. I recently faced this issue when extracting millions of log entries from OpenSearch and uploading them to S3, and I want to share what I learned.

Let’s walk through some practical ways to process large datasets without bringing your server to its knees.

The Challenge: Why Large Data Processing Can Be a Problem

So, what causes these memory issues in the first place? When I first attempted to fetch and process 3 million logs, I was running everything in memory, thinking, How bad could it be? Turns out, pretty bad.

Common Pitfalls:

  • Holding too much data at once – If you fetch millions of records in one go, you're asking for trouble.
  • Inefficient batch processing – If you don’t clean up memory between batches, things get messy fast.
  • Not using streams – Handling everything as large arrays instead of streaming the data keeps memory consumption unnecessarily high.

The Fix: Smarter Ways to Process Large Data

1. Process Data in Small Chunks

Rather than fetching everything at once, grab smaller chunks and process them incrementally. This way, you never hold more data in memory than necessary.

async function fetchAndProcessLogs() {
    let hasMoreData = true;
    let nextToken = null;

    while (hasMoreData) {
        const { logs, newNextToken } = await fetchLogsFromOpenSearch(nextToken);
        await processAndUpload(logs);
        nextToken = newNextToken;
        hasMoreData = !!nextToken;
    }
}

This method ensures that we’re only working with manageable amounts of data at a time.

2. Use Streams Instead of Keeping Data in Memory

Streams allow you to process data as it arrives, rather than waiting for everything to load. When compressing logs before uploading to S3, streams + zlib are your best friends:

const { createGzip } = require('zlib');
const { PassThrough } = require('stream');
const AWS = require('aws-sdk');
const s3 = new AWS.S3();

async function uploadToS3(logs) {
    const passThrough = new PassThrough();
    const gzipStream = createGzip();
    passThrough.pipe(gzipStream);

    const uploadPromise = s3.upload({
        Bucket: 'your-bucket-name',
        Key: `logs-${Date.now()}.gz`,
        Body: gzipStream,
        ContentEncoding: 'gzip',
        ContentType: 'application/json',
    }).promise();

    for (const log of logs) {
        if (!passThrough.write(JSON.stringify(log) + '\n')) {
            await new Promise(resolve => passThrough.once('drain', resolve)); // Handle backpressure while streaming logs
        }
    }

    passThrough.end();

    // Await the upload while continuing to stream
    await uploadPromise; 
}

With this approach, logs are streamed directly to S3, avoiding unnecessary memory overhead.

3. Allow Garbage Collection to Do Its Job

Garbage collection in Node.js works best when we don’t hold onto unnecessary references. Here’s how you can help it along:

  • Use setTimeout(0) or setImmediate() to free up memory between batches.
  • Make sure large arrays or objects are dereferenced once they're processed.
async function processAndUpload(logs) {
    await uploadToS3(logs);
    global.gc && global.gc(); // Manually trigger GC if supported
}

4. Increase Node.js Memory Allocation (If Needed)

If you’re working with extremely large datasets and still hitting memory issues, you can increase Node’s memory allocation like this:

node --max-old-space-size=8192 yourScript.js

That said, fixing memory management should be your first priority before resorting to increasing memory limits.

Final Thoughts

Handling large datasets in Node.js doesn't have to be a struggle. By batch processing, using streams, and optimizing memory usage, you can process millions of records smoothly without running out of RAM.

Have you faced similar challenges? Let’s discuss in the comments!


This content originally appeared on DEV Community and was authored by rabindratamang


Print Share Comment Cite Upload Translate Updates
APA

rabindratamang | Sciencx (2025-02-04T18:00:17+00:00) Batch Processing Large Datasets in Node.js Without Running Out of Memory. Retrieved from https://www.scien.cx/2025/02/04/batch-processing-large-datasets-in-node-js-without-running-out-of-memory/

MLA
" » Batch Processing Large Datasets in Node.js Without Running Out of Memory." rabindratamang | Sciencx - Tuesday February 4, 2025, https://www.scien.cx/2025/02/04/batch-processing-large-datasets-in-node-js-without-running-out-of-memory/
HARVARD
rabindratamang | Sciencx Tuesday February 4, 2025 » Batch Processing Large Datasets in Node.js Without Running Out of Memory., viewed ,<https://www.scien.cx/2025/02/04/batch-processing-large-datasets-in-node-js-without-running-out-of-memory/>
VANCOUVER
rabindratamang | Sciencx - » Batch Processing Large Datasets in Node.js Without Running Out of Memory. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/02/04/batch-processing-large-datasets-in-node-js-without-running-out-of-memory/
CHICAGO
" » Batch Processing Large Datasets in Node.js Without Running Out of Memory." rabindratamang | Sciencx - Accessed . https://www.scien.cx/2025/02/04/batch-processing-large-datasets-in-node-js-without-running-out-of-memory/
IEEE
" » Batch Processing Large Datasets in Node.js Without Running Out of Memory." rabindratamang | Sciencx [Online]. Available: https://www.scien.cx/2025/02/04/batch-processing-large-datasets-in-node-js-without-running-out-of-memory/. [Accessed: ]
rf:citation
» Batch Processing Large Datasets in Node.js Without Running Out of Memory | rabindratamang | Sciencx | https://www.scien.cx/2025/02/04/batch-processing-large-datasets-in-node-js-without-running-out-of-memory/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.