Batch processing in Node.js is a technique to handle large amounts of data efficiently by processing it in groups or batches. It involves managing concurrency and asynchronous tasks. This technique is opposite to processing individual items one at a time. Batch processing can be used when dealing with data transformation, file processing, database updates, etc. This can significantly improve the performance of a system or program and reduce the risk of memory exhaustion.
Let’s outline the typical workflow of Node.js batch processing:
We gather the data that needs to be processed in batches. This could be an array, a file, data from a database, etc.
We create functions that will handle the processing of a single item and a batch of items.
We determine the optimal batch size based on our system's capabilities and the nature of the data. We calculate the number of batches needed.
We iterate through each batch, extract the data, and process it using the defined functions.
We implement error-handling mechanisms and logging to ensure smooth execution and aid in debugging.
We conclude the batch processing with a completion message.
There are multiple methods to perform Node.js batch processing. These include the following:
Synchronous approach: We process each batch sequentially, one after the other, without parallelization or asynchronous operations.
Promises: We can use promises to handle asynchronous operations, allowing for more readable and organized code when tasks involve asynchronous elements like network requests.
Streams: Node.js Streams can be leveraged for more efficient processing of large datasets, reducing memory consumption.
Parallel batch processing: We can also implement parallel batch processing to further improve performance by utilizing the full capacity of the system.
Let’s consider an example of processing customer orders in an e-commerce system. We’ll assume that we have a dataset of customer orders, and we need to update the order prices based on a new pricing strategy. We will update the prices using Node.js batch processing. Here, the batch processing is done using a simple synchronous approach relying on basic looping and function calls to handle the batch processing.
const ordersData = [ { orderId: 1, product: 'Phone', quantity: 2, price: 15 }, { orderId: 2, product: 'Tablet', quantity: 1, price: 25 }, { orderId: 3, product: 'Macbook', quantity: 3, price: 30 }, { orderId: 4, product: 'Laptop', quantity: 1, price: 20 }, { orderId: 5, product: 'Microphone', quantity: 4, price: 18 }, { orderId: 6, product: 'Earphones', quantity: 2, price: 35 }, { orderId: 7, product: 'Jacket', quantity: 1, price: 22 }, { orderId: 8, product: 'Shirt', quantity: 2, price: 28 }, { orderId: 9, product: 'Shorts', quantity: 3, price: 12 }, { orderId: 10, product: 'Pants', quantity: 2, price: 40 }, { orderId: 11, product: 'Gazebo', quantity: 1, price: 25 }, { orderId: 12, product: 'Dressing Table', quantity: 3, price: 20 }, { orderId: 13, product: 'Charger', quantity: 1, price: 32 }, { orderId: 14, product: 'Television', quantity: 2, price: 14 }, { orderId: 15, product: 'Sweater', quantity: 4, price: 26 }, ]; const batchSize = 5; function updateOrderPrice(order) { const newPrice = parseFloat((order.price * 1.1).toFixed(1)); return { ...order, price: newPrice }; } function processABatch(batch, processingFunction) { for (const order of batch) { const updatedOrder = processingFunction(order); console.log(`Order ${updatedOrder.orderId} - Updated Price: $${updatedOrder.price}`); } } const numOfBatches = Math.ceil(ordersData.length / batchSize); for (let batchIndex = 0; batchIndex < numOfBatches; batchIndex++) { const start = batchIndex * batchSize; const end = Math.min(start + batchSize, ordersData.length); const batch = ordersData.slice(start, end); processABatch(batch, updateOrderPrice); console.log(`Batch ${batchIndex + 1} processed.`); console.log(""); } console.log('Batch processing of order prices is complete.');
Lines 1–18: We define an array that contains the data of customer orders to be processed and the batch size, respectively.
Lines 20–23: We define the updateOrderPrice
function to process a single order by increasing the order price by 10%, rounded to 1 decimal place.
Lines 25–30: We define the processABatch
function to process each batch of orders. It will iterate through each order in the batch and call the processingFunction
function, which in this case will be the updateOrderPrice
function, for each order.
Line 32: We compute the number of batches based on the data and the batch size that was defined earlier.
Lines 34–44: We loop over each batch, and for each iteration, we extract the batch of data using the slice
method.
Line 40: We call the processABatch
function to process the current batch of data.
In real-life scenarios, we might perform more complex tasks using batch processing, such as processing large amounts of data from a CSV file, database updates, or data transformations. By customizing the code to a specific use case, we can make the best use of Node.js for batch processing. This can be time-saving, reduce memory usage, and enhance efficiency when working with large amounts of data.
Free Resources