The CUDA error while loading the dataset is likely caused by a conflict between
If you set the pin_memory=True flag on the dataset, tqdm will try to copy the data to the GPU before it is processed. It can sometimes cause a CUDA error if the GPU is not powerful enough to handle the data transfer.
To fix this error, you can try the following:
Disable the pin_memory=true flag in tqdm. It will reduce performance, but it should prevent the CUDA errors.
import tqdm
import torch
def load_dataset(pin_memory=False):
dataset = torch.utils.data.Dataset()
for i in range(1000):
data = torch.randn(10)
if pin_memory:
data = data.pin_memory()
yield data
if __name__ == "__main__":
for i, data in enumerate(tqdm.tqdm(load_dataset(pin_memory=False))):
passThe code defines a load_dataset function that generates a synthetic dataset containing 1000 samples, each with ten random values using PyTorch's torch.randn() function. It uses the yield keyword to create a generator, enabling memory-efficient data loading. The main block iterates through the dataset using tqdm.tqdm() to display a progress bar while loading the data.
Update to a newer version of tqdm.
pip install --upgrade tqdm
If you are loading the dataset in batches, try reducing the batch size to lower the memory requirements during loading. It can help resolving the CUDA memory errors.
import tqdm
import torch
def load_dataset(batch_size=1, pin_memory=False):
dataset = torch.utils.data.Dataset()
for i in range(0, 1000, batch_size):
batch_data = []
for j in range(batch_size):
data = torch.randn(10)
if pin_memory:
data = data.pin_memory()
batch_data.append(data)
yield torch.stack(batch_data)
if __name__ == "__main__":
batch_size = 32
for i, data in enumerate(tqdm.tqdm(load_dataset(batch_size=batch_size, pin_memory=False))):
pass
Here, in Line 16, we set the batch_size variable to 32 which is being used in the load_dataset function. The for loop will run will hence process a batch of 32 entries from the dataset in one iteration.
Free Resources