The CUDA error while loading the dataset is likely caused by a conflict between
If you set the pin_memory=True
flag on the dataset, tqdm will try to copy the data to the GPU before it is processed. It can sometimes cause a CUDA error if the GPU is not powerful enough to handle the data transfer.
To fix this error, you can try the following:
Disable the pin_memory=true
flag in tqdm. It will reduce performance, but it should prevent the CUDA errors.
import tqdm import torch def load_dataset(pin_memory=False): dataset = torch.utils.data.Dataset() for i in range(1000): data = torch.randn(10) if pin_memory: data = data.pin_memory() yield data if __name__ == "__main__": for i, data in enumerate(tqdm.tqdm(load_dataset(pin_memory=False))): pass
The code defines a load_dataset
function that generates a synthetic dataset containing 1000 samples, each with ten random values using PyTorch's torch.randn()
function. It uses the yield
keyword to create a generator, enabling memory-efficient data loading. The main block iterates through the dataset using tqdm.tqdm()
to display a progress bar while loading the data.
Update to a newer version of tqdm.
pip install --upgrade tqdm
If you are loading the dataset in batches, try reducing the batch size to lower the memory requirements during loading. It can help resolving the CUDA memory errors.
import tqdm import torch def load_dataset(batch_size=1, pin_memory=False): dataset = torch.utils.data.Dataset() for i in range(0, 1000, batch_size): batch_data = [] for j in range(batch_size): data = torch.randn(10) if pin_memory: data = data.pin_memory() batch_data.append(data) yield torch.stack(batch_data) if __name__ == "__main__": batch_size = 32 for i, data in enumerate(tqdm.tqdm(load_dataset(batch_size=batch_size, pin_memory=False))): pass
Here, in Line 16, we set the batch_size
variable to 32 which is being used in the load_dataset
function. The for
loop will run will hence process a batch of 32 entries from the dataset in one iteration.
Free Resources