Showing posts with the label cuda

Cuda Matrix Multiplication Non Square

For example multiplying 1024x1024 by 1024x1024 matrix takes 4 times less duration than 1024x1024 by 1024x1023 matrix s…

Cuda Matrix Multiplication Optimize

Matrix multiplication in CUDA this is a toy program for learning CUDA some functions are reusable for other purposes. …

Cuda Matrix Multiplication Size

Int ty threadIdx. To calculate ij th element in C we need to multiply i th row of A with j th column in B Fig1. 5kk7…