Frequently Asked Questions About CUDA Programming

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

FREQUENTLY ASKED QUESTIONS (FAQ)

Q : cuda 5.0 linking with samples .h in Visual Studio 2010


compiler : exception.h no such file or directory
compiler : helper_string.h no such file or directory

A:
You are using visual studio 2010 so you should add the path to your project. Just right click
on the name of the project, select properties. under configuration properties select VC++
Directories. add an extra ; at the end of Include Directories and
add C:\ProgramData\NVIDIACorporation\CUDASamples\v5.0\common\inc\. also
the common directory might also have a lib folder that you should add under Library
Directories.
You should do this for each project that needs them. also you can copy them to your VS
directory under VC\include.

Q :How to use printf in CUDA C?


A:
Add ---- cudaDeviceReset();------ atau ---- cudaDeviceSynchronize();------- after Calling
Kernel.
Example:
#include <stdio.h>
__global__ void helloCUDA(float f)
{
printf("Hello thread %d, f=%f\n", threadIdx.x, f);
}
int main()
{
helloCUDA<<<1, 5>>>(1.2345f);
cudaDeviceReset();
return 0;
}

Q: General way of solving Error: Stack around the variable 'x' was corrupted
A:
There are however, a somewhat smallish number of things that typically cause your
problem:
Improper handling of memory:

Deleting something twice,

Using the wrong type of deletion (free for something allocated with new, etc.),
Accessing something after it's memory has been deleted.

Returning a pointer or reference to a local.

Reading or writing past the end of an array.

Q : Why fungsi checkCudaErrors milik header_cuda.h not DETECTED, though I have


input #include <header_cuda.h>?
A : Input #include <header_cuda.h> at very Bottom Include field..

Q : What is INLINE Function at C Code?


A : >>> More Detail > Look At File Inline at Ref Folder.
inline is generally regarded as a hint for the compiler to do this if it can.
So to summarise the options:

Declare everything static inline, and ensure that there are no undefined
functions, and that there are no functions that call undefined functions.

Declare everything inline for Studio and extern inline for gcc. Then provide a
global version of the function in a separate file.

The downside of inlining is that it can bloat your code size if the function is called
from many places.

In many places we create the functions for small work/functionality which contain simple
and less number of executable instruction. Imagine their calling overhead each time
they are being called by callers.
When a normal function call instruction is encountered, the program stores the memory
address of the instructions immediately following the function call statement, loads the

function being called into the memory, copies argument values, jumps to the memory
location of the called function, executes the function codes, stores the return value of
the function, and then jumps back to the address of the instruction that was saved just
before executing the called function. Too much run time overhead.
The C++ inline function provides an alternative. With inline keyword, the compiler
replaces the function call statement with the function code itself (process called
expansion) and then compiles the entire code. Thus, with inline functions, the compiler
does not have to jump to another location to execute the function, and then jump back
as the code of the called function is already available to the calling program.
Pros :1. It speeds up your program by avoiding function calling overhead.
2. It save overhead of variables push/pop on the stack, when function calling happens.
3. It save overhead of return call from a function.
4. It increases locality of reference by utilizing instruction cache.
5. By marking it as inline, you can put a function definition in a header file (i.e. it can be included in
multiple compilation unit, without the linker complaining)
Cons :1. It increases the executable size due to code expansion.
2. C++ inlining is resolved at compile time. Which means if you change the code of the inlined
function, you would need to recompile all the code using it to make sure it will be updated
3. When used in a header, it makes your header file larger with information which users dont care.
4. As mentioned above it increases the executable size, which may cause thrashing in memory. More
number of page fault bringing down your program performance.
5. Sometimes not useful for example in embedded system where large executable size is not preferred
at all due to memory constraints.
When to use Function can be made as inline as per programmer need. Some useful recommendation are mentioned
below1. Use inline function when performance is needed.
2. Use inline function over macros.
3. Prefer to use inline keyword outside the class with the function definition to hide implementation
details.

Q : Why Error 1 error LNK2005: already defined in appears in my program compile ?


A : More Detail >> Look At Linker Tools Error LNK2005 file in Ref Folder
Declare the variable static to function error / problem.
Example.
If
Error 1
error LNK2005: "void __cdecl MergeSort(int *,int *,int)" (?
MergeSort@@YAXPAH0H@Z) already defined in merge.cu.obj
F:\New\Eks\Visual
Studio 2013\VISUAL C#\Cuda-C#\TestRNG\MergeSortGPU\merge_kernel.cu.obj
MergeSortGPU Appears,
Then,
Add static identifier to MergeSort Function. If not Working, LOOK at FILE Linker Tools
Error LNK2005 file.

Note. based on advice from file


c++ - error LNK2005, already defined- - Stack Overflow (2015-06-24 7-06-53 AM).htm
(http://stackoverflow.com/questions/10046485/error-lnk2005-already-defined)
Extern only valid at variable declaration, not Functions.

Q : HOW TO convert listbox items to array integers C#


A : >>> http://stackoverflow.com/questions/16713851/convert-listbox-items-to-arrayintegers-c-sharp
Add .ToString() to ratingListBox.Items[i]
It should be:
int[] ratingArray = new int[numberRatingsInt];
for (int i = 0; i < ratingListBox.Items.Count; i++)
{
ratingArray[i] = Convert.ToInt32(ratingListBox.Items[i].ToString());
}

Just Tested:
.value after ratingListBox.Items[i] can also work.

It can also work like following:


int[] ratingArray = new int[numberRatingsInt];
for (int i = 0; i < ratingListBox.Items.Count; i++)
{
ratingArray[i] = Convert.ToInt32(ratingListBox.Items[i].Value);
}

(This was tested added in reference to @Chris answer.)


Edit:
put ratingListBox.Items.Count in for loop condition.

Q : member

names cannot be the same as their enclosing

type C#

A : Method names which are similar to class name are called constructors. Constructors
dont have a return type.
Change Class Name or Method Names

Q : PInvokeStackImbalance

C# call to unmanaged C++

function
A : As mentioned in Dane Rose's comment, you can either use __stdcall on your C++
function or declare CallingConvention = CallingConvention.Cdecl on your DllImport.

Q : Use

of cudamalloc(). Why the double pointer?

A:
All CUDA API functions return an error code (or cudaSuccess if no error occured). All other
parameters are passed by reference. However, in plain C you cannot have references, that's
why you have to pass an address of the variable that you want the return information to be
stored. Since you are returning a pointer, you need to pass a double-pointer.
Another well-known function which operates on addresses for the same reason is
the scanffunction. How many times have you forgotten to write this & before the variable that
you want to store the value to? ;)
int i;
scanf("%d",&i);

It is needed because the function sets the pointer. As with every output parameters in C,
you need a pointer to an actual variable that you set, rather than the value itself

Q : What is Complete Syntax of CUDA Kernel


A : http://cuda-programming.blogspot.com.tr/2013/01/complete-syntax-of-cudakernels.html

GPU Kernel with Streams, Run time Kernel Launch

In this article well let you know the complete syntax of CUDA Kernels.
We all are love to learn and always curious about know everything in detail.
I was very disappointed when I was not able to find the complete syntax of

CUDA Kernels. So, I though let me give it a day to search everywhere, after
the havey search, I found the syntax of CUDA Kernel and today I am
presenting It you reader.
The CUDA Kernel consist in <<< >>> brackets four things.
First argument is known as Grid Size, followed by Block Size, followed
by size of Shared Memory and end with Stream argument.
Here is the complete syntax;
Kernel_Name<<< GridSize, BlockSize, SMEMSize,
(arguments,....);

Stream >>>

Grid Size
We all know what is Grid size, in case you dont know read further.
Grid size is defined by the number of blocks in a grid. In previous version of
CUDA architecture (from Compute capability 1.x to 2.x) the grid can only be
organized in two dimension (X and Y direction ). But in the current version
(from Compute capability 3.x onwards) the grid can be organized in three
dimension ( X , Y and Z all ).
Block Size
The blocks organized in terms of threads. Threads is the smallest unit in
Parallel programming so in CUDA.
Shared Memory (SMEMSize)
This is for the size of shared memory which is to be use in CUDA Kernel for
shared variable space. This is use bec. Of dynamic shared memory size in
CUDA Kernels.

Streams
A stream is a sequence of operations that are performed in order on the
device.
Streams allows independent concurrent in-order queues of
execution. Stream tell on which device, kernel will execute.
Operations in different streams can be interleaved and overlapped, which
can be used to hide data transfers between host and device.
Q : What is Stream in CUDA API
A : http://cuda-programming.blogspot.in/2013/01/cuda-streams-what-is-cudastreams.html

Stream
A stream is a sequence of operations that are performed in order on the
device.
Streams allows independent concurrent in-order queues of execution.
Operations in different streams can be interleaved and overlapped, which
can be used to hide data transfers between host and device.

---

Use cudaStreamCreate() (runtime API) or cuStreamCreate()


(Driver API) to create a stream of type cudaStream_t .

The default stream (ID=0) need not be create.


Multiple streams exist within a single context, they share memory and
other resources.
Copies & Kernel launches with the same stream parameter execute
in-order.

Function Prototype
Creates a new asynchronous stream.
cudaError_t cudaStreamCreate (cudaStream_t * pStream)
Parameters:
pStream - Pointer to new stream identifier
Returns:
cudaSuccess, cudaErrorInvalidValue
Note that this function may also return error codes from previous,
asynchronous launches.

Q : What is STATIC Variables and Functions?

A:
Short answer ... it depends.
1.

Static defined local variables do not lose their value between function calls. In other
words they are global variables, but scoped to the local function they are defined in.

2.

Static global variables are not visible outside of the C file they are defined in.

3.

Static functions are not visible outside of the C file they are defined in.

Static member functions are functions that do not require an instance of the class, and are
called the same way you access static member variables -- with the class name rather than a
variable name. (E.g. a_class::static_function(); rather than an_instance.function();) Static member
functions can only operate on static members, as they do not belong to specific instances of a class.
Static member functions can be used to modify static member variables to keep track of their values
-- for instance, you might use a static member function if you chose to use a counter to give each
instance of a class a unique id.

Q : Why Strange Values show in my C / C++ Code?


A:
In Array > MUST BE INITIALIZE.

Ex.

Int angka[]={0};

Q : What is Memset and its function?


A : >>>>>http://www.cplusplus.com/reference/cstring/memset/
function
<cstring>

memset
void * memset ( void * ptr, int value, size_t num );
Fill block of memory

Sets the first num bytes of the block of memory pointed by ptr to the
specified value (interpreted as an unsigned char).

Parameters
ptr
Pointer to the block of memory to fill.

value
Value to be set. The value is passed as an int, but the function fills the block
of memory using the unsigned charconversion of this value.
num
Number of bytes to be set to the value.
size_t is an unsigned integral type.

Return Value
ptr is returned.

Example
1 /* memset example */
2 #include <stdio.h>
3 #include <string.h>
4
5 int main ()
6{
7 char str[] = "almost every programmer should know memset!";
8 memset (str,'-',6);
9 puts (str);
10 return 0;
11 }

Output:
------ every programmer should know memset!

Q : How to Calculate Elapsed Time in C?


A : http://www.gnu.org/software/libc/manual/html_node/CPU-Time.html
#include <time.h>
clock_t start, end;
double cpu_time_used;
start = clock();
/* Do the work. */
end = clock();
cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;

Edit & Run

Q : What is >> or << means?


A : http://www.c4learn.com/c-programming/c-bitwise-right-shift/

C Bitwise Right Shift : (>>) Operator


Bitwise Right Shift Operator in C
1. It is denoted by >>
2. Bit Pattern of the data can be shifted by specified number of
Positions to Right
3. When Data is Shifted Right , leading zeros are filled with zero.
4. Right shift Operator is Binary Operator [Bi two]
5. Binary means , Operator that require two arguments

Quick Overview of Right Shift Operator


Original Number A

0000 0000 0011 1100

Right Shift by 2

0000 0000 0000 1111

Leading 2 Blanks

Replaced by 0 ,Shown in RED

Direction of Movement of Data

Right ========>>>>>>

Syntax :
[variable]>>[number of places]

Q : How to using nvprof (Nvidia Profiling) at Command Prompt Windows?


A : http://stackoverflow.com/questions/21472142/gpu-power-profiling-with-nvprof-andvisual-profiler
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\bin\nvprof.exe" --print-gpu-trace
--system-profiling on .\vectorAdd.exe

You might also like