CS 179: GPU Computing: Lecture 2: More Basics

Uploaded by

This document summarizes key concepts from a lecture on GPU computing basics. It recaps that CUDA extends C++ to allow highly parallelizable problems to run on GPUs. Code is separated into .cu and .cuh files compiled by nvcc. Threads are organized into a grid of blocks and assigned across streaming multiprocessors. Each multiprocessor runs warps of 32 threads that must follow the same instructions. Global memory access can become a bottleneck and future lectures will optimize this.

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

CS 179: GPU Computing: Lecture 2: More Basics

Uploaded by

Rajul

0% found this document useful (0 votes)

27 views23 pages

Original Description:

Caltech GPU slides

Original Title

cs179_2017_lec02

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

0% found this document useful (0 votes)

27 views23 pages

CS 179: GPU Computing: Lecture 2: More Basics

Uploaded by

Rajul

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

Jump to Page

You are on page 1of 23

Search inside document

CS 179: GPU

Computing
LECTURE 2: MORE BASICS
Recap
Can use GPU to solve highly parallelizable problems
Straightforward extension to C++
◦ Separate CUDA code into .cu and .cuh files and compile with nvcc to
create object files (.o files)
Looked at the a[] + b[] -> c[] example
Recap
If you forgot everything, just make sure you understand that CUDA is
simply an extension of other bits of code you write!!!!
◦ Evident in .cu/.cuh vs .cpp/.hpp distinction
◦ .cu/.cuh is compiled by nvcc to produce a .o file
◦ .cpp/.hpp is compiled by g++ and the .o file from the CUDA code is
simply linked in using a "#include xxx.cuh" call
◦ No different from how you link in .o files from normal C++ code
.cu/.cuh vs .cpp/.hpp
.cu/.cuh vs .cpp/.hpp
.cu/.cuh vs .cpp/.hpp
.cu/.cuh vs .cpp/.hpp
Thread Organization
We will now look at how threads are organized and used in GPUs
◦ Keywords you MUST know to code in CUDA:
◦ Thread
◦ Block
◦ Grid
◦ Keywords you MUST know to code WELL in CUDA:
◦ (Streaming) Multiprocessor
◦ Warp
◦ Warp Divergence
Inside a GPU

The black Xs are just

crossing out things you
don’t have to think about
just yet. You'll learn about
them later
Inside a GPU
Think of Device Memory (we will also
refer to it as Global Memory) as a
RAM for your GPU
◦ Faster than getting memory from the
actual RAM but still can be faster
◦ Will come back to this in future lectures
GPUs have many Streaming
Multiprocessors (SMs)
◦ Each SM has multiple processors but
only one instruction unit
◦ Groups of processors must run the
exact same set of instructions at any
given time with in a single SM
Inside a GPU
When a kernel (the thing you define in
.cu files) is called, the task is divided up
into threads
◦ Each thread handles a small portion of
the given task
The threads are divided into a Grid of
Blocks
◦ Both Grids and Blocks are 3 dimensional
◦ e.g.
dim3 dimBlock(8, 8, 8);
dim3 dimGrid(100, 100, 1);
Kernel<<<dimGrid, dimBlock>>>(…);
◦ However, we'll often only work with 1
dimensional grids and blocks
◦ e.g. Kernel<<<block_count, block_size>>>(…);
Inside a GPU
Maximum number of threads per block
count is usually 512 or 1024 depending
on the machine
Maximum number of blocks per grid is
usually 65535
◦ If you go over either of these numbers
your GPU will just give up or output
garbage data
◦ Much of GPU programming is dealing
with this kind of hardware limitations!
Get used to it
◦ This limitation also means that your
Kernel must compensate for the fact that
you may not have enough threads to
individually allocate to your data points
◦ Will show how to do this later (this lecture)
Inside a GPU
Each block is assigned to an SM
Inside the SM, the block is divided into
Warps of threads
◦ Warps consist of 32 threads
◦ All 32 threads MUST run the exact
same set of instructions at the same
time
◦ Due to the fact that there is only one
instruction unit
◦ Warps are run concurrently in an SM
◦ If your Kernel tries to have threads do
different things in a single warp (using
if statements for example), the two tasks
will be run sequentially
◦ Called Warp Divergence (NOT GOOD)
Inside a GPU
(fun hardware info)
In Fermi Architecture (i.e. GPUs with
Compute Capability 2.x), each SM has
32 cores
◦ e.g. GTX 400, 500 series
◦ 32 cores is not what makes each
warp have 32 threads. Previous
architecture also had 32 threads per
warp but had less than 32 cores per
SM
Halo.cms.caltech.edu has 3 GTX 570s
◦ This course will cover CC 2.x
Streaming Multiprocessor
A[] + B[] -> C[] (again)
A[] + B[] -> C[] (again)
A[] + B[] -> C[] (again)
Questions so far?
Stuff that will be useful later
Stuff that will be useful later
Stuff that will be useful later
Next Time...
Global Memory access is not that fast
◦ Tends to be the bottleneck in many GPU programs
◦ Especially true if done stupidly
◦ We'll look at what "stupidly" means

Optimize memory access by utilizing hardware specific memory access

patterns
Optimize memory access by utilizing different caches that come with
the GPU

Network Project
Document6 pages
Network Project
Mohammed Mustafa
No ratings yet
Identifying A Mounted TrueCrypt Volume From Artefacts in Volatile Memory Using Volatility 2.1
Document7 pages
Identifying A Mounted TrueCrypt Volume From Artefacts in Volatile Memory Using Volatility 2.1
bridgeythegeek
No ratings yet
CUDA Compute Unified Device Architecture
Document26 pages
CUDA Compute Unified Device Architecture
proxymo1
No ratings yet
Parralel Demro 001
Document45 pages
Parralel Demro 001
demro channel
No ratings yet
Lecture12 GPUArchCUDA02-CUDAMem
Document67 pages
Lecture12 GPUArchCUDA02-CUDAMem
Michelle Saver
No ratings yet
Gpu1 - GPU Introduction
Document20 pages
Gpu1 - GPU Introduction
Richik Dutta
No ratings yet
Gpgpu Final
Document124 pages
Gpgpu Final
Sibghat Rehman
No ratings yet
Introduction To Programming Massively Parallel Graphics Processors
Document84 pages
Introduction To Programming Massively Parallel Graphics Processors
djrive
No ratings yet
Introduction To Gpu Programming With Cuda and Openacc
Document40 pages
Introduction To Gpu Programming With Cuda and Openacc
plop
No ratings yet
Chap7 CUDA Intro
Document63 pages
Chap7 CUDA Intro
Michael Shi
No ratings yet
CS 179: GPU Computing: Lecture 4: Gpu Memory Systems
Document43 pages
CS 179: GPU Computing: Lecture 4: Gpu Memory Systems
Rajul
No ratings yet
CUDA
Document33 pages
CUDA
ravish177
No ratings yet
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
Document29 pages
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
Xafran Khan
No ratings yet
GPU Programming: CUDA
Document29 pages
GPU Programming: CUDA
Milagros Vega
No ratings yet
Lec 3
Document48 pages
Lec 3
zrashad04
No ratings yet
High Performance Computing On Gpu
Document37 pages
High Performance Computing On Gpu
Sushant Sharma
No ratings yet
Lec 1
Document27 pages
Lec 1
foof faaf
No ratings yet
OPENMP Notes
Document4 pages
OPENMP Notes
avinash kumar
No ratings yet
Parralel 01
Document38 pages
Parralel 01
demro channel
No ratings yet
Cuda Talk
Document82 pages
Cuda Talk
Kevin Salmeron Vicente
100% (1)
Recipe For Running Simple CUDA Code On A GPU Based Rocks Cluster
Document17 pages
Recipe For Running Simple CUDA Code On A GPU Based Rocks Cluster
proxymo1
No ratings yet
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
Document121 pages
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
eipu tu
No ratings yet
cs179 2017 Lec01
Document24 pages
cs179 2017 Lec01
Rajul
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
Document40 pages
Lecture 1: An Introduction To CUDA: Mike Giles
sdancer75
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
Document247 pages
Lecture 1: An Introduction To CUDA: Mike Giles
Stanislav Spatari
No ratings yet
Josh Cuda
Document27 pages
Josh Cuda
Ramu
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
Document42 pages
Programming Gpus With Cuda: John Mellor-Crummey
askbilladdmicrosoft
No ratings yet
лк CUDA - 1 PDCn
Document31 pages
лк CUDA - 1 PDCn
Олеся Барковська
No ratings yet
An Overview of General Purpose Graphics Processing Units: Marc Moreno Maza
Document18 pages
An Overview of General Purpose Graphics Processing Units: Marc Moreno Maza
AsHraf G. ElrawEi
No ratings yet
CUDA, Supercomputing For The Masses: Part 4: Understanding and Using Shared Memory
Document3 pages
CUDA, Supercomputing For The Masses: Part 4: Understanding and Using Shared Memory
thatupiso
No ratings yet
Dragged, Kicking and Scre Aming:: Architec Ture and V Ideo Gam Es
Document17 pages
Dragged, Kicking and Scre Aming:: Architec Ture and V Ideo Gam Es
zixie
No ratings yet
(Videogame) Rendering 102
Document32 pages
(Videogame) Rendering 102
c0de517e.blogspot.com
No ratings yet
CUDA Programming On Nvidia Gpus: Mike Giles
Document21 pages
CUDA Programming On Nvidia Gpus: Mike Giles
proxymo1
No ratings yet
Nvidia Cuda
Document26 pages
Nvidia Cuda
Arpit Vijayvergia
No ratings yet
Parallel Processing With Cuda
Document25 pages
Parallel Processing With Cuda
Sudip Adhikari
No ratings yet
Multi Core
Document70 pages
Multi Core
Mark Veltzer
No ratings yet
8 Cud A 1
Document38 pages
8 Cud A 1
Aashish Mittal
No ratings yet
8.4 GPU Architecture and Programming
Document27 pages
8.4 GPU Architecture and Programming
Amir
No ratings yet
Main Memory: Prof. Mike Giles
Document9 pages
Main Memory: Prof. Mike Giles
Fernanda Foertter
No ratings yet
OpenCL Tutorial - Basics
Document24 pages
OpenCL Tutorial - Basics
ozgur_sahin_13
No ratings yet
Design For Performance
Document34 pages
Design For Performance
c0de517e.blogspot.com
100% (1)
Unit 6 Chapter 1 Parallel Programming Tools Cuda - Programming
Document28 pages
Unit 6 Chapter 1 Parallel Programming Tools Cuda - Programming
Pallavi Bharti
No ratings yet
Gpu History and Cuda Programming Basics
Document44 pages
Gpu History and Cuda Programming Basics
Fransiskus Yoga Esa Wibowo
No ratings yet
L 3 GPU
Document33 pages
L 3 GPU
fdfs
No ratings yet
GPU Cluster4
Document31 pages
GPU Cluster4
Ismar Santos
No ratings yet
Debugging The Linux Kernel With GDB
Document40 pages
Debugging The Linux Kernel With GDB
tedy58
No ratings yet
Set VPN Windows XP
Document12 pages
Set VPN Windows XP
achmadzulkarnaen
No ratings yet
Data-Oriented Design and C++ - Mike Acton - CppCon 2014
Document201 pages
Data-Oriented Design and C++ - Mike Acton - CppCon 2014
c0der
No ratings yet
Calculating Prime Numbers Comparing Java, C, and Cuda
Document27 pages
Calculating Prime Numbers Comparing Java, C, and Cuda
Koukou
No ratings yet
CUDA Introduction
Document39 pages
CUDA Introduction
ohaan
No ratings yet
Linux & Computer Systems: CS553 Homework #1
Document4 pages
Linux & Computer Systems: CS553 Homework #1
Hariharan Shankar
No ratings yet
Computer Science Crash Course - Session 1
Document22 pages
Computer Science Crash Course - Session 1
Anh Nguyễn Xuân
No ratings yet
09 ParallelizationRecap PDF
Document62 pages
09 ParallelizationRecap PDF
giordano mancini
No ratings yet
Lecture 0: Cpus and Gpus: Prof. Mike Giles
Document36 pages
Lecture 0: Cpus and Gpus: Prof. Mike Giles
Aashish
No ratings yet
Performance (Memory) Optimization: National Tsing-Hua University 2017, Summer Semester
Document77 pages
Performance (Memory) Optimization: National Tsing-Hua University 2017, Summer Semester
Michael Shi
No ratings yet
002 - Introduction To CUDA Programming - 1
Document54 pages
002 - Introduction To CUDA Programming - 1
Vinod VM
No ratings yet
100 Hardware Questions
Document17 pages
100 Hardware Questions
stanleyobimma4
No ratings yet
Microblaze Linux: Using An FPGA-based Processor Is: Very Intelligent Very Stupid Don't Know
Document53 pages
Microblaze Linux: Using An FPGA-based Processor Is: Very Intelligent Very Stupid Don't Know
gaurav311086
No ratings yet
Parallel Programming Module 5
Document24 pages
Parallel Programming Module 5
divyansh.death
No ratings yet
Gpu Programming
Document96 pages
Gpu Programming
Jino Goju Stark
100% (2)
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
From Everand
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
Rodrigo Copetti
No ratings yet
Sega Saturn Architecture: Architecture of Consoles: A Practical Analysis, #5
From Everand
Sega Saturn Architecture: Architecture of Consoles: A Practical Analysis, #5
Rodrigo Copetti
No ratings yet
Lec4 17
Document22 pages
Lec4 17
Rajul
No ratings yet
Equity Structured Products Accumulator/ Decumulator
Document5 pages
Equity Structured Products Accumulator/ Decumulator
Rajul
No ratings yet
Network Time Protocol (NTP) General Overview: David L. Mills University of Delaware
Document22 pages
Network Time Protocol (NTP) General Overview: David L. Mills University of Delaware
Rajul
No ratings yet
Linear Models: Stability and Redundancy: 2.1 Singular Value Decomposition
Document24 pages
Linear Models: Stability and Redundancy: 2.1 Singular Value Decomposition
Rajul
No ratings yet
0.1 Installation of R Packages
Document10 pages
0.1 Installation of R Packages
Rajul
No ratings yet
Lec2 17
Document27 pages
Lec2 17
Rajul
No ratings yet
Lec1 17
Document39 pages
Lec1 17
Rajul
No ratings yet
mch1 Ps
Document205 pages
mch1 Ps
Rajul
No ratings yet
Elective I (Math)
Document2 pages
Elective I (Math)
Rajul
No ratings yet
Gambling, Random Walks and The Central Limit Theorem: 3.1 Random Variables and Laws of Large Num-Bers
Document59 pages
Gambling, Random Walks and The Central Limit Theorem: 3.1 Random Variables and Laws of Large Num-Bers
Rajul
No ratings yet
Long-Range Dependency Effects in Network Timekeeping: David L. Mills University of Delaware
Document33 pages
Long-Range Dependency Effects in Network Timekeeping: David L. Mills University of Delaware
Rajul
No ratings yet
Numerical Methods in Finance. Part A. (2010-2011)
Document23 pages
Numerical Methods in Finance. Part A. (2010-2011)
Rajul
No ratings yet
Overview History R
Document16 pages
Overview History R
Rajul
No ratings yet
Flume User Guide
Document48 pages
Flume User Guide
Rajul
No ratings yet
CS 179: GPU Programming
Document40 pages
CS 179: GPU Programming
Rajul
No ratings yet
CS 179: GPU Programming: Lecture 9 / Homework 3
Document33 pages
CS 179: GPU Programming: Lecture 9 / Homework 3
Rajul
No ratings yet
CS 179: GPU Computing: Recitation 2: Synchronization, Shared
Document22 pages
CS 179: GPU Computing: Recitation 2: Synchronization, Shared
Rajul
No ratings yet
Asset-V1 HKUx+HKU 08x+1T2030+type@asset+block@Introduction To FinTech Course Syllabus 05142018
Document2 pages
Asset-V1 HKUx+HKU 08x+1T2030+type@asset+block@Introduction To FinTech Course Syllabus 05142018
Rajul
No ratings yet
CS 179: GPU Computing: Lecture 4: Gpu Memory Systems
Document43 pages
CS 179: GPU Computing: Lecture 4: Gpu Memory Systems
Rajul
No ratings yet
CS 179: GPU Programming: Lecture 5: Gpu Compute Architecture
Document17 pages
CS 179: GPU Programming: Lecture 5: Gpu Compute Architecture
Rajul
No ratings yet
CS 179: GPU Computing: Recitation 1 - 4/1/16
Document18 pages
CS 179: GPU Computing: Recitation 1 - 4/1/16
Rajul
No ratings yet
CS 179: GPU Computing: Lecture 16: Simulations and Randomness
Document61 pages
CS 179: GPU Computing: Lecture 16: Simulations and Randomness
Rajul
No ratings yet
cs179 2017 Lec01
Document24 pages
cs179 2017 Lec01
Rajul
No ratings yet
CS 179 Lecture 14: Pipeline Parallelism and Multi - Gpu Programming
Document23 pages
CS 179 Lecture 14: Pipeline Parallelism and Multi - Gpu Programming
Rajul
No ratings yet
cs179 2016 Lec13
Document30 pages
cs179 2016 Lec13
Rajul
No ratings yet
SELTA STE D Light Indications
Document4 pages
SELTA STE D Light Indications
Abu Bakar Abid Jathol
No ratings yet
TapeAlert Trap Descriptions
Document4 pages
TapeAlert Trap Descriptions
kippy
No ratings yet
Vxworks Kernel Programmers Guide 6.2
Document653 pages
Vxworks Kernel Programmers Guide 6.2
Daniel Dinis
No ratings yet
Avaya B179 SIP Conference Phone
Document66 pages
Avaya B179 SIP Conference Phone
luisaron171
No ratings yet
Networks Lab Manual
Document60 pages
Networks Lab Manual
style_kar
0% (1)
Padasalai Net Computer Instructor Study Material Em1
Document17 pages
Padasalai Net Computer Instructor Study Material Em1
Sheeba
No ratings yet
PNPHelp
Document9 pages
PNPHelp
andyo_ua
No ratings yet
Embedded Systems: Laboratory Manual
Document32 pages
Embedded Systems: Laboratory Manual
Saman Fatima
100% (1)
Create A Web Server and An Amazon RDS Database
Document22 pages
Create A Web Server and An Amazon RDS Database
Anshul Kathet
No ratings yet
SJ-20130118100307-010-ZXA10 C300&C320 (V1.2.5) Optical Access Convergence Equipment Command Reference PDF
Document1,694 pages
SJ-20130118100307-010-ZXA10 C300&C320 (V1.2.5) Optical Access Convergence Equipment Command Reference PDF
инж. Михаил Михайлов
100% (2)
Algorithm and Architecture For Logarithm, Exponential, and Powering Computation
Document12 pages
Algorithm and Architecture For Logarithm, Exponential, and Powering Computation
AnjuJoseTomKarukappallil
No ratings yet
Chapter 1 & 2
Document78 pages
Chapter 1 & 2
miki
No ratings yet
Tera Term Telnet
Document6 pages
Tera Term Telnet
Jaka Edan
No ratings yet
Full Ebook of Programming Massively Parallel Processors 4Th Edition Wen Mei W Hwu Online PDF All Chapter
Document69 pages
Full Ebook of Programming Massively Parallel Processors 4Th Edition Wen Mei W Hwu Online PDF All Chapter
denuellekuni
100% (3)
Chapter 4.1 Basic Call Procedure (ED01 - 53 - EN)
Document53 pages
Chapter 4.1 Basic Call Procedure (ED01 - 53 - EN)
quaderbtech06
No ratings yet
DIALux Setup Log
Document2 pages
DIALux Setup Log
Andrade Miguel
No ratings yet
Avp Enum 3gpp GX
Document33 pages
Avp Enum 3gpp GX
thirasuttakorn
No ratings yet
File Handling
Document19 pages
File Handling
Kalay Ro Son
No ratings yet
STRSW Ilt Dataprot Rev6 Studentguide
Document364 pages
STRSW Ilt Dataprot Rev6 Studentguide
Pothen Mathew
No ratings yet
DWG PDF
Document200 pages
DWG PDF
Ankit
No ratings yet
How To Set Up and Configure The HTTP Proxy
Document9 pages
How To Set Up and Configure The HTTP Proxy
Alan Toledo
100% (1)
Concurrent Java
Document20 pages
Concurrent Java
MonzieAir
No ratings yet
GRAU DATA GAM Service Document
Document9 pages
GRAU DATA GAM Service Document
Sekhar Dash
No ratings yet
Building A Home Firewall-Router Using Openbsd-Sparc
Document28 pages
Building A Home Firewall-Router Using Openbsd-Sparc
ngabormini
No ratings yet
Partitioning and Formatting A Disk Drive in Linux
Document13 pages
Partitioning and Formatting A Disk Drive in Linux
fopata
No ratings yet
Generation of Computer ': Rajiv Academy For Technology and Management, Mathura ON
Document26 pages
Generation of Computer ': Rajiv Academy For Technology and Management, Mathura ON
souarv
No ratings yet
Java MCQS
Document7 pages
Java MCQS
szar
100% (1)
KGB Manual DOS EN PDF
Document31 pages
KGB Manual DOS EN PDF
oszust
No ratings yet