Cs 50
Cs 50
Cs 50
Courtesy of https://cs50.harvard.edu/lectures/
Acknowledgements
• Full credit to the students, teachers, staff, and volunteers at Harvard,
CS50, and EdX who helped make this course possible
• This PDF is simply a quick reference to all Notes found on the website;
the only changes are to format and the creation of this page. Nothing is
added or removed from the website version of this information
• Share at will. This information is open to the public, for free
• This PDF is in no way associated with Harvard, CS50, or EdX.
It is compiled for free, by a volunteer who is taking the course and
wished to spread the resource to all. All of the information can be found
at the aforementioned web address. This PDF simply saves you time
Table of Contents
• Week 1
• Week 2
• Week 3
• Week 4
• Week 5
• Week 7
• Week 8
• Week 9
• Week 10
• Week 12
Week 1
Andrew Sellergren
Table of Contents
Announcements and Demos
From Last Time
From Scratch to C
hello, world!
Linux Commands
Compiling
User Input
Loops
• The blue "say" puzzle piece from Scratch has now become printf and the
orange "when green flag clicked" puzzle piece has become main(void).
• However, source code is not something a computer actually
understands. To translate source code into something the computer
understands, we’ll need a compiler. A compiler is a program that takes
source code as input and produces 0’s and 1’s, a.k.a. object code, as
output.
• We won’t trouble ourselves with knowing the exact mapping between a
series of 0’s and 1’s and the "print" command. Rather, we’ll content
ourselves with writing instructions at a higher level that can be
translated to a lower level. This is consistent with one of the themes of
the course: layering on top of the work of others.
• Statements are direct instructions, e.g. "say" in Scratch or printf in C.
• The "forever" loop from Scratch can be recreated with a while
(true) block in C. The "repeat" loop from Scratch can be recreated with
a for block in C.
• Note that in C just as in Scratch, there are multiple ways of achieving the
same goals.
• In C, a loop that increments a variable and announces its value would
look like so:
• int counter = 0;
• while (true)
• {
• printf("%i\n", counter);
• counter++;
• }
hello, world!
• The CS50 Appliance is software running inside of your normal
computer’s environment that simulates the environment of another
operating system, namely Fedora Linux. At the bottom left of the
Appliance window are three icons for gedit, Chrome, and Terminal.
Since we can code in any text editor, let’s start by opening gedit.
• In gedit, there are three main divisions of the window:
o on the left, the source code pane
o on the right, the actual text editor, where we write code
o on the bottom, the terminal, where we run commands
• Note that all of your files by default save to the jharvard directory, which
is unique to your Appliance and is not shared with other students. All of
your files in theDropbox subdirectory are automatically backed up in the
cloud. In this directory, we’ll save our file as hello.c.
• Now let’s quickly rewrite that first program in C:
• #include <stdio.h>
•
• int main(void)
• {
• printf("hello, world!\n");
• }
Linux Commands
• As an aside, here’s a short list of Linux commands that you’ll find useful:
o ls
stands for "list," shows the contents of the current directory
o mkdir
stands for "make directory," creates a new folder
o cd
stands for "change directory," the equivalent of double
clicking on a folder
o rm
stands for "remove," deletes a file
o rmdir
stands for "remove directory," deletes a directory
Compiling
• When we type make hello in the terminal, the command that actually
runs is as follows:
clang -ggdb3 -00 -std=c99 -Wall -Werror hello.c -lcs50 -lm -o hello
• make is not actually a compiler, but rather a program that shortcuts these
options to the compiler, which in this case is clang. The shorter version
of the command above is:
clang -o hello hello.c
• -o is a switch or flag, an option that influences the behavior of the
program. In this case, the value provided after -o is hello, which
becomes the name of the executable that the compiler creates. We
could’ve typed -o hihihi and our executable would then have been
named hihihi. The flags that we pass to a program are special examples
of command-line arguments.
User Input
• To make our program more interesting, let’s try asking the user for a
name and saying hello to her. To do this, we need a place to store the
user’s name, i.e. a variable. A variable that stores a word or a phrase is
known as a string. Let’s call this variable name:
• #include <stdio.h>
•
• int main(void)
• {
• string name;
• name = GetString();
• printf("hello, David\n");
• }
• Before we ask the user for her name, the variable name has no value. We
shouldn’t print it out as such.
• GetString is a function provided in the CS50 Library written by the
staff. GetString takes in user input and passes it back to your program as
a string. The = in this case is an assignment operator, meaning place in
the left side the value of the right side.
• Now when we try to compile this program, we get all sorts of errors.
When the compiler prints out this many errors, it’s a good idea to work
your way through them from top to bottom because the errors at bottom
might actually have been caused by the errors at the top. The topmost
error is as follows:
hello.c:5:5 error: use of undeclared identifier 'string': did you mean
'stdin'?
• No, we didn’t mean stdin! However, the variable type string is actually
not built in to C. It’s available via the CS50 Library. To use this library,
we actually need to tell our program to include it like so:
• #include <cs50.h>
• #include <stdio.h>
•
• int main(void)
• {
• string name;
• name = GetString();
• printf("hello, David\n");
• }
• When we compile and run this, the program appears to do nothing: the
cursor simply blinks. This is because it’s waiting for the user to type
something. When we type "Rob," the program still prints out "hello,
David," which isn’t quite what we intended. Let’s add a line to clarify to
the user that he’s supposed to type something:
• #include <cs50.h>
• #include <stdio.h>
•
• int main(void)
• {
• string name;
• printf("What is your name?");
• name = GetString();
• printf("hello, David\n", );
• }
• What’s between the parentheses after printf are the arguments that we
pass it. Here, we pass two arguments. %s is a placeholder for the second
argument, name, which gets inserted into the first argument.
• In addition to the CS50 Library, we’re including stdio.h, the library the
contains the definition of printf.
Loops
• Let’s write a silly little program with an infinite loop:
• #include <cs50.h>
• #include <stdio.h>
•
• int main(void)
• {
• while (true)
• {
• printf("I am a buggy program");
• }
• }
• Since the loop condition true is always true, the loop continues
executing indefinitely. Compiling and running this program prints a
whole lot of text to the terminal! You don’t need to restart your
Appliance to stop the program, just type Ctrl+C.
• Now let’s write a counter program:
• #include <cs50.h>
• #include <stdio.h>
•
• int main(void)
• {
• for (int i = 0; i < 100; i++)
• {
• printf("I can count to %i\n", i);
• }
• }
• Ignore the cryptic syntax for now, but know that this program counts
(very fast) to 100. What if we made a mistake and typed i >= 0 instead
of i < 100 as the second loop condition? We would unintentionally
induce an infinite loop. On Wednesday we’ll see if this program has
finished!
Week 1, continued
Andrew Sellergren
Table of Contents
Announcements and Demos
Boolean Expressions
Switches
For Loops
Variables
Functions
hello-1.c
hello-2.c
adder.c
conditions-0.c
conditions-1.c
Teaser
• The Boolean operators "and" and "or" are written as && and || in C:
• if (condition && condition)
• {
• // do this
• }
• if (condition || condition)
• {
• // do this
• }
• Note that & and | have different meaning!
Switches
For Loops
Variables
Functions
• A function is a piece of code that can take input and can produce output.
In some cases, a function can be a so-called black box. This means that
the details of its implementation aren’t relevant. We don’t care how it
does what it does, just that it does it.
• Let’s represent printf with an actual black box onstage. We can write
"hello, world" on a piece of paper to represent an argument to printf.
We then place this piece of paper in the black box and, by whatever
means, the words "hello, world" appear on the screen!
• To make our hello program more dynamic, we asked the user for his or
her name and passed that to printf:
• string name = GetString();
• printf("hello, %s\n", name);
• printf doesn’t return anything; it only has the side effect of printing to
the screen. GetString, on the other hand, returns what the user typed in.
• As with printf, we don’t necessarily care how GetString is implemented.
We know that when we call it, we’ll be provided with a string after some
amount of time. We can simulate this by retrieving from the black box a
piece of paper with a student’s name (Obasi) written on it. We actually
then make a copy of this string before storing it in name.
• Now we have name written on one piece of paper which will act as the
second argument to printf. Next we create the first argument by writing
"hello, %s\n" on another piece of paper. Finally, we place these two
pieces of paper in the black box and magically, "hello, Obasi" appears on
the screen.
• Functions that we implemented in the CS50 Library (cs50.h) include:
o GetChar
o GetDouble
o GetFloat
o GetInt
o GetLongLong
o GetString
• Convention holds that C function names are lowercase, but we
capitalized these just to make it clear that they belong to the CS50
Library.
• A float is a number with a decimal point. A double is a number with a
decimal point but with more numbers after the decimal point. These
types returned by CS50 Library function require different number of
bits to be stored. A char requires 8 bits, a float requires 32 bits, and
a double requires 64 bits. A long long is an integer that is twice as big in
memory (64 bits) as an int (32 bits). More on these types later.
• The CS50 Library also contains two custom types:
o bool
o string
• For convenience, we have created the symbols true and false to
represent 1 and 0. Likewise for convenience, we have created
a string type to store strings.
• The actual types of variables available in C are as follows:
o char
o double
o float
o int
o long long
• The printf function can take many different formatting characters. Just
a few of them are:
o %c for char
o %i (or %d) for int
o %f for float
o %lld for long long
o %s for string
• A few more escape sequences:
o \n for newline
o \r for carriage return (think typewriter)
o \' for single quote
o \" for double quote
o \\ for backslash
o \0 for null terminator
Teaser
• It turns out that computers cannot express some values perfectly
precisely. The protagonists in the movie Office Space take advantage of
this imprecision to rip off their company Initech. Consider that if
banking software stores a number like 0.1 improperly, it could mean
that there are fractions of a cent gained or lost. If you haven’t seen Office
Space, that’s your homework for the weekend.
Last updated 2013-09-13 22:01:40 PDT
Week 2
Andrew Sellergren
Table of Contents
Announcements and Demos
do-while
Scope
Strings
Teaser
Floating Points
floats-0.c
• Let’s write a short program we’ll call floats-0.c:
• #include <stdio.h>
•
• int main(void)
• {
• float f = 1 / 10;
• printf("%.1f\n", f);
• }
• Here we’re simply trying to store the value 0.1 in a float and print it out.
Don’t forget stdio.h! The .1 in front of f means that we want to print
only one decimal place.
• When we compile and run this program, however, we see 0.0 printed to
the screen. Where is the bug? Let’s try printing two decimal places by
writing %.2f. Nope, we get 0.00.
• The problem is that we’re dividing one integer by another. When you do
this, the computer assumes that you want another integer in response.
Since 0.1 is not an integer, the computer actually truncates it, throwing
away everything after the decimal point. When we actually store the
resulting integer in a float, it gets converted to a number that has a
decimal point.
floats-1.c
• To fix this, we could turn the integers into floating points like so:
• #include <stdio.h>
•
• int main(void)
• {
• float f = 1.0 / 10.0;
• printf("%.1f\n", f);
• }
floats-2.c
• Alternatively, we could explicitly cast, or convert, the numbers 1 and 10
to floating points before the division:
• #include <stdio.h>
•
• int main(void)
• {
• float f = (float) 1 / (float) 10;
• printf("%.1f\n", f);
• }
• Since characters are represented as numbers via ASCII, we can use
casting to convert between char and int. Similarly, we can cast between
different types of numbers.
• What if we change %.1f to .10f or .20f? We don’t get exactly 0.1, but
rather 0.1 with some numbers in the far right decimal places. Because
the float type has a finite number of bits, we can only use it to represent
a finite number of numbers. At some point, our numbers become
imprecise.
• Don’t think that this imprecision is a big deal? Perhaps this video will
convince you otherwise.
• As we’ll find out later in the semester, MySQL requires you to specify
how many bits it should use to store values.
Scope
• Even after adding #include <cs50.h> we get "unused variable n" and
"undeclared identifier n" errors. It would seem that we are in fact using
the variable n when we check whether it’s less than or equal to zero.
Likewise it would seem that n is not "undeclared" since we initialized it
within the do block. What’s wrong then? Because we’re
declaring n inside the do block, within the curly braces, its scope is
limited to that block. Outside of those curly braces, n effectively doesn’t
exist. What we need to do is declare n outside the loop but set its value
within the loop like so:
• #include <cs50.h>
• #include <stdio.h>
•
• int main(void)
• {
• int n;
• do
• {
• printf("I demand that you give me a positive integer: ");
• n = GetInt();
• }
• while (n <= 0);
• printf("Thanks for the %d!\n", n);
• }
• If we compile and run this program, we find that it works!
• Let’s try banging on it a little. What happens if we type a character
instead of a number? We get a "Retry: " prompt. Since this doesn’t
appear in our program above, it presumably comes from some error
checking in the CS50 Library. When you call GetInt(), we at least make
sure that what you get in return is an int, not astring or a char.
• A suboptimal solution to this problem of scope would have been to
declare a global variable like so:
• #include <cs50.h>
• #include <stdio.h>
•
• int n;
•
• int main(void)
• {
• do
• {
• printf("I demand that you give me a positive integer: ");
• n = GetInt();
• }
• while (n <= 0);
• printf("Thanks for the %d!\n", n);
• }
• Scratch actually implemented global variables as variables declared for
"all sprites."
• Global variables are generally considered poor design.
• Note that declaring a variable and not using it is not strictly an error.
However, for CS50, we’ve cranked up the error checking of the compiler
as a pedagogical exercise. You may have noticed a series of flags that are
passed to clang automatically when you type make. Two of those flags
are -Wall -Werror which mean "make all warnings into errors."
Strings
• There’s a lot more going on under the hood with strings than we’ve let
on so far. Consider the following program:
• #include <cs50.h>
• #include <stdio.h>
• #include <string.h>
•
• int main(void)
• {
• printf("Please give me a string: ");
• string s = GetString();
• for (int i = 0; i < strlen(s); i++)
• {
• printf("%c\n", s[i]);
• }
• }
• strlen returns the length of the string it is passed. Thus, we seem to be
looping through the characters of the string.
• It turns out that strings are stored as characters back-to-back in
memory. We can access those characters using the square bracket
notation, so s[0] gets the first letter, s[1] gets the second letter, and so
on.
• This program prints the characters of the user-provided string, one line
at a time!
Teaser
• Check out the first jailbreak of the iPhone, the winner of an obfuscated C
contest, and a very pretty program!
Last updated 2013-09-19 08:08:49 PDT
Week 2, continued
Andrew Sellergren
Table of Contents
Announcements and Demos
Functions
function-0.c
function-1.c
Strings
string-1.c
string-2.c
capitalize-0.c
capitalize-1.c
capitalize-2.c
Arrays
ages.c
Cryptography
Strings
• Last time, we saw that we can think of strings as collections of
contiguous boxes, each containing a single character requiring a single
byte of memory. To access the individual characters/bytes of a string, we
index into the string using the square bracket notation.
• Realize that there are two types of memory in your computer: disk,
where you store your music and photos, etc., and RAM, or random
access memory, which store information that needs to be accessed
quickly while a program is running. The memory that stores a string like
"hello" for your program is RAM.
string-1.c
• Consider again the program that takes a string from the user and prints
it one character per line:
• #include <cs50.h>
• #include <stdio.h>
• #include <string.h>
•
• int main(void)
• {
• string s = GetString();
•
• if (s != NULL)
• {
• for (int i = 0; i < strlen(s); i++)
• {
• printf("%c\n", s[i]);
• }
• }
• }
• What’s the deal with the s != NULL? It turns out that the GetString will
not always succeed in getting a string from the user. If it fails, perhaps
because the user typed a string that was too long to hold in
memory, GetString will return a special sentinel value named NULL.
Without this check, other things we try to do with smight cause the
program to crash.
• s[i], where i is 0, 1, 2, is an individual character in the string, so we
ask printf to substitute it into %c.
string-2.c
• There’s at least one inefficiency in this string-1.c as it’s currently
written. strlen(s) will only ever return one value, yet we’re calling it on
every iteration of the loop. To optimize our program, we should
call strlen once and store the value in a variable like so:
• #include <cs50.h>
• #include <stdio.h>
• #include <string.h>
•
• int main(void)
• {
• string s = GetString();
•
• if (s != NULL)
• {
• for (int i = 0, n = strlen(s); i < n; i++)
• {
• printf("%c\n", s[i]);
• }
• }
• }
• Although computers are very fast these days and this optimization may
not be immediately noticeable, it’s important to look for opportunities to
improve design. These little optimizations can add up over time. One of
the problem sets we’ve done in years past was writing a spellchecker in C
with the goal of making it as fast as possible. An optimization like this
might save a few milliseconds of runtime!
capitalize-0.c
• In capitalize-0.c, we claim to capitalize all the letters in a string:
• #include <cs50.h>
• #include <stdio.h>
• #include <string.h>
•
• int main(void)
• {
• string s = GetString();
•
• for (int i = 0, n = strlen(s); i < n; i++)
• {
• if (s[i] >= 'a' && s[i] <= 'z')
• {
• printf("%c", s[i] - ('a' - 'A'));
• }
• else
• {
• printf("%c", s[i]);
• }
• }
• printf("\n");
• }
• The first condition within the loop is checking if the character is
lowercase. If it is, then we subtract from it the value of a - A. Under the
hood, characters are actually just numbers, which is why we can
compare them with >= and subtract them from each other. The value of a
- A is the offset between the lowercase and uppercase characters on the
ASCII chart. By subtracting this offset from s[i], we’re effectively
capitalizing the letter.
capitalize-1.c
• Thus far, we’ve worked with a few libraries of code that gave us
convenient functions. Let’s add one more to that list:
o stdio.h
o cs50.h
o string.h
o ctype.h
• Rather than write our own condition to check if a character is lowercase
and our own logic to capitalize a character, let’s use functions someone
else already implemented:
• #include <cs50.h>
• #include <ctype.h>
• #include <stdio.h>
• #include <string.h>
•
• int main(void)
• {
• string s = GetString();
•
• for (int i = 0, n = strlen(s); i < n; i++)
• {
• if (islower(s[i]))
• {
• printf("%c", toupper(s[i]));
• }
• else
• {
• printf("%c", s[i]);
• }
• }
• printf("\n");
• }
capitalize-2.c
• We don’t strictly need the curly braces around if-else blocks so long as
they are only a single line. However, there’s an even better way to
shorten this program:
• #include <cs50.h>
• #include <ctype.h>
• #include <stdio.h>
• #include <string.h>
•
• int main(void)
• {
• string s = GetString();
•
• for (int i = 0, n = strlen(s); i < n; i++)
• {
• printf("%c", toupper(s[i]));
• }
• printf("\n");
• }
• Turns out that toupper handles both lowercase and uppercase characters
properly, so we don’t even need the islower check. We know this because
we checked the man page, or manual page, for toupper by typing man
toupper at the command line. This page tells us that the return value is
the converted letter or the original letter if conversion was not possible.
Perfect!
The Null Terminator
Arrays
ages.c
• Strings are actually a special case of a data type called an array. Arrays
allow us to store related variables together in one place. For example,
consider a program that stores and prints out the ages of everyone in the
room:
#include <cs50.h>
1
#include <stdio.h>
2
3
int main(void)
4
{
5
// determine number of people
6
int n;
7
do
8
{
9
printf("Number of people in room: ");
10
n = GetInt();
11
}
12
while (n < 1);
13
14
// declare array in which to store everyone's age
15
int ages[n];
16
17
// get everyone's age
18
for (int i = 0; i < n; i++)
19
{
20
printf("Age of person #%i: ", i + 1);
21
ages[i] = GetInt();
22
}
23
24
// report everyone's age a year hence
25
printf("Time passes...\n");
26
for (int i = 0; i < n; i++)
27
{
28
printf("A year from now, person #%i will be %i years old.\n",
29
i + 1, ages[i] + 1);
30
}
31
}
• The first lines should be familiar to you by now: we’re prompting the
user for a positive number. In line 16, we use that number as the
number of places in our array called ages. ages is a bucket with room
for n integers. Using an array is a better alternative than declaring
an int for every single person in the room, especially since we don’t even
know how many there are until the user tells us!
• The rest of the program is pretty straightforward. We iterate
through ages the same way we iterated through strings, accessing each
element using square bracket notation.
Cryptography
• Can you guess what this encrypted string actually says?
Or fher gb qevax lbhe Binygvar
• It says "Be sure to drink your Ovaltine." Each character of the string is
changed to another character using an encryption technique known as
ROT-13. Each letter is simply rotated by 13 around the alphabet. This
isn’t very sophisticated, of course, as there are only 25 different numbers
to rotate by, so it can easily be cracked by brute force.
• On certain systems, your password might be stored as an encrypted
string like so:
• Your password was encrypted using secret-key cryptography. That
means it was translated from plaintext to so-called ciphertext using a
secret. Only with that secret can the ciphertext be decrypted back to
plaintext. In the Hacker Edition of the upcoming problem set, you’ll be
asked to decrypt some passwords without knowing the secret! In the
Standard Edition, we’ll introduce you to some ciphers, one called Caesar
and one called Vigenère.
• Be sure to drink your Ovaltine!
Last updated 2013-09-21 20:44:38 PDT
Week 3
Andrew Sellergren
Table of Contents
Announcements and Demos
Command-line Arguments
argv-0.c
argv-1.c
argv-2.c
Debugging
debug.c
GDB
Security
Command-line Arguments
• Thus far, we’ve started our programs with the line int
main(void). voidmeans that we’re not passing any arguments to
the main function. However, it’s possible to pass arguments to
the main function using the following syntax:
• int main(int argc, string argv[])
• Here, we pass two arguments, argc and argv. argc is an int and argv is an
array of string. argc actually indicates how many arguments we’ve
passed to a program.
• Earlier, when we typed ./ages at the command line, there was 1
command-line argument: the name of the program itself. If we had
typed ./ages hello world, there would have been 3 command-line
arguments. In these cases, argc would have taken the values 1 and 3,
respectively. There is always at least 1 command-line argument.
• Whereas argc contains the number of command-line
arguments, argv contains the command-line arguments themselves.
argv-0.c
• Listen to David describe a short program that uses command-line
arguments:
• #include <cs50.h>
• #include <stdio.h>
•
• int main(int argc, string argv[])
• {
• printf("%s\n", argv[1]);
• }
• We could modify this program to print out the second command-line
argument as an integer:
• #include <cs50.h>
• #include <stdio.h>
•
• int main(int argc, string argv[])
• {
• int x = atoi(argv[1]);
• printf("%d\n", x);
• }
• This works well as long as we actually send two command-line
arguments. What happens when we don’t? The program crashes with
a segmentation fault. This is because we tried to access an element
beyond the bounds of the array. We should add a check to protect
against this:
• #include <cs50.h>
• #include <stdio.h>
•
• int main(int argc, string argv[])
• {
• if (argc < 2)
• {
• printf("Not enough command line arguments\n");
• }
• int x = atoi(argv[1]);
• printf("%d\n", x);
• }
argv-1.c
• Let’s try printing out all of the command-line arguments:
• #include <cs50.h>
• #include <stdio.h>
•
• int main(int argc, string argv[])
• {
• // print arguments
• for (int i = 0; i < argc; i++)
• {
• printf("%s\n", argv[i]);
• }
• }
argv-2.c
• Now, to go even deeper [1], let’s print each character of each command-
line argument on its own line:
• #include <cs50.h>
• #include <stdio.h>
• #include <string.h>
•
• int main(int argc, string argv[])
• {
• // print arguments
• for (int i = 0; i < argc; i++)
• {
• for (int j = 0, n = strlen(argv[i]); j < n; j++)
• {
• printf("%c\n", argv[i][j]);
• }
• }
• }
• argv[i][j] represents character j in command-line argument i.
• How would we go about writing strlen if it weren’t provided to us in the
library string.h? Recall that all strings end with the special \0 character.
If we iterate through each character of the string until we find \0, we’ll
know the length of the string:
• int my_strlen(string s)
• {
• int length = 0
• while(s[length] != '\0')
• {
• length++;
• }
• return length;
• }
• We could even move this logic into our second loop and avoid a function
call altogether:
• #include <cs50.h>
• #include <stdio.h>
• #include <string.h>
•
• int main(int argc, string argv[])
• {
• // print arguments
• for (int i = 0; i < argc; i++)
• {
• for (int j = 0; argv[i][j] != '\0'; j++)
• {
• printf("%c\n", argv[i][j]);
• }
• }
• }
• Each time we reach a \0 within argv, our array of array of characters, we
terminate the inner for loop and execute the outer for loop again
(provided that we still have command-line arguments left, i.e. i < argc).
Debugging
debug.c
• The best way to learn to debug is to work with buggy code like the
following:
• #include <stdio.h>
• #include <cs50.h>
•
• void foo(int i)
• {
• printf("%i\n", i);
• }
•
• int main(void)
• {
• printf("Enter an integer: ");
• int i = GetInt();
•
• while (i > 10)
• {
• i--;
• }
•
• while (i != 0)
• {
• i = i - 3;
• }
•
• foo(i);
• }
• Let’s assume that the user provides an integer greater than 10 (a bad
assumption). The first while loop will then decrement i by 1 until it
equals 10, at which point the loop condition i > 10 will no longer be true
and the loop will exit. The second while loop will decrement i by 3 until
it equals 0. But if it starts at 10, i will go to 7, then 4, then -1, then -4,
and so on to negative infinity. That’s not what we intended!
• One useful debugging technique is to add some printf statements:
• #include <stdio.h>
• #include <cs50.h>
•
• void foo(int i)
• {
• printf("%i\n", i);
• }
•
• int main(void)
• {
• printf("Enter an integer: ");
• int i = GetInt();
•
• printf("Outside first while loop");
• while (i > 10)
• {
• printf("First while loop: %i\n", i);
• i--;
• }
•
• printf("Outside second while loop");
• while (i != 0)
• {
• printf("Second while loop: %i\n", i);
• i = i - 3;
• }
•
• foo(i);
• }
• When we compile and run debug.c now, we can clearly see that the
second while loop is infinite.
• As your programs get longer and more complicated, you’ll find more
sophisticated debugging techniques more useful.
GDB
Security
• In Problem Set 2, you’ll be working with encrypting and decrypting
passwords. Of course, the strength of the encryption doesn’t matter all
that much if you choose aweak password.
• Consider the code that implements the login prompt on your laptop. If
it’s working properly, it will check that what the user types matches your
password before letting the user in. However, if it’s working maliciously,
it might check that what the user types matches some master password
that lets anyone in. We can hope that this backdoor might be caught by
at least one of the many people who have reviewed it.
• But what if the malicious code is in the compiler? Then the compiler
might actually insert the backdoor into the login prompt even though
the code for the login prompt seems safe and has been reviewed. We can
hope again that this would be caught by one of the people who reviewed
the code for the compiler.
• But what if the malicious code is in the compiler that is used to compile
the compiler? Well, then, the backdoor might get inserted without
anyone knowing.
• If you think this scenario is unlikely, consider the speech that Ken
Thompson gave, Reflections on Trusting Trust, when he accepted the
Turing Award (more or less the Nobel Prize of computer science). In it,
he describes this exact technique for compromising a compiler so that it
would introduce a backdoor into a login program. The login program he
refers to, however, is not some toy program, but rather the login
program for all of UNIX. Since delivering this speech, Thompson has
confirmed that this exploit was actually implemented and released to at
least one company, BBN Technologies.
1. Even deeper.
Week 3, continued
Andrew Sellergren
Table of Contents
Announcements and Demos
Searching
Sorting
Bubble Sort
Selection Sort
Insertion Sort
Big O Notation
Searching
• Imagine there are 7 doors with numbers behind them and you want to
find the number 50. If you know nothing about the numbers, you might
just have to open the doors one at a time until you find 50 or until all the
doors are opened. That means it would take 7 steps, or more
generally, n steps, where n is the number of doors. We might call this
a linear algorithm.
• How might our approach change if we know that the numbers are
sorted? Think back to the phonebook problem. We can continually
divide the problem in half! First we open the middle door. Let’s say it’s
16. Then we know that 50 should be in the doors to the right of the
middle, so we can throw away the left. We then look at the middle door
in the right half and so on until we find 50! This algorithm
has logarithmic running time, the green line in the graph below:
• Note that there are plenty of algorithms that are much worse than
linear, as this graph shows:
• Although it looks like n3 is the worst, 2n is much worse for large inputs.
Sorting
Bubble Sort
• If the numbers aren’t sorted to begin with, how much time will it take to
sort them before we search?
• To start figuring this out, let’s bring 7 volunteers on stage and have them
hold pieces of paper with the numbers 1 through 7 on them. If we ask
them to sort themselves, it seems to only take 1 step as they apply an
algorithm something like "if there’s a smaller number to my right, move
to the right of it." In reality, though, it takes more than 1 step as there
are multiple moves going on.
• In order to count the number of steps this algorithm takes, we’ll slow it
down and allow only 1 move to happen at a time. So walking left to right
among the volunteers, we examine the two numbers next to each other
and if they’re out of order, we swap them. We may have to walk left to
right more than 1 time in order to finish sorting. How do we know when
they’re sorted? As a human, you can look at it and know, but we need a
way for the computer to know. If we walk left to right among the
volunteers and make 0 swaps, then we can be sure that all the numbers
are in the right order. That means we’ll need to store the number of
swaps made in a variable that we check after each walkthrough.
• This algorithm we just described is called bubble sort. To describe its
running time, let’s generalize and say that the number of volunteers is n.
Each time we walk through the volunteers, we’re taking n-1 steps. Let’s
just round that up and call it n. How many times do we walk left to right
through the volunteers? In the worst case scenario, the numbers will be
perfectly out of order, that is, arranged left to right largest to smallest. In
order to move the 1 from the right side all the way to the left side, we’re
going to have to walk through the volunteers n times. So that’s n steps
per walkthrough and n walkthroughs, so the running time is n2.
Selection Sort
Insertion Sort
Ω O
linear search 1 n
bubble sort n n2
selection sort n2 n2
insertion sort n2
• In the best case for linear and binary search, the number you’re looking
for is the first one you examine, so the running time is just 1. In the best
case for our sorting algorithms, the list is already sorted, but in order to
verify that in bubble sort, we need to walk through the list at least once.
Unfortunately, to verify that in selection sort, we still have to
do n2 walkthroughs, each of which confirms that the smallest number is
in the correct position.
• What about the best case for insertion sort? We’ll fill in that blank next
time.
• Are we doomed to n2 running time for sorting? Definitely not. Check
out this visualization to see how fast merge sort is compared to bubble
sort, selection sort, and insertion sort. Merge sort leverages the same
"divide and conquer" technique that binary search does.
Last updated 2013-09-27 19:50:45 PDT
Week 4
Andrew Sellergren
Table of Contents
Announcements and Demos
Merge Sort
A Little Math
sigma-1.c
Teaser
noswap.c
Merge Sort
• To see how merge sort compares to the other algorithms we’ve looked at
so far, check out this animation. Notice that bubble sort, insertion sort,
and selection sort are the three worst performers! The flip side is that
they are relatively easy to implement.
• We can describe merge sort with the following pseudocode:
• On input of n elements:
• If n < 2
• Return.
• Else:
• Sort left half of elements.
• Sort right half of elements.
Merge sorted halves.
• If n is less than 2, then it’s either 0 or 1 and the list is already sorted.
This is the trivial case.
• If n is greater than or equal to 2, then what? We seem to be copping out
with a circular algorithm. Two of the steps begin with the command
"sort" without giving any indication as to how we go about that. When
we say "sort," what we actually mean is reapply this whole algorithm to
the left half and the right half of the original list.
• Will this algorithm loop infinitely? No, because after you’ve halved the
original list enough times, you will eventually have less than 2 items left.
• Okay, so we’re halving and halving and halving until we have less than 2
items and then we’re returning. So far, nothing seems sorted. The magic
must be in the "Merge sorted halves" step.
• One consideration with merge sort is that we need a second list for
intermediate storage. In computer science, there’s generally a tradeoff
between resources and speed. If we want to do something faster, we may
need to use more memory.
• To visualize merge sort, let’s bring 8 volunteers on stage. We’ll hand
them numbers and sit them down in chairs so that they’re in the
following order:
4 2 6 1 3 7 5 8
• The bold numbers are the ones we’re currently focusing on. Merge sort
says to first sort the left half, so let’s consider:
4 2 6 1 3 7 5 8
• Now we again sort the left half:
4 2 6 1 3 7 5 8
• And again:
4 2 6 1 3 7 5 8
• Now we have a list of size 1, so it’s already sorted and we return.
Backtracking, we look at the right half of the final two-person list:
4 2 6 1 3 7 5 8
• Again, a list of size 1, so we return. Finally, we arrive at a merge step.
Since the elements are out of order, we need to put them in the correct
order as we merge:
_ _ 6 1 3 7 5 8
2 4 _ _ _ _ _ _
• From now on, the red numbers will represent the second list we use for
intermediate storage. Now we focus on the right half of the left half of
the original list:
_ _ 6 1 3 7 5 8
2 4 _ _ _ _ _ _
• We insert these two numbers in order into our intermediate list:
_ _ _ _ 3 7 5 8
2 4 1 6 _ _ _ _
• Now we merge the left and right half of the intermediate list:
_ _ _ _ 3 7 5 8
1 2 4 6 _ _ _ _
• Finally, we can insert the intermediate list back into the original list:
1 2 4 6 3 7 5 8
• And we’re done with the "Sort left half" step for the original list!
• Repeat for the right half of the original list, skipping to the "Sort left
half" step:
1 2 4 6 _ _ 5 8
_ _ _ _ 3 7 _ _
• Sort right half:
1 2 4 6 _ _ _ _
_ _ _ _ 3 7 5 8
• Merge:
1 2 4 6 _ _ _ _
_ _ _ _ 3 5 7 8
• Move the right half back to the original list:
1 2 4 6 3 5 7 8
• Now, merge the left half and the right half of the original list:
_ _ _ _ _ _ _ _
1 2 3 4 5 6 7 8
• And ta-da!
1 2 3 4 5 6 7 8
• Merge sort is O(n log n). As before, the log n comes from the dividing by
two. The n thus must come from the merging. You can rationalize this
by considering the last merge step. To figure out which number to place
in the intermediate array next, we point our left hand at the leftmost
number of the left half and our right hand at the leftmost number of the
right half. Then we walk each hand to the right and compare numbers.
All told, we walk through every number in the list, which takes n steps.
• Check out Rob’s visualization of merge sort. You can even hear what
sorting algorithms sound like.
• A function that calls itself is using recursion. In the above pseudocode,
we implemented merge sort using recursion.
A Little Math
• To show mathematically that merge sort is O(n log n), let’s use the
following notation:
T(n) = 0, if n < 2
• So far, all this says is that it takes 0 steps to sort a list of 1 or 0 elements.
This is the so-called base case.
T(n) = T(n/2) + T(n/2) + n, if n > 1
• This notation indicates that the rest of the algorithm, the recursive case,
i.e. sorting a list of n elements, takes as many steps as sorting its two
halves, each of n / 2 elements, plus an extra n steps to do the merging.
• Consider the case where n = 16:
• T(16) = 2 * T(8) + 16
• T(8) = 2 * T(4) + 8
• T(4) = 2 * T(2) + 4
• T(2) = 2 * T(1) + 2
T(1) = 0
• Since the base case, where a list of 0 or 1 is already sorted, takes 0 steps,
we can now substitute 0 in for T(1) and calculate T(2):
• T(2) = 2 * 0 + 2
= 2
• Now we can substitute 2 in for T(2) and so on until we get:
• T(16) = 2 * 24 + 16
• T(8) = 2 * 8 + 8
• T(4) = 2 * 2 + 4
• T(2) = 2 * 0 + 2
T(1) = 1
• Thus, T(16) is 64. This number is actually n log n. Dividing the list
successively accounts for log n, but the additional n factor comes from
the merge step.
• Here again with merge sort we’ve returned to the idea of "divide and
conquer" that we saw in Week 0 with the phonebook example.
• In case you want to know what recursion is, try Googling it and checking
out the "Did you mean" suggestion. Hooray for geek humor!
More with Recursion
sigma-0.c
• Let’s write a program that sums up the numbers 0 through n, where n is
provided by the user. We start with some boilerplate code to get a
positive integer from the user using a do-while loop:
• #include <cs50.h>
• #include <stdio.h>
•
• int main(void)
• {
• int n;
• do
• {
• printf("Positive integer please: ");
• n = GetInt();
• }
• while (n < 1);
• }
• Recall the sigma symbol (`\Sigma`) which stands for sum. It makes
sense, then, to call our summing function sigma:
• #include <cs50.h>
• #include <stdio.h>
•
• int main(void)
• {
• int n;
• do
• {
• printf("Positive integer please: ");
• n = GetInt();
• }
• while (n < 1);
•
• int answer = sigma(n);
•
• printf("%i\n", answer);
• }
• Now we need to define sigma:
• #include <cs50.h>
• #include <stdio.h>
•
• int sigma(int m);
•
• int main(void)
• {
• int n;
• do
• {
• printf("Positive integer please: ");
• n = GetInt();
• }
• while (n < 1);
•
• int answer = sigma(n);
•
• printf("%i\n", answer);
• }
•
• int sigma(int m)
• {
• if (m < 1)
• {
• return 0;
• }
•
• int sum = 0;
• for (int i = 1; i <= m; i++)
• {
• sum += i;
• }
• return sum;
• }
• This is pretty straightforward. First, we do some error checking, then we
iterate through all numbers 1 through m, summing them up as we go. sum
+= i is functionally equivalent to sum = sum + i.
• Don’t forget that we need to declare sigma before main if we want to
implement sigma after main!
• When we compile and run this, it seems to work! What happens when
we mess with it by inputting a very large number? Turns out that if our
sum becomes so large that an int doesn’t have enough bits for it, it will
be confused for a negative number.
sigma-1.c
• Let’s try to approach the same problem using recursion.
• Our implementation of main doesn’t change. sigma, however, now looks
like this:
• int sigma(int m)
• {
• if (m <= 0)
• {
• return 0;
• }
• else
• {
• return (m + sigma(m - 1));
• }
• }
• You might worry that this implementation will induce an infinite loop.
However, the first if condition represents a base case in
which sigma doesn’t call itself.
Teaser
noswap.c
• Consider the following code that claims to swap the values of two
integers:
• #include <stdio.h>
•
• void swap(int a, int b);
•
• int main(void)
• {
• int x = 1;
• int y = 2;
•
• printf("x is %i\n", x);
• printf("y is %i\n", y);
• printf("Swapping...\n");
• swap(x, y);
• printf("Swapped!\n");
• printf("x is %i\n", x);
• printf("y is %i\n", y);
• }
•
• void swap(int a, int b)
• {
• int tmp = a;
• a = b;
• b = tmp;
• }
• Seems reasonable, right? Unfortunately, when we compile and run this,
we get this output:
• x is 1
• y is 2
• Swapping...
• Swapped!
• x is 1
y is 2
• Obviously, the numbers haven’t really been swapped. We’ll find out why
next time!
Last updated 2013-10-03 00:26:15 PDT
Week 4, continued
Andrew Sellergren
Table of Contents
Announcements and Demos
Pointers
noswap.c
swap.c
compare-0.c
copy-0.c
compare-1.c
copy-1.c
Teaser
Pointers
noswap.c
• Pointers are one of the more complex topics we cover, so don’t feel bad if
your mind feels stretched in the next few weeks. That’s a good thing!
• Recall last time we ended with a function that didn’t live up to its name:
• #include <stdio.h>
•
• void swap(int a, int b);
•
• int main(void)
• {
• int x = 1;
• int y = 2;
•
• printf("x is %i\n", x);
• printf("y is %i\n", y);
• printf("Swapping...\n");
• swap(x, y);
• printf("Swapped!\n");
• printf("x is %i\n", x);
• printf("y is %i\n", y);
• }
•
• void swap(int a, int b)
• {
• int tmp = a;
• a = b;
• b = tmp;
• }
• Though we expected to see x and y have the values 2 and 1, respectively,
we actually saw that they still had their original values 1 and 2.
• To see why this doesn’t work, let’s bring a volunteer onstage. We’ll ask
her to pour orange juice and milk into two separate glasses representing
two different int. If we ask her to swap the orange juice and milk, she
wisely chooses to use another glass. This glass represents some
temporary storage which we call tmp in the swapfunction above.
Interestingly, if we implement the same code directly in main, the
swapping actually works:
• #include <stdio.h>
•
• int main(void)
• {
• int x = 1;
• int y = 2;
•
• printf("x is %i\n", x);
• printf("y is %i\n", y);
• printf("Swapping...\n");
•
• int tmp = x;
• x = y;
• y = tmp;
•
• printf("Swapped!\n");
• printf("x is %i\n", x);
• printf("y is %i\n", y);
• }
• So why does this logic work in main but not in swap? a and b are actually
copies of x and y, so when we swap a and b, x and y are unchanged.
• One way to fix this would be to make x and y global variables, declaring
them outside of main. In fifteen.c, it made sense to make certain
variables global because they were to be used by the whole program.
However, in a small program like noswap.c, using global variables is
sloppy design.
swap.c
• How can we change the definition of swap to work as intended? Turns
out we just need to add asterisks:
• void swap(int* a, int* b)
• {
• int tmp = *a;
• *a = *b;
• *b = tmp;
• }
• What is an int*? It’s the memory address of an int. More properly
speaking, it is a pointer to an int. If your computer has 2 gigabytes of
RAM, then there are 2 billion bytes, each of which has a memory
address. Let’s say the int that a points to is stored at the 123rd byte of
RAM. The value of a then, is 123. To get at the actual integer value that’s
stored at byte 123, we write *a. *a = *b says "store at location a whatever
is at location b."
• Now that we’ve changed swap, we need to change how we call swap.
Instead of passing x and y, we want to pass the address of x and the
address of y:
• swap(&x, &y)
• & is the "address-of" operator and * is the dereference operator.
• Let’s assume our integers 1 and 2 are stored next to each other in
memory and 1 is stored at byte 123. That means 2 is stored 4 bytes away
(since an int requires 4 bytes), so we’ll assume that it’s stored at byte
127. The values of a and b, then, are 123 and 127. We can simulate
passing those to swap by writing them on pieces of paper and putting
them in a black box.
• We ask a volunteer to come onstage and retrieve the pieces of paper
from the black box. Next he needs to allocate a little bit of memory for
variable tmp. In tmp, he stores the value of the int whose address is in a.
This is 1.
• Next, at address 123, he erases the number 1 and writes in the number 2.
This corresponds to the *a = *b line, which says "store at
location a whatever is at locationb."
• Finally, at address 127, he erases the number 2 and writes in the number
2, which was stored in tmp. tmp is a local variable, but goes away
when swap returns.
compare-0.c
• For the first few weeks, we have worked with string as a data type.
However, this is a type that we defined for you in the CS50 Library.
A string is really achar*. It’s the address of a char. In fact, it’s the address
of the first char in the string.
• Consider the following program which claims to compare two strings:
• #include <cs50.h>
• #include <stdio.h>
•
• int main(void)
• {
• // get line of text
• printf("Say something: ");
• string s = GetString();
•
• // get another line of text
• printf("Say something: ");
• string t = GetString();
•
• // try (and fail) to compare strings
• if (s == t)
• {
• printf("You typed the same thing!\n");
• }
• else
• {
• printf("You typed different things!\n");
• }
• }
• Here, we simply ask the user for two strings and store them in s and t.
Then we ask if s == t. Seems reasonable, no? We’ve used the == operator
for all the other data types we’ve seen thus far.
• But if we compile and run this program, typing "hello" twice, we always
get "You typed different things!"
• Recall that a string is just an array of characters, so "hello" looks like
this in memory:
h e l l o \0
• Although we’re able to access the first character "h" using bracket
notation, under the hood it’s really located at one of 2 billion or so
memory addresses. Let’s call it address 123 again. Then "e" is at address
124, "l" is at address 125, and so on. A char only takes 1 byte, so this time
the memory addresses are only 1 apart.
• If GetString is getting us this string, then what does it actually return?
The number 123! Before it does so, it allocates the memory necessary to
store "hello" and inserts those characters along with the null terminator.
• But if we only know the memory address of the first character, how do
we know how long the string is? Recall that strings end with the
special \0 character, so we can just iterate until we find it.
• compare-0.c is buggy because it’s comparing the memory addresses of the
two strings, not the strings themselves. Maybe s is stored at memory
address 123 and tis stored at memory address 200. Since 123 does not
equal 200, our program says they’re different strings.
copy-0.c
• Let’s take a look at a program that tries, but fails to copy a string:
• #include <cs50.h>
• #include <ctype.h>
• #include <stdio.h>
• #include <string.h>
•
• int main(void)
• {
• // get line of text
• printf("Say something: ");
• string s = GetString();
• if (s == NULL)
• {
• return 1;
• }
•
• // try (and fail) to copy string
• string t = s;
•
• // change "copy"
• printf("Capitalizing copy...\n");
• if (strlen(t) > 0)
• {
• t[0] = toupper(t[0]);
• }
•
• // print original and "copy"
• printf("Original: %s\n", s);
• printf("Copy: %s\n", t);
• }
• We check that s isn’t NULL in case the user has given us more characters
than we have memory for. NULL is actually the memory address 0. By
convention, no user data can ever be stored at byte 0, so if a program
tries to access this memory address, it will crash.
• Now that we have the user-provided string in s, we assign the value
of s to t. But if s is just a memory address, say 123, then t is now the
same memory address. Both s and t are pointing to the same chunks of
memory.
• To prove that this program is buggy, we’ll try to capitalize t, but not s.
The output, though, shows that both s and t are capitalized.
• To emphasize that their role is to point to other variables, pointers are
often represented as arrows.
compare-1.c
• Finally, a program that truly compares two strings:
• #include <cs50.h>
• #include <stdio.h>
• #include <string.h>
•
• int main(void)
• {
• // get line of text
• printf("Say something: ");
• char* s = GetString();
•
• // get another line of text
• printf("Say something: ");
• char* t = GetString();
•
• // try to compare strings
• if (s != NULL && t != NULL)
• {
• if (strcmp(s, t) == 0)
• {
• printf("You typed the same thing!\n");
• }
• else
• {
• printf("You typed different things!\n");
• }
• }
• }
• Now that we know a string is really just a char*, we need to be careful it’s
not NULL.
• strcmp is short for "string compare." It’s a function that comes
in string.h, which, according to the man page, returns 0 if two strings
are identical, a negative number if the first string argument comes
before the second string alphabetically, or a positive number if the first
string argument comes after the second string alphabetically.
copy-1.c
• Copying a string is a little more complicated than just using the
assignment operator:
1 #include <cs50.h>
2 #include <ctype.h>
3 #include <stdio.h>
4 #include <string.h>
5
6 int main(void)
7 {
8 // get line of text
9 printf("Say something: ");
10 char* s = GetString();
11 if (s == NULL)
12 {
13 return 1;
14 }
15
16 // allocate enough space for copy
17 char* t = malloc((strlen(s) + 1) * sizeof(char));
18 if (t == NULL)
19 {
20 return 1;
21 }
22
23 // copy string, including '\0' at end
24 int n = strlen(s);
25 for (int i = 0; i <= n; i++)
26 {
27 t[i] = s[i];
28 }
29
30 // change copy
31 printf("Capitalizing copy...\n");
32 if (strlen(t) > 0)
33 {
34 t[0] = toupper(t[0]);
35 }
36
37 // print original and copy
38 printf("Original: %s\n", s);
39 printf("Copy: %s\n", t);
40
41 // success
42 return 0;
43 }
• In line 17, we’re declaring a pointer t and initializing it with the return
value of a function named malloc. malloc takes a single argument, the
number of bytes of memory requested, and returns the address in
memory of the first of those bytes or NULL if the memory couldn’t be
allocated.
• In this case, we’re allocating enough memory for all the characters
in s plus 1 extra for the null terminator. We multiply this number of
characters by sizeof(char), which gives the size in bytes of a char on this
particular operating system. Normally it will be 1, but we’re handling
other cases correctly, too.
• Once we have enough memory, we iterate through all of the characters
in s and assign them one at a time to t.
Teaser
Memory
The Stack
Binky
Stack Overflow
Memory
The Stack
Binky
• We left off last time with some code that put Binky in a tough spot:
1 int main(void)
2{
3 int* x;
4 int* y;
5
6 x = malloc(sizeof(int));
7
8 *x = 42;
9
10 *y = 13;
11
12 y = x;
13
14 *y = 13;
15 }
• What gets stored in x at line 6? The address of the first byte of memory
allocated by malloc.
• Unfortunately, in line 10, we dereference the pointer y before it has been
initialized. y contains some garbage value that we interpret as a memory
address, so when we try to access it, bad things happen. In the video,
this meant decapitation for Binky. In C programs, this usually means a
segmentation fault.
Stack Overflow
• You may know this as a popular website, but it actually has a specific
technical meaning. If a programmer forgets to check the boundaries of
an array, he or she leaves his program vulnerable to an attack that can
take over control of the program. Consider the following code:
• #include <string.h>
•
• void foo(char* bar)
• {
• char c[12];
• memcpy(c, bar, strlen(bar));
• }
•
• int main(int argc, char* argv[])
• {
• foo(argv[1]);
• }
• For a thorough discussion of this attack, check out the Wikipedia article.
• In short, this program passes the first command-line argument to a
function foo that writes it into an array of size 12. If the first command-
line argument is less than 12 characters long, everything works fine. If
the first command-line argument is greater than 12 characters long,
then it will overwrite memory past the bounds of c. If the first
command-line argument is greater than 12 characters long and actually
contains the address in memory of some malicious code, then it could
potentially overwrite the return address of foo. When foo returns, then,
it will give control of the program over to this malicious code rather
than main.
• Instead of ending on a scary note, let’s end with a joke.
Last updated 2013-10-10 00:17:08 PDT
Week 5, continued
Andrew Sellergren
Table of Contents
Announcements and Demos
Compiling
Memory
The Stack
The Heap
Valgrind
Structs
Images
Compiling
• What we call "compiling" a program actually consists of four steps:
o pre-processing
o compiling
o assembling
o linking
• Lines of code that begin with #, such as #define and #include are pre-
processor directives. When you write #include <stdio.h>, it instructs the
compiler to fetch the contents of stdio.h and paste them into your
program before it begins translating into 0s and 1s. This occurs during
the pre-processing step.
• Compiling actually involves translating C into assembly language.
Assembling translates assembly language to binary. Finally, linking
combines the 0s and 1s of your program with 0s and 1s of other people’s
code.
• To see what’s going on during the compiling step, let’s run clang -S on
our hello.c program. This creates a file named hello.S written in
assembly language. If you open this up, you’ll see instructions
like pushl and movl that manipulate registers, very small memory
containers. These instructions vary between CPUs.
• The following diagram shows how these four steps of compiling connect
with each other:
Memory
The Stack
The Heap
Valgrind
Structs
• Just like we used typedef to create the string type in the CS50 Library,
you can use it to define your own types:
• #include <cs50.h>
•
• // structure representing a student
• typedef struct
• {
• int id;
• string name;
• string house;
• }
• student;
• Here we’re defining a variable type named student. Inside of this type,
which is actually a struct, there are three variables representing the ID,
name, and house of the student. To access these variables within
a student, we use dot notation:
1 #include <cs50.h>
2 #include <stdio.h>
3 #include <string.h>
4
5 #include "structs.h"
6
7 // class size
8 #define STUDENTS 3
9
10 int main(void)
11 {
12 // declare class
13 student class[STUDENTS];
14
15 // populate class with user's input
16 for (int i = 0; i < STUDENTS; i++)
17 {
18 printf("Student's ID: ");
19 class[i].id = GetInt();
20
21 printf("Student's name: ");
22 class[i].name = GetString();
23
24 printf("Student's house: ");
25 class[i].house = GetString();
26 printf("\n");
27 }
28
29 // now print anyone in Mather
30 for (int i = 0; i < STUDENTS; i++)
31 {
32 if (strcmp(class[i].house, "Mather") == 0)
33 {
34 printf("%s is in Mather!\n\n", class[i].name);
35 }
36 }
37
38 // free memory
39 for (int i = 0; i < STUDENTS; i++)
40 {
41 free(class[i].name);
42 free(class[i].house);
43 }
44 }
• In line 13, we declare an array of student. We then loop through that
array and populate it with data from the user.
Images
• There are many different file formats used to store images. One such
format is a bitmap, or BMP. A very simple bitmap might use 0 to
represent black and 1 to represent white, so a series of 0s and 1s could
store a black-and-white image.
• More sophisticated file formats like JPEG store 0s and 1s for the image
itself but also metadata. In Problem Set 5, you’ll use this fact to detect
JPEGs that have been lost on David’s memory card.
Last updated 2013-10-12 23:33:02 PDT
Week 7
Andrew Sellergren
Table of Contents
Announcements and Demos
User Input
scanf-0.c
scanf-1.c
scanf-2.c
Structs
structs.h
structs-0.c
structs-1.c
Storage
Hard Drives
Floppy Disks
Linked Lists
User Input
• sscanf is what the CS50 Library uses to get input from the user in
functions like GetString.
scanf-0.c
• Take a look at a simple example of using scanf, which is quite similar
to sscanf:
• #include <stdio.h>
•
• int main(void)
• {
• int x;
• printf("Number please: ");
• scanf("%i", &x);
• printf("Thanks for the %i!\n", x);
• }
• The first argument to scanf resembles an argument we might pass
to printf (The "f" in both denotes "formatted"). The second argument is
the address of x, thus empowering scanf to actually modify the memory
in which x is stored.
• This program behaves as expected if the user provides a number as
input. However, if the user provides a string or any other non-numeric
input, the program behaves strangely. One of the things the CS50
Library provides is some error checking so that if the user provides bad
input, he or she will be prompted to retry.
scanf-1.c
• scanf-1.c closely resembles scanf-0.c, but introduces one major bug:
• #include <stdio.h>
•
• int main(void)
• {
• char* buffer;
• printf("String please: ");
• scanf("%s", buffer);
• printf("Thanks for the %s!\n", buffer);
• }
• A buffer is just a generic name for a chunk of memory, a place to store
information.
• The problem here is that buffer is uninitialized. We didn’t ask the
operating system for a chunk of memory in which to store the string the
user gives us. If we run this program, it will probably crash with a
segmentation fault.
scanf-2.c
• One solution to the bug in scanf-1.c would be to allocate memory
for buffer on the stack, as we do in scanf-2.c:
• #include <stdio.h>
•
• int main(void)
• {
• char buffer[16];
• printf("String please: ");
• scanf("%s", buffer);
• printf("Thanks for the %s!\n", buffer);
• }
• Here, you can see that scanf treats the array buffer as a memory address.
We know that the address is for a chunk of memory of size 16 bytes.
• In what scenario might this program also be buggy? If the user provides
a string longer than 15 characters (not 16 because we need at least one
character for the null terminator), the program may crash with a
segmentation fault.
• How do we know in advance how much memory to request for user
input? We don’t! The CS50 Library has some logic that reads user input
one character at a time with scanf and requests more memory whenever
it runs out.
Structs
• Let’s revisit the problem of storing information about a number of
students. We might start off just declaring a few variables like so:
• #include <cs50.h>
• #include <stdio.h>
•
• int main(void)
• {
• string name = GetString();
• string house = GetString();
• }
•
• * What if we want to store another student's information? Well I guess
we need some more variables:
• +
• [source]
Storage
Hard Drives
• Hard drives that aren’t SSDs (solid-state drives with no moving parts)
consist of circular metal platters and magnetic heads that read and write
bits on them. The 0s and 1s of files are stored by magnetic particles that
are flipped with either their north or their south poles sticking up.
• Somewhere on the hard drive there exists a table that maps filenames to
their memory addresses. As you can with RAM, you can number all of
the bytes of a hard drive so that each has a memory address. When you
delete a file, say by dragging it to the trash can or even by emptying the
trash can, the contents of the file may not actually be deleted. Rather,
the file’s entry in the location table is simply erased so that the operating
system forgets where the file was stored. Not until the 0s and 1s of the
file are actually overwritten will the file’s contents truly be gone. In the
meantime, the file can be recovered by software like Norton or by a
program like the one you’ll write for Problem Set 5. Having been
provided with the raw bytes of an SD card, you’ll be tasked with
searching through them to look for the particular pattern of bits that
identifies the start of a JPEG file.
Floppy Disks
• Back in David’s day [1], another type of storage called floppy disks was
popular. Functionally, these are very similar to hard drives in that inside
their plastic casing, there is a circular magnetic platter. You can get your
hands on it just by ripping off the metal tab. Be careful, there’s a spring
in there!
• These days, the size of hard drives is measured in terabytes. A so-called
"high-density" floppy disk can only store 1.44 megabytes, or roughly 1
millionth of a terabyte.
Linked Lists
• Arrays are useful because they enable the storage of similar variables in
contiguous memory. One downside of arrays is that they have a fixed
size. Another downside is that there’s no easy way to insert something in
the middle of an array. To do so, we would have to allocate memory for a
copy of the array and then shift all the elements to the right.
• To solve the problem of fixed size, we’ll relax the constraint that the
memory we use be contiguous. We can take a little bit of memory from
here and a little bit of memory from there just so long as we can connect
them together. This new data structure is called a linked list:
• Each element of a linked list contains not only the data we want to store,
but also a pointer to the next element. The final element in the list has
the NULL pointer.
• To implement a linked list, we’ll borrow some of the syntax we used for
structs:
• typedef struct node
• {
• int n;
• struct node *next;
• }
• node;
• Pictorially, next is the bottom box that points to the next element of the
linked list. Why do we have to declare it as a struct node* then? The
compiler doesn’t yet know what a node is, so we have to call it a struct
node in the meantime.
• There are a few linked list operations that will be of interest to us:
o insert
o delete
o search
o traverse
• The search operation is actually pretty easy to implement:
• bool search(int n, node* list)
• {
• node* ptr = list;
• while (ptr != NULL)
• {
• if (ptr->n == n)
• {
• return true;
• }
• ptr = ptr->next;
• }
• return false;
• }
• search takes two arguments, the number to be searched for and a pointer
to the first node in the linked list. We then declare a pointer ptr that
we’ll use to walk through the list. Since ptr is a pointer to a struct, we
use the arrow syntax (->) to access the elements within the struct. To
advance to the next node in the linked list, we assign ptr->next to ptr.
More on this on Wednesday!
Week 7, continued
Andrew Sellergren
Table of Contents
Linked Lists
list-0.c
Search
Insertion
Hash Tables
Linear Probing
Separate Chaining
Tries
Teaser
Linked Lists
• Arrays are of fixed size, which is both an advantage and a disadvantage.
It’s an advantage because it means you know exactly how much space
you’ll be using, but it’s a disadvantage because it means you have to
allocate an entirely new array if you need more space. Arrays are stored
as a single chunk of memory, which means that we have random
access to all of their elements.
• We introduced the linked list as a data structure of expandable size:
Search
Insertion
• Insertion into a linked list requires handling three different cases: the
beginning, middle, and end of the list. In each case, we need to be
careful in how we update the node pointers lest we end up orphaning
part of the list.
• To visualize insertion, we’ll bring 6 volunteers onstage. 5 of these
volunteers will represent the numbers 9, 17, 22, 26, and 34 that are in
our linked list and 1 volunteer will represent the first pointer.
• Now, we’ll request memory for a new node, bringing one more volunteer
onstage. We’ll give him the number 5, which means that he belongs at
the beginning of the list. If we begin by pointing first at this new node,
then we forget where the rest of the list is. Instead, we should begin by
pointing the new node’s next pointer at the first node of the list. Then we
update first to point to the new node.
• Again, we’ll request memory for a new node, bringing another volunteer
onstage and assigning her the number 55. She belongs at the end of the
list. To confirm this, we traverse the list by updating ptr to the value
of next for each node. In each case, we see that 55 is greater than ptr->n,
so we advance to the next node. However, ultimately, we end up
with ptr equal to NULL because 55 is greater than all of the numbers in the
list. We don’t have a pointer, then, to the last node in the list, which
means we can’t update it. To prevent this, we need to keep track of the
node one to the left of ptr. We’ll store this in a variable called predptr in
our sample code. When we reach the end of the list, predptr will point to
the last node in the list and we can update its next value to point to our
new node.
• Another solution to this problem of keeping track of the previous node is
to implement a doubly linked list. In a doubly linked list, each node has
a next pointer to point to the next node and a prev pointer to point to the
previous node.
• Once more, we’ll request memory for a new node, assigning the value 20
to our last volunteer. This time when we traverse the list, our predptr is
pointing to the 17 node and our ptr is pointing to the 22 node when we
find that ptr->n is greater than 20. To insert 20 into the list, we point
the next pointer of predptr to our new node and the next pointer of our
our new node to ptr.
• Linked lists are yet another example that design is very much subjective.
They are not unilaterally better than arrays, but they may be more useful
than arrays in certain contexts. Likewise, arrays may be more useful
than linked lists in certain contexts.
Hash Tables
• The holy grail of running time is O(1), i.e. constant time. We’ve already
seen that arrays afford us constant-time lookup, so let’s return to this
data structure and use it to store a list of names. Let’s assume that our
array is of size 26, so we can store a name in the location corresponding
to its first letter. In doing so, we also achieve constant time for insertion
since we can access location i in the array in 1 step. If we want to insert
the name Alice, we index to location 0 and write it there.
• This data structure is called a hash table. The process of getting the
storage location of an element is called hashing and the function that
does so is called a hash function. In this case, the hash function simply
takes the first letter of the name and converts it to a number.
Linear Probing
• What problems might arise with this hash table? If we want to insert the
name Aaron, we find that location 0 is already filled. We could take the
approach of inserting Aaron into the next empty location, but then our
running time deteriorates to linear because in the worst case, we may
have to iterate through all n locations in the array to insert or search for
a name. This approach is appropriately named linear probing.
Separate Chaining
• When two elements have the same hash, there is said to be a collision in
the hash table. Linear probing was our first approach to handling
collisions. Another approach is separate chanining. In separate
chaining, each location in the hash table stores a pointer to the first
node of a linked list. When a new element needs to be stored at a
location, it is simply added to the beginning of the linked list.
• Why worry at all about collisions? How likely is it really that they will
happen? It turns out the probability of collisions is actually quite high.
We can phrase this question in a slightly different way that we’ll call the
Birthday Problem:
In a room of n CS50 students, what’s the probability that at least 2
students have the same birthday?
• To answer this question, we’ll consider the opposite: what’s the
probability that no 2 students have the same birthday. If there’s only 1
student in the room, then the probability that no 2 students have the
same birthday is 1. If there are 2 students in the room, then there are
364 possible birthdays out of 365 which the second student could have
that would be different from the first student’s. Thus, the probability
that no 2 students have the same birthday in a room of 2 is 364 ⁄ 365.
The probability that no 2 students have the same birthday in a room of 3
is 363 ⁄ 365. And so on. To get the total probability, we multiple all of
these probabilities together. You can see this math here, courtesy of
Wikipedia:
Tries
• One last data structure we’ll discuss is a trie. The word “trie” comes
from the word “retrieval,” but is usually pronounced like “try.” For our
purposes, the nodes in a trie are arrays. We might use a trie to store a
dictionary of names of famous scientists, as this diagram suggests:
• In this trie, each index in the array stands for a letter of the alphabet.
Each of those indices also points to another array of letters. The ∆
symbol denotes the end of a name. We have to keep track of where
words end so that if one word actually contains another word (e.g.
Mendeleev and Mendel), we know that both words exist. In code, the ∆
symbol could be a Boolean flag in each node:
• typedef struct node
• {
• bool word;
• struct node* children[27];
• }
• node;
• One advantage of a trie is that insertion and search times are unaffected
by the number of elements already stored. If there are n elements stored
in the trie and you want to insert the value Alice, it still takes just 5
steps, one for each letter. This runtime we might express as O(k),
where k is the length of the longest possible word. But k is a constant, so
we’re actually just talking about O(1), or constant-time insertion and
lookup.
• Although it may seem like a trie is the holy grail of data structures, it
may not perform better than a hash table in certain contexts. Choosing
between a hash table and a trie is one of many design decisions you’ll
have to make for Problem Set 6.
Teaser
• Before long, we’ll transition to talking about web development,
including HTML, PHP, and JavaScript. As a brief teaser, enjoy this
trailer to Warriors of the Net.
Last updated 2013-10-26 21:50:16 PDT
Week 8
Andrew Sellergren
Table of Contents
Announcements and Demos
Hash Tables
Tries
Stacks
Queues
Trees
Binary Search Trees
Teaser
Hash Tables
• We also discussed hash tables, which associate keys with values. The
keys are determined by taking a deterministic hash of the values using a
hash function. In our first example, this hash function simply took the
first letter of the name that we wanted to store, 0 for Alice, 1 for Bob,
and so on. In code, this might look like:
• int hash(char* s)
• {
• return s[0] - 'A';
• }
• This function is a little buggy, of course, because we’re not checking
if s is NULL and we’re not accounting for lowercase strings. But you get
the idea.
• Hash tables inevitably have collisions in which two values have the same
hash. Linear probing was the first approach we looked at for handling
collisions. More compelling, however, was the second approach in which
our hash table consisted of pointers to linked lists. If a second value
needed to be inserted at a key, we simply add it to the beginning of the
linked list. By inserting at the beginning of the list, we maintain O(1)
insertion time.
Tries
• The final data structure we examined was a trie. Both insertion time and
search time for a trie are O(k), where k is the length of the word being
inserted or searched for. But k is really a constant since words have a
finite length, so insertion time and search time are actually O(1).
• A trie is a tree structure consisting of arrays of arrays. Each index in
each array stores a pointer to another array. Considering how many
arrays we’re actually storing, then, the tradeoff for a trie implementation
is clearly the memory it requires. Yet again, we see that there is a
tradeoff between memory and running time, between space and speed.
Stacks
• We’ve already seen that a program’s memory is called the stack because
of the way in which function frames are layered on top of each other.
More generally, a stack is a data structure that has its own advantages
and disadvantages compared to arrays, linked lists, hash tables, and
tries.
• We interact with stacks using only two basic operations: push and pop.
To add data to the stack, we push it onto the top of the stack. To retrieve
data from the stack, we pop it off the top of the stack. As a result, a stack
exhibits last in first out (LIFO) storage. The only data that we can
retrieve from the stack is the last data we added to it.
• In what contexts might LIFO storage be useful? Clearly it’s useful for
organizing a program’s memory. As we’ll see soon, it’s also useful for
validating the tree structure of a web page’s HTML.
• We might implement a stack like so:
• typedef struct
• {
• int trays[CAPACITY];
• int size;
• }
• stack;
• It’s convenient to think of a stack like the stack of trays in the dining
halls. In the code above, CAPACITY is a constant defining the maximum
number of such trays that a stack can contain. Another integer
named size stores the number of trays currently in the stack.
• Let’s say CAPACITY is 3. Initially, trays contains nothing but garbage
values and size is 0. Let’s say we push the number 9 onto the stack.
Now size becomes 1 and the 0th index of trays is set to 9. Next, we push
17 onto the stack, size becomes 2 and the 1st index of trays is set to 17.
Finally, we push 22 onto the stack,size becomes 3 and the 2nd index
of trays is set to 22. What happens when we try to push 27 onto the
stack? We can’t add it to the stack because we have filled all available
indices in trays. If our push function returned a Boolean, we would want
it to return false in this case.
• One way to implement a stack with dynamic size would be to
declare trays as a pointer and malloc it at runtime. Another way would
be to declare trays as a pointer to a linked list.
Queues
• If you’re familiar with the lines that form outside the Apple store when a
new iPhone is released, then you’re familiar with queues. We also
interact with queues using two basic operations: enqueue and dequeue.
Whereas stacks exhibit LIFO storage, however, queues exhibit FIFO, i.e.
first in first out, storage. Imagine how upset the people outside the
Apple store would be if the line were implemented as a stack instead of a
queue!
• We can implement a queue using a struct:
• typedef struct
• {
• int numbers[CAPACITY];
• int front;
• int size;
• }
• queue;
• Note that our queue type is very similar to our stack type. Why do we
need the extra int for queue? front keeps track of the index of the next
value to be dequeued. If we add 9, 17, and 22 as we did to the stack and
then remove 9, we need to know that 17 should be the next value
dequeued. front is incremented whenever a value is dequeued, at least
until we reach CAPACITY.
• What happens when we have one value in the last index of numbers and
we want to enqueue another number? We can use an operator called
modulus (%) to wrap around to index 0 of numbers. Our one value is at
index 2, so if we insert at 3 modulo CAPACITY, we’ll be inserting at index
0.
• Using this approach, insertion into a queue runs in constant time.
Trees
• A trie is actually a specific type of a data structure called a tree:
Teaser
• Soon we’ll start working in HTML, a markup language that allows you to
specify what a web page should look like, and JavaScript, a
programming language that allows you to execute logic within a
browser. Our first web page will be implemented like so:
• <!DOCTYPE html>
•
• <html>
• <head>
• <title>hello, world</title>
• </head>
• <body>
• hello, world
• </body>
• </html>
• If you get really clever, you may be able to implement a page
like Rob’s or even Hamster Dance.
Last updated 2013-10-30 20:49:09 PDT
Week 8, continued
Andrew Sellergren
Table of Contents
Announcements and Demos
The Internet
HTTP
DNS
TCP/IP
• How does the internet actually work? When you type facebook.com into
your browser (Chrome, Internet Explorer, Firefox, etc.), the browser
makes an HTTP request. HTTP, which stands for hypertext transfer
protocol, defines the language that the browser and web server speak to
each other. Think of a web server exactly like a server at a restaurant:
when you make a request of him, he brings it to you. In the context of
the internet, the server is bringing you a web page written in HTML.
More on HTML later.
• HTTP is a protocol for browsers and servers to talk to each other.
Humans, too, have protocols for talking to each other. Consider that
when you meet someone, you often greet him or her with a handshake.
Browsers and servers also greet and acknowledge each other according
to HTTP.
• Servers do a lot more than just serve web pages. To accommodate
different types of requests, servers use different ports. The default port
number for HTTP requests is 80. Navigating to facebook.com is
identical to navigating to facebook.com:80 because the 80 is implied.
• To see HTTP in action, we can fire up a command-line program
named telnet. We open up a terminal window and type telnet
www.facebook.com 80. This presents us with a prompt like so:
• Trying 31.13.69.32...
• Connected to star.c10r.facebook.com.
Escape character is '^]'.
• 31.13.69.32 is an IP address. IP stands for internet protocol. An IP
address is a unique (more or less) identifier for a computer on the
internet. An IP address is to a computer what a mailing address is to a
house. In this case, the IP address corresponds to one of Facebook’s
servers.
• Now we type the following:
• GET / HTTP/1.1
Host: www.facebook.com
• In turn, we’ll get a response from the server like this:
• HTTP/1.1 302 Found
• Location: http://www.facebook.com/unsupportedbrowser
• Content-Type: text/html; charset-utf-8
• X-FB-Debug: OigNZFku4U2xO68YDYkoMQs95BMNbmwwMqYVgo0yGx8=
• Date: Wed, 30 Oct 2013 17:20:59 GMT
• Connection: keep-alive
Content-Length: 0
• Facebook doesn’t like the fact that we’re pretending to be a browser, so
it’s redirecting us to a site to tell us that. We can actually trick Facebook
into thinking that we’re coming from a normal browser like Chrome by
adding a line to our HTTP request:
• GET / HTTP/1.1
• Host: www.facebook.com
User-Agent: Mozilla/5.0 (Macintosh; Indel Mac OS X 10_8_5)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101
Safari/537.36
• Note that these user agent strings tell websites a lot about your
computer. In this case, it tells Facebook that we’re using an Intel-based
Mac running OS X 10.8.5 and version 30.0.1599.101 of Chrome.
• With this user agent string added to our HTTP request, we get a normal
response back from the server. The server actually still redirects us, this
time to the more secure HTTPS version of the site.
• Why do we write GET /? We’re requesting the root of the website. The
root of the site is denoted with a slash just as the root of a hard drive is.
• If we switch gears and make an HTTP request to www.mit.edu, we get
back an HTTP response that starts with HTTP/1.1 200 OK and actually
contains the HTML that makes up their homepage. 200 is the "all is
well" HTTP status code. You can see this same HTML if you go to View
Source within your browser.
DNS
• How does your browser know the IP address of MIT’s web server? There
are special servers called DNS, or domain name system, servers whose
job it is translate hostnames like www.mit.edu into IP addresses.
TCP/IP
• TCP/IP is the protocol that defines how information travels through the
internet. Information travels from source to destination via
several routers in between. Routers are other servers that simply take in
bytes and direct them elsewhere. We can see which routers our
information passes through using a command-line program
named traceroute. Each of the lines in the output represents a router
that our request went through. Lines that are just three asterisks
represent routers that ignore this type of request, so we don’t know
where they are. On the right side, there are three time values which
represent three measurements of the number of milliseconds it took to
reach this router.
• Typically, information requires fewer than 30 hops between routers to
get to its destination. If we run traceroute www.mit.edu, we see that the
first few hops are Harvard’s routers, but by step 6, we’re in New York
City. After that, the hops are obscured.
• If we run traceroute www.stanford.edu, we see that we can get from
Boston to Washington DC in ~7 steps and less than 15 milliseconds!
After that, we jump to Houston and LAX and finally to Stanford, all in
under 90 milliseconds or so.
• There are other machines besides routers that sit between your
computer and the information you’re requesting. For example, that
information may live behind a machine that restricts access known as
a firewall. You can think of an HTTP request as an envelope with the
return address being your IP and the to address being the IP of the web
server. A very simple firewall might reject requests solely based on the
IPs they came from, i.e. the return addresses written on the envelopes. A
more advanced firewall might reject e-mails but not web requests,
discriminating using the port number of the request.
• For the sake of efficiency and reliability, HTTP requests are broken up
into chunks of information called packets. These packets don’t have to
follow the same path to reach their destination and some of them never
will because they are dropped by routers in between, whether
intentionally or unintentionally.
• For a more in-depth look at TCP/IP and the internet, check out Warriors
of the Net.
Last updated 2013-11-01 18:55:47 PDT
Week 9
Andrew Sellergren
Table of Contents
Announcements and Demos
PHP
More HTML
Implementing Google
Frosh IMs
froshims0.php
conditions-1.php
register-0.php
register-3.php
Model-view-controller
Teaser
More HTML
Implementing Google
• When you search for something on Google, the URL changes from
google.com to something with a lot of information after the /. This
information is actually a series of parameters that Google uses to create
its search results for you. One such parameter is your search query. If
you navigate to http://www.google.com/search?q=cats, you’ll notice
that the search term "cats" is already filled in for you. q is a key meaning
"query" and its value, "cats," is specified after the =.
• Let’s implement Google! First, we need an input box for a user’s query:
• <!DOCTYPE html>
•
• <html>
• <head>
• <title>CS50 Search</title>
• </head>
• <body>
• <h1>CS50 Search</h1>
• <form>
• <input type="text"/>
• <input type="submit"/>
• </form>
• </body>
• </html>
• With this <form> tag and a few <input> tags, we have a very simple search
engine…that doesn’t do anything. We need to specify an action for
the <form> tag:
• <!DOCTYPE html>
•
• <html>
• <head>
• <title>CS50 Search</title>
• </head>
• <body>
• <h1>CS50 Search</h1>
• <form action="https://www.google.com/search" method="get">
• <input name="q" type="text"/>
• <br/>
• <input type="submit" value="CS50 Search"/>
• </form>
• </body>
• </html>
• Now we’re telling the form to submit its information directly to Google
using the GET method. There are two methods for submitting form
information, GET and POST. For now, just know that GET means the
information is appended to the URL.
• We’ve also added a name attribute to the text input to match the URL
parameter we saw that Google was using. We changed the text that the
submit button displays to "CS50 Search" using its value attribute.
Finally, we added a line break using the <br/> tag between the two
inputs.
• When we type "cats" and click "CS50 Search," we end up
on http://www.google.com/search?q=cats! We’ve implemented Google!
Frosh IMs
froshims0.php
• Back at the turn of the 19th century when David was a freshman at
Harvard, the process of registering for intramural sports was painfully
manual. You had to fill out a paper form and actually drop it off at the
dorm room of the proctor in charge. David decided to change all that by
implementing an online registration form. Although his original
implementation was in Perl, we can recreate it in HTML and PHP:
• <?php
•
• /**
• * froshims-0.php
• *
• * David J. Malan
• * [email protected]
• *
• * Implements a registration form for Frosh IMs.
• * Submits to register-0.php.
• */
•
• ?>
•
• <!DOCTYPE html>
•
• <html>
• <head>
• <title>Frosh IMs</title>
• </head>
• <body style="text-align: center;">
• <h1>Register for Frosh IMs</h1>
• <form action="register-0.php" method="post">
• Name: <input name="name" type="text"/>
• <br/>
• <input name="captain" type="checkbox"/> Captain?
• <br/>
• <input name="gender" type="radio" value="F"/> Female
• <input name="gender" type="radio" value="M"/> Male
• <br/>
• Dorm:
• <select name="dorm">
• <option value=""></option>
• <option value="Apley Court">Apley Court</option>
• <option value="Canaday">Canaday</option>
• <option value="Grays">Grays</option>
• <option value="Greenough">Greenough</option>
• <option value="Hollis">Hollis</option>
• <option value="Holworthy">Holworthy</option>
• <option value="Hurlbut">Hurlbut</option>
• <option value="Lionel">Lionel</option>
• <option value="Matthews">Matthews</option>
• <option value="Mower">Mower</option>
• <option value="Pennypacker">Pennypacker</option>
• <option value="Stoughton">Stoughton</option>
• <option value="Straus">Straus</option>
• <option value="Thayer">Thayer</option>
• <option value="Weld">Weld</option>
• <option value="Wigglesworth">Wigglesworth</option>
• </select>
• <br/>
• <input type="submit" value="Register"/>
• </form>
• </body>
• </html>
• As before, we have <head> and <body> tags. Within the <body>, there’s
a <form> whose action attribute is register0.php. We see <input> tags
with typeset to "text," "checkbox," and "radio." Text and checkbox
should be self-explanatory, but radio refers to the bulleted buttons for
which the user can only choose 1 option. To create a dropdown menu,
we use the <select> tag with <option> tags within it. Finally we have our
submit button which displays "Register" as itsvalue attribute.
• When we enter in values into this form and click "Register," we’re taken
to the register0.php URL that was specified in the action attribute of the
form. Unlike with our CS50 Search example, this URL doesn’t have any
of our inputs embedded in it. That’s because we used the POST method
of sending data rather than the GET method.
• register0.php does nothing more than print out our inputs as
an associative array. Whereas we only worked with numerically
indexed arrays in C, PHP supports arrays that can use strings and other
objects as keys. An associative array is really just a hash table! Because
POST sends data via the headers rather than the URL, it is useful for
submitting passwords, credit card numbers, and anything that’s
sensitive. It’s also useful for sending data that’s too large to embed in the
URL, for example an uploaded photo.
conditions-1.php
• To get a feel for this new language, let’s take a look at how we would
implement conditions-1.c in PHP:
• <?php
•
• /**
• * conditions-1.php
• *
• * David J. Malan
• * [email protected]
• *
• * Tells user if his or her input is positive, zero, or negative.
• *
• * Demonstrates use of if-else construct.
• */
•
• // ask user for an integer
• $n = readline("I'd like an integer please: ");
•
• // analyze user's input
• if ($n > 0)
• {
• printf("You picked a positive number!\n");
• }
• else if ($n == 0)
• {
• printf("You picked zero!\n");
• }
• else
• {
• printf("You picked a negative number!\n");
• }
•
• ?>
• The syntax for PHP is actually quite similar to that of C. Variable names
in PHP are prefixed with a $. Variables also do not need to be declared
with explicit types because PHP is a loosely typed language. In different
contexts, PHP will implicitly cast variables from one type to
another. readline is a new function, but the if-else construct is identical
to C.
register-0.php
• register0.php is a quick example of commingling PHP and HTML:
• <!DOCTYPE html>
•
• <html>
• <head>
• <title>Frosh IMs</title>
• </head>
• <body>
• <pre>
• <?php print_r($_POST); ?>
• </pre>
• </body>
• </html>
• Within the <pre> HTML tags, we enter PHP mode by inserting
the <?php and ?>. Once we’re in PHP mode, we access a variable
named $_POST. This is an associative array which PHP constructs for you
whenever you pass in data via the POST method. If we had used the GET
method, the data would be available in the$_GET variable. print_r is a
function which prints recursively, meaning it prints everything that’s
nested within a variable. When we pass the $_POST variable toprint_r, we
see the four inputs that the user provided, each with a key that
corresponds to the name attribute of the input. $_POST and $_GET are
known assuperglobal variables because they’re available everywhere.
register-3.php
• In register3.php, we take the extra step of actually e-mailing the user’s
information:
• <?php
•
• /**
• * register-3.php
• *
• * Computer Science 50
• * David J. Malan
• *
• * Implements a registration form for Frosh IMs. Reports
registration
• * via email. Redirects user to froshims-3.php upon error.
• */
•
• // require PHPMailer
• require("PHPMailer/class.phpmailer.php");
•
• // validate submission
• if (!empty($_POST["name"]) && !empty($_POST["gender"]) &&
!empty($_POST["dorm"]))
• {
• // instantiate mailer
• $mail = new PHPMailer();
•
• // use SMTP
• $mail->IsSMTP();
• $mail->Host = "smtp.fas.harvard.edu";
•
• // set From:
• $mail->SetFrom("[email protected]");
•
• // set To:
• $mail->AddAddress("[email protected]");
•
• // set Subject:
• $mail->Subject = "registration";
•
• // set body
• $mail->Body =
• "This person just registered:\n\n" .
• "Name: " . $_POST["name"] . "\n" .
• "Captain: " . $_POST["captain"] . "\n" .
• "Gender: " . $_POST["gender"] . "\n" .
• "Dorm: " . $_POST["dorm"];
•
• // send mail
• if ($mail->Send() == false)
• {
• die($mail->ErrInfo);
• }
• }
• else
• {
• header("Location: http://localhost/src9m/froshims/froshims-
3.php");
• exit;
• }
• ?>
•
• <!DOCTYPE html>
•
• <html>
• <head>
• <title>Frosh IMs</title>
• </head>
• <body>
• You are registered! (Really.)
• </body>
• </html>
• The require function in PHP is similar to the #include directive in C.
First, we check that the user’s inputs are not empty using the
appropriately named functionempty. If they aren’t, we begin using a
library called PHPMailer to create and send an e-mail. To use this
library, we create a new object of type PHPMailer named$mail and we call
the IsSMTP method of that object by writing $mail->IsSMTP(). We set the
mail server to be smtp.fas.harvard.edu and call a few more methods to set
the to and from addresses as well as the subject and body. We know the
names of these methods simply by reading the documentation for
PHPMailer. To construct a body for our message, we use the dot
operator (.) to concatenate the user’s inputs into one long string.
Finally, we call the Send method and voila, we have just registered a user
for freshman intramurals!
• One interesting implication of this is that it’s pretty easy to send e-mails
from any e-mail address to any e-mail address. Be wary!
Model-view-controller
• As you begin to design web applications, you’ll want to think about how
to organize your code. One paradigm for organizing code is called
Model-view-controller (MVC). The View encapsulates the aesthetics of
the website. The Model handles interactions with the database. The
Controller handles user requests, passing data to and from the Model
and View as needed.
• Let’s try to create a course website for CS50 using the MVC framework.
In version 0, the pages are well organized into separate directories, but
there are a lot of files with very similar code.
• To see how we might abstract away some of the logic, let’s jump ahead
to version 5:
• <?php require("../includes/helpers.php"); ?>
•
• <?php render("header", ["title" => "CS50"]); ?>
•
• <ul>
• <li><a href="lectures.php">Lectures</a></li>
• <li><a
href="http://cdn.cs50.net/2013/fall/lectures/0/w/syllabus/syllabus.html
">Syllabus</a></li>
• </ul>
•
• <?php render("footer"); ?>
• Now the header and footer are being automatically generated by the
function render. This is better design because we can change the header
and footer of all the pages within our site just by changing a few lines of
code.
Teaser
• Soon we’ll dive into the world of databases and even implement our own
e-trading website!
Last updated 2013-11-07 00:27:04 PST
Week 9, continued
Andrew Sellergren
Table of Contents
From Last Time
Reimplementing speller
Sessions and Cookies
counter.php
SQL
Race Conditions
JavaScript
dom-0.html
dom-2.html
Ajax
Teaser
SQL
• We introduced SQL last time as a language to interact with databases.
You can think of a database as an Excel spreadsheet: it stores data in
tables and rows.
• There are four basic SQL commands:
o SELECT
o INSERT
o UPDATE
o DELETE
• For Problem Set 7, we’ve set you up with a database and an application
named phpMyAdmin (not affiliated with PHP) to interact with it. In that
database, there is a users table with id, username, and hash columns:
• Presumably, username is unique, so why bother with an id column? An
id is only 32 bits because it’s an integer, whereas username is a variable-
length string. Comparing strings is not as fast as comparing integers, so
lookups by id will be faster than lookups by username.
• SQL supports the following types:
o CHAR
o VARCHAR
o INT
o BIGINT
o DECIMAL
o DATETIME
• A CHAR is a fixed-length string whereas a VARCHAR is a variable-length
string. It’s slightly faster to search on a CHAR than a VARCHAR.
• SQL tables also have indexes:
o PRIMARY
o INDEX
o UNIQUE
o FULLTEXT
• Adding indexes makes searching a table faster. Because we know that
we’ll be using id to look up rows and we know that id will be unique, we
can specify it as a primary key by choosing PRIMARY from the Index
dropdown menu. Although we decided that id should be the lookup field
rather than email, we still want email to be unique, so we
choose UNIQUE from the Index dropdown menu. This tells MySQL that
the same e-mail address should not be inserted more than once.
ChoosingINDEX tells MySQL to build a data structure to make searching
this column more efficient, even though it’s not unique.
Similarly, FULLTEXT allows for wildcard searching on a column.
• Databases can be powered by one of several different engines:
o InnoDB
o MyISAM
o Archive
o Memory
• This is one more design decision that we have to make, but we won’t
trouble ourselves with the particulars for right now.
Race Conditions
JavaScript
• JavaScript is an interpreted programming language that executes
clientside. Whereas PHP is executed on the server, JavaScript is
downloaded by the browser and executed there.
• If you’ve ever been on page that updates its content without reloading,
you’ve seen JavaScript in action. Behind the scenes, it has actually made
another HTTP request.
• JavaScript is also used to manipulate the DOM, the document object
model, the tree of HTML that we saw earlier.
• The syntax for conditions, Boolean expressions, loops, switch
statements is much the same in JavaScript as it is in PHP and C. One
new type of loop exists, however:
• for (var i in array)
• {
• // do this with array[i]
• }
• Functionally, this loop is equivalent to the foreach loop in PHP.
• The syntax for arrays is slightly different:
• var numbers = [4, 8, 15, 16, 23, 42];
• Note that we don’t have the $ prefix for variables anymore. We still don’t
specify a type for the variable, though.
• Another built-in data structure in JavaScript are objects:
• var quote = {symbol: "FB", price: 49.26}
• Objects are similar in functionality to associative arrays in PHP or
structs in C.
• JavaScript Object Notation, or JSON, is a very popular format these
days. If you work with APIs like Facebook’s for your Final Project, you’ll
be passed data in JSON. JSON is quite useful because it is self-
describing: each of the fields in an object is named.
dom-0.html
• Let’s start with a "hello, world" for JavaScript:
• <!DOCTYPE html>
•
• <html>
• <head>
• <script>
•
• function greet()
• {
• alert('hello, ' + document.getElementById('name').value
+ '!');
• }
•
• </script>
• <title>dom-0</title>
• </head>
• <body>
• <form id="demo" onsubmit="greet(); return false;">
• <input id="name" placeholder="Name" type="text"/>
• <input type="submit"/>
• </form>
• </body>
• </html>
• What’s interesting here is that we can embed JavaScript directly in
HTML using the <script> tag. The alert function is a quick and dirty way
of displaying output via a pop-up window. Convention in JavaScript is
to use single quotes for strings, by the way.
• In JavaScript, there exists a special global variable named document that
contains the entire tree structure of the HTML. The document object also
has functions associated with it known as methods. One such method
is getElementById which retrieves an HTML element with the
specified id attribute.
• greet is called when the form is submitted because it is passed to
the onsubmit attribute of the form. After that, we have to return
false because otherwise the form will actually submit and redirect to
whatever its action attribute is.
dom-2.html
• jQuery is a library that adds a great deal of convenience to writing
JavaScript. Take a look at it in dom-2.html:
• <!DOCTYPE html>
•
• <html>
• <head>
• <script src="http://code.jquery.com/jquery-
latest.min.js"></script>
• <script>
•
• $(document).ready(function() {
• $('#demo').submit(function(event) {
• alert('hello, ' + $('#name').val() + '!');
• event.preventDefault();
• });
• });
•
• </script>
• <title>dom-2</title>
• </head>
• <body>
• <form id="demo">
• <input id="name" placeholder="Name" type="text"/>
• <input type="submit"/>
• </form>
• </body>
• </html>
• For now we’ll wave our hands at the first line of JavaScript above.
Basically, it just waits till the document has loaded before executing
anything. $('#demo') is equivalent to document.getElementById('demo').
• One feature of JavaScript that we’re leveraging here is the ability to pass
functions as objects to other functions. The only argument to
the submit method is ananonymous function that takes in its own
argument event.
Ajax
ajax-2.html
• Ajax is the technology that allows us to request external data without a
page refresh. Take a look at ajax-2.html:
• <!--
•
• ajax-2.html
•
• Gets stock quote from quote.php via Ajax with jQuery, embedding result
in page itself.
•
• David J. Malan
• [email protected]
•
• -->
•
• <!DOCTYPE html>
•
• <html>
• <head>
• <script src="http://code.jquery.com/jquery-
latest.min.js"></script>
• <script>
•
• /**
• * Gets a quote via JSON.
• */
• function quote()
• {
• var url = 'quote.php?symbol=' + $('#symbol').val();
• $.getJSON(url, function(data) {
• $('#price').html(data.price);
• });
• }
•
• </script>
• <title>ajax-2</title>
• </head>
• <body>
• <form onsubmit="quote(); return false;">
• Symbol: <input autocomplete="off" id="symbol" type="text"/>
• <br/>
• Price: <span id="price">to be determined</span>
• <br/><br/>
• <input type="submit" value="Get Quote"/>
• </form>
• </body>
• </html>
• The quote function appears to be constructing a URL with a stock symbol
as a parameter. If we visit quote.php?symbol=GOOG directly, we’ll get some
JSON spit out to the screen that includes a stock price. In the JavaScript
above, we’re asking for that JSON programmatically, then passing it to a
function that inserts it into the DOM.
Teaser
• Check out geolocation-0.html which seems to know where you are!
Last updated 2013-11-10 06:44:05 PST
Week 10
Andrew Sellergren
Table of Contents
Announcements and Demos
Hourglass Architecture
The Cycle
Security
A Final Question
Unowned Technologies
Hourglass Architecture
— Arthur C. Clarke
— Leigh Brackett
The Cycle
Security
• This is also a lesson in trust and security. Malicious software has only
gotten more sophisticated, beginning with the likes of the Storm Worm
and progressing to the hardly detectable Stuxnet.
• An amusing anecdote: the Cap’n Crunch Bosun whistle emitted a ton at
the exact frequency that AT&T recognized as an idle line. If you blew it
into the telephone receiver, you could get free long distance! Because
theirs was an owned technology, AT&T could quickly fix this
vulnerability. Vulnerabilities in unowned technologies, for example
viruses, malicious links, or even remote access tools (RATs), cannot be
so easily fixed.
A Final Question
• As a CS50 grad, who are you in this riddle? You have a tool with which
you can change everything. Use it to forge systems that distribute power
rather than focus it.
Last updated 2013-11-14 01:12:38 PST
Final Project
Web Hosting
Security
Session Hijacking
Final Project
• Just to plant one seed in your mind, check out the list of e-mail
addresses for the various cell providers here. With these, you can send
text messages programmatically! Be careful, lest you send some 20,000
text messages mistakenly, as David did during lecture last year.
• Receiving text messages is a little more difficult. You can use the service
provided by textmarks.com. For example, if you send a text message to
41411 like "SBOY mather quad," you’ll get a response from the CS50
Shuttleboy app.
• Consider using Parse as your backend database instead of MySQL!
• CS50 has its own authentication service called CS50 ID. Check out
the manual to see how to verify that a user is someone from Harvard.
Web Hosting
• Check out your options for web hosting if you want your Final Project to
live outside of the Appliance. Namecheap is just one!
• To see who owns a particular domain, you can look it up
using whois from the command line. Under the "Name Servers" heading,
you’ll see a list of servers that are the canonical sources for returning the
IP address of the domain you looked up. When you type in this domain
into your browser, the browser will eventually query these name servers
to find the final IP address. When you register for web hosting, you’ll
need to tell the registrar what your name servers are. Since CS50 uses
DreamHost, you’ll enter in NS1.DREAMHOST.COM,
NS2.DREAMHOST.COM, and NS3.DREAMHOST.COM if you use
CS50’s hosting account.
• SSL stands for secure sockets layer and is indicated by a URL that
begins with https. To use SSL for your own website, you need a unique
IP address for which you’ll have to pay a web hosting company a few
more dollars per month.
Security
• As a random segue into security, check out the first volume of CS50
Flights.
• As we talked about on Monday, it’s important to be careful when
installing software. Often you’ll be prompted to give permission to an
installer as a security measure because it needs to run as an
administrator. This has very serious security implications because you’re
giving this installer the ability to execute almost any command on your
computer.
• The trust you implicitly or explicitly give to the software you run can
easily be abused. Sony got a lot of flak a few years ago for including
rootkits on the CDs they sold. These rootkits would actually hide
themselves so that you couldn’t see they were running if you opened
Task Manager.
• What does the padlock icon on a website mean in terms of security?
Virtually nothing. But we’ve been conditioned to think that a website is
secure when we see that padlock. That means it’s just as easy for an
adversary to put a padlock on his malicious website and trick you into
trusting him.
• Some browsers like Chrome go one step further in showing the owner of
the SSL certificate. When you navigate to Bank of America’s website,
Chrome shows "Bank of America Corporation [US]" in green in the
address bar.
• But how many of you have actually noticed or changed your behavior
because of these security measures?
Session Hijacking
• You can see the actual value of the cookie that Facebook plants on your
computer by using Developer Tools in Chrome. Usually this cookie is
planted when you first visit Facebook. But how did you get to Facebook?
You probably didn’t type "https" to begin with, so you must have be
redirected to the SSL-enabled version of the website. During that
redirection, your cookie was forwarded along. If a bad guy is on the
same network on you, he may be able to intercept this cookie while
you’re being redirected. This attack is called session hijacking.
• A bad guy could even intercept your HTTP request and respond with his
own fake version of Facebook in order to steal your credentials. This
attack is called man in the middle.
SQL Injection Attack
Farewell
Farewell
• It has been a pleasure teaching you this semester! We leave you with a
reel of outtakes.
Last updated 2013-11-27 20:07:40 PST