An Intuitive Introduction To Limits

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 108

An Intuitive Introduction To Limits

by Kalid Azad 41 comments

Limits, the Foundations Of Calculus, seem so artificial and weasely: Let


x approach 0, but not get there, yet well act like its there Ugh.
Heres how I learned to enjoy them:
What is a limit? Our best prediction of a point we didnt observe.
How do we make a prediction? Zoom into the neighboring points. If
our prediction is always in-between neighboring points, no matter how
much we zoom, thats our estimate.
Why do we need limits? Math has black hole scenarios (dividing by
zero, going to infinity), and limits give us an estimate when we cant
compute a result directly.
How do we know were right? We dont. Our prediction, the limit, isnt
required to match reality. But for most natural phenomena, it sure seems
to.
Limits let us ask What if?. If we can directly observe a function at a value
(like x=0, or x growing infinitely), we dont need a prediction. The limit
wonders, If you can see everything except a single value, what do you think is
there?.
When our prediction is consistent and improves the closer we look, we
feel confident in it. And if the function behaves smoothly, like most real-world
functions do, the limit is where the missing point must be.

Key Analogy: Predicting A Soccer Ball


Pretend youre watching a soccer game. Unfortunately, the connection is
choppy:

Ack! We missed what happened at 4:00. Even so, whats your prediction for
the balls position?
Easy. Just grab the neighboring instants (3:59 and 4:01) and predict the ball to
be somewhere in-between.
And it works! Real-world objects dont teleport; they move through
intermediate positions along their path from A to B. Our prediction is At 4:00,
the ball was between its position at 3:59 and 4:01. Not bad.
With a slow-motion camera, we might even say At 4:00, the ball was between
its positions at 3:59.999 and 4:00.001.
Our prediction is feeling solid. Can we articulate why?
The predictions agree at increasing zoom levels. Imagine the 3:594:01 range was 9.9-10.1 meters, but after zooming into 3:59.9994:00.001, the range widened to 9-12 meters. Uh oh! Zooming
should narrow our estimate, not make it worse! Not every zoom level
needs to be accurate (imagine seeing the game every 5 minutes), but to
feel confident, there must be some threshold where subsequent zooms
only strengthen our range estimate.
The before-and-after agree. Imagine at 3:59 the ball was at 10
meters, rolling right, and at 4:01 it was at 50 meters, rolling left. What
happened? We had a sudden jump (a camera change?) and now we cant

pin down the balls position. Which one had the ball at 4:00? This
ambiguity shatters our ability to make a confident prediction.
With these requirements in place, we might say At 4:00, the ball was at 10
meters. This estimate is confirmed by our initial zoom (3:59-4:01, which
estimates 9.9 to 10.1 meters) and the following one (3:59.999-4:00.001, which
estimates 9.999 to 10.001 meters).
Limits are a strategy for making confident predictions.

Exploring The Intuition


Lets not bring out the math definitions just yet. What things, in the real world,
do we want an accurate prediction for but cant easily measure?
Whats the circumference of a circle?
Finding pi experimentally is tough: bust out a string and a ruler?
We cant measure a shape with seemingly infinite sides, but we can wonder Is
there a predicted value for pi that is always accurate as we keep increasing the
sides?
Archimedes figured out that pi had a range of

using a process like this:

It was the precursor to calculus: he determined that pi was a number that


stayed between his ever-shrinking boundaries. Nowadays, we have
modern limit definitions of pi.
What does perfectly continuous growth look like?
e, one of my favorite numbers, can be defined like this:

We cant easily measure the result of infinitely-compounded growth. But, if


we could make a prediction, is there a single rate that is ever-accurate? It
seems to be around 2.71828
Can we use simple shapes to measure complex ones?
Circles and curves are tough to measure, but rectangles are easy. If
we could use an infinite number of rectangles to simulate curved area, can we
get a result that withstands infinite scrutiny? (Maybe we can find the area of a
circle.)

Can we find the speed at an instant?


Speed is funny: it needs a before-and-after measurement (distance traveled /
time taken), but cant we have a speed at individual instants? Hrm.

Limits help answer this conundrum: predict your speed when traveling to a
neighboring instant. Then ask the impossible question: whats your predicted
speed when the gap to the neighboring instant is zero?
Note: The limit isnt a magic cure-all. We cant assume one exists, and there
may not be an answer to every question. For example: Is the number of
integers even or odd? The quantity is infinite, and neither the even nor odd
prediction stays accurate as we count higher. No well-supported prediction
exists.
For pi, e, and the foundations of calculus, smart minds did the proofs to
determine that Yes, our predicted values get more accurate the closer we
look. Now I see why limits are so important: theyre a stamp of approval on
our predictions.

The Math: The Formal Definition Of A Limit


Limits are well-supported predictions. Heres the official definition:
means for all real > 0 there exists a real > 0 such that for all x with 0 < |x
c| < , we have |f(x) L| <
Lets make this readable:
Math English

Human English
When we strongly predict that f(c) = L, we mean

means
for all real > 0
for any error margin we want (+/- .1 meters)
there exists a real > 0
there is a zoom level (+/- .1 seconds)
such that for all x with 0 < |x c| < , we have |f(x) where the prediction stays accurate to within the
L| <
error margin

Theres a few subtleties here:


The zoom level (delta, ) is the function input, i.e. the time in the video
The error margin (epsilon, ) is the most the function output (the balls
position) can differ from our prediction throughout the entire zoom level
The absolute value condition (0 < |x c| < ) means positive and
negative offsets must work, and were skipping the black hole itself
(when |x c| = 0).
We cant evaluate the black hole input, but we can say Except for the missing
point, the entire zoom level confirms the prediction f(c) = L. And because f(c)
= L holds for any error margin we can find, we feel confident.
Could we have multiple predictions? Imagine we predicted L1 and L2 for f(c).
Theres some difference between them (call it .1), therefore theres some error
margin (.01) that would reveal the more accurate one. Every function output in
the range cant be within .01 of both predictions. We either have a single,
infinitely-accurate prediction, or we dont.

Yes, we can get cute and ask for the left hand limit (prediction from before
the event) and the right hand limit (prediction from after the event), but we
only have a real limit when they agree.
A function is continuous when it always matches the predicted value (and
discontinuous if not):

Calculus typically studies continuous functions, playing the game Were


making predictions, but only because we know theyll be correct.

The Math: Showing The Limit Exists


We have the requirements for a solid prediction. Questions asking you to
Prove the limit exists ask you to justify your estimate.
For example: Prove the limit at x=2 exists for

The first check: do we even need a limit? Unfortunately, we do: just plugging in
x=2 means we have a division by zero. Drats.
But intuitively, we see the same zero (x 2) could be cancelled from the top
and bottom. Heres how to dance this dangerous tango:
Assume x is anywhere except 2 (It must be! Were making a prediction
from the outside.)
We can then cancel (x 2) from the top and bottom, since it isnt zero.
Were left with f(x) = 2x + 1. This function can be used outside the black
hole.
What does this simpler function predict? That f(2) = 2*2 + 1 = 5.
So f(2) = 5 is our prediction. But did you see the sneakiness? We pretended x
wasnt 2 [to divide out (x-2)], then plugged in 2 after that troublesome item
was gone! Think of it this way: we used the simple behavior from outside the
event to predict the gnarly behavior at the event.
We can prove these shenanigans give a solid prediction, and that f(2) = 5 is
infinitely accurate.
For any accuracy threshold (), we need to find the zoom range () where we
stay within the given accuracy. For example, can we keep the estimate
between +/- 1.0?
Sure. We need to find out where

so

In other words, x must stay within 0.5 of 2 to maintain the initial accuracy
requirement of 1.0. Indeed, when x is between 1.5 and 2.5, f(x) goes from
f(1.5) = 4 to and f(2.5) = 6, staying +/- 1.0 from our predicted value of 5.
We can generalize to any error tolerance () by plugging it in for 1.0 above. We
get:

If our zoom level is = 0.5 * , well stay within the original error. If our error
is 1.0 we need to zoom to .5; if its 0.1, we need to zoom to 0.05.
This simple function was a convenient example. The idea is to start with the
initial constraint (|f(x) L| < ), plug in f(x) and L, and solve for the distance
away from the black-hole point (|x c| < ?). Its often an exercise in algebra.
Sometimes youre asked to simply find the limit (plug in 2 and get f(2) = 5),
other times youre asked to prove a limit exists, i.e. crank through the epsilondelta algebra.

Flipping Zero and Infinity


Infinity, when used in a limit, means grows without stopping. The symbol
is no more a number than the sentence grows without stopping or my
supply of underpants is dwindling. They are concepts, not numbers (for our
level of math, Aleph me alone).
When using in a limit, were asking: As x grows without stopping, can we
make a prediction that remains accurate?. If there is a limit, it means the
predicted value is always confirmed, no matter how far out we look.
But, I still dont like infinity because I cant see it. But I can see zero. With
limits, you can rewrite

as

You can get sneaky and define y = 1/x, replace items in your formula, and then
use

so it looks like a normal problem again! (Note from Tim in the comments: the
limit is coming from the right, since x was going to positive infinity). I prefer
this arrangement, because I can see the location were narrowing in on (were
always running out of paper when charting the infinite version).

Why Arent Limits Used More Often?


Imagine a kid who figured out that Putting a zero on the end made a number
10x larger. Have 5? Write down 5 then 0 or 50. Have 100? Make it 1000.
And so on.
He didnt figure out why multiplication works, why this rule is justified but,
youve gotta admit, he sure can multiply by 10. Sure, there are some edge
cases (Would 0 become 00?), but it works pretty well.
The rules of calculus were discovered informally (by modern standards).
Newton deduced that The derivative of x3 is 3x2 without rigorous justification.
Yet engines whirl and airplanes fly based on his unofficial results.
The calculus pedagogy mistake is creating a roadblock like You must know
Limits before appreciating calculus, when its clear the inventors of calculus
didnt. Id prefer this progression:
Calculus asks seemingly impossible questions: When can rectangles
measure a curve? Can we detect instantaneous change?
Limits give a strategy for answering impossible questions (If you can
make a prediction that withstands infinite scrutiny, well say its ok.)
Theyre a great tag-team: Calculus explores, limits verify. We memorize
shortcuts for the results we verified with limits (ddxx3=3x2), just like we
memorize shortcuts for the rules we verified with multiplication (adding a
zero means times 10). But its still nice to know why the shortcuts are
justified.
Limits arent the only tool for checking the answers to impossible questions;
infinitesimals work too. The key is understanding what were trying to
predict, then learning the rules of making predictions.
Happy math.

Why Do We Need Limits and Infinitesimals?


by Kalid Azad 36 comments

So many math courses jump into limits, infinitesimals and Very Small Numbers (TM)
without any context. But why do we care?
Math helps us model the world. We can break a complex idea (a wiggly curve) into
simpler parts (rectangles):

But, we want an accurate model. The thinner the rectangles, the more accurate the
model. The simpler model, built from rectangles, is easier to analyze than dealing
with the complex, amorphous blob directly.
The tricky part is making a decent model. Limits and infinitesimals help us create
models that are simple to use, yet share the same properties as the original item
(length, area, etc.).
The Paradox of Zero

Breaking a curve into rectangles has a problem: How do we get slices so thin we
dont notice them, but large enough to exist?
If the slices are too small to notice (zero width), then the model appears identical to
the original shape (we dont see any rectangles!). Now theres no benefit the
simple model is just as complex as the original! Additionally, adding up zero-width
slices wont get us anywhere.
If the slices are tiny but measurable, the illusion vanishes. We see that our model is a
jagged approximation, and wont be accurate. Whats a mathematician to do?
We want the best of both: slices so thin we cant see them (for an accurate model)
and slices thick enough to create a simpler, easier-to-analyze model. A dilemma is at
hand!
The Solution: Zero is Relative

The notion of zero is biased by our expectations. Is 0 + i, a purely imaginary


number, the same as zero?
Well, i sure looks like zero when were on the real number line: the real part of i,
Re(i), is indeed 0. Where else would a purely imaginary number go? (How far East is
due North?)
Heres a different brain bender: did your weight change by zero pounds while reading
this sentence? Yes, by any scale you have nearby. But an atomic measurement would
show somemass change due to sweat evaporation, exhalation, etc.
You see, there are two answers (so far!) to the be zero and not zero paradox:

Allow another dimension: Numbers measured to be zero in our dimension


might actually be small but nonzero in another dimension (infinitesimal
approach a dimension infinitely smaller than the one we deal with)

Accept imperfection: Numbers measured to be zero are probably nonzero at


a greater level of accuracy; saying something is zero really means its 0 +/our measurement error (limit approach)

These approaches bridge the gap between zero to us and nonzero at a greater
level of accuracy.
Overview of Limits & Infinitesimals

Lets see how each approach would break a curve into rectangles:

Limits: Give me your error margin (I know you have one, you limited,
imperfect human!), and Ill draw you a curve. Whats the smallest unit on your
ruler? Inches? Fine, Ill draw you a staircasey curve at the millimeter level and
youll never know. Oh, you have a millimeter ruler, do you? Ill draw the curve
in nanometers. Whatever your accuracy, Im better. Youll never see the
staircase.

Infinitesimals: Forget accuracy: theres an entire infinitely small


dimension where Ill make the curve. The precision is totally beyond your reach
Im at the sub-atomic level, and youre a caveman who can barely walk and
chew gum. Its like getting to the imaginary plane from the real one you just
cant do it. To you, the rectangular shape I made at the sub-atomic level is the
most perfect curve youve ever seen.

Limits stay in our dimension, but with just enough accuracy to maintain the illusion
of a perfect model. Infinitesimals build the model in another dimension, and it looks
perfectly accurate in ours.
The trick to both approaches is that the simpler model was built beyond our level of
accuracy. We might know the model is jagged, but we cant tell the difference any
test we do shows the model and the real item as the same.
That trick doesnt work, does it?

Oh, but it does. Were tricked by imperfect but useful models all the time:

Audio files dont contain all the information of the original signal. But can you
tell the difference between a high-quality mp3 and a person talking in the other
room?

Computer printouts are made from individual dots too small to see. Can you tell
a handwritten note from a high-quality printout of the same?

Video shows still images at 24 times per second. This imperfect model is fast
enough to trick our brain into seeing fluid motion.

On and on it goes. We resist because of our artificial need for precision. But audio and
video engineers know they dont need a perfect reproduction, just quality good
enough to trick us into thinking its the original.
Calculus lets us make these technically imperfect but accurate enough models in
math.
Working In Another Dimension

We need to be careful when reasoning with the simplified model. We need to do our
work at the level of higher accuracy, and bring the final result back to our world.
Well lose information if we dont.
Suppose an imaginary number (i) visits the real number line. Everyone thinks hes
zero: after all, Re(i) = 0. But i does a trick! Square me! he says, and they do: i * i =
-1 and the other numbers are astonished.
To the real numbers, it appeared that 0 * 0 = -1, a giant paradox.
But their confusion arose from their perspective they only thought it was 0 * 0 =
-1. Yes, Re(i) * Re(i) = 0, but that wasnt the operation! We want Re(i * i), which is
different entirely! We square i in its own dimension, and bring that result back to
ours. We need to square i, the imaginary number, and not 0, our idea of what i was.
Beware similar mistakes in calculus: we deal with tiny numbers that look like zero to
us, but we cant do math assuming they are (just like treating i like 0). No, we need to
do the math in the other dimension and convert the results back.
Limits and infinitesimals have different perspectives on how this conversion is done:

Limits: Do the math at a level of precision just beyond your detection


(millimeters), and bring it back to numbers on your scale (inches)

Infinitesimals: Do the math in a different dimension, and bring it back to


the standard one (just like taking the real part of a complex number; you take
the standard part of a hyperreal number more later)

Nobody ever told me: Calculus lets you work at a better level of accuracy, with a
simpler model, and bring the results back to our world.
A Real Example: sin(x) / x

Lets try a conceptual example. Suppose we want to know what happens to sin(x) / x
at zero. Now, if we just plug in x = 0 we get a nonsensical result: sin(0) = 0, so we
get 0 / 0 which could be anything.

Lets step back: what does x = 0 mean in our world? Well, if were allowing the
existence of a greater level of accuracy, we know this:

Things that appear to be zero may be nonzero in a different dimension (just like i might
appear to be 0 to us, but isnt)

Were going to say that x can be really, really close to zero at this greater level of
accuracy, but not true zero. Intuitively, you can think of x as 0.000000001, where
the is enough zeros for you to no longer detect the number.
(In limit terms, we say x = 0 + d (delta, a small change that keeps us within our error
margin) and in infinitesimal terms, we say x = 0 + h, where h is a tiny hyperreal
number, known as an infinitesimal)
Ok, we have x at zero to us, but not really. Now we need a simpler model of sin(x).
Why? Well, sine is a crazy repeating curve, and its hard to know whats happening.
But it turns out that a straight line is a darn good model of a curve over short
distances:

Just like we can break a filled shape into tiny rectangles to make it simpler, we can
dissect a curve into a series of line segments. Around 0, sin(x) looks like the line x.
So, we switch sin(x) with the line x. Whats the new ratio?

Well, x/x is 1. Remember, we arent really dividing by zero because in this superaccurate world: x is tiny but non-zero (0 + d, or 0 + h). When we take the limit or
take the standard part it means we do the math (x / x = 1) and then find the
closest number in our world (1 goes to 1).
So, 1 is what we get when sin(x) / x approaches zero that is, we make x as small as
possible so it becomes 0 to us. If x became pure, true zero, then the ratio would be
undefined (and it is at the infinitesimal level!). But were never sure if were at
perfect zero something like 0.00000001 looks like zero to us.

So, sin(x)/x looks like x/x = 1 as far as we can tell. Intuitively, the result makes
sense once we read about radians).
Visualizing The Process

Todays goal isnt to solve limit problems, its to understand the process of solving
them. To solve this example:

Realize x=0 is not reachable from our accuracy; a small but nonzero x is always
available at a greater level of accuracy
Replace sin(x) by a straight line as a simpler model
Do the math with the simpler model (x / x = 1)
Bring the result (1) back into our accuracy (stays 1)

Heres how I see the process:

In later articles, well learn the details of setting up and solving the models.
Caveats: The Trick Doesnt Always Work

Some functions are really jumpy and they might differ on an infinitesimal-byinfinitesimal level. That means we cant reliably bring them back to our world. It looks
like the function is unstable at microscopic level and doesnt behave smoothly.
The rigorous part of limits is figuring out which functions behave well enough that
simple yet accurate models can be made. Fortunately, most of the natural functions
in the world (x, x2, sin, ex) behave nicely and can be modeled with calculus.
Limits Or Infinitesimals?

Logically, both approaches solve the problem of zero and nonzero. I like
infinitesimals because they allow another dimension which seems a cleaner
separation than always just outside your reach. Infinitesimals were the foundation
of the intuition of calculus, and appear inside physics and other subjects that use it.
This isnt an analysis class, but the math robots can be assured that infinitesimals
have a rigorous foundation. I use them because they click for me.

Summary

Phew! Some of these ideas are tricky, and I feel like Im talking from both sides of my
mouth: we want to be simpler, yet still perfectly accurate?
This famous dilemma about being zero sometimes, and non-zero others is a famous
critique of calculus. It was mostly ignored since the results worked out, but in the
1800s limits were introduced to really resolve the dilemma. We learn limits today, but
without understanding the nature of the problem they were trying to solve!
Here are the key concepts:

Zero is relative: something can be zero to us, and non-zero somewhere else
Infinitesimals (another dimension) and limits (beyond our accuracy) resolve the
dilemma of zero and nonzero
We create simpler models in the more accurate dimension, do the math, and bring the
result to our world
The final result is perfectly accurate for us

My goal isnt to do math, its to understand it. And a huge part of grokking calculus is
realizing that simple models created beyond our accuracy can look just fine in our
dimension. Later on well learn the rules to build and use these models. Happy math.
Other Posts In This Series
1. A Gentle Introduction To Learning Calculus
2. How To Understand Derivatives: The Product, Power & Chain Rules
3. How To Understand Derivatives: The Quotient Rule, Exponents, and Logarithms
4. An Intuitive Introduction To Limits
5. Why Do We Need Limits and Infinitesimals?
6. Learning Calculus: Overcoming Our Artificial Need for Precision
7. Prehistoric Calculus: Discovering Pi
8. A Calculus Analogy: Integrals as Multiplication
9. Calculus: Building Intuition for the Derivative
10.Understanding Calculus With A Bank Account Metaphor
11.A Friendly Chat About Whether 0.999... = 1
Posted in Calculus, Math
questions and insights for the article. Thanks!

36 comments

1. Camilo Martin says:


Hi Kalid! =) Wonderful to have those insights as always!
Just one very unimportant correction; you said:
Video shows still images at 24 times per second
However the correct would be to say that Film shows the frames at 24
times/second. Even when converted to digital, it is sped-up to 25 FPS
(PAL video standart) or telecined to 29.97 FPS (NTSC standart, USAs
video)
Note: Its not so simple, there are other standarts and variants, I was
refering to SD video (not HD, which is commonly the double FPS) and
also: digital video on a computer can have any FPS, even floating

numbers and variable frame-rate (a totally complex thing to handle,


easier to get in than to get out of it)
And as Im talking about video, Id like to suggest an idea for an article:
how digital video and image works. Its a fabulous bunch of AHA!
moments when you get the concept behind all of those blocky artifacts
and color schemes other than ye olde RGB.
And I loved to think on infinitesimals in a new way! Thanks! =D
2. Parag Shah says:
Hi Kalid,
Thanks for the wonderful post. I have totally forgotten all my math and
have been thinking of re-learning it (especially from a computer science
perspective).
I found your post very useful, and I think it will also give that little push I
needed to get started.
BTW, I too share your passion for helping others learn. I have aggregated
various open computer science course videos on my website.
3. Murugesh Prabhu says:
Hi.This one is as good as the previous posts! I appreciate ur enthusiasm
in promoting the interest in Math among young readers! I enjoyed every
bit of the article man.Thank you very much.
4. Kalid says:
@Camilo: Ah, thank you for the clarification! Ill change video to film :).
Thatd be a really cool article I dont know too much about the video
formats, but know that MPEG has some really neat technology to make it
compress well.
@Parag: Great, glad you enjoyed the article! Checking out your site now,
thanks for collecting all those links Im hoping to go back and refresh a
lot of my cs knowledge also :).
@Murugesh: Thanks for the support, Ive had a lot of fun trying to get my
brain around these concepts again, but being able to ask Wait, what
does it _really_ mean to me?. Glad it was useful for you!
5. Anonymous says:
I found a few typos:

Paradox of zero: slices so thin we cant _seem_ them (see)


Summary: I feel like Im talking from _boths ides_ of my mouth (both
sides)
Summary: _Heres_ the key concepts (Here are)
Thanks for another great article!
6. Kalid says:
@Anonymous: Youre welcome, and thanks for the corrections! I just
made them now.
7. nanoturkiye says:
Again I started reading and did not realized that I finished a long article!
It was very entertaining. Thank you very much for your efforts.
8. Kalid says:
@nanoturkiye: Youre welcome! Glad you enjoyed the article.
9. Arbie Samong says:
Hi Khalid, props for this great series which I just found out recently and
reading your posts has been a daily habit for me.
Just one confusion in this topic, can you elaborate more on this:

Around 0, sin(x) looks like the line x.

I think of x as the x-axis in the plane that was demonstrated. It could also
be just the variable in the equation. But neither makes sense. I know x/x
is 1, but how come sin(x) is x?
thanks! and more power!
10.

mcmlxxxvi says:

Hello, Kalid,
Very well-written and descriptive. Thank you for giving me a good and
pleasant read on things past and nearly forgotten!
I could only wish that more people like you were teaching in high schools
and universities. Around here, the tutors are often skilled in their field,
but regularly and gravely fail to convey the meaning behind the
definitions, theorems and proofs they teach only the items themselves;
and the educational process plummets.
Arbie:

Around 0, sin(x) looks like the line x.

I believe this means the line y = x. Thus y_1 = sin(x), y_2 = x and y1
~= y2 for x -> 0.
11.

Kalid says:

@Arbie: Wow, I like that functional representation of it! Yes, integrations


are a general applying one function to another, vs. some static
multiplication just to find area (area just limits our creativity/intuition I
think).
Ah, I should be more clear about that the I meant the line y = x, that
is, a 45 degree line extending from the origin. So the equation y = sin(x)
looks very similar to y = x for very small numbers (sin(x) extends 45
degrees from the origin when it first starts off).
Hope this helps!
12.

Kalid says:

@mcmlxxxvi: Glad you enjoyed it, and thanks for the comment. I too
wish there was more emphasis on true understanding vs. the lets learn
enough to pass the next test mentality. Learning the intuition may take
a bit longer than memorizing in the short term, but in the long run it
gives you a more flexible set of knowledge, and not to mention its way
more fun. I sometimes see grades as a curse because rather than being
an indication of knowledge, they become an end in itself vs. the learning
it should represent. Its very hard to test intuition its a gutcheck you
need to ask yourself. But with no grades theres no incentive (carrot or
stick) I dont know the answer, but I too wish there was another way.
13.

Anonymous says:

This paper offers similar views about mathematics education as well as a


criticism of the cultural opinion of mathematics that you might
like.http://maadotorg/devlin/devlin_03_08.html
14.

Kalid says:

@Anonymous: Thank you Ive seen the essay and really like it :).
15.

asdf says:

Smooth Infinitesimal Analysis handles infinitesimals better than NonStandard Analysis:


http://en.wikipedia.org/wiki/Smooth_infinitesimal_analysis

In intuitionistic math, the law of excluded middle is rejected (i.e. not not
A doesnt imply A) so you must provide an algorithm for constructing all
your objects.
There is no general procedure for detecting whether or not 2 objects are
equal. You must explicitly provide an algorithm for showing 2 objects are
equal.
The trichotomy law (a b, a = b) doesnt hold in general.
All functions are continuous. Piecewise functions are nonsensical.
In other words, the continuum is unbreakable into points. Functions
transform the continuum onto the continuum.
With this as our basis, Smooth Infinitesimal Analysis introduces an object
called epsilon.
There is no algorithm to tell whether or not epsilon != 0 or epsilon = 0.
This avoids the first problem entirely.
epsilon^2 = 0 though which gives us a way to get rid of them from our
formulas.
So I view infinitesimals as the glue that makes the continuum
unbreakable and there is no algorithm to decide if the expression
epsilon = 0 or epsilon != 0 is true (see why we have to reject the law of
excluded middle to make this work?).
16.

Kalid says:

@asdf: Wow, really interesting stuff! I like that insight of infinitesimals as


the glue that makes the continuum unbreakable. Great analogy.
17.

Dave says:

Hey, Kalid, Ive just got a quick question to ask.


If you learn calculus via the use of infinitesimals, is it possible to then
make the leap over to using limits? While I doubt it would happen, Id like
to be an amateur mathematician in the vein of Fermat some time and
develop proofs (more as a beauty thing, to be honest), but writing in a
fashion that is contrary to the norm is rather like handing out Spanish
pamphlets in an English neighborhood- they might understand, but they
wont like it.
So, yeah, can you jump from infinitesimals over to limits? From what I can
tell, limits are mainly used because theyre easily to rigorously define an
to keep the constructivist camp from yelling at you.
18.

Kalid says:

@Dave: Great question. I cant say Im completely comfortable with


limits, but I think you can jump back and forth (the Keisler Calculus book
has some examples like this I believe). I think the bigger goal is to figure
out what is being said, i.e. What does this equation equal, within some
level of tolerance?. Limits and infinitesimals are two ways to define that
tolerance threshold, but infinitesimals are easier in that its built in
(and you dont need to explicitly define epsilon, delta, etc.).
19.

werterber says:

Hello, i have silly question. How intuitively explain that cos x/x is
undefind?
There is graf> http://www.wolframalpha.com/input/?i=Plot%5B{cos%5Bx
%5D%2C+x}%2C+{x%2C+-1.0%2C+1.0}%5D
thx
20.

Kalid says:

@werterber: Not a silly question at all! In my head, its saying whats the
ratio of width [cos(x)] to distance traveled (x).
As our distance traveled goes to 0 (we arent moving from the starting
point), cos(x) tends towards 1 were pretty much at the same width. So
it becomes 1 / 0 in my head.
21.

Kostya says:

This comes down to this: we cant possibly describe what we cant


possibly imagine. Thats why it must always be small enough
rectangles of a sort
Interestingly, Brian Greene in his Elegant Universe gives to understand,
that the superstring theory, along with expected resolution of some
fundamental problems, must bring about radical change in mathematical
modes, so that you cant decrease the size of those small rectangles
down to infinity, but that it must have its limit somewhere around the
level Plank constant ~10^-34. After which further decrease will actually
mean increase.
Now every theory serves for some convenience. Therefore, arent we free
to take such approximation with those rectangles, as will serve our
purpose the best? And not bother any more than we can help? Cause
thats what we do anyway.
22.

Anonymous says:

Hey, Kalid You hold a marvelous scape valve from the montains of
unintuitive theorems and corolaries contained in every text-book.Outside,
our memory rests in peace, and the big picture awakes our deep passions
about math.Oh, precious and full of insight scape valve.

23.

kalid says:

@Anon: Thanks :).


24.

Tue Nguyen says:

Thanks Kalid,
Your articles did help me a lot.
By the way, what software do you use to illustrate examples in your
articles (like this one)? Thanks
25.

kalid says:

Thanks Tue I use PowerPoint 2007 to make the diagrams.


26.

skrsccrfrk says:

Hey khalid plz i am getting a doubt !


You said that infinitesimal are the values which we cannot measure ! My
question is can we imgaine infinitesimal ?? According to me , humans can
think only of finite values .so whenever we try to assign a value to
infinitesimal it woud be of finite digits and tat would be against the
defination of infinitesimal So according to me due to the limited
scope of human brain we can never think of what value wud be of
infinitesimal Am i correct plz ????
27.

skrsccrfrk says:

Hi khalid plz i hav a doubt ??


My question is can we think about wat number would be infinitesimal ??
According to me we humans hav a limited horizon of thinking and so we
can just think of finit numbers. So even if we assign any value to
infinitesimal it would be some fiite value and a value smaller than it will
still exist.. So is the limit which we are talking about is the limit of our
brains to comprehend such small amounts ??? Plz help ??
28.

andy521 says:

@kalid
@ kalid
Hi kalid plz i hav a doubt ?? My question is can we think about wat
number would be infinitesimal ?? According to me we humans hav a
limited horizon of thinking and so we can just think of finit numbers.
So even if we assign any value to infinitesimal it would be some fiite
value and a value smaller than it will still exist.. So is the limit which
we are talking about is the limit of our brains to comprehend such small
amounts ??? Plz help ??
29.

prashant sharma says:

very nice, i loved the way, you taught us. Very interesting!
30.

Eric V says:

@Dave
post 17
Regarding your question, If you learn calculus via the use of
infinitesimals, is it possible to then make the leap over to using limits?, I
suppose it is possible for I have (in a way) done it, though I never knew I
was learning infinitesimals.
I must admit that prior to reading this post I have never even heard
about the defined mathematic concept of infinitesimal. I also never took
a formal Calculus course. I originally learned Calc in my AP physics class
in high school. Our teacher (one of the few who truly loved the craft of
teaching and had a passion for what she did) had both the constraint of
putting her Physics class on hold to teach Calc to those who have never
seen it, and also the freedom that brevity provided; she was free to teach
the idea of calculus without the strict procedural rigor that a formal class
drags its pupil through. We learned the basic idea of the integral before
the derivative, heresy in Calc101. Here it is 21 years later and I can still
hear her voice saying Taking the integral just means add up a whole
bunch of things, and taking a differential element of just means cut the
thing into really teenie weenie chunks. We learned the idea of a
derivative as slope of a function without being given 2 points, just one
point and an interval to the next. After seeing what happened as the
interval got smaller we finally visualized slope at a point. Only afterward
were we shown the official formula with a limit in it. I saw it as a
perfectly nice piece of legal-eez that made the rest of the world happy for
me to have learned the right way, and I was enormously grateful our
teacher taught us the intuitive way.
31.

Eric V says:

Fascinating article Kalid!


This is something new for me. After reading this post I started some
research on infinitesimals, and quickly re-affirmed how valuable your
common sense approach is by comparison to an army of equations,
lemmas, and theorems.
My great a-ha moment was your description of infinitesimals as another
dimension, similar to the way imaginary numbers are another dimension
to reals. In a strange way, that may not be obvious at first, it reminded
me of a conundrum I faced learning the history of physics. It seems that
every time we define what an element is -the smallest indivisible
component of a thing- some clever lad comes along later and figures out
way to break that element into something smaller. This means, of
course, that the old thing never was a true element, we just thought it

was. But then what about this new element, how can we know it is the
smallest thing?
A revelation came when I realized that in order to be an element we
dont really need it to be true that you cant break it apart, it just means
that if you do break it down further then it is no longer the same stuff.
Thus the element is really just the smallest possible piece of a thing
WHICH can still be the same thing. E.g. an element of water (H2O) can
be broken down, but it is no longer water, just hydrogen and oxygen
atoms. An atom can be broken down into protons, neutrons, and
electrons, but it is no longer the same stuff as the original atom. A little
chunk of matter (a superstring exhibiting one class of vibration in 10
dimensions) can indeed be broken down, it is just no longer matter. It is
also not exactly energy, but when the stuff comes back together in a
different pattern (the superstring having the same vibration just in a
different dimension) it appears to us as a little chunk of energy.
It seems natural to me to take a cue from the physical world to
comprehend numbers. When we look at an element and it appears weve
hit the limit in terms of breaking it up, but we can go further it just
means we have to view it in a different dimension. Why then could we
not do the same with numbers? Heres a rational number you can only
break it apart but to a certain extent and no smaller. I know you may
object and say take that number and divide by 2, it is smaller and still
rational. But take notice of the irrationals, like sqrt(2). It does exist,
sitting there staring us in the face. It is in between rationals. So how does
there exist any space between rationals? How can the rationals be
broken down finer than it is possible to break them down? Imagine
thinking you understand that atoms are elementary particles, then this
clown Rutherford comes along and experimentally identifies this object
(nucleus) in the middle of an atom.
I say the best way forward is to take as true those things that must be
true and re-evaluate our preconceived notions that have pigeon-holed us
into an apparent paradox. It is difficult and un-nerving. You can be
guaranteed youll get it wrong a few times before you make some
progress, but some progress is far better than the certainty of smaller
minds.
32.

kalid says:

Thanks Eric, thats a really thought-provoking comment.


I think the element analogy is apt, were able to function at a certain
level (water molecules) and while we *can* go to a deeper level
(individual subatomic particles) those details presumably dont change
the measurements were making at the macro level. In the same way,
infinitesimals can bounce around in funny ways but not effect the
numbers one level up. (I.e., when we switch domains, the infinitesimal
part goes to 0.)
Trying to fill in the number line with rationals is another great example.
We have a smooth continuum on the number line, but the rationals are

so sparse theyll never complete it! There must be another way to get to
those in-between numbers, and it isnt by dividing the ones we have into
smaller bits.
33.

John Briggs says:

Kalid
Leibnitz and Newton originated calculus in the 17th century, long before
imaginary numbers were around. Cant we just say that limits are
paradoxical but they work and leave it at that?

A Calculus Analogy: Integrals as Multiplication


by Kalid Azad 111 comments

Integrals are often described as finding the area under a curve. This
description is too narrow: it's like saying multiplication exists to find the area of
rectangles. Finding area is a useful application, but not the purpose. Integrals
help us combine numbers when multiplication can't.
I wish I had a minute with myself in high school calculus:
"Psst! Integrals let us 'multiply' changing numbers. We're used to "3 x 4 = 12",
but what if one quantity is changing? We can't multiply changing numbers, so
we integrate.
You'll hear a lot of talk about area -- area is just one way to visualize
multiplication. The key isn't the area, it's the idea of combining quantities into
a new result. We can integrate ("multiply") length and width to get plain old
area, sure. But we can integrate speed and time to get distance, or length,
width and height to get volume.
When we want to use regular multiplication, but can't, we bring out the big
guns and integrate. Area is just a visualization technique, don't get too caught
up in it. Now go learn calculus!"
That's my aha moment: integration is a "better multiplication" that works on
things that change. Let's learn to see integrals in this light.
Understanding Multiplication

Our understanding of multiplication changed over time:


With integers (3 x 4), multiplication is repeated addition
With real numbers (3.12 x sqrt(2)), multiplication is scaling
With negative numbers (-2.3 * 4.3), multiplication is flipping and scaling
With complex numbers (3 * 3i), multiplication is rotating and scaling
We're evolving towards a general notion of "applying" one number to another,
and the properties we apply (repeated counting, scaling, flipping or rotating)
can vary. Integration is another step along this path.
Understanding Area

Area is a nuanced topic. For today, let's see area as a visual representation of
of multiplication:

With each count on a different axis, we can "apply them" (3 applied to 4) and
get a result (12 square units). The properties of each input (length and length)
were transferred to the result (square units).
Simple, right? Well, it gets tricky. Multiplication can result in "negative area" (3
x (-4) = -12), which doesn't exist.
We understand the graph is a representation of multiplication, and use the
analogy as it serves us. If everyone were blind and we had no diagrams, we
could still multiply just fine. Area is just an interpretation.
Multiplication Piece By Piece

Now let's multiply 3 x 4.5:

What's happening? Well, 4.5 isn't a count, but we can use a "piece by piece"
operation. If 3x4 = 3 + 3 + 3 + 3, then
3 x 4.5 = 3 + 3 + 3 + 3 + 3x0.5 = 3 + 3 + 3 + 3 + 1.5 = 13.5
We're taking 3 (the value) 4.5 times. That is, we combined 3 with 4 whole
segments (3 x 4 = 12) and one partial segment (3 x 0.5 = 1.5).

We're so used to multiplication that we forget how well it works. We can break
a number into units (whole and partial), multiply each piece, and add up the
results. Notice how we dealt with a fractional part? This is the beginning of
integration.
The Problem With Numbers

Numbers don't always stay still for us to tally up. Scenarios like "You drive
30mph for 3 hours" are for convenience, not realism.
Formulas like "distance = speed * time" just mask the problem; we still need to
plug in static numbers and multiply. So how do we find the distance we went
when our speed is changing over time?
Describing Change

Our first challenge is describing a changing number. We can't just say "My
speed changed from 0 to 30mph". It's not specific enough: how fast is it
changing? Is it smooth?
Now let's get specific: every second, I'm going twice that in mph. At 1 second,
I'm going 2mph. At 2 seconds, 4mph. 3 seconds is 6mph, and so on:

Now this is a good description, detailed enough to know my speed at any


moment. The formal description is "speed is a function of time", and means we
can plug in any time (t) and find our speed at that moment ("2t" mph).
(This doesn't say why speed and time are related. I could be speeding up
because of gravity, or a llama pulling me. We're just saying that as time
changes, our speed does too.)
So, our multiplication of "distance = speed * time" is perhaps better written:

where speed(t) is the speed at any instant. In our case, speed(t) = 2t, so we
write:

But this equation still looks weird! "t" still looks like a single instant we need to
pick (such as t=3 seconds), which means speed(t) will take on a single value
(6mph). That's no good.
With regular multiplication, we can take one speed and assume it holds for the
entire rectangle. But a changing speed requires us to combine speed and time
piece-by-piece (second-by-second). After all, each instant could be different.
This is a big perspective shift:
Regular multiplication (rectangular): Take the amount of distance moved
in one second, assume it's the same for all seconds, and "scale it up".
Integration (piece-by-piece): See time as a series of instants, each with
its own speed. Add up the distance moved on a second-by-second basis.
We see that regular multiplication is a special case of integration, when the
quantities aren't changing.
How large is a "piece"?

How large is a "piece" when going piece by piece? A second? A millisecond? A


nanosecond?
Quick answer: Small enough where the value looks the same for the entire
duration. We don't need perfect precision.
The longer answer: Concepts like limits were invented to help us do piecewise
multiplication. While useful, they are a solution to a problem and can distract
from the insight of "combining things". It bothers me that limits are introduced
in the very start of calculus, before we understand the problem they were
created to address (like showing someone a seatbelt before they've even seen
a car). They're a useful idea, sure, but Newton seemed to understand calculus
pretty well without them.
What about the start and end?

Let's say we're looking at an interval from 3 seconds to 4 seconds.


The speed at the start (3x2 = 6mph) is different from the speed at the end
(4x2 = 8mph). So what value do we use when doing "speed * time"?
The answer is that we break our pieces into small enough chunks (3.00000 to
3.00001 seconds) until the difference in speed from the start and end of the
interval doesn't matter to us. Again, this is a longer discussion, but "trust me"
that there's a time period which makes the difference meaningless.
On a graph, imagine each interval as a single point on the line. You can draw a
straight line up to each speed, and your "area" is a collection of lines which
measure the multiplication.
Where is the "piece" and what is its value?

Separating a piece from its value was a struggle for me.


A "piece" is the interval we're considering (1 second, 1 millisecond, 1
nanosecond). The "position" is where that second, millisecond, or nanosecond
interval begins. The value is our speed at that position.
For example, consider the interval 3.0 to 4.0 seconds:
"Width" of the piece of time is 1.0 seconds
The position (starting time) is 3.0
The value (speed(t)) is speed(3.0) = 6.0mph
Again, calculus lets us shrink down the interval until we can't tell the difference
in speed from the beginning and end of the interval. Keep your eye on the
bigger picture: we are multiplying a collection of pieces.
Understanding Integral Notation

We have a decent idea of "piecewise multiplication" but can't really express it.
"Distance = speed(t) * t" still looks like a regular equation, where t and
speed(t) take on a single value.
In calculus, we write the relationship like this:

The integral sign (s-shaped curve) means we're multiplying things pieceby-piece and adding them together.
dt represents the particular "piece" of time we're considering. This is
called "delta t", and is not "d times t".
t represents the position of dt (if dt is the span from 3.0-4.0, t is 3.0).
speed(t) represents the value we're multiplying by (speed(3.0) = 6.0))
I have a few gripes with this notation:
The way the letters are used is confusing. "dt" looks like "d times t" in
contrast with every equation you've seen previously.
We write speed(t) * dt, instead of speed(t_dt) * dt. The latter makes it
clear we are examining "t" at our particular piece "dt", and not some
global "t"
You'll often see
, with an implicit dt. This makes it easy to forget
we're doing a piece-by-piece multiplication of two elements.
It's too late to change how integrals are written. Just remember the higherlevel concept of 'multiplying' something that changes.
Reading In Your Head

When I see

I think "Distance equals speed times time" (reading the left-hand side first) or
"combine speed and time to get distance" (reading the right-hand side first).
I mentally translate "speed(t)" into speed and "dt" into time and it becomes a
multiplication, remembering that speed is allowed to change. Abstracting
integration like this helps me focus on what's happening ("We're combining
speed and time to get distance!") instead of the details of the operation.
Bonus: Follow-up Ideas

Integrals are a deep idea, just like multiplication. You might have some followup questions based on this analogy:
If integrals multiply changing quantities, is there something to divide
them? (Yes -- derivatives)
And do integrals (multiplication) and derivatives (division) cancel? (Yes,
with some caveats).
Can we re-arrange equations from "distance = speed * time" to "speed =
distance/time"? (Yes.)
Can we combine several things that change? (Yes -- it's called multiple
integration)
Does the order we combine several things matter? (Usually not)
Once you see integrals as "better multiplication", you're on the lookout for
concepts like "better division", "repeated integration" and so on. Sticking with
"area under the curve" makes these topics seem disconnected. (To the math
nerds, seeing "area under the curve" and "slope" as inverses asks a lot of a
student).
Reading integrals

Integrals have many uses. One is to explain that two things are "multiplied"
together to produce a result.
Here's how to express the area of a circle:

We'd love to take the area of a circle with multiplication. But we can't -- the
height changes as we go along. If we "unroll" the circle, we can see the area
contributed by each portion of radius is "radius * circumference". We can write
this relationship using the integral above. (See the introduction to calculus for
more details).
And here's the integral expressing the idea "mass = density * volume":

What's it saying? Rho: is the density function -- telling us how dense a material
is at a certain position, r. dv is the bit of volume we're looking at. So we
multiply a little piece of volume (dv) by the density at that position
and add
them all up to get mass.
We'd love to multiply density and volume, but if density changes, we need to
integrate. The subscript V means is a shortcut for "volume integral", which is
really a triple integral for length, width, and height! The integral involves four
"multiplications": 3 to find volume, and another to multiply by density.
We might not solve these equations, but we can understand what they're
expressing.
Onward an upward

Today's goal isn't to rigorously understand calculus. It's to expand our mental
model, and realize there's another way to combine things: we can add,
subtract, multiply, divide... and integrate.
See integrals as a better way to multiply: calculus will become easier, and
you'll anticipate concepts like multiple integrals and the derivative. Happy
math.
1. Matt says:
@Frank:
It might make more sense if you imagine dividing up the area between
the x-axis and the function y=x into many vertical rectangles, and adding
up their areas. The more rectangles you use, the better the
approximation of the area. The idea behind integration is that if I divide
up the area into infinitely many rectangles with infinitely small width, no
matter how far you zoom in, youll never see the difference between
the real shape (which is triangular) and my approximated shape
(which is composed of many rectangles). So its reasonable to say that
the area is in fact the same. Now how exactly do we add up infinitely
many infinitely small things to get a real number? UhKalid?!?
2. Kalid says:
@Peter: Cool, Ill check it out!
@Matt: Thanks for the comment! One of the hardest parts is getting my
head around the idea of accurate enough.
Heres how I think about it. In real life, we hit this all the time: A screen
image is a grid of pixels, yet we can see perfectly smooth shapes like
curves, circles, faces, etc. Similarly, inkjet printers spray a matrix of dots
on a paper, but to us it looks like a smooth unbroken image or line.
The key is realizing that the approximation is only an approximation at
that higher level of accuracy at the level that we work at, it appears
indistinguishable from the real thing. Calculus helps formalize some of

these ideas with limits (informally, two numbers that have a difference
less than our error margin appear the same to us).
Unfortunately, we dont really talk about this much, and we sometimes
say numbers are equal, and sometimes say they arent. Theres a notion
of infinitely small numbers which makes this clearer, and is used in
physics. That is, you can talk about how infinitely small numbers interact
with each other, and with infinity, to give numbers we can detect. A poor
analogy but it may work: A caveman could probably not conceive of an
individual atom, or the gargantuan Avogadros number (6 x 10^23), but
when this tiny particle and huge number combine we can get something
we can detect.
The key is writing this idea down in the language of math: numbers that
are too small and too large for us to detect can interact to give us
numbers we can work with.
3. ram says:
Hi Kalid,
your explanations of the underlying concepts of mathematics do bring
the subject at a democratic lavel, a level on which people communicate,
collaborate and work towards making the subject useful for greater
number of people.
Now coming to the subject, cud i say dat, differentiation is inverse of
integration?
and going by dat if i have to apply differentiation, lets say on the
example of circle, all i have to do is to run a playback, i.e. to peel of all
those tiny rings (or, in other words) thus divide the circle into the tiniest
possible rings.
Once i m done peeling I would measure this ring, to see the result of
differentiation application, which should be 2*pi*r.
BTW, would not I b applying multiplication again, to measure that tiniest
ring, i.e. finding the area of that ring, pi*r^2?
4. Arbie Samong says:
One way I understood the basic integral notation is with my crude
understanding of sets and functional programming. Using the given
example above (speed and time):
Theres an implied set of values of time, and we take a piece of it or a
member of that set. That becomes the slice of time. We then apply it to a
function of time that is speed. This results in another set whose members
are results from each function result using the said function given a slice
(or element of implied set of time) as input. Finally, we apply the
integrate operator, or probably a map to the integrate function; or, to
put it simply, use the integrate function on all members of the resulting
set to return the integrated value.

or something like:
map(integrate, (getSpeed(t) | t <= time_slices))
5. bill says:
note to extend idea by Kalid above:
circumference of a cirle (a 1d distance) = 2 pi r
area of a circle (a 2d area) = pi r ^ 2
(note the integral /derivative of each other.)
surface area of a sphere = 4 pi r ^ 2
volume of a sphere = 4/3 pi r ^3
try this with squares and cubes hint, base it on the shortest distance
from the centre to a side.
how cool is that!

Calculus: Building Intuition for the Derivative


by Kalid Azad 64 comments

How do you wish the derivative was explained to you? Here's my take.
Psst! The derivative is the heart of calculus, buried inside this definition:

But what does it mean?


Let's say I gave you a magic newspaper that listed the daily stock market
changes for the next few years (+1% Monday, -2% Tuesday...). What could you
do?
Well, you'd apply the changes one-by-one, plot out future prices, and buy low /
sell high to build your empire. You could even hire away the monkeys who
currently throw darts at newspapers.
Others call the derivative "the slope of a function" -- it's so bland! Like having
the magic newspaper, the derivative is a crystal ball that lets you see how a
pattern will play out. You can plot the past/present/future, find
minimums/maximums, and yes, staff your simian workforce to pick stocks.
Step away from the gnarly equation. Equations exist to convey ideas:
understand the idea, not the grammar.
Derivatives create a perfect model of change from an imperfect
guess.
This result came over thousands of years of thinking, from Archimedes to
Newton. Let's look at the analogies behind it.
We all live in a shiny continuum

Infinity is a constant source of paradoxes ("headaches"):


A line is made up of points? Sure.
So there's an infinite number of points on a line? Yep.
How do you cross a room when there's an infinite number of points to
visit? (Gee, thanks Zeno).
And yet, we move. My intuition is to fight infinity with infinity. Sure, there's
infinity points between 0 and 1. But I move two infinities of points per second
(somehow!) and I cross the gap in half a second.
Distance has infinite points, motion is possible, therefore motion is in terms of
"infinities of points per second".
Instead of thinking of differences ("How far to the next point?") we can
compare rates ("How fast are you moving through this continuum?").

It's strange, but you can see 10/5 as "I need to travel 10 'infinities' in 5
segments of time. To do this, I travel 2 'infinities' for each unit of time".
Analogy: See division as a rate of motion through a continuum of
points
What's after zero?

Another brain-buster: What number comes after zero? .01? .0001?


Hrm. Anything you can name, I can name smaller (I'll just halve your number...
nyah!).
Even though we can't calculate the number after zero, it must be there, right?
Like demons of yore, it's the "number that cannot be written, lest ye be
smitten".
Call the gap to the next number "dx". I don't know exactly how big it is, but it's
there!
Analogy: dx is a "jump" to the next number in the continuum.
Measurements depend on the instrument

The derivative predicts change. Ok, how do we measure speed (change in


distance)?
Officer: Do you know how fast you were going?
Driver: I have no idea.
Officer: 95 miles per hour.
Driver: But I haven't been driving for an hour!
We clearly don't need a "full hour" to measure your speed. We can take a
before-and-after measurement (over 1 second, let's say) and get your
instantaneous speed. If you moved 140 feet in one second, you're going
~95mph. Simple, right?
Not exactly. Imagine a video camera pointed at Clark Kent (Superman's alterego). The camera records 24 pictures/sec (40ms per photo) and Clark seems
still. On a second-by-second basis, he's not moving, and his speed is 0mph.
Wrong again! Between each photo, within that 40ms, Clark changes to
Superman, solves crimes, and returns to his chair for a nice photo. We
measured 0mph but he's really moving -- he goes too fast for our instruments!
Analogy: Like a camera watching Superman, the speed we measure
depends on the instrument!
Running the Treadmill

We're nearing the chewy, slightly tangy center of the derivative. We need
before-and-after measurements to detect change, but our measurements
could be flawed.
Imagine a shirtless Santa on a treadmill (go on, I'll wait). We're going to
measure his heart rate in a stress test: we attach dozens of heavy, cold
electrodes and get him jogging.
Santa huffs, he puffs, and his heart rate shoots to 190 beats per minute. That
must be his "under stress" heart rate, correct?
Nope. See, the very presence of stern scientists and cold electrodes increased
his heart rate! We measured 190bpm, but who knows what we'd see if the
electrodes weren't there! Of course, if the electrodes weren't there, we
wouldn't have a measurement.
What to do? Well, look at the system:
measurement = actual amount + measurement effect
Ah. After lots of studies, we may find "Oh, each electrode adds 10bpm to the
heartrate". We make the measurement (imperfect guess of 190) and remove
the effect of electrodes ("perfect estimate").
Analogy: Remove the "electrode effect" after making your
measurement
By the way, the "electrode effect" shows up everywhere. Research studies
have theHawthorne Effect where people change their behavior because they
are being studied. Gee, it seems everyone we scrutinize sticks to their diet!
Understanding the derivative

Armed with these insights, we can see how the derivative models change:

Start with some system to study, f(x):


1. Change by the smallest amount possible (dx)
2. Get the before-and-after difference: f(x + dx) - f(x)

3. We don't know exactly how small "dx" is, and we don't care: get the rate
of motionthrough the continuum: [f(x + dx) - f(x)] / dx
4. This rate, however small, has some error (our cameras are too slow!).
Predict what happens if the measurement were perfect, if dx wasn't
there.
The magic's in the final step: how do we remove the electrodes? We have two
approaches:
Limits: what happens when dx shrinks to nothingness, beyond any error
margin?
Infinitesimals: What if dx is a tiny number, undetectable in our number
system?
Both are ways to formalize the notion of "How do we throw away dx when it's
not needed?".
My pet peeve: Limits are a modern formalism, they didn't exist in Newton's
time. They help make dx disappear "cleanly". But teaching them before the
derivative is like showing a steering wheel without a car! It's a tool to help the
derivative work, not something to be studied in a vacuum.
An Example: f(x) = x^2

Let's shake loose the cobwebs with an example. How does the function f(x) =
x^2 change as we move through the continuum?

Note the difference in the last 2 equations:


One has the error built in (dx)
The other has the "true" change, where dx = 0 (we assume our
measurements have no effect on the outcome)
Time for real numbers. Here's the values for f(x) = x^2, with intervals of dx =
1:
1, 4, 9, 16, 25, 36, 49, 64...
The absolute change between each result is:
1, 3, 5, 7, 9, 11, 13, 15...
(Here, the absolute change is the "speed" between each step, where the
interval is 1)
Consider the jump from x=2 to x=3 (3^2 - 2^2 = 5). What is "5" made of?

Measured rate = Actual Rate + Error


5 = 2x + dx
5 = 2(2) + 1
Sure, we measured a "5 units moved per second" because we went from 4 to 9
in one interval. But our instruments trick us! 4 units of speed came from the
real change, and 1 unit was due to shoddy instruments (1.0 is a large jump,
no?).
If we restrict ourselves to integers, 5 is the perfect speed measurement from 4
to 9. There's no "error" in assuming dx = 1 because that's the true interval
between neighboring points.
But in the real world, measurements every 1.0 seconds is too slow. What if our
dx was 0.1? What speed would we measure at x=2?
Well, we examine the change from x=2 to x=2.1:
2.1^2 - 2^2 = 0.41
Remember, 0.41 is what we changed in an interval of 0.1. Our speed-per-unit
is 0.41 / .1 = 4.1. And again we have:
Measured rate = Actual Rate + Error
4.1 = 2x + dx
Interesting. With dx=0.1, the measured and actual rates are close (4.1 to 4,
2.5% error). When dx=1, the rates are pretty different (5 to 4, 25% error).
Following the pattern, we see that throwing out the electrodes (letting dx=0)
reveals the true rate of 2x.
In plain English: We analyzed how f(x) = x^2 changes, found an "imperfect"
measurement of 2x + dx, and deduced a "perfect" model of change as 2x.
The derivative as "continuous division"

I see the integral as better multiplication, where you can apply a changing
quantity to another.
The derivative is "better division", where you get the speed through the
continuum at every instant. Something like 10/5 = 2 says "you have a constant
speed of 2 through the continuum".
When your speed changes as you go, you need to describe your speed at each
instant. That's the derivative.
If you apply this changing speed to each instant (take the integral of the
derivative), you recreate the original behavior, just like applying the daily stock
market changes to recreate the full price history. But this is a big topic for
another day.
Gotcha: The Many meanings of "Derivative"

You'll see "derivative" in many contexts:

"The derivative of x^2 is 2x" means "At every point, we are changing by
a speed of 2x (twice the current x-position)". (General formula for
change)
"The derivative is 44" means "At our current location, our rate of change
is 44." When f(x) = x^2, at x=22 we're changing at 44 (Specific rate of
change).
"The derivative is dx" may refer to the tiny, hypothetical jump to the next
position. Technically, dx is the "differential" but the terms get mixed up.
Sometimes people will say "derivative of x" and mean dx.
Gotcha: Our models may not be perfect

We found the "perfect" model by making a measurement and improving it.


Sometimes, this isn't good enough -- we're predicting what would happen if dx
wasn't there, but added dx to get our initial guess!
Some ill-behaved functions defy the prediction: there's a difference between
removing dx with the limit and what actually happens at that instant. These
are called "discontinuous" functions, which is essentially "cannot be modeled
with limits". As you can guess, the derivative doesn't work on them because
we can't actually predict their behavior.
Discontinuous functions are rare in practice, and often exist as "Gotcha!" test
questions ("Oh, you tried to take the derivative of a discontinuous function,
you fail"). Realize the theoretical limitation of derivatives, and then realize
their practical use in measuring every natural phenomena. Nearly every
function you'll see (sine, cosine, e, polynomials, etc.) is continuous.
Gotcha: Integration doesn't really exist

The relationship between derivatives, integrals and anti-derivatives is nuanced


(and I got it wrong originally). Here's a metaphor. Start with a plate, your
function to examine:
Differentiation is breaking the plate into shards. There is a specific
procedure: take a difference, find a rate of change, then assume dx isn't
there.
Integration is weighing the shards: your original function was "this" big.
There's a procedure, cumulative addition, but it doesn't tell you what the
plate looked like.
Anti-differentiation is figuring out the original shape of the plate from the
pile of shards.
There's no algorithm to find the anti-derivative; we have to guess. We make a
lookup table with a bunch of known derivatives (original plate => pile of
shards) and look at our existing pile to see if it's similar. "Let's find the integral
of 10x. Well, it looks like 2x is the derivative of x^2. So... scribble scribble...
10x is the derivative of 5x^2.".

Finding derivatives is mechanics; finding anti-derivatives is an art. Sometimes


we get stuck: we take the changes, apply them piece by piece, and
mechanically reconstruct a pattern. It might not be the "real" original plate,
but is good enough to work with.
Another subtlety: aren't the integral and anti-derivative the same? (That's what
I originally thought)
Yes, but this isn't obvious: it's the fundamental theorem of calculus! (It's like
saying "Aren't a^2 + b^2 and c^2 the same? Yes, but this isn't obvious: it's
the Pythagorean theorem!"). Thanks to Joshua Zucker for helping sort me out.
Reading math

Math is a language, and I want to "read" calculus (not "recite" calculus, i.e. like
we can recite medieval German hymns). I need the message behind the
definitions.
My biggest aha! was realizing the transient role of dx: it makes a
measurement, and is removed to make a perfect model. Limits/infinitesimals
are a formalism, we can't get caught up in them. Newton seemed to do ok
without them.
Armed with these analogies, other math questions become interesting:
How do we measure different sizes of infinity? (In some sense they're all
"infinite", in other senses the range (0,1) is smaller than (0,2))
What are the real rules about making "dx go away"? (How do
infinitesimals and limits really work?)
How do we describe numbers without writing them down? "The next
number after 0" is the beginnings of analysis (which I want to learn).
AK says:
I just wanted to let you know that I really appreciate the effort you put
into this. I only discovered this website a few days ago, and Ive been
having a blast reading all those intuitive approaches!!
You should consider writing an elementary and highschool book of
mathematics, as well as teaching on khansacademy

Please keep this flowing


and if theres any way we, the
audience, can support you, please do mention how!
kalid says:
@AK: Thanks for the comment really appreciate the support! Im
actually looking at ways to help tap into the community one idea is
getting a little section after each post to share the analogies that worked
(or questions that are still outstanding). Id love certain articles (like the

one on e, for example) to become a living reference about What actually


made it click. Wikipedia is great for strict definitions, Khan and others
for detailed tutorials / practice problems, and Id like to contribute aha!
moments (i.e. the last step that turned the light bulb on). Definitely
something Im looking to develop, Ill be posting on this soon =).
Pat Shaughnessy says:
What a great explanation! It took me back to my days as a Physics major

in college only I wish I had this explanation back then


zaine_ridling says:
Wow, now thats power. Takes a brilliant mind to break complexity down,
making this one of the best sites online!
kalid says:
@Pat: Thanks, glad you liked it! Oh man, how I wish I could go back in
time and give myself some tutorials :).
@Zaine: Thanks, I really appreciate it!
kalid says:
Joshua Zucker emailed me after the comment form ate his reply, pasting
below:
Apparently my long comment on your recent post got eaten somewhere
along the line. Darn.
Anyway, my point was that you really misrepresent integrals. Theyre
easier than derivatives, not harder. Its antiderivatives that are
tough, and although the fundamental theorem says theyre the same as
integrals, the whole point of the theorem is that theres something
meaningful to say there! Well, actually, antiderivatives arent
really tough, its just that were picky about wanting to write them
in terms of certain kinds of functions, which is your break lots of
plates analogy. We know exactly how the pieces were made, so we can
just glue them back together. The hard part is recognizing the brand
name of the plate when were done, not reassembling the plate.
You also seem inconsistent about saying in your intro that you can use
the rate of change to reconstruct the future prices, and then later
saying that putting the pieces back together is hard. Integrals are,
as you say better multiplication you just have to multiply and
add.
There is lots and lots of good stuff in the post too, of course! I
particularly love the idea of the derivative as an inference of what

the perfect tool would measure, from approximations using imperfect


tools. I dont think I ever thought of it as a tool in quite that
sense, and its a useful thing. I mean, I have thought of the
derivative at a point as a local property, and the derivative as an
operator that maps functions to functions, but this feels more like a
caliper that is open to some finite amount and then youre reducing
that amount to see whats going on; it captures more of the limit
process in there.
Oh, one more note: Oddly, Im totally comfortable with the idea of dx
= the next number right after 0, or the jump between adjacent real
numbers, but I am really bothered by the analogy of dividing 5
infinities by 2 infinities of points to get 5/2.
====
Hi Joshua,
Great feedback I think the nuances of integrals vs. anti-derivatives
were previously lost on me :). After a little reading
(http://mathforum.org/library/drmath/view/53755.html) I think Im up to
speed:
* Integration is literally the process of gluing the pieces together
(mechanical, finding the sum of many products)
* Anti-derivatives are the function whose derivative is f (i.e., the brand
as you say)
The essence of the FTOC (which Ive previously missed) is that Integrals
are *computable* from anti-derivatives, which is pretty amazing. Literally
gluing pieces isnt hard, but saying this reconstructed plate is an Ikea
Furjen is the tricky part (realizing what function, easily defined, would
create such an integral).
the idea of the derivative as an inference of what the perfect tool
would measure, from approximations using imperfect tools I love this
concise description, thats exactly it. Yes, in this context its like a little
caliper which is prodding, only to disappear again to help figure out a
greater result. The operator and local property / slope interpretations are
other ones to switch between. When writing this article, I was ruminating
on the purpose of limits, which always bothered me because they were
ignored so often in engineering classes (even though the derivative
wasnt!). In this case, limits were mathematical scaffolding.
The 5 vs 2 infinities doesnt quite sit right with me either its my gut
screaming for there to be some way to move through an infinitude of
points. My analysis knowledge is very limited, but perhaps something like
a Lebesgue measure could capture this notion (that 0-5 is a larger infinite
range than 0-2)? (http://en.wikipedia.org/wiki/Lebesgue_measure).
Really appreciate the discussion, I love refining these thoughts! Ill
update the article soon, as I get my intuitions in order.

===
Josh: I think a better analogy is this:
Integration is piling all the shards on a scale and reading the total.
Antidifferentiation is putting the shards carefully back together in
exactly the right order and recognizing the plate.
Ogbuka chukwuma says:
I dont understand cumulative frenquency so well.pls help
BASSMAN says:
i need more details on how to solve the partial fractions and integrations.
Asmaul Hoque says:
It is realy interesting. I have enjoy it ..nd lear a lot. Today I unmderstood
What is Derivative ? Actually I am searching this but give us. Than you so
much. Please give the this opportunity to learn math.
John Jordan says:
Khalid,
Long-time lurker, first-time poster. Firstly, just wanted to say congrats on
all your work here, really impressive. This is my favourite maths site on
the web; I see the seeds of an educational revolution here. Reminds me
of the time I got a weighty book Applying Maths in the Chemical and
Biological Sciences..I was hoping for an interesting novel, what I got was
almost pure grammar, i.e. I was looking for semantics but all I got was
syntax. Your articles explain the meaning, i.e. utility, of these abstract
notions. Your complex numbers article helped solved the riddle of how
imaginary numbers could be use in the real world, so thanks!
Like the (modified!) analogy for the distinction of integral and antiderivative, which was yet another one of those esoteric relationships that
was never explored in high school; are you going to amend the original
article?
Regards,
John
kalid says:
@Bassman, Ogbuka: Ill take those as suggestions for future topics,
thanks.
@Asmaul: Glad it was helpeful!

@John: Thanks for the note, really appreciate it! I hear you, so many
math explanations just focus on the grammar, like the lifeless language
classes that nobody ever seems to learn from (contrasted with learning a
language by actually being immersed in it and speaking it, vs. trying to
crunch through the rules like a computer).
Im going to update the article right now with the new integral/antiderivative analogy. Thanks again for posting!
Anonymous says:
This came at a pretty good time for me since its publication coincided
with my own autodidactic journey through math! I was fresh into
calc/derrivatives when this came and I skimmed through, initially getting
about half of it. Then while walking my dogs today I got deep into
thinking about really understanding derrivatives after a few plug and
chug sessions, and I begun recalling what you had written (especially
regarding the actual rate+error part) and the superman analogy.
In retrospect it was a good thing I was walking in the barren woods
because the unconcious OOOOOOOOOOHHH! of my aha moment was
so loud. My dogs didnt seem to care though, they were busy pooping
and such.
Thank you, thank you, thank you!
Kalid says:
@Anonymous: Awesome, Im glad the aha! came :). Im planning on
making some changes to the site to help share and discuss the individual
aha! moments, really appreciate the note!
Sebastian Marquez says:
Khalid,
This is great! Derivatives were always out of focus to me but this is
helping clear things up.
Sebastian
kalid says:
@Sebastian: Thanks, glad it helped :).
just a kid says:
Hey Kalid, another great article!
But I noticed something; couldnt you just, instead of even doing all the
other math, just take the exponent of the original number, multiply the
number in front of it and then minus one from the exponent? if you didnt
get that, heres what I mean: the derivative of x^2=2*1(x)^(2-1), which
equates to 2x. It also works in the reverse of finding the original number

using the derivative: 10x^1;10x^(1+1)=10x^2; (10/2)x^2=5x^2.


Should I have put this here, or on your new aha moents and FAQ thingy?
Thank you,
Just a kid.
kalid says:
@just a kid: Thanks for the comment! For posting, either method is fine!
The aha!/FAQ thingy is a way to have longer discussions, since regular
wordpress comments dont have threading (and the discussions could get
hard to follow).
Your shortcut definitely works (take the exponent, decrease by one). Its
neat to see why this works: if were taking the derivative of x^n (x raised
to some power), we make a model like this:
[(x + dx)^n x^n ] / dx
= [(x^n + Something * x^(n-1) * dx + Something2 * x^(n-2) * dx^2 +
) x^n ] / dx
= Something * x^(n-1) + Something2 * x^(n-2) * dx
Most of the other terms go away because we want dx to be zero (i.e.,
assume a perfect model). Were left with
Something * x^(n-1)
And what is the Something? Well, its the number n (this is due to the
Binomial Theorem), more details
here: http://betterexplained.com/articles/how-to-understandcombinations-using-multiplication/
But yep, you got it theres a shortcut to figure out how the derivative
of a regular polynomial (x^n) will behave :).
wm tanksley says:
This is a fair explanation of the theory behind derivatives; but I like how
Wilberger explains and motivates tangent curves (which are directly and
simply related to derivatives). Not only does he NOT use the idea of dx
(which doesnt actually exist in any system of numbers beyond the
integers, since there is no unique number that is closest to zero), but he
winds up defining the theory so that it works on arbitrary algebraic
curves (not only functions).
Check it out look at his (njwilbergers) Math Foundations series on
YouTube. Most people reading here will be able to skip to something like
the episode on doing calculus on the unit circle, but dont expect to
understand EVERYTHING if you do that. The interesting thing is that he
defines this without using limits at all; the essential point is that he uses
the nth degree polynomial that best approximates the surface at that
point (of course, this is the Taylor expansion at that point).

-Wm
kalid says:
@wm: Thanks for the pointer! Ill check it out.
STILL LEARNING says:
Really great stuff. Mathematics is the foundation of all science and
science is the compass to help us navigate the universe. Keep up the
good work. Very much appreciated.
kalid says:
@Still learning: Thanks really appreciate the encouragement!
Sudar says:
Hi Khalid,
Great article. I have always been fascinated by calculus and always
wanted to decipher the true meaning of derivative. Your article gives me
a great insight. However I would beg you to clarify the following
confusion that has arisen.
We all know that derivative of Y = X^2 is 2x. when you calculate values
of y for x=2 and 3, you get y = 4 and 9 respectively. The change in y
here is 9-4 = 5. However if I substitute x= 2 in the derivative function
dy/dx it gives me 2x = 4. you showed us why this difference exists. It is
because of the dx factor (Shoddy instrument). But the reality is that y
changed by 5 units when x changed from 2 to 3. Are you saying that
dy/dx or derivative is not here to calculate rate of change for such large
changes and if you use it for large changes results are inaccurate. Does
that mean that dy/dx can only be used to calculate very small changes.
Earlier I thought if you want to find how a function f(x) is changing w.r.t x
between 2 values without substituting the values, just calculate the
derivative and substitute x but it seems I was wrong?
Also I didnt understand when you say
The derivative is 44 means At our current location, our rate of change is
44.
Change is a relative term. How can there be a change at a current
location. It has always got to be between two locations.
here says:
Heya, I just hopped over to your web-site through StumbleUpon. Not
somthing I would typically browse, but I enjoyed your thoughts none the
less. Thank you for making some thing worth reading through.
wm tanksley says:
Sudar, I understand your confusion.

Your last paragraph is the most important. The differential


MIGHT
be understood as the rate of change at a single point, but it also might be
confusing if you think of it like that. Its important that you see that the
differential is not the same thing as the
difference
. The difference requires two
different point; the differential takes only one point.
Another way to think of the differential is that its the slope of the line
that best approximates the curve at that point. This definition gives you
some surprising algebraic power and it also suggests some other
operations, such as the linear subderivative, which is the _line_ that
best approximates the curve, and from that the quadratic subderivative
(and so on). These are very cleanly defined operations on algebraic
curves, and require only algebra, no analysis or limits.
-Wm
Nikhil Panikkar says:
The derivative is a concept that relates a continuous property( average
change ) to a discrete one (instantaneous change). Even if one had a
perfect instrument to measure instantaneous change, one wouldnt be
able to because of our conception (and consequent definition) of speed.
To properly understand a derivative you would need the concept of a
limit. Limits are to calculus what de Broglies wavelength is to quantum
physics (it bridges the gap between wave and particle properties
between discrete and continuous)
Also limit is not a way to make the derivative work. It is just one
application of limit.
In physics and signal theory certain functions are so complicated that you
have to use limits to define them we call them generalized functions.
The derivative can be taught without limits ( since the derivative deals
with rate of change ) but if you are introducing infitesimals then i think
you could have introduced the limit too.
wm tanksley says:
Nikhil, you do not need limits or infinitesimals to properly understand the
derivative. The derivative is sufficiently understood as the slope of the
line tangent to a curve at a point. This geometric understanding does not
invoke limits or infinitesimals. You can add in limits to this definition to
handle piecewise continuous curves, but as-is this definition can handle
arbitrary curves, rather than being limited to functions.
If one is learning general calculus then infinitesimals are essential; but if
one is learning the derivative they are not, and therefore no limits are
needed. Iverson actually wrote a Calculus text without using limits, and
he only used infinitesimals informally. Its available online
at http://www.jsoftware.com/jwiki/Books. Aside from that oddity, the text
is notable for its computational focus and for its treatment of some

advanced theoretical topics such as fractional integrals (Wikipedia calls


this the differintegral).
-Wm
Nikhil Panikkar says:
Nikhil, you do not need limits or infinitesimals to properly understand
the derivative.
I never said that you need limits to understand the derivative. Read the
last paragraph of my comment.
In your explanation, you talk about infinity and the continuum. What I
was saying is that the conceptual leap from there to that of a limit is
very small. So there is no need to avoid the concept of a limit.
One doesnt need the epsilon delta definition to introduce the concept
of a limit.
Nikhil Panikkar says:
Tanksley , you say that the concept of a limit restricts the definition of a
derivative to functions and makes it inapplicable to arbitrary curves. I did
not follow this. Could you elaborate ?
wm tanksley says:
Nikhail, you said To properly understand a derivative you would need
the concept of a limit. Thats the sentence I was seeking to correct. Your
last paragraph claims that you dont need limits but then implies that you
need infinitesimals, and this is also something I disputed but assuming
your post is not self-contradictory, your claim would imply that you need
infinitesimals in order to understand derivatives _improperly_, and if you
add limits you can understand them _properly_, and theres no other way
to even begin to understand derivatives.
I contradicted this claim by saying that there is another way of
understanding the derivative: the geometric definition. It requires no
limits, no infinitesimals, no continuum. It works not only on smooth
functions, but also on arbitrary smooth curves. (Ill explain in my next
comment.)
You said that I mentioned infinity and the continuum. I didnt mention
either; the only place I can find those concepts is in the original post. I
would also disagree entirely with the original posts take on them; for
example, there is not only one infinitesimal, rather, there are an unknown
number of them, so you cannot iterate through the continuum by adding
just any infinitesimal to a number (if you do this, youll miss points on the
continuum).
On the other hand, I do agree that one does not need epsilon-delta to
introduce limits; one can introduce limits for other purposes. Or one can

introduce limits for their own sake. But this has nothing to do with the
topic of understanding the derivative.
-Wm
Nikhil Panikkar says:
You said that I mentioned infinity and the continuum. I didnt mention
either; the only place I can find those concepts is in the original post.
Yes I was referring to the original post. I just stumbled upon this article
while doing a google search. and assumed that you were its author. Now I
have explored the site, and discovered it was Kalid.
I was thinking about your comment The derivative is sufficiently
understood as the slope of the line tangent to a curve at a point.. I have
some doubts regarding this definition. But I will first wait for your
comment on the application of the derivative to arbitrary curves and how
the limit restricts this applicability.
wm tanksley says:
(Note: I hope the LaTeX below works. I wish there were a preview
mode)
I claimed that using limits and infinitesimals to define the derivative led
to restricting ourselves to functions, while using the geometric definition
of the derivative allowed arbitrary curves rather than only functions.
(There are other advantages; for example, using the geometric definition
allows you to reason about derivatives of curves over arbitrary fields
rather than only the continuum.)
Recall that the geometric definition of the derivative is the slope of the
line tangent to the curve at any point on the curve. First let me
distinguish a function from a curve. Every function is a curve, but a
function has at most one value per input, while a curve can have any
number of values. We can consider the subset of general curves called
the algebraic curves, consisting of the Cartesian graphs of the
polynomials of the appropriate number of variables for the dimension
were examining; analytic curves are also amenable to this analysis, or
curves on other coordinate systems.
And a simple example of that is the classical unit circle. In order to find
the derivative of the unit circle using limits, one has to split the circle into
upper and lower halves. If one uses the geometric definition, however,
there is only one curve, and computing a formula for its tangent line is
simple algebra. The result is a formula for the tangent line to the circle at
every point on the plane (sometimes called the first order
semiderivative), and its easy to see how to extract the slope of that
line.
The algebra one performs in order to extract this is to evaluate the curve
at
, where r and s are variables representing arbitrary

numbers, then express the result in terms of powers of x and y, and


finally evaluate that at
, thereby giving a net effect of adding
and subtracting zero and rewriting the expression in terms of powers
of
and
. (This action substitutes for adding and subtracting an
infinitesimal, but we need no assumption that infinitesimals exist.) If the
original curve was algebraic it will also be analytic, and so the rewritten
result will be a Taylor expansion.
Now, to find the slope of the tangent line, one needs only to see that the
equation of the tangent line is the equation setting all the zeroth and first
order terms in the Taylor expansion to zero (and discarding all the higher
order terms); and the equation of the slope of that line is simply the
coefficient of divided by the coefficient of .
So, lets compute the first order semiderivative of the unit circle.
The curve is
curve
to
therefore

. Evaluating at
, we get the translated
, which expands
. The Taylor expansion is
.

To find the equation of the tangent line (the first-order semiderivative


with respect to x and y), we set the zeroth and first order terms of the
Taylor expansion equal to zero:
.
Putting this in the standard y=mx+b line, we
get
as the equation of the line tangent to the
unit circle at (r,s). Therefore, the derivative of the unit circle curve at the
point
on the circle is
for all points where
.
This follows directly for all algebraic curves, and can be confirmed for all
analytic curves. For non-analytic curves, it can be shown that we can
approximate the derivative as closely as desired.
-Wm
wm tanksley says:
Sorry about the LaTex. Ugly.
wm tanksley says:
Nikhil said: But I will first wait for your comment on the application of
the derivative to arbitrary curves and how the limit restricts this
applicability.
Thank you for reminding me that I said that I forgot to explain that
part.
I just explained how to apply the geometric definition of the derivative to
arbitrary algebraic curves. More complex curves are also available, and

there are proofs that the geometric definition yields both exact solutions
and a simple method for deriving approximations.
I also explained one obvious way in which the geometric definition is
superior, in that it allows derivatives of curves that arent simple
functions. But I didnt explain in what aspects the infinitesimal definition
of the derivative is inadequate. Notice that Im not trying to say that its
bad or wrong, or that its ALWAYS inadequate; rather, Im pointing out
some specific problems that hinder certain uses. Also notice that Im not
complaining about limits; Im talking specifically about the use of
infinitesimals in the definition of the derivative. Limits may still be useful
(for example, I mentioned piecewise smooth functions, whose derivatives
require limits).
The most interesting problem is that infinitesimals require the use of the
continuum, and not all numbers are embedded in a continuum. The
rationals are very useful for most purposes; and floating point
computation is a use of a special type of rational number. There are other
infinite fields as well, and obviously the finite fields cannot be
approached with limits at all (but are quite easily approached with
geometry). And yes, the definition of algebraic curve applies over any
field, finite or infinite, so this method will find its derivative. Complex
numbers are reachable as well in fact, you can probably see that the
equation I derived for the tangent line has values over the entire plane,
not just on the unit circle, and in fact those values are geometrically
meaningful.
There are more interesting results as well. The tangent line is interesting
and useful, but there are also tangent conics, cubics, and so on.
-Wm
Nikhil Panikkar says:
Thank you Tanksley, for your explanation. But I have to admit, there is a
lot in the above explanation that I am not familiar with( like the first order
semiderivative ) , so Ill have to go through it step by step. I hope youll
stay on the site to clarify my doubts!
In the mean time, can we discuss your earlier comment The derivative is
sufficiently understood as the slope of the line tangent to a curve at a
point. ?
Lets say we want draw a tangent to a curve. This raises the question
what is a tangent.
1. Lets say the tangent a point is a line that best approximates the curve
at the point. This raises the question what is meant by best
approximation ?
2. A simplified answer to this question would be that it should have the
same value at the point as the curve.

So if your function is y = x^2. The then at x = 2, y = 4. But you can draw


any number of lines through the point (2,4). So how do you go from
there?
For a circle or a conic you can draw the a line from the centre or the focii
and then define the tangent as the line that is perpendicular to this line.
But how would you draw a tangent to an arbitrary curve (one that has no
centre or focii)?
I would also like to know if my statements 1 and 2 are correct, or do they
need some mathematical refinement.
Nikhil Panikkar says:
Tanksley, I went through your last post again and I think I am beginning
to understand the definition of the derivative as the slope of the tangent.
Statement 1 should be the tangent is the best first order approximation
to the curve at a point ie it should have the same value as the original
curve and also the sam rate of change at that point.
So, if my function is x^3, and I want to draw a tangent at
change in y = (x + a)^3 x^3
=3a^2 x + 3a x^2 + a^3
The zeroeth order approximation is found by setting the first and second
degree terms to zero. This would be y0 = a^3.This has the same value as
the function at x=a.
The first order approximation is found by setting the second degree
terms to zero. This would be y1 = 3 a^2 x + a^3. The slope of this line
has the same value as the derivative of the function at x = a ie 3 a^2.
Intuitively too, this makes sense.Lets consider a body starting from rest
(at t = 0) and undergoing uniform acceleration of 1m per second
squared. When I say this body has an instantaneous velocity of 8m per
second at t = 8, what it means is that the body has a potential to travel
8m per second, if it were moving at a constant velocity of 8m per second,
in either direction. (But this doesnt happen, because by the time the t
becomes 9, the body has already accelerated through 1m per second
squared. So the distance travelled between t = 8 and t = 9 is not 8m.)
Is my understanding correct, or is there something that Ive missed ?
Nikhil Panikkar says:
Theres a mistake in my above post, I said Statement 1 should be the
tangent is the best first order approximation to the curve at a point ie it
should have the same value as the original curve and also the sam rate
of change at that point.

This is what one would expect given the traditional definition of the
tangent.- ie the tangent line to a plane curve at a given point is the
straight line that just touches the curve at that point.
But if you look at the equation to the tangent line derived in my last post,
y1 = 3a^2x + a^3,
at x = a, y = 4 a ^ 3. The point (a, 4 a ^ 3), does not lie on the curve y =
x^3.
So is there something wrong with the definition, or is there something
Ive missed ?
wm tanksley says:
Im really sorry, but Im just not able to get the time to reply this
weekend. Youre on the right track in general (in fact, Im quite
impressed, given the tiny bit of explanation Ive been able to give); but
theres more to do.
If you dont mind, Im going to point you to a YouTube video where a fairly
complex curve is analyzed according to these rules.
http://www.youtube.com/watch?v=i9o0OfvQYmA
Unfortunately, he uses some unusual terms while doing this for
example, he denotes the curve using a polynumber, which he writes as
an array of integers. You may be able to figure how a polynumber is like a
polynomial without explicitly written variables; if you need a better
explanation the previous videos in his series will explain completely. See
the entire playlist at:
http://www.youtube.com/playlist?
list=PL5A714C94D40392AB&feature=plcp
-Wm
Nikhil Panikkar says:
Ok Tanksley, Ill check out the videos, and then Ill post what Ive
understood. But since I am unfamiliar with a lot of what is being
discussed here, Ill need your confirmation to be sure what I understood
is correct. Ill wait for your comment.
And thank you for pointing me to these videos. Its a new approach for
me integrating algebra, geometry and calculus. The only hindrance is
my own less rigorous math background. So Ill have to go through it step
by step. I hope you will stay on the site to comment on my progress.
wm tanksley says:
No problem, Ill be here.

And if it helps any, the playlist I pointed you to is MathFundamentals,


so its no problem not knowing math. He starts at counting with tallies, if
you want to start at the beginning.
If you wanted an advanced playlist, he got one on universal hyperbolic
geometry and another on algebraic topology. Whew!
-Wm
Alex says:
Im uncertain about calling the derivative a better division, although its
better than continuous division. Id probably call it generalised
division. It does follow the pattern (established with integrals) that the
derivative is about changing quantities. I believe the problematic aspect
of the derivative, is that it is a number at a specific point a, f'(a), but a
function at a generalised point, f'(x).
I do like the 4-step procedure: (1) Choose an interval; (2) Find the raw
change; (3) find the rate of change; and (4) Make your model perfect. But
the limit is not only about making your model perfect, because it is also
used to *simplify* a problem by neglecting the contribution of a certain
component.
That last step Make your model perfect seems to be what the change
from Hyperreal numbers to Real numbers (by taking the standard part) is
all about. Or at least that was something that immediately sprung to
mind.
Argh. Somebody mentioned the epsilon delta definition. Its not so
much the definition itself (that basically relates input error to output
error), but most explanation are just so.ugh.
Thanks
robin says:
Dear Kalid, thank you very much for your efforts and the time that you
put in to write such excellent explanations of basic math concepts. I am a
student of psychology. I am using this to learn more about calculus and
maths. A note: I believe mathematics pedagogy in schools all over the
world has to radically change. It is essential and beneficial to use 3 D
animations and other visual techniques to impart mathematical ideas.
Until that happens there would always be potential mathematicians who
would never go on to do set theory or matrices or calculus. Also with
better 3 D visual representations and animations of mathematical ideas,
fundamental concepts like calculus, graph theory and even dynamical
systems could be imparted to students at an earlier age.
kalid says:
Thanks Robin, I really appreciate the note. I love it when people in other
fields are able to take away some insights. I definitely think math

pedagogy needs to change, to use other techniques, but really, to just


ask Are we actually learning here?. I feel theres a giant emperors
clothes problem where nobody wants to admit Hey, this concept were
supposed to be teaching its not clicking at an intuitive level, and it
should. 3d visualizations and other tools can help get ideas to really sink
in. Appreciate the note!
kishore says:
one thing that always bothers me is the chicken egg problem. which
comes first, differential equation or the function. Let me give and
example. Take for instance decay laws which is stated in differential
equations. But when you carry out practical experiments, we would plot a
graph and would approximate the graph to a function through curve
fitting techniques. Now where is differential equation fitting in.Because i
can make all the predictions through a function. What is the point in
representing event through differential equations if my function could do
all the job.
wm tanksley says:
kishore, the differential equation doesnt come from a function. It comes
from a model that predicts the observed values. The model happens to
imply certain relationships between physical measurements, which (when
stated mathematically, using known laws of physics, and expressed with
the smallest number of independent variables) often winds up having
integrals and differentials embedded in it hence its a differential
equation.
simo says:
Does not dx ( in our case dx=1) represent acceleration in speed in
interval x=2, x=3?
Between points x=2.8 and x=2.9 is speed
5.7 = 5.6 + 0.1
velpandian says:
Excellent!!!!
I hope there are more people who can explain maths like this, Removes
the fear of maths and puts the joy of learning it.
themythof says:
I find many pluses and minuses with these types of approaches to topics.
Its good because some people find it more approachable. I feel its bad if
they are not able to convert it to a logical mathematical understanding
using mathematical language.
This is one of the most harmful aspects of math education today.
Everyone is focused on pushing standardized testing and standardized

testing destroys the nurturing of problem solving and logical


understanding of concepts because there is no time and everyone has to
play the rat race within education to get the golden ticket to a nice
expensive college. This is what you get when you turn education into a
product and students into consumers. Those students who have
excellent problem solving skills never have time to nurture them and
even wind up having those skills stunted.
My rant aside, no, the slope of the tangent line is not a bland description
at all. The problem here usually is that students dont have a solid
foundation in algebra first, which is a must and then a very good
foundation in pre-calculus.
To see a whole topic on derivatives without a single graph is doing a
disservice.
Good to see students get something here but calculus needs a unified
approach and the understanding of the derivative begins with a strong
foundation in algebra (coordinate geometry) and pre-calculus.
Harpreet Singh says:
I think currently your approach is better than others.
I dont know why peoples want to sandwich the Aha! moments with fast
track academies.
Keep it up! Khalid you have been blessed to reduce complexity to bring in
simplicity.
Thanks.
Soham Chowdhury says:
I really dig your articles and Im going through some of them (mostly
because I like math, and (also) because of the fact that I can have some
eighth-grade swag at knowing calculus :D).
Have you ever considered teaching?
On another note, why dont you use MathJax for your equations? Its so
much better.
Gaurav M Tulsiani says:
Thanks a ton Khalid. I stumbled on betterexplained while searching for
some limits explainations on the web. You are doing an awesome job by
reminding people importance of intuition & beauty of maths. As of now
before starting any new topic I go through better explained to know what
I am going to do.
kalid says:
@Soham: Thanks for the comment! I havent thought about in-person
teaching that much, but it might be something in the future. Id like to
integrate MathJax as well, the only problem is it doesnt work in RSS

feeds/email. I might find a way to use MathJax on the website and fall
back to the images on the other places.
@Gaurav: Awesome, glad youre enjoying the site :).
ansh choubey says:
Aha moment came near reading your fab articles must write a high
school book soon I wanna read more and more. ,. Plzzz do write on
physics too.
kalid says:
Hi Ansh, glad you enjoyed it!
ansh choubey says:
Aha moment came near reading your fab articles must write a high
school book soon I wanna read more and more. ,. Plzzz do write on
physics too.
Boom
Ademilson says:
Congratulations!!! its really a very intuitive explanation!
Analogy is the key! As above so below
Namaste!
Math enthusiast: Northwestern Student says:
This was absolutely great.well done. what an excellent explanation of
calculus also I would like to add this.
I was doing a lot of research and thinking and came to the conclusion
that like you mentioned the integral in some respects is not directly
related to differentiation. More precisely, the definite integral is unrelated
to differentiation, and anti-diffrentiation is the imperfect
reversal(opposite operation) of differentiation (very intuitive). The reason
why is simple: the definite integral computes the signed area under a
curve and the change in position of the original function (i.e (dx/dt) times
(dt) equals dx) which is completely useless if you are trying to find the
original function the antiderivative, however, is useful for that as long
as the constant is defined. the indefinite integral is virtually the same
as an anti-derivative except its syntax actually means nothing in
nature have you ever wondered why there is a dx (or the appropriate
differential) at the end of the integrand even though there are no bounds
of integration??? dx would represent an infinitesimally small width but
since there are no bounds of integration the dx means nothing its a
dummy variable as some would say
great stuff

Adrienne says:
I LOVE THIS. Helps me appreciate math so much more. Youre awesome
Kalid.
elisen says:
thanks
elisen says:
Dear Khalid,
I just wanted to say thank you again and the fact that you are very clever
and how your dream lies in helping other people understand is great.
I havent been listening to my maths classes recently and therefore I
need to do a lot of work.
Your website had made me more confident.
indeed the idea of a rate of change at a point is very confusing.
I want to ask whether your passion for maths or any learning stems from
your curiousity, whether you have read history on your maths.
and i hope you check this out.
this guy is quite smart and uses analogies too.
i want to be able to do amke analogies myself so i can understand and
relate ideas so i can apply them to life and make use of them. because all
learning is precious.
because i am a person who needs to understand,
therefore you ahas help (my mother on the other hand are the ones who
remember and dont question haha, but indeed there are different people,
and their way of learning and how their brain functions, their behaviour,
their attitude to learning approaches is different)
http://www.scotthyoung.com/blog/2007/03/25/how-to-ace-your-finalswithout-studying/
i want to thank you again and how you have left a Question part shows
your dedication.
Yours SIncerely
Elisen
mahendra says:
Voila outstanding ,kudos what a lucid explanation ,keep the great work
flowing
cheers
kalid says:
@mahendra: Thanks!

@Elisen: Really appreciate the note, thank you. (Scott is a friend and I
really like how he breaks down his methods!).
My passion for math (or learning in general) came when I realized how
much simpler an idea could be if we looked at it the right way. Something
which was once confusion becomes simple with the right approach (think
about how difficult multiplication is with Roman numerals, but how easy
it is with decimal numbers). I had this belief that any idea could be made
simple, and its what keeps me going. If something seems difficult, its ok
it just means I havent found the simple version of it yet.
Really glad the site has been helping :).

How To Understand Derivatives: The Product, Power & Chain Rules


by Kalid Azad 40 comments

The jumble of rules for taking derivatives never truly clicked for me. The
addition rule, product rule, quotient rule how do they fit together? What are
we even trying to do?
Heres my take on derivatives:

We have a system to analyze, our function f


The derivative f (aka df/dx) is the moment-by-moment behavior
It turns out f is part of a bigger system (h = f + g)
Using the behavior of the parts, can we figure out the behavior of the
whole?
Yes. Every part has a point of view about how much change it
added. Combine every point of view to get the overall behavior. Each
derivative rule is an example of merging various points of view.
And why dont we analyze the entire system at once? For the same reason you
dont eat a hamburger in one bite: small parts are easier to wrap your head
around.
Instead of memorizing separate rules, lets see how they fit together:

The goal is to really grok the notion of combining perspectives. This


installment covers addition, multiplication, powers and the chain rule. Onward!
Functions: Anything, Anything But Graphs

The default calculus explanation writes f(x) = x^2 and shoves a graph in
your face. Does this really help our intuition?

Not for me. Graphs squash input and output into a single curve, and hide the
machinery that turns one into the other. But the derivative rules are about the
machinery, so lets see it!
I visualize a function as the process input(x) => f => output(y).

Its not just me. Check out this incredible, mechanical targetting computer
(beginning of youtube series).
The machine computes functions like addition and multiplication with gears
you can see the mechanics unfolding!

Think of function f as a machine with an input lever x and an output lever


y. As we adjust x, f sets the height for y. Another analogy: x is the input
signal, f receives it, does some magic, and spits out signal y. Use
whatever analogy helps it click.
Wiggle Wiggle Wiggle

The derivative is the moment-by-moment behavior of the function. What


does that mean? (And dont mindlessly mumble The derivative is the
slope. See any graphs around these parts, fella?)
The derivative is how much we wiggle. The lever is at x, we wiggle it, and
see how y changes. Oh, we moved the input lever 1mm, and the output
moved 5mm. Interesting.
The result can be written output wiggle per input wiggle or dy/dx (5mm /
1mm = 5, in our case). This is usually a formula, not a static value, because it
can depend on your current input setting.
For example, when f(x) = x^2, the derivative is 2x. Yep, youve memorized
that. What does it mean?
If our input lever is at x = 10 and we wiggle it slightly (moving it by dx=0.1 to
10.1), the output should change by dy. How much, exactly?

We know f'(x) = dy/dx = 2 * x


At x = 10 the output wiggle per input wiggle is = 2 * 10 = 20. The
output moves 20 units for every unit of input movement.
If dx = 0.1, then dy = 20 * dx = 20 * .1 = 2
And indeed, the difference between 10^2 and (10.1)^2 is about 2. The
derivative estimated how far the output lever would move (a perfect, infinitely
small wiggle would move 2 units; we moved 2.01).
The key to understanding the derivative rules:

The

Set up your system


Wiggle each part of the system separately, see how far the output moves
Combine the results
total wiggle is the sum of wiggles from each part.

Addition and Subtraction

Time for our first system:

What happens when the input (x) changes?


In my head, I think Function h takes a single input. It feeds the same input to f
and g and adds the output levers. f and g wiggle independently, and dont
even know about each other!
Function f knows it will contribute some wiggle (df), g knows it will contribute
some wiggle (dg), and we, the prowling overseers that we are, know their
individual moment-by-moment behaviors are added:

Again, lets describe each point of view:


The overall system has behavior dh
From fs perspective, it contributes df to the whole [it doesnt know about
g]
From gs perspective, it contributes dg to the whole [it doesnt know
about f]
Every change to a system is due to some part changing (f and g). If we add the
contributions from each possible variable, weve described the entire system.

df vs df/dx

Sometimes we use df, other times df/dx what gives? (This confused me for a
while)
df is a general notion of however much f changed
df/dx is a specific notion of however much f changed, in terms of how
much x changed
The generic df helps us see the overall behavior.
An analogy: Imagine youre driving cross-country and want to measure the fuel
efficiency of your car. Youd measure the distance traveled, check your tank to
see how much gas you used, and finally do the division to compute miles per
gallon. You measured distance and gasoline separately you didnt jump into
the gas tank to get the rate on the go!
In calculus, sometimes we want to think about the actual change, not the ratio.
Working at the df level gives us room to think about how the function
wiggles overall. We caneventually scale it down in terms of a specific input.
And well do that now. The addition rule above can be written, on a per dx
basis, as:

Multiplication (Product Rule)

Next puzzle: suppose our system multiplies parts f and g. How does it
behave?

Hrm, tricky the parts are interacting more closely. But the strategy is the
same: see how each part contributes from its own point of view, and combine
them:
total change in h = fs contribution (from fs point of view) + gs
contribution (from gs point of view)
Check out this diagram:

Whats going on?


We have our system: f and g are multiplied, giving h (the area of the
rectangle)
Input x changes by dx off in the distance. f changes by some amount df
(think absolute change, not the rate!). Similarly, g changes by its own
amount dg. Because f and g changed, the area of the rectangle changes
too.
Whats the area change from fs point of view? Well, f knows he changed
by df, but has no idea what happened to g. From fs perspective, hes the
only one who moved and will add a slice of area = df * g
Similarly, g doesnt know how f changed, but knows hell add as slice of
area dg * f
The overall change in the system (dh) is the two slices of area:

Now, like our miles per gallon example, we divide by dx to write this in terms
of how much x changed:

(Aside: Divide by dx? Engineers will nod, mathematicians will frown.


Technically, df/dx is not a fraction: its the entire operation of taking the
derivative (with the limit and all that). But infinitesimal-wise, intuition-wise, we
are scaling by dx. Im a smiler.)
The key to the product rule: add two slivers of area, one from each point of
view.
Gotcha: But isnt there some effect from both f and g changing
simultaneously (df * dg)?
Yep. However, this area is an infinitesimal * infinitesimal (a 2nd-order
infinitesimal) and invisible at the current level. Its a tricky concept, but (df *

dg) / dx vanishes compared to normal derivatives like df/dx. We vary f and g


indepdendently and combine the results, and ignore results from them moving
together.
The Chain Rule: Its Not So Bad

Lets say g depends on f, which depends on x:

The chain rule lets us zoom into a function and see how an initial change (x)
can effect the final result down the line (g).
Interpretation 1: Convert the rates
A common interpretation is to multiply the rates:

x wiggles f. This creates a rate of change of df/dx, which wiggles g by dg/df.


The entire wiggle is then:

This is similar to the factor-label method in chemistry class:

If your miles per second rate changes, multiply by the conversion factor to
get the new miles per hour. The second doesnt know about the hour directly
it goes through the second => minute conversion.
Similarly, g doesnt know about x directly, only f. Function g knows it should
scale its input by dg/df to get the output. The initial rate (df/dx) gets modified
as it moves up the chain.
Interpretation 2: Convert the wiggle
I prefer to see the chain rule on the per-wiggle basis:
x wiggles by dx, so
f wiggles by df, so
g wiggles by dg

Cool. But how are they actually related? Oh yeah, the derivative! (Its the
output wiggle per input wiggle):

Remember, the derivative of f (df/dx) is how much to scale the initial wiggle.
And the same happens to g:

It will scale whatever wiggle comes along its input lever (f) by dg/df. If we write
the df wiggle in terms of dx:

We have another version of the chain rule: dx starts the chain, which results in
some final result dg. If we want the final wiggle in terms of dx, divide both
sides by dx:

The chain rule isnt just factor-label unit cancellation its the propagation of
a wiggle, which gets adjusted at each step.
The chain rule works for several variables (a depends on b depends on c), just
propagate the wiggle as you go.
Try to imagine zooming into different variables point of view. Starting from
dx and looking up, you see the entire chain of transformations needed before
the impulse reaches g.
Chain Rule: Example Time

Lets say we put a squaring machine in front of a cubing machine:


input(x) => f:x^2 => g:f^3 => output(y)
f:x^2 means f squares its input. g:f^3 means g cubes its input, the value of f.
For example:
input(2) => f(2) => g(4) => output:64
Start with 2, f squares it (2^2 = 4), and g cubes this (4^3 = 64). Its a 6th
power machine:

And whats the derivative?

f changes its input wiggle by df/dx = 2x


g changes its input wiggle by dg/df = 3f^2
The final change is:

Chain Rule: Gotchas

Functions treat their inputs like a blob


In the example, gs derivative (x^3 = 3x^2) doesnt refer to the original x,
just whatever the input was (foo^3 = 3*foo^2). The input was f, and it treats f
as a single value. Later on, we scurry in and rewrite f in terms of x. But g has
no involvement with that it doesnt care that f can be rewritten in terms of
smaller pieces.
In many examples, the variable x is the end of the line.
Questions ask for df/dx, i.e. Give me changes from xs point of view. Now, x
could depend on something deeper variable, but thats not being asked for. Its
like saying I want miles per hour. I dont care about miles per minute or miles
per second. Just give me miles per hour. df/dx means stop looking at inputs
once you get to x.
How come we multiply derivatives with the chain rule, but add them
for the others?
The regular rules are about combining points of view to get an overall picture.
What change does f see? What change does g see? Add them up for the total.
The chain rule is about going deeper into a single part (like f) and seeing if its
controlled by another variable. Its like looking inside a clock and saying Hey,
the minute hand is controlled by the second hand!. Were staying inside the
same part.
Sure, eventually this per-second perspective of f could be added to some
perspective from g. Great. But the chain rule is about diving deeper into fs
root causes.
Power Rule: Oft Memorized, Seldom Understood

Whats the derivative of x^4? 4x^3? Great. You brought down the exponent
and subtracted one. Now explain why!
Hrm. Theres a few approaches, but heres my new favorite: x^4 is really x * x
* x * x. Its the multiplication of 4 independent variables. Each x doesnt
know about the others, it might as well be x * u * v * w.
Now think about the first xs point of view:

It changes from x to x + dx
The change in the overall function is [(x + dx) x][u * v * w] = dx[u * v *
w]
The change on a per dx basis is [u * v * w]
Similarly,
From us point of view, it changes by du. It contributes (du/dx)*[x * v * w]
on a per dx basis
v contributes (dv/dx) * [x * u * w]
w contributes (dw/dx) * [x * u * v]
The curtain is unveiled: x, u, v, and w are the same! The point of view
conversion factor is 1 (du/dx = dv/dx = dw/dx = dx/dx = 1), and the total
change is

In a sentence: the derivative of x^4 is 4x^3 because x^4 has four identical
points of view which are being combined. Booyeah!
Take A Breather

I hope youre seeing the derivative in a new light: we have a system of parts,
we wiggle our input and see how the whole thing moves. Its about combining
perspectives: what does each part add to the whole?
In the follow-up article, well look at even more powerful rules (exponents,
quotients, and friends). Happy math.
1. Gourav says:
Awesome article. The description of the product rule really changed how I
think about them.
Out of curiosity, how do you think your idea of the power rule extends to
negative, fractional and irrational powers? Its a bit harder to think about
since you cant just split them into linear parts.
2. kalid says:
Hi Gourav, thanks for the note. Great question about the negative,
fractional and irrational powers. To follow the analogy, we could use the
chain rule; suppose we have f(x) = x^-3. See x^-3 as shorthand for
1/x^3. We can do:
d/dx x^-3 = d/dx 1/x^3 = d/dx 1/u = -1/u^2 * du/dx
du/dx can be understood intuitively (3x^2), and we divide it by (x^3)^2.
We can see the x powers fight it out as (x-1) 2x = -x 1 [The (x-1)
power is from du/dx, and -2x is from 1/u^2. With x=3, get -3 1 = -4 as
the power]. Notice how we still brought down the 3 (which was in
du/dx). Hope this part made sense.

Once we get to fractional and irrational powers, its probably easiest to


rewrite things in terms of e: x^3.4 = e^[ln(x)*3.4]. From here, we can
use the chain rule and product rule and exponent rule (to be explained
next time) we can get the result. Essentially, even a complex idea like a
fractional exponent can be further broken down. Its something Id like to
write more about its helping to really test my intuition :).
3. Jisoo says:
I found the following website useful for understanding the product rule
using what I already know.
http://woobiola.net/math/calc2b.htm
4. John Paton says:
Nice post Kalid. Ive spend the last couple of hours trying to develop this
machine-like intuition. Any chance that you could also post some
examples on the quotient rule. Ive been trying to work it out on my own
but havent managed to get there. Honestly this is slightly worrying. I feel
that if I truly understood what you are saying then the quotient rule
should be no big deal. Thanks.
5. kalid says:
@John: Thanks for the comment. Intuitively, the quotient rule can be
seen as a variation of the product rule since division is a variation of
multiplication (in my head, multiplying by a quantity that is getting
smaller). So, the quotient rule should look a lot like the product rule (two
slices to take into account), but one of the slices is a shrinking one. Ill
be posting a follow-up soon.
And great gut-check by the way. If a concept isnt clicking deep down it
means theres more intuition to build (and, probably, the explanation can

use some refinement

).

@Jisoo: Great, I like the simple diagrams!


6. Phoenix says:
First of all, great initiative and material. Loved the way u analyse things. I
read this page a couple of months ago.
Recently I also read about binomial series and somehow I was able to
narrate how the power rule was actually derived. So here it goes.
Lets take the simplest function y = x^2. Now what do we mean by
derivative? It is simply the change in the output when we tweak the input
a little.
Now lets take two number x and x+1. Now I want to find out how much y
changes when we change x.

Change = (x+1)^2 x^2


Conventional calculus tells us that it is 2*x.
But the actual value can be obtained by using binomial theorem.
We all know (a+b)^2 formula, = a^2 + b^2 + 2*a*b
Now (x+1)^2 = x^2 + 1 + 2*x
Change = x^2 + 1 + 2*x x^2 = 2*x + 1
Haha , we have arrived at the answer. The calculus value and the actual
value differ by 1. To remove that we apply the rule that the change in
input is very very less when compared to the input value. x>>1.
Applying the above, we can approximate 2*x + 1 as 2*x.
In the same way , I applied the same to x^3 and the difference is 1 +
3*x*1(x+1) which can be approximated as 1 + 3*x(x) which can be
reduced to 3*x since x>>1.
In general, for x^n, we have n+1 terms in the series. Of that we omit all
powers of n upto n-2. We take only x^n and x^n-1 terms. The co-efficient
of x^n-1 is n and hence the power rule is given as
(d/dx) of x^n = n*(x^(n-1)).
Thanks to the admin for invoking a interest in me to solve this. Hope it
helps.
7. kalid says:
@Phoenix: Thanks for the comment. Yep, thats the essence of it to get
more particular, turn the 1 into dx (the amount of change, so its
(x+dx)^2), then do the binomial theorem and throw away the dx at the
end (i.e., assume your change was perfect).
8. Trae Barlow says:
We know f(x) = dy/dx = 2 * x
At x = 10 the output wiggle per input wiggle is = 2 * 10 = 20. The
output
moves 20 units for every unit of input movement.
If dx = 0.1, then dy = 20 * dx = 20 * .1 = 2
Umm what?
If F(x) = x^2 and x=10 then the result of that would be 100, not 20.
If 2*10=2 then the output would move 2 units for every unit of input
movement, not 10, this doesnt make any sense at all.
9. Trae Barlow says:
That said, I speak as someone who passed a college level Calc course but
never understood a lick of anything I was doing (dunno how thats
possible). Simply memorizing everything, I just remember an
overwhelming urge to bash my brains out on my school desk. Actually
getting that same urge now.
Crazy ehh?

10.Trae Barlow says:


If our input lever is at x = 10 and we wiggle it slightly (moving it by
dx=0.1 to 10.1), the output should change by dy. How much, exactly?
We know f(x) = dy/dx = 2 * x
At x = 10 the output wiggle per input wiggle is = 2 * 10 = 20. The
output moves 20 units for every unit of input movement.
If dx = 0.1, then dy = 20 * dx = 20 * .1 = 2
To be clear, let me explain what Im confused with. The output moves 20
units for every unit of input movement. What are we calling a unit? An
integer? A dozen? twenty? (.1? 1? 20?)
The other thing that confuses me is that you go form dy/dx = 5 to a
whole different formula/equation. Consistency would help those not as
literate in mathematics out (like myself) quite a bit.
11. andy says:
what s the payoff in learning all of this the bank account metaphor was
insightful, poupulations models might serve as good example, but also
remember that learning is different is for each individual depending on
what resides in their subconscious
12. kalid says:
@andy: The payoff is understanding something you didnt

before!

Yep, if the analogy works for you, it works.

13. Matthias says:


Thank you! I really wanted to understand how and why these rules work,
not just how to apply them
14. JJ says:
Kalid,
Thanks for your extraordinarily simple explanations of calc! Im currently
a sophomore in high school, and I could have just waited until next year
to take the class, but Ive wanted to learn for too long already! Its
amazing how simple and easy the math of change is! Now I get to make
the semi-intelligent juniors feel dumb for someone a grade below them
knowing more about the subject than they do
15. Vishwas says:
To use calculus on any changing system, is it mandatory, that the system
MUST follow a particular rule of change.
For example, when some system is changing by ratio of 1:2, then one can

find out the change as 2:4 or 100:200 finally. What is the rule of change
in calculus ? If not, will calculus be able to find an accurate answer every
time ?
16. kalid says:
@Matthias: Glad it helped.
@JJ: Awesome, glad youre getting a head start! You got it, the math isnt
much more than algebra, its just seeing how to put the variables
together.
@Vishwas: Calculus is made for instantaneous rates of change, i.e. the
rate of change at a certain moment in time. As you move away from that
moment, the rate of change varies and is no longer accurate [and you
use integration to add-up these constantly-shifting moments].
17. Finne Gillan says:
Still does not make any sense unfortunately including the mechanical
computer video and wiggles. Perhaps its hopeless and I will never
understand calculus despite wanting to. I was lost earlier. If f(x) = x^2
and input is 10. Wiggle of 0.1 gives wiggle of 2.01 and wiggle of 0.01
gives 0.2001. How do these relate to the derivative?
18.Silrak says:
Hi, am working through the tutorials from naught, to gather an
understanding of calculus and within 4 weeks when my course will start.
Is there anyone who might help me excel more than is possible on my
own via skype or phone? Based in Melb, 24.01.14.
19. kalid says:
Hi Silrak, theres a full series on calculus
here: http://betterexplained.com/calculus/which might help.
20.

raju says:

why you neglect df * dg area?


if rate of change is more then error produces in the way of intuition,
i understand 95% about product rule except this part
what a intuition to produt rule! i really like it except neglecting dy * dg
21. Eric V says:
@raju
Great question! This question, and a few others very much like it, gave
me a bit of trouble until I performed a bit of mental ju jitsu to convince
myself I understood it. Ill have a crack at answering this, if it helps great,
if not feel free to ignore me, I wont be offended.

I can answer it with a bit of simple, but creative, algebra. To do so Im


going to need to explain two different sizing operations: the integral ( )
and the differential element (d). I dont want to throw a bunch of
integration at you in trying to explain derivatives, but if you blur out the
strict definitions and just look at the ideas, then and d are just two
different ideas that are a kind of compliment to each other.
add up a whole bunch of things
d take a thing and cut it into a bunch of itty bitty pieces, or i.b.p.
Taking a differential element of a pizza, or d(pizza), is just shaving off a
little bit, its lifting a pepperoni and licking the bottom then replacing it
while nobodys looking. The small amount of pepperoni grease thats
missing is so small compared to the whole size of the pizza that no one
notices its missing.
In that vein Im going to talk about taking some i.b.p.s of a few things.
Start with:
h(x) = f(x) * g(x)
h is the full size of h, the output
dh is an i.b.p. of the output
x is the full size of an input
dx is an i.b.p. of an input
Im going to take a few liberties and do some algebra magic with
quantities like dx, please understand that what Im about to insinuate is
not technically allowed, and if we followed the mathematically correct
path we would perform a whole lot of weird calculus operations just to
come up with the same result, see Kalids Aside note about the
Engineers nodding and the mathematician frowning.
Im going to assume you followed Kalids logic and got to this part:
dh = f*dg + g*df
dh is just an i.b.p. of h, it is the little bit of rectangle you add on to the full
size rectangle of f*g.
f*dg is the vertical sliver rectangle
g*dh is the horizontal sliver rectangle
So it seems your question becomes the following: if I want to describe all
of that i.b.p. of h that Im adding I can see I need to add the two slivers,
but dont I also need to add that little square? Sure its teeny, but the
slivers are teeny too, arent they?
How about we dont disregard it, how about we add it in then see why it
vanishes and the slivers are big enough to stay.
We should have:
dh = f*dg + g*df
but logic tells us we have:
dh = f*dg + g*df + (df*dg) that pesky little square

dh = f*dg + g*df + (df*dg) **now divide both sides by dx


dh/dx = f * dg/dx + g *df/dx + (df*dg)/dx **re-arrange () in 3rd term
dh/dx = f * dg/dx + g *df/dx + df * (dg/dx) **put that 3rd term right after
the 2nd
= f * dg/dx + df * (dg/dx) + g * df/dx **factor out the dg/dx
= (f + df) (dg/dx) + g * df/dx **almost there, compare it to what we
should have:
= (f ) (dg/dx) + g * df/dx
What happens to (f + df) as df gets eensy weensy? It gets really close to
f. The full size of f on its own is indistinguishable from the full size of f
plus an i.b.p. of f. No one sees the little bit of pepperoni grease missing
compared to the full size of the pie (I live in New England, we call pizza
pie up here, we also take rs out of words and stick em other places
they dont belong: pahk the cah, Delter airlines).
With that pesky little square we have:
dh/dx = (f + df) (dg/dx) + g * df/dx
Without we have:
dh/dx = (f ) (dg/dx) + g * df/dx
But these two are the same!

If I havent confused you yet maybe this will throw you off guard (I jest, I
really do want you to understand)!
Heres another reason that little square ( df * dg ) vanishes but the slivers
remain. Lets use an analogy that all of us understand so well on an
intuitive level: Boolean Algebra! (Bang head against wall now)
+ means OR
* means AND
P(a) means probability event a happens
P(a) + P(b) means probability of a or b happening
P(a) * P(b) means probability of a happening AND THEN b happening
That little square is df * dg, its kind of like take a little chunk of f, then a
little chunk of g. Its like licking the pepperoni, AND THEN a little flea
jumps on your tongue and licks a little grease from your tongue. You
dont notice it compared the large little bit of grease you got, which
itself is small compared to the pie. Just df on its own is you licking the
pepperoni. Just dg on its own is the flea licking your tongue. The sum df
+ dg is either you licking the pepperoni or the flea licking your tongue.
But df *dg is licking the pepperoni AND THEN getting licked by the flea, it
means comparing the fleas very little bit of grease to the whole pie.
Hope this helps, or at least that you got the chance to laugh at Calculus
for a little while.
Excelsior,
Eric V
22.

Eric V says:

Sorry if the formatting is hard to follow. Cant use tab so I used a bunch of
spaces instead to keep my = lined up under each other. The Posting
Gnome ate my spaces. He also messed up my comment markers ** at
the end of lines.
23.Melissa says:
AAAHHH LIGHTBULB!!!
24.

Theo A.H. says:

This video helped me get the Chain Rule by working out d/dx(sin(x^2))
compared to d/dx(sin(x)) in a really visual and intuitive
way. http://youtu.be/bcGOZLL1v4Y
25.Bonnie says:
This is definitely written for people who have already taken calculus and
not understood it, versus someone with almost no exposure who is trying
to learn. I dont even think reading over this would be helpful. You
assume I already know all this stuff! Do you know of any good place to
START learning calculus? Maybe I can come back to this site after I
memorize all those rules, if Im still confused.
26.

kalid says:

Hi Bonnie, yep, this lesson is definitely geared for someone in the tail end
of a calculus class. If youre just starting out, check out:
http://betterexplained.com/calculus/lesson-1
Hope that helps!

How To Understand Derivatives: The Quotient Rule, Exponents, and Logarithms


by Kalid Azad 19 comments

Last time we tackled derivatives with a machine metaphor. Functions are a


machine with an input (x) and output (y) lever. The derivative, dy/dx, is how
much output wiggle we get when we wiggle the input:

Now, we can make a bigger machine from smaller ones (h = f + g, h = f * g,


etc.). The derivative rules (addition rule, product rule) give us the overall
wiggle in terms of the parts. The chain rule is special: we can zoom into a
single derivative and rewrite it in terms of another input (like converting miles
per hour to miles per minute were converting the time input).
And with that recap, lets build our intuition for the advanced derivative rules.
Onward!
Division (Quotient Rule)

Ah, the quotient rule the one nobody remembers. Oh, maybe you
memorized it with a song like Low dee high, high dee low, but thats not
understanding!
Its time to visualize the division rule (who says quotient in real life?). The
key is to see division as a type of multiplication:

We have a rectangle, we have area, but the sides are f and 1/g. Input x
changes off on the side (by dx), so f and g change (by df and dg) but how
does 1/g behave?
Chain rule to the rescue! We can wrap up 1/g into a nice, clean variable and
then zoom in to see that yes, it has a division inside.
So lets pretend 1/g is a separate function, m. Inside function m is a division,
but ignore that for a minute. We just want to combine two perspectives:
f changes by df, contributing area df * m = df * (1 / g)
m changes by dm, contributing area dm * f = ?
We turned m into 1/g easily. Fine. But what is dm (how much 1/g changed) in
terms of dg (how much g changed)?
We want the difference between neighboring values of 1/g: 1/g and 1(g + dg).
For example:
Whats the difference between 1/4 and 1/3? 1/12
How about 1/5 and 1/4? 1/20
How about 1/6 and 1/5? 1/30
How does this work? We get the common denominator: for 1/3 and 1/4, its
1/12. And the difference between neighbors (like 1/3 and 1/4) will be 1 /
common denominator, aka 1 / (x * (x + 1)). See if you can work out why!

If we make our derivative model perfect, and assume theres no difference


between neighbors, the +1 goes away and we get:

(This is useful as a general fact: The change from 1/100 to 1/101 = one ten
thousandth)
The difference is negative, because the new value (1/4) is smaller than the
original (1/3). So whats the actual change?
g changes by dg, so 1/g becomes 1/(g + dg)
The instant rate of change is -1/g^2 [as we saw earlier]
The total change = dg * rate, or dg * (-1/g^2)
A few gut checks:
Why is the derivative negative? As dg increases, the denominator gets
larger, the total value gets smaller, so were actually shrinking (1/3 to 1/4
is a shrink of 1/12).

Why do we have -1/g^2 * dg and not just -1/g^2? (This confused me at


first). Remember, -1/g^2 is the chain rule conversion factor between the
g and 1/g scales (like saying 1 hour = 60 minutes). Fine. You still
need to multiply by how far you went on the g scale, aka dg! An hour
may be 60 minutes, but how many do you want to convert?
Where does dm fit in? m is another name for 1/g. dm represents the total
change in 1/g, which as we saw, was -1/g^2 * dg. This substitution trick
is used all over calculus to help split up gnarly calculations. Oh, it looks
like were doing a straight multiplication. Whoops, we zoomed in and saw
one variable is actually a division change perspective to the inner
variable, and multiply by the conversion factor.
Phew. To convert our dg wiggle into a dm wiggle we do:

And get:

Yay! Now, your overeager textbook may simplify this to:

and it burns! It burns! This simplification hides how the division rule is just a
variation of the product rule. Remember, theres still two slivers of area to
combine:
The f (numerator) sliver grows as expected

The g (denominator) sliver is negative (as g increases, the area gets


smaller)
Using your intuition, you know its the denominator thats contributing the
negative change.
Exponents (e^x)

e is my favorite number. It has the property

which means, in English, e changes by 100% of its current amount (read


more).
The current amount assumes x is the exponent, and we want changes from
xs point of view (df/dx). What if u(x)=x^2 is the exponent, but we still want
changes from xs point of view?

Its the chain rule again we want to zoom into u, get to x, and see how a
wiggle of dx changes the whole system:
x changes by dx
u changes by du/dx, or d(x^2)/dx = 2x
How does e^u change?
Now remember, e^u doesnt know we want changes from xs point of view. e
only knows its derivative is 100% of the current amount, which is the exponent
u:

The overall change, on a per-x basis is:

This confused me at first. I originally thought the derivative would require us to


bring down u. No the derivative of e^foo is e^foo. No more.
But if foo is controlled by anything else, then we need to multiply the rate of
change by the conversion factor (d(foo)/dx) when we jump into that inner point
of view.
Natural Logarithm

The derivative is ln(x) is 1/x. Its usually given as a matter-of-fact.


My intuition is to see ln(x) as the time needed to grow to x:

ln(10) is the time to grow from 1 to 10, assuming 100% continuous


growth
Ok, fine. How long does it take to grow to the next value, like 11? (x + dx,
where dx = 1)
When were at x=10, were growing exponentially at 10 units per second. It
takes roughly 1/10 of a second (1/x) to get to the next value. And when were
at x=11, it takes 1/11 of a second to get to 12. And so on: the time to the next
value is 1/x.
The derivative

is mainly a fact to memorize, but it makes sense with a time to grow


intepreration.
A Hairy Example: x^x

Time to test our intuition: whats the derivative of x^x?

This is a bad mamma jamma. Theres two approaches:


Approach 1: Rewrite everything in terms of e.
Oh e, youre so marvelous:

Any exponent (a^b) is really just e in different clothing: [e^ln(a)]^b. Were just
asking for the derivative of e^foo, where foo = ln(x) * x.
But wait! Since we want the derivative in terms of x, not foo, we need to
jump into xs point of view and multiply by d(foo)/dx:

The derivative of ln(x) * x is just a quick application of the product rule. If


h=x^x, the final result is:

We wrote e^[ln(x)*x] in its original notation, x^x. Yay! The intuition was
rewrite in terms of e and follow the chain rule.

Approach 2: Independent Points Of View


Remember, deriviatives assume each part of the system works independently.
Rather than seeing x^x as a giant glob, assume its made from two interacting
functions: u^v. We can then add their individual contributions. Were sneaky
though, u and v are the same (u = v = x), but dont let them know!
From us point of view, v is just a static power (i.e., if v=3, then its u^3) so we
have:

And from vs point of view, u is just some static base (if u=5, we have 5^v).
We rewrite into base e, and we get

We add each point of view for the total change:

And the reveal: u = v = x! Theres no conversion factor for this new viewpoint
(du/dx = dv/dx = dx/dx = 1), and we have:

Its the same as before! I was pretty excited to approach x^x from a few
different angles.
By the way, use Wolfram Alpha (like so) to check your work on derivatives
(click show steps).
Question: If u were more complex, where would we use du/dx?
Imagine u was a more complex function like u=x^2 + 3: where would we
multiply by du/dx?
Lets think about it: du/dx only comes into play from us point of view (when v
is changing, u is a static value, and it doesnt matter that u can be further
broken down in terms of x). us contribution is

if we wanted the dx point of view, wed include du/dx here:

Were multiplying by the du/dx conversion factor to get things from xs point
of view. Similarly, if v were more complex, wed have a dv/dx term when
computing vs point of view.
Look what happened we figured out the genric d/du and converted it into a
more specific d/dx when needed.
Its Easier With Infinitesimals

Separating dy from dx in dy/dx is against the rules of limits, but works great
with infinitesimals. You can figure out the derivative rules really quickly:
Product rule:

We set df * dg to zero when jumping out of the infinitesimal world and back
to our regular number system.
Think in terms of How much did g change? How much did f change? and
derivatives snap into place much easier. Divide through by dx at the end.
Summary: See the Machine

Our goal is to understand calculus intuition, not memorization. I need a few


analogies to get me thinking:
Functions are machines, derivatives are the wiggle behavior
Derivative rules find the overall wiggle in terms of the wiggles of each
part
The chain rule zooms into a perspective (hours => minutes)
The product rule adds area
The quotient rule adds area (but one area contribution is negative)
e changes by 100% of the current amount (d/dx e^x = 100% * e^x)
natural log is the time for e^x to reach the next value (x units/sec means
1/x to the next value)
With practice, ideas start clicking. Dont worry about getting tripped up I still
tried to overuse the chain-rule when working with exponents. Learning is a
process!
Happy math.

Appendix: Partial Derivatives

Lets say our function depends on two inputs:

The derivative of f can be seen from xs point of view (how does f change with
x?) or ys point of view (how does f change with y?). Its the same idea: we
have two independent perspectives that we combine for the overall behavior
(its like combining the point of view of two Solipsists, who think theyre the
only real people in the universe).
If x and y depend on the same variable (like t, time), we can write the
following:

Its a bit of the chain rule were combining two perspectives, and for each
perspective, we dive into its root cause (time).
If x and y are otherwise independent, we represent the derivative along each
axis in a vector:

This is the gradient, a way to represent From this point, if you travel in the x
or y direction, heres how youll change. We combined our 1-dimensional
points of view to get an understanding of the entire 2d system. Whoa.

1. Joe says:
When will Math curriculums begin combining concepts in meaningful
ways like this? Calculus classes like to split Power Rule, Quotient Rule,
and Chain Rule into discrete sections, when really theyre consequences
of the same basic idea. Perhaps its less labor-intensive teaching distinct
formulas to be memorized, but its just another reason people hear
Calculus and immediately glaze over.
And while Im lamentingyour mention of infinitesimals brings up another
sore spot of mine. A Calc TA told me how separating dy/dx is against
the rules, as you say, and I took it to heart. Imagine poor, confused me a
couple semesters later in DiffEq: I thought this was against the rules!
The limit-based approach to teaching Calculus needs some serious
revision, particularly for non-mathematicians moving into practical fields.
2. kalid says:

@Joe: I hear you we slice and dice concepts and miss the cohesive
whole. All the calculus rules are just examples of how different subparts
can contribute to the whole, but Im only seeing that now, 10+ years
after high school. Ugh.
And yeah theres so much dont do this, I dont know why, but dont!
in math. Why is it against the rules? What are the rules? Limits are a
seatbelt introduced to address theoretical concerns many, many years
after Calculus was put into use. Learning about seatbelts is fine, but dont
dive into them before you explain what a car [i.e., calculus] is!
1. Jackson says:
Thank you for the time youve put into these articles theyve helped me a
lot and Im glad to know there are people who care about intuition and
share it, but Im confused about your intuition of the natural log. Why is
the derivative always predicting the next increment by one? Why not .5?
Shouldnt it be infitestimally small because it is using the input of the a
naturally growing function?
2. kalid says:
Thanks Jackson, great question. Intuitively, think about taking a single
step forward, which is 1*dx. Another way of seeing it: when taking the
derivative, we split our continuous function into discrete steps (a single
dx wide at each step) and see our rate of change when we increment by
the next dx.
An analogy: we represent a photo with individual pixels (dx) and step
through one pixel at a time. The pixels are chosen at an infinitely small
retina resolution where we dont notice them at the macro scale.
(Theres more on limits later in this series.)
3. Jackson says:
Sorry for my inconvienience but Im confused how you got 1/x. Wouldnt
the derivative be dx/x because dx would be the change and x would be
the current value as dx approach 0.Im just confused why 1=dx instead
of approaching 0.
4. kalid says:
No worries, great question, I realize it can be unclear. I start with
scenarios where dx = 1 (which is a GIANT step) to estimate results in
my head. Then, I can set dx = 0 (taking the limit) to get an exact
prediction.
Lets say I want the derivative of x^2. I imagine going from 10^2 to
11^2 (we jumped from x=10 to x=11, so dx=1). The difference is 21, or
2x + dx (20 + 1). I can then set dx = 0 and get the exact answer of 2x.
(If there was no gap between x and the next value, the derivative would
be 2x.)

The natural log is harder to compute: its the time e^x needs to grow
from 1 to x. How does it change?
Imagine going from 10 to 11 (again, dx=1). Here, were at 10 and we
grow exponentially up to 11. Since e^x assumes were growing at 100%
of our current value, it takes 1/10 of a unit time to get to 11. (10 +
(1/10)*10 = 11).
Now, this isnt *quite* accurate because as were going to 11, were
getting faster. I.e., when were at 10.5 were growing at 10.5 units per
unit time, not the 10 we expected. Removing the imaginary dx fixes this
(we assume there is no midpoint between x=10 and the next value, so it
really is a perfect 1/x amount of time we wait).

Understanding Calculus With A Bank Account Metaphor


by Kalid Azad 32 comments

Calculus examples are boring. "Hey kids! Ever wonder about the distance,
velocity, and acceleration of a moving particle? No? Well you're locked in here
for 50 minutes!"
I love physics, but it's not the best lead-in. It makes us wait till science class
(9th grade?) and worse, it implies calculus is "math for science class". Couldn't
we introduce the themes to 5th graders, and relate it to everyday life?
I think so. So here's the goal:
Use money, not physics, to introduce calculus concepts
Explore how patterns relate (bank account to salary; salary to raises)
Use our intuition to explore potential issues (can we keep drilling into
patterns?)
Strap on your math helmet, time to dive in.
Money money money

My favorite calculus example is the relationship between your bank account,


salary, and raises.
Here's Joe ("Hi, Joe"). You, the sly scoundrel you are, sneak onto Joe's computer
and monitor his bank account each week. What can you learn?

Ack. Clearly, not much happened -- Joe isn't earning anything. And what if you
see this?

Easy enough: Joe's making some money. And how much? With a quick
subtraction, we can figure out his weekly paycheck. Turns out Joe is making a
steady $100/week.
Key idea: If I know your bank account, I know your salary
The bank account is dependent on the salary -- it changes because of the
weekly salary.
Raise the roof

Let's go deeper: knowing the salary, what else can we figure out? Well, the
salary is another pattern to analyze -- we can see if it changes! That is, we can
tell if Joe's salary is changing week by week (is he getting a raise?).
The process:
Look at Joe's weekly bank account
Take the difference in bank account to get the weekly salary
Take the difference in salary to get the weekly raise (if any)
In the first example ($100/week), it's clear there's no raise (sorry, Joe). The
main idea is to "take the difference" to analyze the first pattern (bank account
to salary) and "take the difference again" to find yet another pattern (salary to
raise).
Working backwards

We just went "down", from bank account to salary. Does it work the other way:
knowing the salary, can I predict the bank account?
You're hesitating, I can tell. Yes, knowing Joe gets $100/week is nice. But...
don't we need to know the starting account balance?
Yes! The changes to his account (salary) is not enough -- where did it start? For
simplicity (i.e., what you see in homework problems) we often assume Joe
starts with $0. But, if you are actually making a prediction, you want to know
the initial conditions (the "+ C").
A More Complex Pattern

Let's say Joe's account grows like this: 100, 300, 600, 1000, 1500...

What's going on? Is it random? Well, we can do our week-by-week subtraction


to get this:

Interesting -- Joe's income is changing each week. We do another week-byweek difference and get this:

And yep, Joe's getting a steady raise of $100/week. Let's get wild and chart
them on the same graph:

One way to think about it: Joe gets a raise each week, which changes his
salary, which changes his bank account. As the raises continue to appear, his
salary continues to increase and his bank account rises. You can almost think
of the raise "pushing up" the salary, which "pushes up" the bank account.
So... Where's the Calculus?

What's the formula for Joe's bank account for any week? Well, it's the sum of
his salaries up to that point:
100 + 200 + 300 + 400... = 100 * n * (n + 1)/2
The formula for adding up a series of numbers (1 + 2 + 3 + 4...) is very close
to n^2/2, and gets closer as the number of steps increases.
This is our first "calculus" relationship:
A constant raise ($100/week) leads to a...
Linear increase in salary (100, 200, 300, 400) which leads to a...
Quadratic (something * n^2) increase in bank account (100, 300, 600,
1000... you see it curve!)
Now, why is it roughly 1/2 * n2 and not n2? One intuition: The linear increase in
salary (100, 200, 300) gives us a triangle. The area of the triangle represents
all the payments so far, and the area is 1/2 * base * height. The base is n (the
number of weeks) and the height (income) is 100 * n.
Geometric arguments get more difficult in higher dimensions -- just because
we can work out 2*100 with addition doesn't mean it's the easiest way.
Calculus gives us the rules to jump between patterns (taking derivatives and
integrals).

Points to Explore

Our understanding of bank accounts, salaries, and raises lets us explore


deeper.
Could we figure out the total earnings between weeks 1 and 10?
Sure! There's two ways: we could add up our income for each week (week 1
salary + week 2 salary + week 3 salary...) or just subtract the bank account
(Week 10 bank account - week 1 bank account). This idea has a beefy name:
the Fundamental Theorem of Calculus!
Can we keep going "down" (taking derivatives) beyond the raise?
Well, why not? If the raise is $100/week, if we take the difference again we see
it drops to 0 (there is no "raise raise", aka the raise is always steady). But, we
can imagine the case where the raise itself is raising (week1 raise = 100,
week2 raise = 200). Using our intuition: if the "raise raise" is constant, the
raise is linear (something * n), the income is quadratic (something * n 2) and
the bank account is cubic (something * n3). And yes, it's true!
Can derivatives go on forever?
Yep. Maybe the connection is bank account => salary => raise => inflation
=> milk output of Farmer Joe's cow => how much Joe feeds the cow each
week. Many patterns "stop having derivatives" once we get to the root cause.
But certain interesting patterns, like exponential growth, have an infinite
number of components! You have interest, which earns interest, which earns
interest, which earns interest... forever! You can never find the single "root
cause" of your bank account because an infinite number of components went
into it (pretty trippy).
What happens if the raise goes negative?
Interesting question. As the raise goes negative, his salary will start lowering.
But, as long as the salary is above zero, the bank account will keep rising!
After all, going from $200 to $100 per week, while bad to you, still helps your
bank account. Eventually, a negative raise will overpower the salary, making it
negative, which means Joe is now paying his employer. But up until that point,
Joe's bank account would be growing.
How quickly can we check for differences?
Suppose we're measuring a stock portfolio, not a bank account. We might want
a second-by-second model of our salary and account balance. The idea is to
measure at intervals short enough to get the detail we need -- a large aspect
of calculus is deciding what "limit" is enough to say "Ok, this is accurate
enough for me!".

The calculus formulas you typically see (integral of x = 1/2 * x^2) are different
from the "discrete" formulas (sum of 1 to n = 1/2 * n * (n + 1)) because the
discrete case is using "chunky" intervals.
Key Takeaways

Why do I care about the analogy used? The traditional "distance, velocity,
acceleration" doesn't lead to the right questions. What's the next derivative of
acceleration? (It's called "jerk", and it's rarely used). Such a literal example is
like having kids think multiplication is only for finding area, and only works on
two numbers at a time.
Here's the key points:
Calculus helps us find related patterns (bank account, to salary, to raises)
The "derivative" is going "down" (finding week-by-week changes to get
your salary)
The "integral" is going "up" (adding up your salary to get your bank
account)
We can figure out a formula for a pattern (given my bank account, predict
my salary) or get a specific value (what's my salary at week 3?)
Calculus is useful outside the hard sciences. If you have a pattern or
formula (production rate, size of a population, GDP of a country) and
want to examine its behavior, calculus is the tool for you.
Textbook calculus involves memorizing the rules to derive and integrate
formulas. Learn the basics (x^n, e, ln, sin, cos) and leave the rest to
machines. Our brainpower is better spent learning how to translate our
thoughts into the language of math.
In my fantasy world, derivatives and integrals are just two everyday concepts.
They're "what you can do" to formulas, just like addition and subtraction are
"what you can do" to numbers.
"Hey kids, we find the total mass using addition (Mass1 + Mass2 = Mass3).
And to find out how our position changes, we use the derivative".
"Duh -- addition is how you combine stuff. And yeah, you take the derivative to
see how your position is changing. What else would you do?"
One can always dream. Happy math.
1. Johann says:
Hi Kalid,
The thing about physics is that its more appropriate for
describing continuousvariations. Money is a discrete process and thus
a case of non-continuous variations
But I agree it is a nice view to explain it, and maybe link it to the boring
use made in physics (probably because its always presented in the same
way).
Again, great reading ! Keep up the good work

Bests,
Johann
2. Prudhvi says:
ingenious!
its true that a lot of people get bored by the direct physics application of
calculus if its introduced too soon
money seems like a natural way to understand it
of course it might not be able to so easily give an intuitive
understanding of the more grainy aspects of calculus, but whatever,
thanks!
3. Kalid says:
@Johann: Thanks for dropping by, and great point about continuous vs.
discrete. The funny thing is that many physicists treat the formulas as
discrete (i.e. using infinitesimal dx, dy, dz quantities to make a 3d
cube, for example) and then let it disappear to make it continuous
again. The neat thing is that using discrete quantities really shows how
the error margin is there (the difference between the actual sum of
squares and 1/2 * x^2) and how limits / Riemann sum help us shrink this.
I agree though, that physics would be cool if it were shown to be an
example of these general principles (and not the definition as is often
seen).
@Prudhvi: Yep, theres always details that you cant get to when you
make analogies. But you have to start somewhere :).
4. MJ says:
If someone had outright told me at any point in Calc I or Calc II that the
+ C can be thought of as an initial condition, I might have actually
remembered to tack it to the end of integrals, instead of considering it an
arbitrary annoyance that has little context.
That makes so much sense (and yet is so, so, so, painfully obvious), that
its not even funny.
5. Kalid says:
@MJ: Thanks for the comment yeah, its *way* too easy to think of the
+C as some mathematical details to keep track of, instead of something
_needed_ to figure out how to make your model work.

A Gentle Introduction To Learning Calculus


by Kalid Azad 319 comments

I have a love/hate relationship with calculus: it demonstrates the beauty of


math and the agony of math education.
Calculus relates topics in an elegant, brain-bending manner. My closest
analogy is Darwins Theory of Evolution: once understood, you start seeing
Nature in terms of survival. You understand why drugs lead to resistant germs
(survival of the fittest). You know why sugar and fat taste sweet (encourage
consumption of high-calorie foods in times of scarcity). It all fits together.
Calculus is similarly enlightening. Dont these formulas seem related in some
way?

They are. But most of us learn these formulas independently. Calculus lets us
start with circumference = 2 * pi * r and figure out the others the Greeks
would have appreciated this.
Unfortunately, calculus can epitomize whats wrong with math
education. Most lessons feature contrived examples, arcane proofs, and
memorization that body slam our intuition & enthusiasm.
It really shouldnt be this way.
Math, art, and ideas

Ive learned something from school: Math isnt the hard part of math;
motivation is.Specifically, staying encouraged despite
Teachers focused more on publishing/perishing than teaching
Self-fulfilling prophecies that math is difficult, boring, unpopular or not
your subject
Textbooks and curriculums more concerned with profits and test results
than insight
A Mathematicians Lament [pdf] is an excellent essay on this issue
that resonated withmany people:

if I had to design a mechanism for the express purpose of destroying a


childs natural curiosity and love of pattern-making, I couldnt possibly do as
good a job as is currently being done I simply wouldnt have the imagination
to come up with the kind of senseless, soul-crushing ideas that constitute
contemporary mathematics education.
Imagine teaching art like this: Kids, no fingerpainting in
kindergarten. Instead, lets study paint chemistry, the physics of light, and
the anatomy of the eye. After 12 years of this, if the kids (now teenagers) dont
hate art already, they may begin to start coloring on their own. After all, they
have the rigorous, testable fundamentals to start appreciating art. Right?
Poetry is similar. Imagine studying this quote (formula):
This above all else: to thine own self be true, and it must follow, as night
follows day, thou canst not then be false to any man. William Shakespeare,
Hamlet
Its an elegant way of saying be yourself (and if that means writing
irreverently about math, so be it). But if this were math class, wed be counting
the syllables, analyzing the iambic pentameter, and mapping out the subject,
verb and object.
Math and poetry are fingers pointing at the moon. Dont confuse the
finger for the moon. Formulas are a means to an end, a way to express a
mathematical truth.
Weve forgotten that math is about ideas, not robotically manipulating the
formulas that express them.
Ok bub, whats your great idea?

Feisty, are we? Well, heres what I wont do: recreate the existing textbooks. If
you need answers right away for that big test, theres plenty of websites, class
videos and 20-minute sprints to help you out.
Instead, lets share the core insights of calculus. Equations arent
enough I want the aha! moments that make everything click.
Formal mathematical language is one just one way to communicate. Diagrams,
animations, and just plain talkin can often provide more insight than a page
full of proofs.
But calculus is hard!

I think anyone can appreciate the core ideas of calculus. We dont need to be
writers to enjoy Shakespeare.
Its within your reach if you know algebra and have a general interest in math.
Not long ago, reading and writing were the work of trained scribes. Yet today
that can be handled by a 10-year old. Why?

Because we expect it. Expectations play a huge part in whats possible.


So expect that calculus is just another subject. Some people get into the nittygritty (the writers/mathematicians). But the rest of us can still admire whats
happening, and expand our brain along the way.
Its about how far you want to go. Id love for everyone to understand the core
concepts of calculus and say whoa.
So whats calculus about?

Some define calculus as the branch of mathematics that deals with limits and
the differentiation and integration of functions of one or more variables. Its
correct, but not helpful for beginners.
Heres my take: Calculus does to algebra what algebra did to arithmetic.
Arithmetic is about manipulating numbers (addition, multiplication,
etc.).
Algebra finds patterns between numbers: a^2 + b^2 = c^2 is a
famous relationship, describing the sides of a right triangle. Algebra finds
entire sets of numbers if you know a and b, you can find c.
Calculus finds patterns between equations: you can see how one
equation (circumference = 2 * pi * r) relates to a similar one (area = pi *
r^2).
Using calculus, we can ask all sorts of questions:
How does an equation grow and shrink? Accumulate over time?
When does it reach its highest/lowest point?
How do we use variables that are constantly changing? (Heat, motion,
populations, ).
And much, much more!
Algebra & calculus are a problem-solving duo: calculus finds new equations,
and algebra solves them. Like evolution, calculus expands your
understanding of how Nature works.
An Example, Please

Lets walk the walk. Suppose we know the equation for circumference (2 * pi *
r) and want to find area. What to do?
Realize that a filled-in disc is like a set of Russian dolls.

Here are two ways to draw a disc:


Make a circle and fill it in
Draw a bunch of rings with a thick marker
The amount of space (area) should be the same in each case, right? And how
much space does a ring use?
Well, the very largest ring has radius r and a circumference 2 * pi * r. As the
rings get smaller their circumference shrinks, but it keeps the pattern of 2 * pi
* current radius. The final ring is more like a pinpoint, with no circumference at
all.

Now heres where things get funky. Lets unroll those rings and line them
up. What happens?

We get a bunch of lines, making a jagged triangle. But if we take thinner


rings, that triangle becomes less jagged (more on this in future articles).
One side has the smallest ring (0) and the other side has the largest ring
(2 * pi * r)
We have rings going from radius 0 to up to r. For each possible radius
(0 to r), we just place the unrolled ring at that location.
The total area of the ring triangle = 1/2 base * height = 1/2 * r * (2 * pi
* r) = pi * r^2, which is the formula for area!
Yowza! The combined area of the rings = the area of the triangle = area of
circle!

(Image from Wikipedia)


This was a quick example, but did you catch the key idea? We took a disc, split
it up, and put the segments together in a different way. Calculus showed us
that a disc and ring are intimately related: a disc is really just a bunch of rings.
This is a recurring theme in calculus: Big things are made from little
things. And sometimes the little things are easier to work with.
A note on examples

Many calculus examples are based on physics. Thats great, but it can be hard
to relate: honestly, how often do you know the equation for velocity for an
object? Less than once a week, if that.
I prefer starting with physical, visual examples because its how our minds
work. That ring/circle thing we made? You could build it out of several pipe
cleaners, separate them, and straighten them into a crude triangle to see if
the math really works. Thats just not happening with your velocity equation.
A note on rigor (for the math geeks)

I can feel the math pedants firing up their keyboards. Just a few words on
rigor.
Did you know we dont learn calculus the way Newton and Leibniz discovered
it? They used intuitive ideas of fluxions and infinitesimals which were
replaced with limits becauseSure, it works in practice. But does it work
in theory?.

Weve created complex mechanical constructs to rigorously prove calculus,


but have lost our intuition in the process.
Were looking at the sweetness of sugar from the level of brain-chemistry,
instead of recognizing it as Natures way of saying This has lots of energy. Eat
it.
I dont want to (and cant) teach an analysis course or train researchers. Would
it be so bad if everyone understood calculus to the non-rigorous level that
Newton did? That it changed how they saw the world, as it did for him?
A premature focus on rigor dissuades students and makes math hard to learn.
Case in point: e is technically defined by a limit, but the intuition of growth is
how it was discovered. The natural log can be seen as an integral, or the time
needed to grow. Which explanations help beginners more?
Lets fingerpaint a bit, and get into the chemistry along the way. Happy math.
(PS: A kind reader has created an animated powerpoint slideshow that helps
present this idea more visually (best viewed in PowerPoint, due to the
animations). Thanks!)
Andre says:

Great article! I love the insights. Im currently taking up calculus class and I
find it hard to learn its essence just by taking it up in school. I mean,
depending upon the style of ones professor, I think math is a subject one can
get by without much thinking by just knowing its procedure (except integral
calculus, I think). But I find myself being reluctant to score that way (and also
find integral calculus challenging), so I surfed the Internet to seek for a website
that would make me understand what calculus really means, and your website
turns out to be exactly what Im looking for!
I can really relate to you KalidI also feel that our math education system
today is being head over the clouds and must be more down-to-earth to
beginners. Not understanding the essence of mathematics makes the majority
of people not appreciate it. To give an analogy, its like theyre seeing music in
written form and calling it music without even listening to it. In order to
understand what an abstract word really means, one must get a hold first of its
manifestations in the concrete world, and then how the abstract thereafter
relates to the concrete. I think whenever people say they hate the beautiful
subject math, they just dont really understand what it means.
Ive read some of the others comments regarding evolution. I feel moved to
share some facts, inferences and insights regarding its validity.
Our scientific formulae are so predictive only because each scientific formula
represents a scientific generalisation that has been based on factual
observations. Its because we have observed a set of phenomena to be
consistent that we classify them together and make a scientific generalisation
out of them, taking advantage of their consistency to make predictions for
future purposes. We keep on observing sets of phenomena in this way.
However, that does not explain how they can be consistent. Therefore one is
left with two general categories to explain the consistency of each of them: (1)
occurrences ensue; (2) otherwise, theyre being controlled. What do we call

these certainties in the universe? Physical laws, which are certain, cant be just
some chance events, which are random and uncertain. Our lack of knowledge
permits us to believe that some things just happen by chance when we dont
know what caused it, but thats not the attitude of a scientist; science
attempts to explain causes or it wont have a cause. If the universe wouldnt
follow physical laws, we wouldnt be able to classify anything (e.g. atoms), let
alone observe any consistency. What intuition do you think drove us to call
physical laws laws? Laws are commands. Nothing comes from nothing. The law
of conservation of energy signifies this. If one wants to believe that something
can arise by itself, it shouldnt be the universe, because the universe is under
the law of conservation of energy. This therefore makes us conclude that the
universe has always existed from eternity past. However, the universe began.
Our universe is characterised by cosmic expansion. The second law of
thermodynamics indicates that the longer time has elapsed, the greater the
overall entropy of the universe shall be. Given that the universe is currently
not at a state of maximum entropy, the first and second laws of
thermodynamics indicate that the universe must not have always existed from
eternity past. Matter, energy, space and time, which constitute the universe,
have not always existed. Therefore, because the universe began to exist,
either some Being or something must have caused it. This cause of the
universe must be immaterial, because the cause of the universe cannot be the
universe itself, which is the totality of all material things, as nothing can cause
itself that has not arisen from nothing. In other words, something causing itself
is like saying that it appeared out of nowhere. Something arising out of nothing
can only be true if that thing is not under the law of conservation of energy, or,
if some Being xor some other thing caused it that, being able to create energy,
is above the law of conservation of energy. Because of laws such as the laws of
thermodynamics, only the Creator can and will create the universe from
nothing. Being transcendent, the Creator of the universe must possess a
unique nature distinct from the universe or from anything in it as much as the
Creator of the universe hasnt caused the universe or anything in it to bear
resemblance to the Creators nature. This nature then doesnt necessarily have
to be tangible nor visible to our eyes.
The theory of evolution holds that millions and millions of years ago, fish
began evolving by means of little cumulative changes over long periods of
time. Over approximately 170000000 years, fish managed to evolve to
amphibians. Over approximately 60000000 years, amphibians evolved to
reptiles. Some of these reptiles evolved to nonmonkey mammals, still over a
long period of time, which then evolved to monkeyssimply put, our
ancestors. Of course, fish came all the way from a common ancestor. This is
what Darwin has proposed. After the discovery of DNA, however, the theory of
evolution itself evolved to include nonliving chemicals that happened to live by
time and incredible luck.
There is no substantial evidence, however, to support this. It doesnt follow
that similarities in DNA should indicate a common descent. The assertion that
genus evolves to another genus over a very long period of time is contrary to
science (genome is the total of all the genetic possibilities for a given species,
and should not be confused for genotype). I understand that, in order to
appear as though it was falsifiable, and thus be convincing, this assertion
depends on natural selection. But its not the other way around; it is not
requisite for this unobservable assertion to be true in order for natural
selection to be true, or for natural things to serve some purpose. One purpose

of natural selection is to eliminate the abnormal (mutations cause


abnormalities). However, natural selection doesnt cause adaptation; all it does
are to eliminate the weak and the mutated and to spare the survivors to live a
longer reproductive life. Too much of this and extinction would occur. Living
beings adapt to their surroundings because of the way they were designed
not because of natural selection; without design in the first place, natural
selection would be meaningless. Whats observable in nature are adaptation,
death and the fact that species can only produce species of their own kind. No
one has ever observed actual evolution happen naturally. One only sees
supposed evolution in some man-made books with pictures and in man-made
realistic 3D animation movies. All proponents of the theory of evolution can
show are some fossil remains with similarities, which have already undergone
decomposition.
Earnest A. Hooton, from Harvard University, states, To attempt to restore the
soft parts is an even more hazardous undertaking. The lips, the eyes, the ears,
and the nasal tip leave no clues on the underlying bony parts. You can with
equal facility model on a Neanderthaloid skull the features of a chimpanzee or
the lineaments of a philosopher. These alleged restorations of ancient types of
man have very little if any scientific value and are likely only to mislead the
public So put not your trust in reconstructions.
(Up From The Ape p. 332)
Similarities in DNA do not indicate a common descent; similarities in DNA
indicate a one languagethe language of DNA itselfand this we have
evidence of. The fact that one language was used to design, and to dictate all
the functions of, all living beings on Earth is just undeniable. After all, all living
beings on Earth have one thing in common life. If one has ever used a
programming language before, one would understand the necessity of reusing
a set of specific codes to a number of different programs. Computer
programmers though have a way of converting lengthy codes to just a short
one by saving codes in header files because it would be tiring for humans to
retype lengthy codes over and over again. Information is contained in our DNA,
and our bodies were designed, and functions, as well, according to the
specifications of this information. What happens when a living being is
exposed to harmful things such as radiation? Mutations are alterations that
take place in the DNAdamaging the information in it. Its impossible for living
beings to acquire new organs through mutations, because mutations do not
add new genetic information. Things dont just happen by chance to an
omniscient being; our lack of knowledge permits us to believe that some
things just happen by chance when we dont know what caused it, but
intelligence identifies with intelligence like archaeologists do. Information
never originates by itself in matter; it always comes from an intelligent source.
During Darwins time, this extremely complex chemical macromolecule called
the DNA was not yet discovered. The outdated microscopes of their time made
the very complex structure of the cell look so simple. However, if we would
subscribe to the current scientific discoveries, as well as the technologies, of
our time, we would begin to apprehend that the indications never really
pointed to the theory of evolution. As science progresses, intelligent design
becomes more evident. We shouldnt limit ourselves therefore in Darwins
worldview.
Darwin himself wrote, If it could be demonstrated that any complex organ
existed, which could not possibly have been formed by numerous, successive,
slight modifications, my theory would absolutely break down.

(The Origin of Species p. 189)


Note: He didnt write [my theory would absolutely evolve].
What else is the meaning of evidence? Everywhere we look, the more attentive
we are to the details, the more evident intelligent design becomes.
Mathematical formulae are symbolic representations of mathematical ideas,
and ideas can only be conceived by the mind. We experience this whenever
we conceive mathematical ideas. The Fibonacci numbers is one such idea.
Fibonacci numbers and golden section often occur in nature, even in our
bodies, and this repetition goes against mere coincidences. It seems to me
then that just as we humans can make something only out of that which has
already been created, we can not conceive mathematical ideas other than that
which has already been thought by an immaterial intelligent Being prior to the
universe, as we humans rely upon the universe to derive our conclusions and
mathematical ideas from. All mathematical ideas that we know of are
embedded throughout the whole universe. As a matter of fact, mathematics is
so pervasive it even permeates science. This does not contradict intelligence
prior to the universe, but rather, proves it.
I understand that not all religions can be trusted to teach one what is true, but
lies exist not only in religion. People shouldnt be throwing the baby with the
bathwater and not leaving room for creation just because some people who
hold false beliefs happen to believe also in creation. It doesnt follow that
creation should be false due to that. I think people who dismiss intelligent
design because of other peoples attitude against reason should be less biased
in their focus and consider also the scientists who believe in creation due to
the intelligently designed things that surround us. One should learn upon the
insights of the reasonable, rather than calling the untaught ignorant without
even educating them.
I just feel like sharing these facts, inferences and insights of mine because I
believe iron sharpens iron and because I believe its important for you to
know the truth. Im always glad to hear others insights about the truth (what
is true) in general. I love math because to me there is nothing more beautiful
than the truth, and math to me is also the realisation of the quantitative
objective aspect of the truth (algebraic logic counts truth value 0, 1).

4. Learning The Official Terms

We've been able to describe our step-by-step process with analogies (X-Rays, Timelapses, and rings) and diagrams:

However, this is a very elaborate way to communicate. Here's the Official Math
terms:
Intuitive
Concept
X-Ray (split apart)
Time-lapse (glue
together)

Arrow direction

Arrow start/stop

Slice

Formal Name

Symbol

Take the derivative (derive)

ddr

Take the integral (integrate)

Integrate or derive "with respect to"


a variable.

Bounds or range of integration

dr implies moving
along r
endstart

Integrand (shape being glued

Equation, such

together, such as a ring)

as 2r

Let's walk through the fancy names.

The Derivative

The derivative is splitting a shape into sections as we move along a path (i.e., XRaying it). Now here's the trick: although the derivative generates the entire sequence
of sections (the black line), we can also extract a single one.
Think about a function like f(x)=x2. It's a curve that describes a giant list of
possibilities (1, 4, 9, 16, 25, etc.). We can graph the entire curve, sure, or examine the
value of f(x) at a specific value, like x=3.
The derivative is similar. Officially, it's the entire pattern of sections, but we can zero
into a specific one by asking for the derivative at a certain value. (The derivative is a
function, just like f(x)=x2; if not otherwise specified, we're describing the entire
function.)
What do we need to find the derivative? The shape to split apart, and the path to follow
as we cut it up (the orange arrow). For example:
The derivative of a circle with respect to the radius creates rings
The derivative of a circle with respect to the perimeter creates slices
The derivative of a circle with respect to the x-axis creates boards
I agree that "with respect to" sounds formal: Honorable Grand Poombah radius, it is
with respect to you that we derive. Math is a gentleman's game, I suppose.
Taking the derivative is also called "differentiating", because we are finding the
difference between successive positions as a shape grows. (As we grow the radius of a
circle, the difference between the current disc and the next size up is that outer ring.)
The Integral, Arrows, and Slices

The integral is glueing together (time-lapsing) a group of sections and measuring the
final result. For example, we glued together the rings (into a "ring triangle") and saw it
accumulated to r2, aka the area of a circle.
Here's what we need to find the integral:

Which direction are we gluing the steps together? Along the orange line
(the radius, in this case)
When do we start and stop? At the start and end of the arrow (we start at 0,
no radius, and move to r, the full radius)
How big is each step? Well each item is a "ring". Isn't that enough?
Nope! We need to be specific. We've been saying we cut a circle into "rings" or "pizza
slices" or "boards". But that's not specific enough; it's like a BBQ recipe that says "Cook
meat. Flavor to taste."
Maybe an expert knows what to do, but we need more specifics. How large, exactly, is
each step (technically called the "integrand")?

Ah. A few notes about the variables:


If we are moving along the radius r, then dr is the little chunk of radius in the
current step
The height of the ring is the circumference, or 2r
There's several gotchas to keep in mind.
First, dr is its own variable, and not "d times r". It represents the tiny section of the
radius present in the current step. This symbol (dr, dx, etc.) is often separated from
the integrand by just a space, and it's assumed to be multiplied (written 2r dr).
Next, if r is the only variable used in the integral, then dr is assumed to be there. So if
you see 2r this still implies we're doing the full 2r dr. (Again, if there are two

variables involved, like radius and perimeter, you need to clarify which step we're
using: dr or dp?)
Last, remember that r (the radius) changes as we time-lapse, starting at 0 and
eventually reaching its final value. When we see r in the context of a step, it means "the
size of the radius at the current step" and not the final value it may ultimately have.
These issues are extremely confusing. I'd prefer we use r dr to indicate an intermediate
"r at the current step" instead of a general-purpose "r" that's easily confused with the
max value of the radius. I can't change the symbols at this point, unfortunately.
Practicing The Lingo

Let's learn to talk like calculus natives. Here's how we can describe our X-Ray
strategies:
Intuitive

Formal description

Visualization

derive the area of a circle with respect to


the radius

derive the area of a circle with respect to


the perimeter

derive the area of a circle with respect to


the x-axis

Symbol

ddrAre
a

ddpAr
ea

ddxAre
a

Remember, the derivative just splits the shape into (hopefully) easy-to-measure steps,
such as rings of size 2r dr. We broke apart our lego set and have pieces scattered on
the floor. We still need an integral to glue the parts together and measure the new size.
The two commands are a tag team:
The derivative says: "Ok, I split the shape apart for you. It looks like a bunch of
pieces 2r tall and dr wide."
The integral says: "Oh, those pieces resemble a triangle -- I can measure that!
The total area of that triangle is 12baseheight, which works out to r2 in this
case.".
Here's how we'd write the integrals to measure the steps we've made:

Formal description

integrate 2 * pi * r * dr
from r=0 to r=r

integrate [a pizza slice]


from [p = min perimeter] to
[p = max perimeter]

integrate [a board] from [x =


min value] to [x = max value]

A few notes:

Symbol

r02r dr

p=maxp=min(pizza slic
e) dp

x=maxx=minboard dx

Measures Total
Size Of

Often, we write an integrand as an unspecified "pizza slice" or "board" (use a


formal-sounding name like s(p) or b(x) if you like). First, we setup the
integral, and then we worry about the exact formula for a board or slice.
Because each integral represents slices from our original circle, we know they
will be the same. Gluing any set of slices should always return the total area,
right?
The integral is often described as "the area under the curve". It's accurate, but
shortsighted. Yes, we are gluing together the rectangular slices under the curve.
But this completely overlooks the preceding X-Ray and Time-Lapse thinking.
Why are we dealing with a set of slices vs. a curve in the first place? Most likely,
because those slices are easier than analyzing the shape itself (how do you
"directly" measure a circle?).
Questions

1) Can you think of another activity which is made simpler by shortcuts and notation,
vs. written English?
2) Interested in performance? Let's drive the calculus car, even if you can't build it yet.

Question 1: How would you write the integrals that cover half of a circle?

Each should would be similar to:


integrate [size of step] from [start] to [end] with respect to [path variable]

(Answer for the first half and the second half. This links to Wolfram Alpha, an online
calculator, and we'll learn to use it later on.)

Question 2: Can you find the complete way to describe our "pizza-slice" approach?

The "math command" should be something like this:


integrate [size of step] from [start] to [end] with respect to [path variable]

Remember that each slice is basically a triangle (so what's the area?). The slices move
around the perimeter (where does it start and stop?). Have a guess for the command?
Here it is, the slice-by-slice description.

Question 3: Can you figure out how to move from volume to surface area?

Assume we know the volume of a sphere is 4/3

* pi * r^3.

Think about the instructions

to separate that volume into a sequence of shells. Which variable are we moving
through?

You might also like