Instant Download Business Statistics, 4th Global Edition Norean R. Sharpe PDF All Chapter
Instant Download Business Statistics, 4th Global Edition Norean R. Sharpe PDF All Chapter
Instant Download Business Statistics, 4th Global Edition Norean R. Sharpe PDF All Chapter
com
https://ebookmass.com/product/business-
statistics-4th-global-edition-norean-r-
sharpe/
ebookmass.com
More products digital (pdf, epub, mobi) instant
download maybe you interests ...
https://ebookmass.com/product/statistics-for-business-
economics-14e-edition-david-r-anderson/
https://ebookmass.com/product/business-analytics-global-edition-
james-r-evans/
https://ebookmass.com/product/global-4-global-business-4th-
edition-mike-w-peng/
https://ebookmass.com/product/statistics-for-business-and-
economics-10th-global-edition-paul-newbold/
Statistics for Business & Economics, Global Edition,
14th Edition James T. Mcclave
https://ebookmass.com/product/statistics-for-business-economics-
global-edition-14th-edition-james-t-mcclave/
https://ebookmass.com/product/business-statistics-communicating-
with-numbers-4e-4th-edition-sanjiv-jaggia/
https://ebookmass.com/product/essentials-of-statistics-for-
business-economics-9th-edition-david-r-anderson/
https://ebookmass.com/product/global-business-4th-edition-by-
mike-w-peng/
https://ebookmass.com/product/essentials-of-modern-business-
statistics-with-microsoft-excel-8th-edition-david-r-anderson/
This is a special edition of an established title widely used by colleges and
GLOBAL universities throughout the world. Pearson published this exclusive edition
for the benefit of students outside the United States and Canada. If you
GLOBAL
EDITION purchased this book within the United States or Canada, you should be aware EDITION
EDITION
GLOB AL
that it has been imported without the approval of the Publisher or Author.
Business Statistics narrows the gap between theory and practice by focusing on relevant statistical
BUSINESS STATISTICS
BUSINESS STATISTICS
methods, thus empowering business students to make good, data-driven decisions. Using the latest
GAISE (Guidelines for Assessment and Instruction in Statistics Education) report, which included
extensive revisions to reflect both the evolution of technology and new wisdom on statistics education,
this fourth edition brings a modern edge to teaching business statistics. This includes a focus on the
report’s key recommendations: teaching statistical thinking, focusing on conceptual understanding,
4E
integrating real data with a context and a purpose, fostering active learning, using technology to
explore concepts and analyze data, and using assessments to improve and evaluate student learning.
By presenting statistics in the context of real-world businesses and by emphasizing analysis and
understanding over computation, this book helps students be more analytical, prepares them to
make better business decisions, and shows them how to effectively communicate results.
Key Features
Norean R. Sharpe • Richard D. De Veaux • Paul F. Velleman
• Improved organization A streamlined design and a data-first presentation of information
provides students with both the motivation to learn statistics as well as a foundation of real
business decisions on which to build their statistical understanding. Chapters 5–7 now cover
probability trees and Bayes’ rule. Chapter 21 is a brand-new chapter on data mining and Big Data.
• Motivating Vignettes Each chapter opens with a vignette, which uses data from or about
real-world companies, that helps students relate key statistical concepts to real business events.
Companies featured include Visa, H&M, and Whole Foods Market.
EDITION
FOURTH
• Case Studies The book provides four cases based on realistically large datasets that challenge
students to respond to accompanying open-ended business questions. These cases encourage
students to bring together methods they have learned throughout the book.
• Section Exercises Each chapter provides straightforward exercises targeted at the topics
covered in each section and designed to check students’ understanding.
Available separately for purchase with this book is MyLab Statistics, the teaching and learning
platform that empowers instructors to personalize learning for each student. When combined with
trusted educational content, MyLab Statistics provides countless opportunities for practice—with
the help of statistics-specific resources and tools—that enhance a student’s experience and their
comprehension.
Business Statistics
Norean R. Sharpe
St. John’s University
Richard D. De Veaux
Williams College
Paul F. Velleman
Cornell University
With Contributions by David Bock
and Special Contributor Eric M. Eisenstein
Harlow, England • London • New York • Boston • San Francisco • Toronto • Sydney • Dubai • Singapore • Hong Kong
Tokyo • Seoul • Taipei • New Delhi • Cape Town • São Paulo • Mexico City • Madrid • Amsterdam • Munich • Paris • Milan
Paul F. Velleman (Ph.D. Princeton University) has an international reputation for innovative
statistics education. He designed the Data Desk® software package and is also the author
and designer of the award-winning ActivStats® multimedia software, for which he received the
EDUCOM Medal for innovative uses of computers in teaching statistics and the ICTCM Award
for Innovation in Using Technology in College Mathematics. He is the founder and CEO of Data
Description, Inc. (www.datadesk.com), which supports both of these programs. Data Description
also developed and maintains the Internet site Data and Story Library (DASL; dasl.datadescrip-
tion.com), which provides all of the datasets used in this text as well as many others useful for
teaching statistics, and the statistics conceptual tools at astools.datadesk.com. Paul coauthored
(with David Hoaglin) the book ABCs of Exploratory Data Analysis. Paul is Emeritus Professor
of Statistical Sciences, at Cornell University where he was awarded the MacIntyre Prize for
Exemplary Teaching. Paul earned his M.S. and Ph.D. from Princeton University, where he studied
with John Tukey. His research often focuses on statistical graphics and data analysis methods.
Paul is a Fellow of the American Statistical Association and of the American Association for the
Advancement of Science. He was a member of the working group that developed the GAISE
2016 guidelines for teaching statistics. Paul’s experience as a professor, entrepreneur, and
business leader brings a unique perspective to the book.
Richard De Veaux and Paul Velleman have authored successful books in the introductory col-
lege and AP High School market with David Bock, including Intro Stats, Fifth Edition (Pearson,
2018); Stats: Modeling the World, Fifth Edition (Pearson, 2019); and Stats: Data and Models,
Fourth Edition (Pearson, 2016).
Special Contributor
Eric M. Eisenstein (Ph.D. Wharton School of Business) is an internationally known
educator, researcher, and consultant. Eric has taught at multiple business schools,
including Wharton, Cornell’s Johnson School, ESADE, and Temple University’s Fox
School of Business. At Fox, he serves as the Director of the MS in Business Analytics
in the department of Statistical Science, Director of Graduate Programs in the depart-
ment of Marketing and Supply Chain Management, and Chair of the Undergraduate
Program (curriculum) Committee. Eric teaches data analytics, quantitative strategy, and
marketing. His research focuses on the psychology of expertise, how to improve deci-
sion making, and strategic analytics. Prior to becoming an academic, Eric worked at
Mercer Management Consulting (now Oliver Wyman) where he focused on management
of technology and marketing research in the financial services and telecommunications
industries. His teams won the outstanding team award three times consecutively; clients
invested over $30 million based on the recommendations of his teams, and the teams’
strategic recommendations affected more than $10 billion in revenue and $2 billion
in profits. He continues to consult and serve on the board of numerous companies
and charities. Eric earned his Ph.D. in Applied Economics and an M.A. in Statistics at
the Wharton School of Business, University of Pennsylvania and graduated from the
Management and Technology dual degree program at the University of Pennsylvania,
where he concurrently earned a B.S. in Economics from Wharton and a B.S. in Computer
Systems Engineering from the School of Engineering and Applied Science. He is the
proud father to three children.
Index of Applications 27
Chapter 7 The Normal and Other Continuous Distributions (The NYSE) 252
7.1 The Standard Deviation as a Ruler, 253 • 7.2 The Normal Distribution, 255
• 7.3 Normal Probability Plots, 262 • 7.4 The Distribution of Sums of
Normals, 263 • 7.5 The Normal Approximation for the Binomial, 266 • 7.6 Other
Continuous Random Variables, 269
Ethics in Action 273
From Learning to Earning 273
Tech Support: Probability Calculations and Plots 274
Brief Case: Price/Earnings and Stock Value 276
Part VI Analytics
Chapter 21 Introduction to Big Data and Data Mining (Paralyzed
Veterans of America) 778
21.1 Data Mining and the Big Data Revolution, 779 • 21.2 The Data Mining Process,
783 • 21.3 Data Mining Algorithms: A Sample, 789 • 21.4 Models Built from Combining
Other Models, 797 • 21.5 Comparing Models, 800 • 21.6 Summary, 806
Ethics in Action 807
From Learning to Earning 807
Appendixes 811
A. Answers 811
B. Tables and Selected Formulas 867
C. Credits 887
Index 889
1
Unfortunately, not the question most students are asking themselves on the first day of the course.
12
Our Approach
Statistical Thinking
For all of our improvements, examples, and updates in this edition of Business Sta-
tistics we haven’t lost sight of our original mission—writing a modern business sta-
tistics text that addresses the importance of statistical thinking in making business
decisions and that acknowledges how Statistics is actually used in business.
Statistics is practiced with technology, and this insight informs everything
from our choice of forms for equations (favoring intuitive forms over calculation
forms) to our extensive use of real data. But most important, understanding the
value of technology allows us to focus on teaching statistical thinking rather than
calculation. The questions that motivate each of our hundreds of examples are not
“How do you find the answer?” but “How do you think about the answer?”; “How
does it help you make a better decision?”; and “How can you best communicate
your decision?” Our redesigned “In Practice” elements in each chapter have been
recast as conversations between managers and analysts to emphasize the business
relevance of each method and its importance in making good business decisions.
Our focus on statistical thinking ties the chapters of the book together. An
introductory Business Statistics course covers an overwhelming number of new
terms, concepts, and methods, and it is vital that students see their central core:
how we can understand more about the world and make better decisions by under-
standing what the data tell us. From this perspective, it is easy to see that the pat-
terns we look for in graphs are the same as those we think about when we prepare
to make inferences. And it is easy to see that the many ways to draw inferences
from data are several applications of the same core concepts. It follows naturally
that when we extend these basic ideas into more complex (and even more realistic)
situations, the same basic reasoning is still at the core of our analyses.
Coverage
The topics covered in a Business Statistics course are generally mandated by our
students’ needs in their studies and in their future professions. But the order of
these topics and the relative emphasis given to each is not well established. Busi-
ness Statistics presents some topics sooner or later than other texts. Although many
chapters can be taught in a different order, we urge you to consider the order we
have chosen.
We’ve been guided in the order of topics by the fundamental goal of design-
ing a coherent course in which concepts and methods fit together to provide a
new understanding of how reasoning with data can uncover new and important
truths. Each new topic should fit into the growing structure of understanding that
students develop throughout the course. For example, we teach inference concepts
with proportions first and then with means. Most people have a wider experience
with proportions, seeing them in polls and advertising. And by starting with pro-
portions, we can teach inference with the Normal model and then introduce infer-
ence for means with the Student’s t-distribution.
We introduce the concepts of association, correlation, and regression early in
Business Statistics. Our experience in the classroom shows that introducing these
fundamental ideas early makes statistics useful and relevant even at the beginning
of the course. By Chapter 4, students can discuss relationships among variables in a
meaningful way. Later in the semester, when we discuss inference, it is natural and
relatively easy to build on the fundamental concepts learned earlier and enhance
them with inferential methods.
GAISE Report
We’ve been guided in our choice of what to emphasize by the 2016 GAISE
(Guidelines for Assessment and Instruction in Statistics Education) report, which
emerged from extensive studies of how students best learn Statistics (www.amstat
.org/asa/files/pdfs/GAISE/GaiseCollege_Full.pdf). The GAISE report was extensively
revised in 2016 to ref lect the evolution of technology and new wisdom about
teaching statistics. The new recommendations have been officially adopted and
recommended by the American Statistical Association and urge (among other
detailed suggestions) that statistics education should:
1. Teach statistical thinking.
2. Focus on conceptual understanding.
3. Integrate real data with a context and a purpose.
4. Foster active learning.
5. Use technology to explore concepts and analyze data.
6. Use assessments to improve and evaluate student learning.
In this sense, this book is thoroughly modern.
Syllabus Flexibility
To be effective, a course must fit comfortably with the instructor’s preferences.
The early chapters—Chapters 1–15—cover core material that will be part of most
introductory courses. Chapters 16–20—multiple regression, model building, and
time series. Analysis of Variance—may be included in an introductory course, but
our organization provides f lexibility in the order and choice of specific topics.
Chapters 21–25 may be viewed as “special topics” and selected and sequenced to
suit the instructor or the course requirements.
Continuing Features
A textbook isn’t just words on a page. A textbook is many elements that come
together to form a big picture. The features in Business Statistics provide a real-
world context for concepts, help students apply these concepts, promote problem
solving, and integrate technology—all of which help students understand and see
the big picture of Business Statistics.
Applying Concepts
In Practice. Almost every section of every chapter includes focused examples
that illustrate and apply the concepts or methods of that section to a real-world
business context. Each one now ends with a specific written report. They are now
structured as conversations between a manager and an analyst or employee with
the requirement that a report be made to the manager. This format helps to frame
the issues in a practical way.
Step-by-Step Guided Examples. The answer to a statistical question is almost
never just a number. Statistics is about understanding the world and making better
decisions with data. Guided Examples model a thorough solution in the right col-
umn with commentary in the left column. The overall analysis follows our innova-
tive Plan, Do, Report template. Each analysis begins with a clear question about
a business decision and an examination of the data (Plan), moves to calculating
the selected statistics (Do), and finally concludes with a Report that specifically
addresses the question. To emphasize that our goal is to address the motivating
question, we present the Report step as a business memo that summarizes the
results in the context of the example and states a recommendation if the data are
able to support one. To preserve the realism of the example, whenever it is appro-
priate, we include limitations of the analysis or models in the concluding memo, as
one should in making such a report.
By Hand. Even though we encourage the use of technology to calculate statistical
quantities, we recognize the pedagogical benefits of occasionally doing a calcula-
tion by hand. The By Hand boxes break apart the calculation of some of the sim-
pler formulas and help the student through the calculation of a worked example.
Reality Check. We regularly offer reminders that statistics is about understanding
the world and making decisions with data. Results that make no sense are prob-
ably wrong, no matter how carefully we think we did the calculations. Mistakes are
often easy to spot with a little thought, so we ask students to stop for a reality check
before interpreting results.
Notation Alert. Throughout this book, we emphasize the importance of clear
communication. Proper notation is part of the vocabulary of statistics, but it can
be daunting. We’ve found that it helps students when we are clear about the letters
and symbols statisticians use to mean very specific things, so we’ve included Nota-
tion Alerts whenever we introduce a special notation that students will see again.
Math Boxes. When we present the mathematical underpinnings of the statisti-
cal methods and concepts, we set proofs, derivations, and justifications apart from
the narrative. In this way, the underlying mathematics is there for those who want
greater depth, but the text itself presents the logical development of the topic at
hand without distractions.
From Learning to Earning. Each chapter ends with a From Learning to Earning
summary that includes learning objectives and definitions of terms introduced in
the chapter. Students should use these as study guides. We encourage them to take
this opportunity to see the “big picture” of the chapter and see how it applies to
making business decisions.
Ethics in Action. Statistics is not just plugging numbers into formulas; most sta-
tistical analyses require a fair amount of judgment. Ethics in Action vignettes—
updated for this edition—in each chapter provide a context for some of the
judgments needed in statistical analyses. Possible errors, a link to the American
Statistical Association’s Ethical Guidelines, and ethically and statistically sound
alternative approaches are presented in the Instructor’s Solutions Manual.
Section Exercises. The exercises for each chapter begin with straightforward
exercises targeted at the topics in each section. These are designed to check under-
standing of specific topics. Because they are labeled by section, it is easy to turn
back to the chapter to clarify a concept or review a method.
Chapter Exercises. These exercises are designed to be more realistic than sec-
tion exercises and to lead to conclusions about the real world. They may combine
concepts and methods from different sections, and they contain relevant, modern,
and real-world questions. Many come from news stories; some come from recent
research articles. The exercises marked with a T indicate that the data are available
on the book’s companion website, in a variety of formats. We pair the exercises so
that each odd-numbered exercise (with answer in the back of the book) is followed
by an even-numbered exercise on the same statistics topic. Exercises are roughly
ordered within each chapter by both topic and level of difficulty.
Integrating Technology
Data and Sources. Most of the data used in examples and exercises are from real-
world sources and whenever we can, we include URLs for Internet data sources. The
data we use, are usually available at the online Data and Story Library (DASL) at
dasl.datadescription.com and on the companion website, www.pearsonglobaleditions.com.
Preparedness
One of the biggest challenges in many mathematics and statistics courses is mak-
ing sure students are adequately prepared with the prerequisite skills needed
to successfully complete their course work. Pearson offers a variety of content
and course options to support students with just-in-time remediation and key-
concept review.
• Build homework assignments, quizzes, and tests to support your course
learning outcomes. From Getting Ready (GR) questions to the Conceptual
Question Library (CQL), we have your assessment needs covered from the
mechanics to the critical understanding of Statistics. The exercise libraries
include technology-led instruction, including new Excel-based exercises, and
learning aids to reinforce your students’ success.
• Using proven, field-tested technology, auto-graded Excel Projects allow in-
structors to seamlessly integrate Microsoft® Excel® content into their course
without having to manually grade spreadsheets. Students have the oppor-
tunity to practice important statistical skills in Excel, helping them to master
key concepts and gain proficiency with the program.
pearson.com/mylab/statistics
StatCrunch
StatCrunch, a powerful, web-based
statistical software, is integrated
into MyLab, so students can quickly
and easily analyze datasets from
their text and exercises. In addi-
tion, MyLab includes access to
www.StatCrunch.com, the full web-
based program where users can
access tens of thousands of shared
datasets, create and conduct online
surveys, interact with a full library
of applets, and perform complex
analyses using the powerful statis-
tical software.
pearson.com/mylab/statistics
pearson.com/mylab/statistics
Thanks to feedback from instructors and students from StatTalk Videos: Fun-loving statistician Andrew
more than 10,000 institutions, MyLab Statistics contin- Vickers takes to the streets of Brooklyn, New York, to
ues to transform—delivering new content, innovative demonstrate important statistical concepts through
learning resources, and platform updates to support interesting stories and real-life events. This series of
students and instructors, today and in the future. 24 videos includes available assessment questions
and an instructor’s guide.
Deliver Trusted Content
You deserve teaching materials that meet your own Empower Each Learner
high standards for your course. That’s why Pearson Each student learns at a different pace. Personalized
partners with highly respected authors to develop learning pinpoints the precise areas where each student
interactive content and course-specific resources that needs practice, giving all students the support they
you can trust—and that keep your students engaged. need—when and where they need it—to be successful.
pearson.com/mylab/statistics
pearson.com/mylab/statistics
pearson.com/mylab/statistics
Acknowledgments
This book would not have been possible without many contributions from David Bock, our
coauthor on several other texts. Many of the explanations and exercises in this book benefit
from Dave’s pedagogical f lair and expertise. We are honored to have him as a colleague and
friend.
Many people have contributed to this book from the first day of its conception to its
publication. Business Statistics would have never seen the light of day without the assistance
of the incredible team at Pearson. The Director of Portfolio Management, Deirdre Lynch,
was central to the support, development, and realization of the book from day one. Patrick
Barbera, Senior Portfolio Management Analyst; Morgan Danna, Editorial Assistant;
Kaylee Karlson, Product Marketing Manager; and Shannon McCormack, Marketing
Support Assistant, were essential in managing all of the behind-the-scenes work that needed
to be done. Peggy McMahon, Content Producer, and Chere Bemelmans, Project Manager
at SPi Global, worked miracles to get the book out the door. We are indebted to them.
Aimee Thorne, Senior Producer, put together a top-notch media package for this book.
Designer Jerilyn Bokorick and Cenveo® Publisher Services are responsible for the wonderful
way the book looks.
We’d also like to thank our accuracy checker, whose monumental task was to make sure
we said what we thought we were saying: Dirk Tempelaar, Maastricht University.
We also thank those who provided feedback through focus groups, class tests, and
reviews:
Finally, we want to thank our families. This has been a long project, and it has required
many nights and weekends. Our families have sacrificed so that we could write the book we
envisioned.
Norean Sharpe
Richard De Veaux
Paul Velleman
Eric Eisenstein
27
Summit Projects, 415, 460, 466 Distribution and Operations High school dropouts, 448
Target Corp., 767 Management High school graduates, 374
Texaco, 24-16–17 Delivery services and times, 516–517 Internet transactions, 553
3M, 356 Enterprise Resource Planning (ERP), 560, 23-25 IQ tests, 279
Tiffany & Co., 764 Packaging, 449 Maternal level of education, 447
Tokyo Communication Engineering Company, 22-2 Product placement, 514 Math instruction, 511–512
Tokyo Tsushin Kogyo K.K., 23-1 Production schedules, 564, 577 Reading instruction, 340, 23-23
Toyota Motor Manufacturing, 767–768 Project completion times, 273 SAT scores, 337, 341
Travelers Insurance Company, 451–452 Shipping, 382 School absenteeism, 447
Uber, 54, 57, 58 Waiting lines, 98, 25-27 School budgets, 370
Verizon, 66–67 Software for learning, 476
Via, 54 Statistical training, 512
E-Commerce
Visa, 479–480 Test scores, 130–131, 258, 259, 276–277, 23-21
Book purchases, 556
Walmart, 675, 772 Training centers, 584
Clothing purchases, 183–184
Wellspring Grocery, 729 Value of college, 556
Customer trust, 25-28
Western Electric Company, 22-3
Cybershopping, 84
Whole Foods Market, 729–731, 733, 748–749, 751–754,
E-mail, 371 Energy
755, 759 Alternative energy company investment, 24-20
Internet coupons, 470
Wild Oats, 729 Energy use, 127
Internet transactions, 250, 335, 553
W.K. Kellogg Institute for Food and Nutrition Research, Fuel economy, 129, 174–175, 181, 186, 262, 276, 589,
Loyalty programs, 550
598 591, 593, 594, 610–611, 632, 25-26–27
Online banking, 213
World Fertility Study, 23-14 Gas additives, 25-32–33
Online sales and blizzards, 176
Yellow Cab, 54, 55, 57, 58, 59 Gas prices, 765–766, 770, 771, 774, 775
Promotions, 215
Zillow.com, 137–138, 639–640, 649 Gasoline octane, 473
Sales trends, 775
Great Recession and energy use, 594, 595
Consumers Economics Hydroelectric power, 624
Attracting customers, 23-13–14 Business startups, 134 Oil prices, 601–603, 775
Categorizing consumers, 78 Consumer Price Index (CPI), 765, 770 Solar energy, 628, 24-18
Color preference, 249 Cost of living, 175, 185, 592, 593, 725 Wind power, 411, 520, 521, 607–608, 624
Consumer Price Index (CPI), 765, 770 Crowdedness, 23-27
Consumer research, 52 Employment/unemployment, 135 Environment
Credit card customers, 103–104, 190–192, 249, Forecasting, 365, 601–603, 755–757 Acid rain, 448
312–314, 444 GDP, 182–183, 560–561, 624, 636–637, 675, 727 Air pollution, 374, 410, 447, 448, 25-30–31
Credit card purchases, 374, 486–488, 489–490, 625 Gemstone imports, 609–610 Carbon footprint, 181, 23-28
Customer databases, 132, 809 Great Recession and energy use, 594, 595 Chemicals and congenital abnormalities, 447
Customer satisfaction, 250, 278, 23-24–25, 23-8–10 Health expenditures, 679 CO2 and temperature, 185
Gender of customers, 124 Human Development Index (HDI), 162–163, 629 Cyclones, 514
Handedness, 250, 251 Income and housing cost, 23-28 Dowsing, 340, 446
Laundry detergents, 336, 341 Income spent on food, 585 Earthquakes, 23-21, 23-22, 23-23
Loyalty programs, 550 Interest rates, 632, 633, 771, 23-28 El Niño, 185
Municipal playground, 310 OECD GDP, 637 Environmental Protection Agency (EPA), 50
Patient complaints, 22-36 Oil prices, 601–603, 775 Global climate change, 306, 307, 309, 583
Product ratings, 23-21, 23-22, 23-23 Organization for Economic Cooperation and Hazards, 81
Shopping patterns, 145, 184 Development (OECD), 127 Hurricanes, 133, 514
Spending patterns, 653–654 Poverty, 85 Pollution cleanup, 310
Veterinary costs, 281 Unemployment, 588, 594–595, 775–776, 24-18 River restoration/conservation, 302
U.S. international trade, 755–757, 763–764 Toxic waste, 309
Demographics Views on the economy, 217, 349, 350, 372, 537 Water hardness, 515–516
Age, 78, 86, 122, 123, 405, 471, 540–544
Crowdedness, 23-27 Education Ethics
Customer databases, 280 AP Statistics exam scores, 448 Advertising, 73–74
Gender, 203 Business school, 246, 295–296, 502 Angel investors, 209
Gender and wages, 86 College admissions, 71 Anti-aging products, 545
Gender of customers, 124 College attendance, 446 Awareness of ethical issues, 373
Handedness, 250, 251, 282 College retention rate, 370 Bicycle manufacture, 22-27
Heights, 276 College tuition, 134 Bossnappings, 355–357, 366
High school graduation rate, 678 Computer lab fees, 409, 449 Cereal and weight loss, 169
Hispanics, 444 Computer skills, 473, 474 Chia seeds, 621
Life expectancy, 147–148 Course ratings, 23-21 Computer repair, 400, 438
Marriage, 25-25 Credit card debt of college students, 518 Elder care, 579
Multigenerational households, 213 Distance learning, 23-21, 23-22, 23-23 Gas drilling, 23-17–18
Racial discrimination, 309, 558, 559 Freshman 41, 23-25 Government contracts, 332
Small businesses, 558 GPA, 184 Hybrid cars, 807
U.S. Census Bureau, 85 Grades, 23-21 Internet coupons, 470
Women executives, 448 Graduate school admissions, 87 Investment advice, 241
Job discrimination, 25-29 Morita, Akio, 22-1–2 Stock market and prices, 51, 83, 88–90, 96–98,
MBA enrollment, 502 Morris, Nigel, 312–313 104, 182, 252–254, 279, 530–531, 734–736,
Medical equipment sales, 664 Obama, Barack, 779 737, 767–768, 25-29
Project completion times, 273 Pepys, Samuel, 22-3 Trading via smartphones, 365
Racial discrimination, 558, 559 Persson, Karl-Johan, 33 Venture capital, 240
Real estate, 362, 24-14–15 Poisson, Simeon Denis, 238 Wages and gender, 86
Research funding and data, 45, 169 Rukeyser, Louis, 192
River restoration/conservation, 302 Sarasohn, Homer, 22-2, 22-4 Food/Drink
Social networking, 760 Secrist, Horace, 153 Advertising, 340
Social responsibility, 116 Shewhart, Walter A., 22-3, 22-26 Alcoholic beverages, 309, 366, 25-27
Travel packages, 713 Shiller, Robert, 253 Apples, 765–766
Smith, Rick, 190 Candy, 446
Spearman, Charles Edward, 163n, 23-16 Cereal, 169, 259–261, 281, 724–725, 23-24, 25-31
Famous People Starr, Cornelius Vander, 88 Coffee, 769–770
Albran, Kehlog, 733 Street, Picabo, 708–710 Cookies, 335, 449
American Society for Quality (ASQ), 22-3 Taleb, Nassim Nicholas, 193, 255n Cranberry juice, 555
American Society for Quality Control (ASQC), 22-3 Thurmond, Strom, 298n Diet drinks, 23-21, 23-22, 23-23
Archimedes, 604 Tiffany, Charles Lewis, 764 Farmed salmon, 387–388, 390–391, 395, 425–426, 429
Arrow, Kenneth, 153 Truman, Harry, 284, 298 Farmers’ market, 248
Bacon, Francis, 569, 605 Truzzi, Marcello, 436 Fast food, 678–679, 686–688
Barton, Rich, 639 Tukey, John W., 350 Food consumption, 134
Bayes, Thomas, 208 Twain, Mark, 526 Food sales, 339
Bernoulli, Jacob, 192 Wallace, Henry, 298n Food science research, 25-30
Bernoulli, Daniel, 234 Wanamaker, John, 414 Frozen foods, 335
Berra, Yogi, 192, 195 Wayne, John, 448 Hot dogs, 449, 511
Bohr, Niels, 603 Whitney, D. R., 23-4 Income spent on food, 585
Bonferroni, Carlo, 25-9 Wilcoxon, Frank, 23-3, 23-4 Irradiation, 372
Box, George, 149, 255 William of Occam, 695n Meal costs, 407
Castle, Mike, 342 Wunderlich, Carl, 430 Milk, 309, 22-31
Cohen, Steven A., 525, 526
Nutrition information, 678–679, 686–688
De Moivre, Abraham, 254n
Organic food, 23-21, 23-22, 23-23
Deming, W. Edwards, 22-2, 22-3, 22-4, 22-25–26 Finance and Investments
Pizza, 445, 583, 608, 25-25
Descartes, René, 141 Alternative energy company investment, 24-20
Popcorn, 449
Dewey, Thomas, 284, 298n Angel investors, 209
Wine, 337, 339, 25-27
Einstein, Albert, 36 Biotechnology firm, 371
Yogurt, 449, 23-23, 25-30
Fairbank, Richard, 312–314 Bond funds, 513
Fisher, Ronald Aylmer, 384, 388, 417, 458, 653, Business finances, 84
25-1, 25-3 Charitable donations, 403–404, 443, 447, 471, Games
Franklin, Benjamin, 451 778–779, 781, 785–786, 788–789 Casino gambling, 214, 215, 217, 248, 410, 444, 22-32
Friedman, Milton, 153 Company assets, 135–136, 164, 612–614 Coin spins/tosses, 278, 371, 445, 477
Frink, Lloyd, 639 Company profits, 767 Computer games, 632
Galton, Francis, 152 Credit scores, 189–190 Dice, 224
Gates, Bill, 93 Currency, 279, 23-21 Keno, 193
Gauss, Carl Friedrich, 151 Cyclically Adjusted Price/Earnings Ratio (CAPE10), Lottery, 225, 227, 554, 22-32
Gosset, William S., 239, 376–377, 384 253–254, 263, 276 Smartphone games, 23-21, 23-22, 23-23
Gretzky, Wayne, 126 Day trading, 248 Video games, 339
Guinness, Arthur, 376 Diversification, 231–232
Guinness, Arthur, II, 376 Dow Jones Industrial Average (DJIA), 636, 734–736, 739, Government, Labor, and Law
Hotelling, Harold, 153 740, 741–743, 753 Approval ratings, 374
Howe, Gordie, 126 Equipment investment, 24-20–21 Audits and taxes, 448
Hume, David, 453 Evaluating investment options, 244–245, 246 Bureau of Labor Statistics, 562
Ibuka, Masaru, 22-1–2 Fundraising, 368 Consumer Financial Protection Bureau, 313
Juran, Joseph, 22-2 Gold, 365 GDP, 129
Kahneman, Daniel, 153 Hedge funds, 525–526 Government contracts, 332
Kellogg, John Harvey, 597 Hormones and profits, 23-12–13 Health Insurance Portability and Accountability Act
Kellogg, Will Keith, 597–598 Income and housing costs, 185–186 (HIPAA), 784–785
Kendall, Maurice, 745n Interest rates, 186, 239 Internal Revenue Service (IRS), 562
Laplace, Pierre-Simon, 379, 380 Investment advice, 241 Investment Company Act, 175
Legendre, Adrien-Marie, 151 Investment in technology companies, 516 IRS, 373
Likert, Rensis, 23-1 Investment options, 557 Juries, 416, 417, 448, 453–454
Lowell, James Russell, 422 Investment strategies, 335, 336, 24-22 Jury duty, 448
MacArthur, Douglas, 22-2 Movie budgets/revenues, 669–670, 772 National Highway Transportation Safety Administration,
Malkiel, Burton, 745n Mutual funds, 125–126, 130, 132, 175, 176, 183, 278, 25-32
Mann, H. B., 23-4 279, 280, 519, 588, 24-22 Presidential elections, 284, 298, 779
Mao Zedong, 88 Profits, 247 Seatbelt use, 283
McGwire, Mike, 126 Purchase amounts, 405 Securities Act, 175
Concrete formulation, 25-26 Business-to-business sales, 81 Service Industries and Social Issues
Customer satisfaction, 22-24–25 Buying from a friend, 490–494, 23-6–7 Fundraising, 368
Dental floss, 22-31 Car prices, 126, 132, 482, 483, 485, 494–495, 496 Paralyzed Veterans of America (PVA), 187–188,
DVDs, 22-32 Car sales, 52, 251 717–718, 778–779, 781, 785–786, 788–789
Games, 22-32 Catalog purchases, 25-2, 25-4 Power, 82
Graphite production, 22-33–34 Catalogs, 51, 642–643, 648–649, 660
Historical background, 22-3–6 Cell phone screen defroster, 24-19
Sports
Milk, 22-31 Cereal, 267
Archery, 250, 251
Packaging defects, 249 Chia seeds, 621
Baseball, 126, 433, 22-31, 22-36–37, 23-24, 23-27
Product defects, 129, 205, 247, 248, 249, 475, 22-34–35 Coffee, 179, 282
Bicycling, 282, 612, 24-22, 22-27
Product inspections and testing, 131, 246–247, 249, Concert tickets, 625–626, 627
Dirt bikes, 677–678, 726, 727
269, 283, 309, 311, 369, 370, 476–477, 553, 584, Convenience stores, 122
Employee athletes, 521–522
593, 594, 22-35, 25-27 Coupons, 338, 339
Fishing, 247
Product recalls, 247, 249 Department stores, 153
Football, 56, 447
Product reliability, 221 Diamond prices, 600–601, 603, 606–607, 615–616,
Frisbee, 25-25
Product weight, 22-6, 22-10, 22-15–16, 22-17–18, 685–686, 694–695, 698–699, 711–712, 23-15
Golf, 128, 668
22-31 eBay, 248
Hockey, 126
Production process, 25-29–30 Food stores, 124, 128, 25-31–32
Horse racing, 130
Rulers and yardsticks, 22-33 Forecasting, 744, 754–758, 766, 769, 771
Olympics, 518, 708–710
Six Sigma, 22-26, 25-28 Grocery shopping, 473
Running, 337
Specifications, 22-26, 22-30–31 Growth of sales, 767
Skiing, 708–710
Sports equipment, 22-31, 22-32–33, 22-36–37 Housing starts and Home Depot sales, 23-28
Skydiving, 338
Warranties, 218, 249 International sales index, 588
Swimming, 339, 518–519
Web browsers, 233, 234–235 Inventories, 227–228
Tennis, 283
Loyalty programs, 550
Trophies, 23-24
Real Estate Medical equipment sales, 664
Weightlifting, 282
Broker profit, 249 Motorcycles, 52
Commercial real estate, 616–619, 23-6, 23-8, 23-16 Movie concessions, 584, 585
Number of employees, 150 Surveys and Opinion Polls
Foreclosures, 83, 23-24 Cell phone surveys, 374
Home features, 216, 220, 221 Packaging and sales, 180
Pizza sales and prices, 180 Company surveys, 366, 367, 373, 374
Home sales and prices, 131, 182, 204–205, 362, 403, Consumer polls and surveys, 52, 217–218, 293, 300,
443, 508, 510, 522, 587, 588, 628, 634, 643–645, Profits, 591, 592
Promotions, 200–203, 215 308, 309, 374, 22-24–25
646–647, 649–653, 656–657, 669, 672–673, 674, E-mail surveys, 305, 307, 366
699–703, 704–707, 23-20, 24–14–15 Regional sales, 181
Sales growth, 634 Fortune Survey, 284
Home size and prices, 82–83, 137–138, 141, 148–149, Gallup polls, 284, 308, 349, 365, 372, 770
150, 152, 154–155, 165–166 Sales predictions, 718
Sales representatives, 249, 251, 553 Instant polls, 309
Home values, 99–101, 122, 639–642 International polls and surveys, 213, 308–309, 372
House ages, 508, 509, 510, 720–721, 23-21, 23-22, Seasonal spending, 143–145, 374, 497–499, 508, 596,
767, 771, 809 Internet polls and surveys, 300
23-23 Library use, 556
Housing bubble crash, 411 Self-checkout stations, 344–345
Solar panels, 628 Mail surveys, 307, 366
Housing costs, 185–186 Market surveys, 83, 122
Income and housing cost, 23-28 Store performance, 511
Travel packages, 713 Pew Research Center for the People and the Press, 290, 308
Property values, 589 Political surveys and polls, 309
Racial discrimination, 309, 558, 559 Used cars, 221, 588, 589
Walmart revenue, 675, 677 Public opinion polls, 285, 356
Time on market, 660–662 Real estate, 373
Zillow.com, 639–640, 649, 137–138 Weekly sales, 178–179
Wine prices, 129, 673 Student surveys, 50, 307, 366, 367, 406
Telephone surveys, 220, 290, 366, 573–574
Salary and Benefits Science
Bonuses, 25-29 Activating yeast, 25-25 Technology
CEO compensation, 132, 389–390, 407 Biotechnology, 371 Apps, 55, 61, 67–68
Day care, 367, 444 Chemicals and congenital abnormalities, 447 Area codes, 41
Football players, 180 Colorblindness, 634 Bank websites, 444
GDP and salary, 182, 183 Concrete formulation, 25-26 Big Data, 809
Job types, 181 Cuckoos, 23-26 Blogs, 559–560
Managers’ hourly wages, 25-26 Intelligence and foot size, 605 Cable, phone, and Internet packages, 345–346
Salaries, 176, 25-25 Intelligence of dogs, 23-21, 23-22, 23-23 Cell phones, 66–67, 163, 250, 278, 281, 25-24
Secretaries’ salaries, 674–675 Mineral hardness, 23-21, 23-22, 23-23 Character recognition, 368, 369
Weekly earnings, 768–769 Noise and mazes, 25-25 Computers, 221, 227–228, 249, 310, 400, 438, 586,
Observatories, 22-32 592, 22-30–31, 22-35
Sales and Retail Rat reaction times, 450 Customer satisfaction, 250
Advertising, 184 Research funding and data, 45, 169 Databases, 51
Appliance sales, 673 Seasonality of births, 551 DVDs, 22-32
Assets and sales, 164, 589, 594 Space flights, 310 E-mail, 217, 371, 474, 475
Bicycles, 282, 24-22 Twins, 447 Help desk, 24-2–8, 24-9–10, 24-11
Bookstores, 176, 177, 178, 245, 585, 586 Water height and phase of moon, 23-21, 23-22 Information technology, 556
Internet activity of consumers, 559–560 Transportation Commuting, 247–248, 377–379, 396–398, 23-11
Internet music, 373, 374 Air, 214–215, 248, 306, 307, 321, 366, 409, Driving tests, 420
Internet use, 552 448, 630, 634, 773–774, 24-3, 24-5, Emissions testing, 447, 475
Investment in technology companies, 516 24-6, 24-7, 24-11, 24-19, 24-20, 25-28 Freeway speed and congestion, 138–139
Online magazine, 447 Auto batteries, 589–590 Horsepower of cars, 697–698, 707
PDAs, 24-19 Auto warranties, 215, 216 Motorcycles, 452, 454–456, 677–678, 726
Security, 25-27–28 Automotive safety, 25-32 Parking fees, 408
Self-checkout stations, 344–345 Border crossings, 734, 754 Road signs, 476
Social media, 534–535, 536, 552, 760 Car dealerships, 251 Seatbelts, 283
Software, 308 Car inspection, 219 Ship, 62–64, 65–66, 69–70, 218, 452
Telemarketing, 528–529 Car prices, 158–162 Texas Transportation Institute (TTI), 138
Video games, 339 Car quality, 553 Tire mileage, 281
Web browsers, 233 Car rentals, 23-2–4, 23-8–10 Traffic accidents, 140, 23-26
Website design, 213–214, 325, 336, 473, 508–509, 510 Car speeds, 281 Traffic congestion, 721
Websites, 50 Cars, 50, 221, 374 Train, 286–287, 315–316, 25-5, 25-8, 25-10–11, 25-14
1.1 Data
1.2 The Role of Data in
Decision Making
1.3 Variable Types
1.4 Data Sources: Where,
How, and When
H&M
E
ven if you haven’t bought something from H&M recently, chances
are good that you’ve passed by one of their stores. With over 4000 stores
in 64 markets worldwide, they are one of the largest and fastest-growing
clothing retailers in the world. Over the past decade, H&M has built new
stores at an astounding rate of over 10% a year. Thanks to this growth, the
CEO, Karl-Johan Persson, grandson of the founder, is now the richest person
in Sweden.
Like most companies, H&M’s online presence has been increasing as well.
Of their 64 worldwide markets, 35 offer e-commerce where customers can
shop 24 hours a day, 7 days a week, with just the click of a mouse. H&M now
reaches their customers in ways no one could even imagine just a generation
ago. But what of the future? Will the company be better off continuing to
grow brick and mortar stores at the same pace, or should they devote more
resources into the digital space?1
1
We developed this hypothetical example in late 2017 based on our business and consulting experience.
As we were going to press, the news caught up with us. It turns out that indeed H&M had been strug-
gling with their balance of online sales vs. brick and mortar inventory. Perhaps if this book had been
published a year earlier, they could have solved the problem: www.nytimes.com/2018/03/27/business/
hm-clothes-stock-sales.html
33
A few generations ago, many store owners knew their customers and their
business well. With that knowledge, they could forecast growth, see trends,
and even personalize their suggestions to customers, guessing which items that
particular customers might like. Businesses today rely on similar information
to make decisions, but most never meet their customers. With 4000 different
stores and thousands of online customers, H&M has to obtain and analyze
their data in other ways.
The key to turning data into information and knowledge is Statistics—the
collection of tools that extract information from data. These tools that you
will learn also provide the foundation for more advanced methods like data
mining and analytics. According to CEO Karl-Johan Persson, “advanced
analytics provide an important support for our operations. The algorithms we
have started to use will contribute to improvements within everything from
assortment planning and logistics to sales.”2 Using statistical methods to turn
data into information, information into knowledge, and knowledge into smart
business decisions is the key to all successful modern business enterprises.
And it all starts . . . with data.
T
homasine has just landed her first job out of school as a marketing and strategy
analyst working for H&M. Her team’s first assignment is to decide whether to
build more brick and mortar stores or invest more in online operations. To
help make the decision, they investigate store sales data over the past ten years and
display them in the following graph:
FIGURE 1.1 H&M’s store growth has H&M’s growth in stores vs. overall operating
remained steady at just over 10% a profit growth
year, but operating profit growth 25%
seems to be coming down.
20
15
10
Operating profit growth
5 Stores growth
´06 ´07 ´08 ´09 ´10 ´11 ´12 ´13 ´14 ´15
Thomasine wonders if the decline she sees in the stores’ profit growth (the blue
line in Figure 1.1) means she should recommend putting more resources into
online sales instead of just building more stores.
Displays like this, called data visualizations, can summarize large amounts of
data in a concise way that helps make good business decisions, and can often reveal
things that weren’t expected.
2
2016 H&M Group annual report, about.hm.com/en/media/news/financial-reports/2017/1/2441626.html
(in dollars) in each country he should divide the total sales in each country by the
population size, creating the new variable Sales per Capita. When he displayed this
variable on a map, management was shocked:3
MANAGER We know that we sell more in the United States than anywhere else in
the world, but why are some countries redder than the U.S.?
CONSULTANT In this color scheme, low Sales per Capita ($ spent per customer) is
indicated by dark blue, average by white (grey) and higher than average by red. The
countries in the brightest red are the ones with the highest sales per person.
MANAGER You mean we sell more per person in Norway, Finland, and even Australia
than in the U.S.?
CONSULTANT Exactly. Norway has the highest sales at more than $600 per person,
compared with the U.S. at $364.
MANAGER Wow! I had no idea. I never would have guessed that. Thank you for
the insight!
1.1 Data
Every time you make an online purchase, more information is captured than just
the details of the purchase itself. What pages did you search to get to your purchase?
How much time did you spend looking at each? These recorded values, whether
numbers or labels, together with their context are called data. They are recorded
and stored electronically, in vast digital repositories called data warehouses.
Businesses have always relied on data to make good decisions, but today, more than
ever before, companies use data to make decisions about virtually all aspects of
their business, from inventory to advertising to website design.
3
This is based on a true story. We can’t reveal the name of the company due to a non-disclosure
agreement.
Every swipe of your credit card and every click of your mouse has helped
these data warehouses grow. The challenges of collecting, managing, storing, and
curating all of this information collectively fall under the term Big Data.
But data alone can’t make good decisions. To start the process of turning data
into useful information, you first need to know what decisions you want to make.
Why are you taking this course?
Without a question, you have no idea what might be interesting about the data.
The typical answer is “because it’s Should you look at the time of transactions, their location, their price, which prod-
required.” But why is it required?
ucts were bought, or something else? Your knowledge of the business issues and
Because these are the tools that
will help you leverage your busi-
the questions you want to answer will help guide your search for insights from the
ness domain knowledge with data. data, and help you harness data to make better decisions.
Once you have data and a clear vision of the problem, the statistics tech-
niques in this book can empower your decision making. They will help you in two
ways: You’ll learn how to estimate the likely values needed for your decisions
and—possibly more important—you’ll learn how to quantify the uncertainty of
Albert Einstein is credited with those estimates.
saying “If I had one hour to save Before H&M introduces a new product they usually test market it to a small
the world, I’d spend 55 minutes
sample of customers and collect data on the product’s performance before commit-
defining the problem and 5 min-
ting to it worldwide. Statistics helps them make the leap from a sample to an
utes solving it.” 4 The wisdom of
using your business acumen to understanding of the world at large. We hope this text will empower you to draw
define your question will be clear conclusions from data and make valid business decisions in response to such ques-
throughout this book. tions as:
• Will the new design of our website increase click-through rates and result in
more sales?
• What is the effect of advertising on sales?
• Do aggressive, “high-growth” mutual funds really have higher returns than
more conservative funds?
• Is there a seasonal cycle in your firm’s profits?
• What is the relationship between shelf location and cereal sales?
• Do students around the world perceive issues in business ethics differently?
Plan (1–2) • Are there common characteristics about your customers and why they choose
1. Define the problem. your products?—and, more importantly, are those characteristics the same
2. Collect and/or find data and among those who aren’t your customers?
identify the variables.
Do (3–6) Your ability to answer questions such as these and make sound business
decisions with data depends largely on your ability to take a business problem,
3. Prepare and wrangle data.
4. Characterize the data.
translate it into a question that data can answer, and communicate that answer
5. Explore the data. to others. The steps to follow are shown in the box in the margin. The Plan, Do,
Summarize and Report strategy is found throughout the book. The main headings will stay
Visualize the same although the specific subparts will vary slightly depending on the topic
6. Model (if appropriate). we’re learning.
Check conditions and Rarely does the journey from problem definition to solution proceed straight
assumptions for modeling. from Step 1 to Step 7. As you learn more about your data you’ll probably want
Fit the model and make the to rethink earlier steps, possibly even modifying the original question itself. Or
necessary calculations. you may decide to collect different data after you see the limitations of your
Report (7) current model. But bearing this process in mind will help you to strategize your
7. Communicate and present. data analytics process and keep you on the road toward the goal of delivering
good decisions.
4
According to quoteinvestigator.com there is “no substantive evidence that Einstein ever made a
remark of this type.” It appeared in a paper by William H. Markle, who credited an unnamed Yale
professor. But many people, including those at goodreads.com, still give the credit to Einstein.
Plan Do Do Report
1. Define problem 3. Prepare data 6. Build models 7. Present
4. Characterize data
2. Collect/Find data 5. Explore data
ITERATE
Order Number Name State/Country Price Area Code Album Download Gift? Stock ID Artist
105-2686834-3759466 Katherine H. Ohio 5.99 440 Identity N B00000I5Y6 James Fortune & Flya
105-9318443-4200264 Samuel P. Illinois 9.99 312 Port of Morrow Y B000002BK9 The Shins
105-1872500-0198646 Chris G. Massachusetts 9.99 413 Up All Night N B000068ZVQ Syco Music UK
103-2628345-9238664 Monique D. Canada 10.99 902 Fallen Empires N B000001OAA Snow Patrol
002-1663369-6638649 Katherine H. Ohio 11.99 440 Sees the Light N B002MXA7Q0 La Sera
TABLE 1.1 Example of a data table. The variable names are in the top row. Typically,
the Who of the table are found in the leftmost column.
of the cases. The columns are called variables. You’ll usually find the name of the
variable at the top of the column as in Table 1.1.
We call cases by different names, depending on the situation. Individuals who
answer a survey are referred to as respondents. People on whom we experiment are
subjects or (in an attempt to acknowledge the importance of their role in the experi-
ment) participants, but animals, plants, websites, and other inanimate subjects are
often called experimental units. Often we call cases just what they are: for example,
customers, economic quarters, or companies. When referring to a transaction, rows are
often called records. In Table 1.1, the rows are the individual orders, or purchase
records. A common place to find the who of the table is the leftmost column. It’s
often an identifying variable for the cases, in this example, the order number.
JUST CHECKING
1 What is the “who” of Table 1.1? That is, does each row refer to a) a person or
b) an order? How can you tell?
Metadata If you collect the data yourself, you’ll know what the cases are and how the
Metadata became a common term variables are defined. But, often, you’ll be looking at data that someone else col-
when the National Security lected. The information about the data, called the metadata, might have to come
Agency (NSA) claimed that they from the company’s database administrator or from the information technology
weren’t collecting Americans’ department of a company. Metadata typically contains information about how,
phone calls but only the informa- when, and where (and possibly why) the data were collected; who each case repre-
tion about the phone calls, the sents; and the definitions of all the variables.
phone numbers of the caller and A general term for a data table like the one shown in Table 1.1 is a spread-
recipient, the time and duration of sheet, a name that comes from bookkeeping ledgers of financial information. The
the call and any bank information data were typically spread across facing pages of a bound ledger, the book used by
used to make the call—in other
an accountant for keeping records of expenditures and sources of income. For the
words—the metadata.
accountant, the columns were the types of expenses and income, and the rows
were transactions, typically invoices or receipts. These days, it is common to keep
modest-size datasets in a spreadsheet even if no accounting is involved. It is usually
easy to move a data table from a spreadsheet program to a program designed for
statistical graphics and analysis, either directly or by copying the data table and
pasting it into the statistics program.
Although data tables and spreadsheets are great for relatively small data sets,
they are cumbersome for the complex data sets that companies must maintain on a
day-to-day basis. Try to imagine a spreadsheet from a company the size of Amazon
with customers in the rows and products in the columns. Amazon has hundreds of
millions of customers and millions of products. But very few customers have pur-
chased more than a few dozen items, so almost all the entries in the spreadsheet
would be blank––not a very efficient way to store information. For that reason,
various other database architectures are used to store data. The most common is a
relational database.
In a relational database, two or more separate data tables are linked together
so that information can be merged across them. Each data table is a relation because
it is about a specific set of cases with information about each of these cases for all
(or at least most) of the variables (“fields” in database terminology). For example, a
table of H&M customers, along with demographic information on each, is such a
relation. A data table of all the items sold by the company, including information
on price, inventory, and past history, is another relation. Transactions may be held
in a third “relation” that references each of the other two relations. Table 1.2 shows
a small example.
In statistics, analyses are typically performed on a single relation because all
variables must refer to the same cases. But often the data must be retrieved from a
relational database. Retrieving data from these databases may require specific
expertise with that software. In the rest of the book, we’ll assume that the data have
been retrieved and placed in a data table or spreadsheet with variables listed as
columns and cases as the rows.
Customers
Customer
Number Name City State ZIP Code Customer since Gold Member?
473859 R. De Veaux Williamstown MA 01267 2007 No
127389 N. Sharpe New York City NY 10021 2000 Yes
335682 P. Velleman Ithaca NY 14580 2003 No
…
Items
Product ID Name Price Currently in Stock?
42-8719 Resort Shirt 24.99 Yes
73-2671 Lace Dress 69.99 No
35-0518 Cashmere Sweater 129.00 Yes
72-9665 Leather Derby Shoes 69.00 Yes
Transactions
Transaction Customer Shipping
Number Date Number Product ID Quantity Method Free Ship?
T23478923 9/15/17 473859 42-8719 1 UPS 2nd Day N
T23478924 9/15/17 473859 35-0518 1 UPS 2nd Day N
T63928934 10/20/17 335682 73-2671 3 UPS Ground N
T72348299 12/22/17 127389 72-9665 1 Fed Ex Ovnt Y
TABLE 1.2 A relational database shows all the relevant information for three separate
relations linked together by customer and product numbers.
(continued)
Im östlichen Nyansa-Gebiet.
Katoto und Mwansa. — Ukerewe. — Ukara. — Der Baumann-Golf. — Gefechte in
Mugango. — Die Schaschi-Länder. — Ngoroïne. — Ikoma. — Kämpfe in Ututwa.
— Ntussu. — Meatu. — Munyihemedis Niederlassung. — Zur Nyarasa-Steppe. —
Der Salzfluss Simbiti. — Die Elephantenjäger. — Die Weiber der Karawane. —
Usmau und Usukuma. — Mwansa.
Ruderblatt, Ukerewe.
Ukara.