Learning Statistics: Concepts and Applications in R

Course No. 1480
Professor Talithia Williams, Ph.D.
Harvey Mudd College
Share This Course
4.1 out of 5
33 Reviews
69% of reviewers would recommend this product
Course No. 1480
Sale
Video Streaming Included Free

What Will You Learn?

  • How to use R and RStudio; how to import and export data, write code, and generate plots; how to customize R features.
  • Fundamentals of descriptive statistics: e.g., normal distribution, central limit theorem, correlation.
  • Fundamentals of statistical inference; e.g., hypothesis testing, regression, ANOVA.
  • Advanced topics: e.g., experimental design, spatial statistics, time series analysis, Bayesian inference.

Course Overview

“Show me the data!” This is coin of the realm in science, medicine, business, education, journalism, and countless other fields. Of course, it’s more complicated than that, because raw data without interpretation is useless. What they mean is “Show me the statistics”—well-founded, persuasive distillations of data that support a claim under discussion.

The ability of statistics to extract insights from a random collection of facts is one of the most astonishing and useful feats of applied mathematics. That power is all the more accessible today through the statistical programming language R, a free, open-source computer language with millions of users worldwide—everyone from students and nonprofessionals to managers and researchers at the forefront of their disciplines.

In this era of big data, with a solid understanding of statistics and the tools for interpreting data, you don’t have to trust someone else’s analysis of medical treatments, financial returns, crop yields, voting trends, home prices, or any other interpretation of data. You can do it yourself.

Designed for those who appreciate math or want an introduction to an essential toolkit for thinking about the uncertainty inherent in all sorts of information, Learning Statistics: Concepts and Applications in R teaches you elementary statistical methods and how to apply them in R, which is made even more powerful when combined with the user interface of RStudio. (Both R and RStudio are free and downloadable for multiple platforms.)

In 24 challenging and in-depth half-hour lectures, award-winning Professor Talithia Williams of Harvey Mudd College walks you through major concepts of an introductory college-level statistics course, and beyond, using examples developed and presented in R. Compared with “canned” statistics packages, R brings users into a more hands-on, mind-engaging approach that is becoming the standard at top-tier statistics programs throughout the country.

An Associate Professor of Mathematics and the Associate Dean for Research and Experiential Learning at Harvey Mudd, Dr. Williams is a nationally recognized innovator in statistics education, noted for her popular TED Talk, “Own Your Body’s Data,” and she is cohost of the PBS NOVA series NOVA Wonders.

R You Ready for a Fresh Approach to Statistics?

In a course that repays multiple viewings, Professor Williams presents the most widely-used statistical measures, concepts, and techniques: how and when to use them, what they mean, and how to recognize when arguments or conclusions based on statistical data are suspect or wrong.

Learning Statistics will especially benefit those who want to go beyond a beginner level and get a deeper, fuller understanding of the discipline. And for anyone who learned statistics many years ago, this course gives an updated experience of what is going on in the field today and how user access to the R programming language is transforming the everyday practice of statistics.

The special advantages of this video-only course include:

  • Statistics concepts combined with R examples: Viewers get a two-for-one combination of thorough grounding in statistical concepts with ground-up demonstrations of how problems are solved with the R programming languge
  • A guided tour of R in action: Viewers get a gentle introduction to R in use—from how to download R and RStudio, to importing and exporting data, writing code, and generating plots. All examples in the course are conducted in R.
  • Enhanced graphics: On-screen graphics are based on outputs from RStudio, but with frequent enhancements to make the visuals even easier to read and understand.
  • Large screen or handheld: The presentation has been optimized for everything from TVs and computers to mobile devices, meaning you can watch it on a handheld device with the same comfort and clarity as on a television screen.
  • Links to the R community: When you finish these lectures, you are not on your own. Professor Williams helps you join the worldwide community of R users, who have been advising the novice and expert alike for two decades.

Professor Williams has organized the course so that it can be taken straight through, proceeding from elementary descriptive statistics to standard and advanced techniques in statistical inference. Those with a background in other statistics software may also find the progression very helpful, while students seeking help in specific areas can jump in and out at any point throughout the course.

Discover a Powerful Set of Statistical Tools

Learning Statistics begins with an overview of the field, including how to calculate and display summaries of data. Professor Williams then introduces R and discusses its advantages over other statistical analysis packages. Unlike many such products, which are costly to purchase and upgrade, R and RStudio are entirely free. Before the end of Lecture 2, you are up and running R code.

The next six lectures cover descriptive statistics and probability, in which you learn to draw conclusions from a given sample of data by using visual aids such as histograms, scatterplots, and box plots. Employing concepts such as the normal distribution, central limit theorem, and correlation, you explore a variety of probability distributions and graphical analysis techniques. You are introduced to the formulas for these operations as well as the simple R commands that run them automatically.

Starting in Lecture 8, you explore the remarkable power of statistics to make inferences about an entire population, based on a small sample. You discover how to frame a hypothesis, build a model, and deduce propositions from the resulting data. You study simple linear regression, multiple linear regression, ANOVA (analysis of variance), and other cornerstone techniques, while also using R to run simulations of many different scenarios from the R Datasets Package.

In the last third of the course, you learn how statisticians go beyond what beginners are often taught, developing branches of applied statistics that have spun off to form their own immensely productive specialties. These include:

  • Experimental design: While there are many techniques for analyzing data you already have, even more powerful is designing an experiment to decide how data is collected from the start. Consider such elements of good design as blocking, randomization, and replication to ensure that your experiment produces sound statistical results.
  • Spatial statistics: Maps have always been information-rich artifacts, but they are now more useful than ever thanks to the advent of GPS-enabled data-gathering devices and powerful computers, combined with a panoply of statistical tools for treating spatial autocorrelation as a rich new source of information.
  • Time series analysis: Just as fascinating as spatial data is information collected sequentially over time—in finance, meteorology, biology, agriculture, and other fields. One of the most important goals of time series analysis is forecasting, which extracts short- and longer-term patterns in the data.
  • Bayesian inference: Textbook statistics is often based on a “frequentist” paradigm, in which sampling is theoretically unlimited. But for many real-life situations, your information is almost always incomplete, and likely to be revised. This is the forte of Bayesian inference.

You close the course with a lecture on how to customize R to select and combine information in whatever way you want, so that R best serves your own needs.

Dr. Williams has made it her life’s work to get students, parents, educators, and the community at large excited about mathematics and especially statistics, which she describes as “a powerful framework for THINKING—for reaching insights and solving problems.” As witnessed by her TED Talk, which has been viewed over one million times, Dr. Williams has a gift for demystifying statistics and making it relevant to everyone—because whenever you hear a statistical argument that directly affects your health, livelihood, autonomy, or your firmly held beliefs, you should say, “Show me the data, so I can decide for myself.” With this course, you will be able to do exactly that.

Hide Full Description
24 lectures
 |  Average 29 minutes each
  • 1
    How to Summarize Data with Statistics
    Confront how ALL data has uncertainty, and why statistics is a powerful tool for reaching insights and solving problems. Begin by describing and summarizing data with the help of concepts such as the mean, median, variance, and standard deviation. Learn common statistical notation and graphing techniques, and get a preview of the programming language R, which will be used throughout the course. x
  • 2
    Exploratory Data Visualization in R
    Dip into R, which is a popular open-source programming language for use in statistics and data science. Consider the advantages of R over spreadsheets. Walk through the installation of R, installation of a companion IDE (integrated development environment) RStudio, and how to download specialized data packages from within RStudio. Then, try out simple operations, learning how to import data, save your work, and generate different plots. x
  • 3
    Sampling and Probability
    Study sampling and probability, which are key aspects of how statistics handles the uncertainty inherent in all data. See how sampling aims for genuine randomness in the gathering of data, and probability provides the tools for calculating the likelihood of a given event based on that data. Solve a range of problems in probability, including a case of medical diagnosis that involves the application of Bayes' theorem. x
  • 4
    Discrete Distributions
    There's more than one way to be truly random! Delve deeper into probability by surveying several discrete probability distributions—those defined by discrete variables. Examples include Bernoulli, binomial, geometric, negative binomial, and Poisson distributions—each tailored to answer a specific question. Get your feet wet by analyzing several sets of data using these tools. x
  • 5
    Continuous and Normal Distributions
    Focus on the normal distribution, which is the most celebrated type of continuous probability distribution. Characterized by a bell-shaped curve that is symmetrical around the mean, the normal distribution shows up in a wide range of phenomena. Use R to find percentiles, probabilities, and other properties connected with this ubiquitous data pattern. x
  • 6
    Covariance and Correlation
    When are two variables correlated? Learn how to measure covariance, which is the association between two random variables. Then use covariance to obtain a dimensionless number called the correlation coefficient. Using an R data set, plot correlation values for several variables, including the physical measurements of a sample population. x
  • 7
    Validating Statistical Assumptions
    Graphical data analysis was once cumbersome and time-consuming, but that has changed with programming tools such as R. Analyze the classic Iris Flower Data Set—the standard for testing statistical classification techniques. See if you can detect a pattern in sepal and petal dimensions for different species of irises by using scatterplots, histograms, box plots, and other graphical tools. x
  • 8
    Sample Size and Sampling Distributions
    It’s rarely possible to collect all the data from a population. Learn how to get a lot from a little by “bootstrapping,” a technique that lets you improve an estimate by resampling the same data set over and over. It sounds like magic, but it works! Test tools such as the Q-Q plot and the Shapiro-Wilk test, and learn how to apply the central limit theorem. x
  • 9
    Point Estimates and Standard Error
    Take your understanding of descriptive techniques to the next level, as you begin your study of statistical inference, learning how to extract information from sample data. In this lecture, focus on the point estimate—a single number that provides a sensible value for a given parameter. Consider how to obtain an unbiased estimator, and discover how to calculate the standard error for this estimate. x
  • 10
    Interval Estimates and Confidence Intervals
    Move beyond point estimates to consider the confidence interval, which provides a range of possible values. See how this tool gives an accurate estimate for a large population by sampling a relatively small subset of individuals. Then learn about the choice of confidence level, which is often specified as 95%. Investigate what happens when you adjust the confidence level up or down. x
  • 11
    Hypothesis Testing: 1 Sample
    Having learned to estimate a given population parameter from sample data, now go the other direction, starting with a hypothesized parameter for a population and determining whether we think a given sample could have come from that population. Practice this important technique, called hypothesis testing, with a single parameter, such as whether a lifestyle change reduces cholesterol. Discover the power of the p-value in gauging the significance of your result. x
  • 12
    Hypothesis Testing: 2 Samples, Paired Test
    Extend the method of hypothesis testing to see whether data from two different samples could have come from the same population—for example, chickens on different feed types or an ice skater’s speed in two contrasting maneuvers. Using R, learn how to choose the right tool to differentiate between independent and dependent samples. One such tool is the matched pairs t-test. x
  • 13
    Linear Regression Models and Assumptions
    Step into fully modeling the relationship between data with the most common technique for this purpose: linear regression. Using R and data on the growth of wheat under differing amounts of rainfall, test different models against criteria for determining their validity. Cover common pitfalls when fitting a linear model to data. x
  • 14
    Regression Predictions, Confidence Intervals
    What do you do if your data doesn't follow linear model assumptions? Learn how to transform the data to eliminate increasing or decreasing variance (called heteroscedasticity), thereby satisfying the assumptions of normality, independence, and linearity. One of your test cases uses the R data set for miles per gallon versus weight in 1973-74 model automobiles. x
  • 15
    Multiple Linear Regression
    Multiple linear regression lets you deal with data that has multiple predictors. Begin with an R data set on diabetes in Pima Indian women that has an array of potential predictors. Evaluate these predictors for significance. Then turn to data where you fit a multiple regression model by adding explanatory variables one by one. Learn to avoid overfitting, which happens when too many explanatory variables are included. x
  • 16
    Analysis of Variance: Comparing 3 Means
    Delve into ANOVA, short for analysis of variance, which is used for comparing three or more group means for statistical significance. ANOVA answers three questions: Do categories have an effect? How is the effect different across categories? Is this significant? Learn to apply the F-test and Tukey's honest significant difference (HSD) test. x
  • 17
    Analysis of Covariance and Multiple ANOVA
    You can combine features of regression and ANOVA to perform what is called analysis of covariance, or ANCOVA. And that's not all: Just as you can extend simple linear regression to multiple linear regression, you can also extend ANOVA to multiple ANOVA, known as MANOVA, or multivariate analysis of variance. Learn when to apply each of these techniques. x
  • 18
    Statistical Design of Experiments
    While a creative statistical analysis can sometime salvage a poorly designed experiment, gain an understanding of how experiments can be designed in from the outset to collect far more reliable statistical data. Consider the role of randomization, replication, blocking, and other criteria, along with the use of ANOVA to analyze the results. Work several examples in R. x
  • 19
    Regression Trees and Classification Trees
    Delve into decision trees, which are graphs that use a branching method to determine all possible outcomes of a decision. Trees for continuous outcomes are called regression trees, while those for categorical outcomes are called classification trees. Learn how and when to use each, producing inferences that are easily understood by non-statisticians. x
  • 20
    Polynomial and Logistic Regression
    What can be done with data when transformations and tree algorithms don't work? One approach is polynomial regression, a form of regression analysis in which the relationship between the independent and dependent variables is modelled as the power of a polynomial. Step functions fit smaller, local models instead of one global model. Or, if we have binary data, there is logistic regression, in which the response variable has categorical values such as true/false or 0/1. x
  • 21
    Spatial Statistics
    Spatial analysis is a set of statistical tools used to find additional order and patterns in spatial phenomena. Drawing on libraries for spatial analysis in R, use a type of graph called a semivariogram to plot the spatial autocorrelation of the measured sample points. Try your hand at data sets involving the geographic incidence of various medical conditions. x
  • 22
    Time Series Analysis
    Time series analysis provides a way to model response data that is correlated with itself, from one point in time to the next, such as daily stock prices or weather history. After disentangling seasonal changes from longer-term patterns, consider methods that can model a dependency on time, collectively known as ARIMA (autoregressive integrated moving average) models. x
  • 23
    Prior Information and Bayesian Inference
    Turn to an entirely different approach for doing statistical inference: Bayesian statistics, which assumes a known prior probability and updates the probability based on the accumulation of additional data. Unlike the frequentist approach, the Bayesian method does not depend on an infinite number of hypothetical repetitions. Explore the flexibility of Bayesian analysis. x
  • 24
    Statistics Your Way with Custom Functions
    Close the course by learning how to write custom functions for your R programs, streamlining operations, enhancing graphics, and putting R to work in a host of other ways. Professor Williams also supplies tips on downloading and exporting data, and making use of the rich resources for R—a truly powerful tool for understanding and interpreting data in whatever way you see fit. x

Lecture Titles

Clone Content from Your Professor tab

What's Included

What Does Each Format Include?

Video DVD
Video Download Includes:
  • Download 24 video lectures to your computer or mobile app
  • Downloadable PDF of the course guidebook
  • FREE video streaming of the course from our website and mobile apps
Video DVD
DVD Includes:
  • 24 lectures on 4 DVDs
  • 408-page printed course guidebook
  • Downloadable PDF of the course guidebook
  • FREE video streaming of the course from our website and mobile apps
  • Closed captioning available

What Does The Course Guidebook Include?

Video DVD
Course Guidebook Details:
  • 408-page printed course guidebook
  • Charts, data, and graphics
  • Suggested readings
  • Problems and solutions

Enjoy This Course On-the-Go with Our Mobile Apps!*

  • App store App store iPhone + iPad
  • Google Play Google Play Android Devices
  • Kindle Fire Kindle Fire Kindle Fire Tablet + Firephone
*Courses can be streamed from anywhere you have an internet connection. Standard carrier data rates may apply in areas that do not have wifi connections pursuant to your carrier contract.

Your professor

Talithia Williams

About Your Professor

Talithia Williams, Ph.D.
Harvey Mudd College
Talithia Williams is an Associate Professor of Mathematics and the Associate Dean for Research and Experiential Learning at Harvey Mudd College. She earned her bachelor’s degree in Mathematics from Spelman College, a master’s degree in mathematics from Howard University and her Ph.D. in Statistics from Rice University. Her professional experiences include research appointments at NASA’s Jet Propulsion Laboratory,...
Learn More About This Professor
Also By This Professor

Reviews

Learning Statistics: Concepts and Applications in R is rated 4.1 out of 5 by 33.
Rated 3 out of 5 by from Not basic enough Not basic enough. I found myself stopping the lecture, going to YouTube, and finding more basic explanations with the happy tutor. This course is really not for the total novice.
Date published: 2018-12-07
Rated 2 out of 5 by from Too rushed to teach either topic successfully The course aims to teach statistics through the use of the R software package, but tries to take on far too much by assuming no prior knowledge in either. It might be more successful either as a statistics course assuming prior knowledge or R, or as an R course assuming prior knowledge of statistics. As it stands, key concepts in both subjects are rushed through or even omitted completely, or information to understand a concept or follow an example only appears later. Many of the statistical concepts are introduced in a completely abstract way with no examples at all, and it is often clear that the presenter is severely time constrained. It does not help that when the lecturer does use examples, she jumps from one dataset to another, and at least in the beginning does not indicate how these datasets are accessed. As a result a lot of time is wasted with basic explanations of what the data contain. Results are often shown on the screen for just a few seconds, and it seems to be random whether or not the R code is shown to generate the results yourself so you cannot always pause the video and explore or experiment with the particular result yourself. The presentation of equations is often unhelpful, with the lecturer painfully reading out even long complex equations rather than spending more time explaining or deriving them. Would the course work as a 36-lecture series? Stretching out the existing material into 50% more time would address some of the problems raised above, but I think most of the problems would be alleviated rather than resolved completely. Even over 36 lectures I think the course would work better either as a statistics course or as an R course.
Date published: 2018-08-21
Rated 5 out of 5 by from Statistics with R Excellent introduction to the capability. Easy to follow and apply.
Date published: 2018-06-24
Rated 5 out of 5 by from Excellent course Good build from basics to advance topics in statistics through R. Takes you from beginner to advance level in nice chunks of 30 minutes lectures.
Date published: 2018-05-08
Rated 5 out of 5 by from Excellent Overview of the Topic Having seen the reviews, I spent a couple of weeks getting a feel for 'R' by reviewing some basic books on the subject. I think that was useful. The course itself was very good. Dr. Williams clearly did not intend to make the reader an expert in all the intricacies and calculations in each topic area. I believe she intended to introduce the topics, have the reader understand when each is used, and then to give an introduction to the use of 'R' as a tool to solve example problems. She did not intend that the reader become an expert in either statistics or 'R' in my view. I think the course could be improved with the use of more graphics to illustrate concepts, e.g., acceptance and rejection of hypotheses and confidence intervals. Overall, well worth the investment of time and effort.
Date published: 2018-03-16
Rated 3 out of 5 by from Good course, but some glaring omissions I have experience with R and my primary goal is to get some ideas for my teaching. I am currently at Lecture 8 and, after some getting used to too many cameras madly switching angles, I am happy with the teacher and the content. However, I was disappointed to find that for some of the more complex plots in the lectures, no R code is shown in either the videos or the course guidebook. That is a big omission which should be corrected as soon as possible. All the code to generate the plots should be available for download.
Date published: 2018-02-17
Rated 5 out of 5 by from Good purchase worth the wait.met Met my expectations. The instructor is great. Thanks
Date published: 2018-02-17
Rated 3 out of 5 by from Good information but instructor talks too much There is some good information in here if you have already studied statistics and are looking for a review. The only problem is that the teacher seems to babble on a lot. For instance, at one point she spends about three minutes talking about the kind of milk her kids drink. Hardly any time is then spent explaining the R code. In some cases, the code is flashed up for two seconds. The data sets used are also pretty boring. Things like petal and sepal length. More interesting examples would have made the lectures more engaging.
Date published: 2018-01-31
  • y_2018, m_12, d_15, h_24
  • bvseo_bulk, prod_bvrr, vn_bulk_2.0.9
  • cp_1, bvpage1
  • co_hasreviews, tv_3, tr_30
  • loc_en_US, sid_1480, prod, sort_[SortEntry(order=SUBMISSION_TIME, direction=DESCENDING)]
  • clientName_teachco
  • bvseo_sdk, p_sdk, 3.2.0
  • CLOUD, getContent, 14.21ms
  • REVIEWS, PRODUCT

Questions & Answers

Customers Who Bought This Course Also Bought

Buy together as a Set
and
Save Up To $215.00
Choose a Set Format
$88.90
$124.90