| Title: | Functions and datasets for STAT 1080: Foundations of Data Science |
|---|---|
| Description: | This package provides companion materials for the course STAT 1080: Foundations of Data Science. Materials include datasets for homework assignments, as well as helpful functions and datasets for in-class assignments. Users should expect regular updates to this package and should re-install it prior to completing any in-class or homework assignment. |
| Authors: | Brennan Bean |
| Maintainer: | Brennan Bean <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.1.0 |
| Built: | 2026-05-25 08:14:18 UTC |
| Source: | https://github.com/beanb2/stat1080r |
Data regarding campus crime.
campus_crime_hawkescampus_crime_hawkes
A data.frame with 20 rows and 4 columns.
The number of crimes as applicable to the school
Number of police officers at the school
Total number of students enrolled at the school
Private school (1), and others (0).
https://www.hawkeslearning.com/Statistics/dataset-descriptions.html#campus-crime
A dataset that provides nutritional information regarding cereals.
cerealcereal
A data.frame with 77 rows and 15 variables.
One of Top, Bottom, or Middle, describing the position of the cereal on the shelf at the grocery store.
The name of the cereal.
The manufacturer of the cereal, including American home food products (A), General Mills (G), Kelloggs (K), Nabisco (N), Post Brands (P), Quaker Oats (Q), and Ralston (R).
One of cold (C) or hot (H).
The calories in one serving.
The number of grams of protein in one serving.
The number of grams of fat in one serving.
The number of milligrams (mg) of sodium in one serving.
The number of grams of fiber in one serving.
The number of grams of carbohydrates in one serving.
The numebr of grams of sugar in one serving.
vitamins and minerals - 0, 25, or 100, indicating the typical percentage of FDA
The weight of one serving size of cereal.
number of cups in one serving
https://www.hawkeslearning.com/Statistics/beg3/datasets.html
https://www.kaggle.com/jeandsantos/breakfast-cereals-data-analysis-and-clustering
A dataset that provides nutritional information regarding cereals, but includes a nefarious twist.
cereal_infectedcereal_infected
A data.frame with 77 rows and 15 variables.
One of Top, Bottom, or Middle, describing the position of the cereal on the shelf at the grocery store.
The name of the cereal.
The manufacturer of the cereal, including American home food products (A), General Mills (G), Kelloggs (K), Nabisco (N), Post Brands (P), Quaker Oats (Q), and Ralston (R).
One of cold (C) or hot (H).
The calories in one serving.
The number of grams of protein in one serving.
The number of grams of fat in one serving.
The number of milligrams (mg) of sodium in one serving.
The number of grams of fiber in one serving.
The number of grams of carbohydrates in one serving.
The numebr of grams of sugar in one serving.
vitamins and minerals - 0, 25, or 100, indicating the typical percentage of FDA
The weight of one serving size of cereal.
number of cups in one serving
https://www.hawkeslearning.com/Statistics/beg3/datasets.html
https://www.kaggle.com/jeandsantos/breakfast-cereals-data-analysis-and-clustering
A subset of data from the U.S. Department of Education's College Score card. Contains information about institutions across the US. Institutions are only retained if they offer 4-year programs, have more than 2k students, have an endowment larger than 10 million dollars, and have non-missing information for all the variables included in the dataset. The data represent the most recent statistics that were compiled in May of 2025.
collegescolleges
A data frame with 831 rows and 10 columns.
Name of the college or university
Size of the school endowment at the beginning of the fiscal year, in billions of dollars
Size of the school endowment at the end of the fiscal year, in billions of dollars
Number of enrolled undergraduates
Proportion of male undergraduates
Proportion of female undergraduates
Is the school a public institution?
Average cost of attendance, in thousands of dollars
Median family income, in thousands of dollars
Median earnings 10 years after enrollment, in thousands of dollars
<https://collegescorecard.ed.gov/data/>
A snapshot in time of COVID case level aggregates by country, as prepared by the World Health Organization.
covid_casecovid_case
A data frame with 241 rows and 6 columns.
Country name
World Health Organization region
Cumulative number of cases
Cumulative number of cases per 100,000 people
Cumulative number of deaths
Cumulative number of deaths per 100,000 people
<https://data.who.int/dashboards/covid19/>
Weekly covid case statistics by country, as reported by the World Health Organization.
covid_hoscovid_hos
A data frame with 241 rows and 6 columns.
Date of report, in yyyy-mm-dd format
Country name
World Health Organization region
Number of new hospitalizations reported that week.
<https://data.who.int/dashboards/covid19/>
COVID case count data taken from the world health organization. Data were scraped on 11-17-2023 and the updated tables from the website are subject to change.
covid_rawcovid_raw
A data.frame with 333,459 rows rows and 8 variables.
The date in y-m-d format.
Two letter country code.
Name of the country.
World Health Organization region.
new COVID-19 cases reported on that day.
total number of cases reported since Jan 2020.
new COVID-19 deaths reported on that day.
total number of deaths reported since Jan 2020.
https://covid19.who.int/WHO-COVID-19-global-data.csv
Data regarding high school completion and crime rates across U.S. states in 2014.
crime_hawkescrime_hawkes
A data.frame with 50 rows and 5 columns.
U.S. State
The 4-year ACGR is the number of students who graduate in 4 years with a regular high school diploma divided by the number of students who form the adjusted cohort for the graduating class.” This number has been rounded to the nearest whole number.
Rate of violent and property crimes per 100,000 people
Rate of violent crimes (murder, rape, robbery, aggravated assault) per 100,000 people
Rate of property crimes per 100,000 people
Post election survey of 7606 voters in the 2016 US Presidential Election.
election_survey_2016election_survey_2016
A data.frame with 7606 rows and 6 columns.
voter id
a weight based on the estimated representative-ness of the voter in relation to the overall population
either by mail or in person
either Clinton, Trump, or "Other" for all other candidates
voter's self-identified income bracket
voter's self-reported interest in the daily news
https://www.kaggle.com/datasets/democracy-fund/2016-voter-survey?resource=download
A sample of data collected on Darwin's finches on Daphe Major island.
finch_hawkesfinch_hawkes
A data.frame with 100 rows and 11 columns.
Refers to an individual's identity, more specifically, the number on a metal leg band it was given
Species name
Male, female, or unknown. The reason for the "unknown" category is that males start their lives looking like females. After one or more years they molt into a plumage with some black feathering that indicates they are males
The year after the individual hatched from an egg
The last year of that individual's life
Weight, in grams
Wing length, in millimeters
Tarsus length (part of leg), in millimeters
Beak length, in millimeters
Beak depth, in millimeters
Beak width, in millimeters
https://www.hawkeslearning.com/Statistics/dataset-descriptions.html#finch-data
Information about properties for sale in three subdivisions of Mount Pleasant, South Carolina, in the year 2017.
housing_hawkeshousing_hawkes
A data.frame with 245 rows and 24 columns.
Property ID number
Asking price in dollars
Is the property a duplex?
Number of bedrooms
baths_full + baths_half
Number of full bathrooms
Number of half bathrooms
Number of stories
Name of the subdivision the property is located in
Estimated floor area inside the house
The year the house was constructed
The size of the lot in acres
Has the house been lived in previously?
The style of the house
Number of covered parking spots on the property
Miscellaneous exterior features (text string)
Does the home have a private pool?
Does the home have a private dock?
Does the home have a fenced yard?
Does the home have a screened porch?
Text screen of other amenities associated with the property
Is the home located on a golf course?
Does the home have a fireplace?
Number of fireplaces
https://www.hawkeslearning.com/Statistics/dataset-descriptions.html#mount-pleasant-real-estate
Data on vehicle fuel economy for model years 1984—2019
mpg_hawkesmpg_hawkes
A data.frame with 38,693 rows and 11 columns.
city miles per gallon
number of cylinders in the vehicle engine
engine displacement (liters)
drive axle type (front-wheel, four-wheel, etc.)
type of fuel (regular, premium, diesel, etc.)
highway miles per gallon
manufacturer
model name
type of transmission
vehicle size class
model year
https://www.seattlecentral.edu/qelp/sets/057/057.html
https://www.hawkeslearning.com/Statistics/dataset-descriptions.html#miles-per-gallon
Contains the run times for women who recorded valid run times in both the 2021 and 2022 NCAA National Championship Meets.
ncaa_timesncaa_times
A data.frame with 98 rows and 3 columns.
The name of the athlete
The 6k run time in seconds for the 2021 meet
The 6k run time in seconds for the 2022 meet
https://xc.tfrrs.org/results_search.html
Medical statistics for a sample of 100 patients from a hospital.
patient_hawkespatient_hawkes
A data.frame with 100 rows and 8 columns
patient id number
patient gender (dichotomous)
one of married (M), widowed (W), divorced (D) or single (S)
patient age
patient weight in pounds
patient total cholesterol
patient systolic blood pressure in mmHg
patients separated by decade of age
This is a web scraped dataset of all presidential inauguration speeches. It does not include second or third inaugural addresses, nor does it include presidential addresses when the vice president replaced a dead president mid term.
pres_speech_dfpres_speech_df
A data.frame with 53 rows rows and 2 variables.
The year in which the inaugural address was given.
The full text of the inaugural speech as a single character strring.
Reliability-targeted design snow loads for the state of Utah. A design snow load is the pressure exerted by accumulated snow that a building needs to be able to withstand in order to prevent collapse.
rt_load_utrt_load_ut
A data.frame with 257 rows and 6 columns
name of the weather station at which the measurements were recorded
decimal degrees latitude (geographic location)
decimal degrees longitude
altitude of the station, measured in feet
the level I ecological region as defined by the U.S. Environmental Protection Agency
the design snow load, measured in pounds per square foot (psf)
https://doi.org/10.26077/200k-pr86
Data regarding campus crime.
satisfy_hawkessatisfy_hawkes
A data.frame with 15,000 rows and 12 columns.
The ID number associated with the individual employee
How satisfied the employee is in their position (scale of 0 to 1)
How management rated employee performance during the last evaluation (scale of 0 to 1)
The number of projects an employee is currently working on
he average number of hours the employee works in a month
The number of years the employee has worked at the company
A binary variable that indicates whether the employee experienced an accident at work
A binary variable that indicates whether an employee left the company
A binary variable that indicates whether an employee received a promotion in the last 5 years
The department that the employee works in
The level of the employee’ s salary(low, medium, high)
The dollar range for the salary levels
https://www.hawkeslearning.com/Statistics/dataset-descriptions.html#employee-satisfaction
https://www.kaggle.com/ludobenistant/hr-analytics
This is data obtained from the National Water and Climate Center report generator. It contains snow water equivalent (SWE) measurements from two snowpack telemetery (SNOTEL) stations in Utah. One at Tony Grove, Lake up Logan Canyon, and another at Ben Lomond Peak, east of Ogden.
sweswe
A data.frame with 32,972 rows rows and 5 variables.
The date of the measurement in y-m-d format.
The SNOTEL measurement ID.
The SNOTEL station name.
The measured snow water equiavlent, in 10ths of millimeters.
An NWCC quality control flag where "S" stands for suspect measurements.
https://wcc.sc.egov.usda.gov/reportGenerator/
Comparison of lot size vs work hours at the Toluca Company
tolucatoluca
A data.frame with 25 rows and 2 columns
lot size (no units originally provided, but assumed to be thousands of square feet)
average hours it takes to make single part
http://www.cnachtsheim-text.csom.umn.edu/Kutner/Chapter%20%201%20Data%20Sets/CH01PR20.txt