Wayne's Github Page

A place to learn about statistics

HW5 - More data wrangling

Q0 Correcting data format

One of the students have had an issue loading data for a different class into R. Turns out this dataset has a few good exercises in it. The final result should be a data frame with 2 numeric columns in it. I have only changed the name of the file, the rest is real.

For Q0, answers are only necessary for the sub-bullet points where the main bullet points are for context.

Side comment: You should NOT correct the file poorly_formatted_data.csv directly because now your code records all the cleaning done to the data and now everyone can replicate your steps.

Q1 Spatial data

The government has been collecting weather information for quite some years and a calibrated version of this data is maintained by NOAA.

For this problem, imagine that you are a research intern for a climate scientist who wants to study the annual precipitation for different locations over time.

In hw5_spatial folder on Canvas, there are k station’s data that was converted into a CSV file for you using a similar process to Q0 above. You could download the “raw” data in the future using the information under Data Access on the NOAA page (not CSVs!). We will not work with the raw data yet.

Please download the files in the folder on Canvas (no need to use R to do this). This contains 2 types of csvs:

The above description is all for context and nothing needs to be reported for them yet. The following questions intentionally gives fewer instructions than the past homeworks.

Q2 tapply() or aggregate() or group_by() + summarize() practice

Please summarize the data frame from Q1 to obtain the decade total precipitation for the different stations.