Wayne's Github Page

A place to learn about statistics

HW6 - Calling APIs and Text Manipulation

Q0 Getting USDA Agriculture Data

This problem is focused on using httr to call the US Department of Agriculture (USDA) API, QuickStats, to obtain state level agricultural data. Commodity traders pay attention to this data since they have major implications about prices, specifically good harvest yields low prices and vice versa.

This problem is broken into 4 steps:

  1. Requesting for credentials called the API key
  2. Calling one API to understand the options
  3. Calling another API to get the records
  4. Visualize teh data over time

Q0.1 Request for an API key

Please follow the instructions under “Request API Key” for USDA’s QuickStat API. For convenience, please save your credentials on a file called usda_quickstat_api_key.txt following this example:

library(jsonlite)
cred <- list("usda_quick_stat_api_key" = "MY_API_KEY")
write_json(cred, "usda_quickstat_api_key.txt")

Please do NOT show your code here to the grader, you will need this file for later. In general you should never share your credentials with other people. We will take a point off if your credentials are showing anywhere in the assignment. We will trust that you did this step, i.e. you do not need to print/show anything, given it’s necessary to finish the later tasks.

Side comment: this is not the most secure way to store credentials, i.e. storing them in plain text format, but this is not the focus of this class.

Understanding the options

The API has many parameters and we do not know the possible values we can set these parameters. To understand the options available to us, we need to call a specific API endpoint /api/get_param_values also listed under “Usage”.

Specifically, the parameters of the API define different dimensions of the data, e.g. crop type (like corn vs soybeans), sampling method (like census vs surveys), sectors (like crops vs animals). For each parameter, it can take on different values, e.g. “CORN” vs “Corn - Sweet”, etc that will specify, within the dimension, what values we want to retrieve.

For the parameters, you should look at the table at the bottom of the “Usage” at the API page. For the second type, we need to leverage the API endpoint /api/get_param_values also listed under “Usage”.

Please use the API to get all possible values under commodity_desc that have the word corn or maize (case-insensitive so CORN or Maize should also be matched if they exist). Please show all code and print out the possible values.

Getting the data

Before we plot the data in the next problem, we need to get the data first. Please grab the data from the API endpoint /api/api_GET with the following specifications:

Visualize the data

Please plot the ratio of total production between the I-states vs the entire US over the years.

Q1 Text manipulation

On Canvas, you’ll find a file titled indeed_job_descs_2021_03_18_california.json, this problem will walk you through how we generated the dataset for the first practice midterm. Ultimately, we want to know the number of job descriptions that have Python vs R listed.

Q2 Simulation