Most of us end up having to work with Census data on a regular basis. In the past, we used American Factfinder (replaced by the data.census.gov) or then the Interuniversity Consortium of Political and Social Research (ICPSR) databases to grab the data we needed. Mercifully, with the opening up of a lot of government data, accessing Census data has become very easy. There are several packages that allow you to do so but I will initially focus on two packages – {tidycensus} and {censusapi} – given their ease of use. In addition to these packages, I will show you how to work with APIs to get data from the World Bank, the U.S. Geological Survey (USGS), and other sources. Please also be sure to look at IPUMS
Most of us end up having to work with Census data on a regular basis.
In the past, we used American Factfinder or then the ICPSR databases to
grab the data we needed. Mercifully, with the opening up of a lot of
government data grabbing Census data has become very easy. There are
several packages that allow you to do so but I will focus on two
packages – {tidycensus}
and {censusapi}
– given their ease of use. In addition to these packages, I will also
show you how to use APIs to access data from the World Bank and from the
U.S. Geological Survey (USGS).
{tidycensus}
Let us start with {tidycensus}
. As usual, we install the
package and, since we will need it, get a Census API key. You can signup
for the key here. Check your
email and write your key somewhere and remember it. Install
{tidycensus}
next.
library(tidycensus)
library(tidyverse)
census_api_key("YOUR API KEY GOES HERE", install = TRUE)
Your key will be saved to .Renviron
and automatically
used in future sessions with {tidycensus}
so be sure to not
use the install = TRUE
switch for subsequent runs otherwise
you will be asked every time if you want to overwrite the previous
setting in .Renviron
.
The two data-streams of interest will either be the decennial census or then, most likely, the regularly updated American Community Survey (ACS) series. You can get each with specific commands:
load_variables(
year = 2016,
dataset = "acs5",
cache = TRUE
-> my.vars.acs
)
load_variables(
year = 2000,
dataset = "sf3",
cache = TRUE
-> my.vars.d.sf3 )
What if I am interested in the Summary File 1 from 2010 or Summary File 3 from 2000?1
load_variables(
year = 2010,
dataset = "sf1",
cache = TRUE
-> my.vars.d.sf1
)
load_variables(
year = 2000,
dataset = "sf3",
cache = TRUE
-> my.vars.d.sf3 )
Each of these will show you what variables are available, and make it
easier for you to choose the ones you want to work with. Assume we want
total population, variable P0010001
, from the decennial
census
get_decennial(
geography = "state",
variables = c(totpopn = "P001001")
-> state.popn.d
)
get_decennial(
geography = "county",
variables = c(totpopn = "P001001"),
state = "OH"
-> county.popn.d
)
get_decennial(
geography = "tract",
variables = c(totpopn = "P001001"),
state = "OH",
county = "Athens"
-> tract.popn.d )
and then from the ACS. Note that the default for each is 2010 and the most recent ACS (2012-2016).
get_acs(
geography = "state",
variables = c(totpopn = "B01003_001")
-> state.popn.acs
)
get_acs(
geography = "county",
variables = c(totpopn = "B01003_001"),
state = "OH"
-> county.popn.acs
)
get_acs(
geography = "tract",
variables = c(totpopn = "B01003_001"),
state = "OH",
county = "Athens"
-> tract.popn.acs )
You can also get an entire table instead of having to list variables, such as shown below, with the default decennial census for the package’s current version (2010).
get_decennial(
geography = "county",
table = c("P012")
-> county.table.d
)
get_decennial(
geography = "state",
table = c("P012")
-> state.table.d )
Population data for all tracts in the country?
library(tidycensus)
library(purrr)
unique(fips_codes$state)[1:51] -> us
map_df(us, function(x) {
get_acs(
geography = "tract",
variables = "B01003_001",
state = x,
geometry = TRUE)
}-> totalpop )
Often you want to convert some measure into a percent. You can do
that either by also downloading the denominator, or then leaning on
built-in functions (as shown below). In this example, I am grabbing the
B02001
table for all tracts in Cuyahoga county, Ohio.
get_acs(
"tract",
county = "Cuyahoga",
state = "Ohio",
year = 2019,
table = "B02001",
summary_var = "B02001_001",
geometry = TRUE
-> cuy
)
%>%
cuy head()
Simple feature collection with 6 features and 7 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -81.60645 ymin: 41.52275 xmax: -81.5981 ymax: 41.53004
Geodetic CRS: NAD83
GEOID NAME variable
1 39035118400 Census Tract 1184, Cuyahoga County, Ohio B02001_001
2 39035118400 Census Tract 1184, Cuyahoga County, Ohio B02001_002
3 39035118400 Census Tract 1184, Cuyahoga County, Ohio B02001_003
4 39035118400 Census Tract 1184, Cuyahoga County, Ohio B02001_004
5 39035118400 Census Tract 1184, Cuyahoga County, Ohio B02001_005
6 39035118400 Census Tract 1184, Cuyahoga County, Ohio B02001_006
estimate moe summary_est summary_moe geometry
1 1203 199 1203 199 MULTIPOLYGON (((-81.60645 4...
2 18 18 1203 199 MULTIPOLYGON (((-81.60645 4...
3 1137 193 1203 199 MULTIPOLYGON (((-81.60645 4...
4 11 17 1203 199 MULTIPOLYGON (((-81.60645 4...
5 0 11 1203 199 MULTIPOLYGON (((-81.60645 4...
6 0 11 1203 199 MULTIPOLYGON (((-81.60645 4...
Now I can run mutate(...)
to convert each estimate into
its corresponding percentage.
%>%
cuy mutate(
pct = round(estimate / summary_est) * 100
-> cuy
)
%>%
cuy head()
Simple feature collection with 6 features and 8 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -81.60645 ymin: 41.52275 xmax: -81.5981 ymax: 41.53004
Geodetic CRS: NAD83
GEOID NAME variable
1 39035118400 Census Tract 1184, Cuyahoga County, Ohio B02001_001
2 39035118400 Census Tract 1184, Cuyahoga County, Ohio B02001_002
3 39035118400 Census Tract 1184, Cuyahoga County, Ohio B02001_003
4 39035118400 Census Tract 1184, Cuyahoga County, Ohio B02001_004
5 39035118400 Census Tract 1184, Cuyahoga County, Ohio B02001_005
6 39035118400 Census Tract 1184, Cuyahoga County, Ohio B02001_006
estimate moe summary_est summary_moe geometry
1 1203 199 1203 199 MULTIPOLYGON (((-81.60645 4...
2 18 18 1203 199 MULTIPOLYGON (((-81.60645 4...
3 1137 193 1203 199 MULTIPOLYGON (((-81.60645 4...
4 11 17 1203 199 MULTIPOLYGON (((-81.60645 4...
5 0 11 1203 199 MULTIPOLYGON (((-81.60645 4...
6 0 11 1203 199 MULTIPOLYGON (((-81.60645 4...
pct
1 100
2 0
3 100
4 0
5 0
6 0
The White count is B02001_002
and Black is
B02001_003
Say I wanted to map just these two for the tracts.
%>%
cuy filter(
%in% c('B02001_002', 'B02001_003')
variable %>%
) mutate(
popgroup = case_when(
== "B02001_002" ~ 'White Alone',
variable == "B02001_003" ~ 'Black Alone'
variable
)-> cuy_df
)
%>%
cuy_df ggplot() +
geom_sf(
aes(fill = pct)
+
) facet_wrap(~ popgroup) +
theme_void() +
theme(legend.position = 'top') +
scale_fill_viridis_c(option = "rocket") +
labs(
fill = "Percent"
)
I threw in the mapping here to show how easy it would be to pipe through from getting data to visualizing it.
{censusapi}
This package will allow you to grab a vast array of Census products, and very easily I might add.
library(censusapi)
listCensusApis() -> apis_df
Application Programming Interfaces (APIs) could have slightly varying parameters so it pays to check the API documentation available here.
It is easy to find variable names via the built-in function
listCensusMetadata
. As an example, the bureau’s small area
health insurance estimates are shown below, as well as the small area
income and poverty estimates.
listCensusMetadata(
name = "timeseries/healthins/sahie",
type = "variables"
-> sahie_vars
)
listCensusMetadata(
name = "timeseries/poverty/saipe",
type = "variables"
-> saipe_vars )
Curious about available geographies? Switch type =
to
geography
:
listCensusMetadata(
name = "timeseries/healthins/sahie",
type = "geography"
-> sahie_geos
)
listCensusMetadata(
name = "timeseries/poverty/saipe",
type = "geography"
-> saipe_geos )
Grab the most recent county-level data but note that the latest SAHIE
are for 2015 while the latest SAIPE are for 2016. Every variable is
downloaded as a chr
so you will need to flip what should be
numeric into numeric.
getCensus(
name = "timeseries/healthins/sahie",
vars = c("NAME", "IPRCAT", "IPR_DESC", "PCTUI_PT"),
region = "county:*",
regionin = "state:39",
time = 2015
-> sahie_counties
)
head(sahie_counties, n = 12L)
getCensus(
name = "timeseries/poverty/saipe",
vars = c("NAME", "SAEPOVRTALL_PT", "SAEMHI_PT"),
region = "county:*",
regionin = "state:39",
time = 2016
-> saipe_counties
)
head(saipe_counties, n = 12L)
%>%
saipe_counties mutate(
prate = as.numeric(SAEPOVRTALL_PT),
mdhinc = as.numeric(SAEMHI_PT)
-> saipe_counties )
If you want data for a lot of geographies, the package will let you
do it in seconds. For example, if you want tract-level data for
all tracts
, you can get it as shown below:
fips
[1] "01" "02" "04" "05" "06" "08" "09" "10" "11" "12" "13" "15" "16"
[14] "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28" "29"
[27] "30" "31" "32" "33" "34" "35" "36" "37" "38" "39" "40" "41" "42"
[40] "44" "45" "46" "47" "48" "49" "50" "51" "53" "54" "55" "56"
<- NULL
tracts
for (f in fips) {
<- paste("state:", f, sep = "")
stateget <- getCensus(
temp name = "dec/sf1",
vintage = 2010,
vars = c("NAME", "P001001", "H010001"),
region = "tract:*",
regionin = stateget
)<- rbind(tracts, temp)
tracts
}
head(tracts)
state county tract NAME
1 01 001 020100 Census Tract 201, Autauga County, Alabama
2 01 001 020500 Census Tract 205, Autauga County, Alabama
3 01 001 020300 Census Tract 203, Autauga County, Alabama
4 01 001 020400 Census Tract 204, Autauga County, Alabama
5 01 001 020200 Census Tract 202, Autauga County, Alabama
6 01 001 020600 Census Tract 206, Autauga County, Alabama
P001001 H010001
1 1912 1912
2 10766 10585
3 3373 3373
4 4386 4386
5 2170 1989
6 3668 3668
Notice that you had to specify the region
, and since
tracts are nested within states, you had to add
regionin = stateget
.
Two packages are available to access data gathered by the World Bank,
{data360r}
and {wbstats}
.
Install both packages, and then we can start with
{data360r}
, seeing what indicators, basic country-level
information, and data-sets are available.
library(data360r)
#get all indicator metadata in Govdata360
get_metadata360(
site = "gov",
metadata_type = "indicators"
-> df_indicators
)
#get all country metadata in TCdata360
get_metadata360(
metadata_type = 'countries'
-> df_countries
)
#get all dataset metadata in TCdata360
get_metadata360(
metadata_type = 'datasets'
-> df_datasets )
Once you have identified a particular table, note it’s ID, and then
pull it. Say I want indicator 90 =
Can a married woman register a business in the same way as a married man?
I can do the same thing for more than one indicator by specifying the
IDs.
get_data360(
indicator_id = c(90)
-> df_ind90
)
get_data360(
indicator_id = c(28130, 28131)
-> df_indtwo )
If I only want all data for a specific country or just the indicators
we pull in df_ind1
for a specific country you could do:
get_data360(
country_iso3 = "IND"
-> df_allone
)
search_360(
"woman business",
search_type = "indicator",
limit_results = 5
-> df_ind1
)
get_data360(
indicator_id = df_ind1$id,
country_iso3 = "IND"
-> df_allindtwo )
Now an example with two measures – legal age of marriage for boys and
girls. Note that the package allows you to specify the long
format (preferred) than the default wide
format you see
earlier results being returned in. Note also the use of
timeframes = c()
that allows you to specify the specific
time period you want the indicators for.
search_360("marriage", search_type = "indicator")
id
1: 204
2: 205
3: 206
4: 207
5: 43472
6: 40083
7: 40080
8: 40081
9: NA
name
1: What is the legal age of marriage for boys?
2: What is the legal age of marriage for girls?
3: Are there any exceptions to the legal age of marriage?
4: Does the law prohibit or invalidate child or early marriage?
5: Does the law grant spouses equal administrative authority over assets during marriage?
6: What is the minimum age of marriage with judicial authorization for girls?
7: What is the minimum age of marriage with parental consent for boys?
8: What is the minimum age of marriage with parental consent for girls?
9: Are there penalties in the law for authorizing or knowingly entering into the child or early marriage?'
slug type score dataset
1: age.marr.male indicator 0.11111111 Women, Business and the Law
2: age.marr.fem indicator 0.11111111 Women, Business and the Law
3: marry.age.except indicator 0.10000000 Women, Business and the Law
4: chld.marr indicator 0.10000000 Women, Business and the Law
5: hf8cada17 indicator 0.08333333 Women, Business and the Law
6: h5e7b11fe indicator 0.08333333 Women, Business and the Law
7: haab6520b indicator 0.08333333 Women, Business and the Law
8: hdecd0041 indicator 0.08333333 Women, Business and the Law
9: hb9d0a754 indicator 0.05882353 Women, Business and the Law
redirect
1: FALSE
2: FALSE
3: FALSE
4: FALSE
5: FALSE
6: FALSE
7: FALSE
8: FALSE
9: FALSE
get_data360(
indicator_id = c(204, 205),
timeframes = c(2016),
output_type = 'long'
-> df_marriage
)
ggplot(
df_marriage,aes(
x = Observation,
group = Indicator,
fill = Indicator
)+
) geom_bar() +
theme(legend.position = "none") +
facet_wrap(~ Indicator, ncol = 1) +
labs(
x = "Legal Age of Marriage",
y = "Frequency",
title = "Legal Age of Marriage for Boys vs. Girls",
subtitle = "(2016)",
caption = "Source: World Bank Data"
)
The {wbstats}
package does pretty much the same thing.
Let us see the core functionality by loading the library and then seeing
what is available in terms of indicators, topics, and so on. We can then
set the most current list of information in wb_cachelist
to
be used via new_cache
. Doing so speeds up the operations
and ensures that you are getting the latest data.
library(wbstats)
str(wb_cachelist, max.level = 1)
List of 8
$ countries : tibble [297 × 18] (S3: tbl_df/tbl/data.frame)
$ indicators : tibble [16,643 × 8] (S3: tbl_df/tbl/data.frame)
$ sources : tibble [62 × 9] (S3: tbl_df/tbl/data.frame)
$ topics : tibble [21 × 3] (S3: tbl_df/tbl/data.frame)
$ regions : tibble [42 × 4] (S3: tbl_df/tbl/data.frame)
$ income_levels: tibble [7 × 3] (S3: tbl_df/tbl/data.frame)
$ lending_types: tibble [4 × 3] (S3: tbl_df/tbl/data.frame)
$ languages : tibble [23 × 3] (S3: tbl_df/tbl/data.frame)
wb_cache() -> new_cache
What indicators are available on the topic of corruption?
wb_search(pattern = "corruption") -> corruption_vars
head(corruption_vars)
# A tibble: 6 × 3
indicator_id indicator indicator_desc
<chr> <chr> <chr>
1 CC.EST Control of Corruption: Estimate "Control of C…
2 CC.NO.SRC Control of Corruption: Number of So… "Control of C…
3 CC.PER.RNK Control of Corruption: Percentile R… "Control of C…
4 CC.PER.RNK.LOWER Control of Corruption: Percentile R… "Control of C…
5 CC.PER.RNK.UPPER Control of Corruption: Percentile R… "Control of C…
6 CC.STD.ERR Control of Corruption: Standard Err… "Control of C…
If I want information from a particular source, say from Bloomberg,
wb_search(
pattern = "Bloomberg",
fields = "source_org"
-> blmbrg_vars
)
head(blmbrg_vars)
# A tibble: 2 × 3
indicator_id indicator indicator_desc
<chr> <chr> <chr>
1 GFDD.OM.02 Stock market return (%, year-on-year) Stock market ret…
2 GFDD.SM.01 Stock price volatility Stock price vola…
Searching for indicators tied to multiple subjects is easy as well:
wb_search(
pattern = "poverty | unemployment | employment"
-> povemply_vars
)
head(povemply_vars)
# A tibble: 6 × 3
indicator_id indicator indicator_desc
<chr> <chr> <chr>
1 1.0.HCount.1.90usd Poverty Headcount ($1.90 a day) The poverty h…
2 1.0.HCount.2.5usd Poverty Headcount ($2.50 a day) The poverty h…
3 1.0.HCount.Mid10to50 Middle Class ($10-50 a day) Hea… The poverty h…
4 1.0.HCount.Ofcl Official Moderate Poverty Rate-… The poverty h…
5 1.0.HCount.Poor4uds Poverty Headcount ($4 a day) The poverty h…
6 1.0.HCount.Vul4to10 Vulnerable ($4-10 a day) Headco… The poverty h…
Once we identify what we want, downloading the data is easy as well, needing us to specify just the indicator(s) and then the start and end dates, and then specific country codes if you want data for specific countries. Below I am pulling total population.
wb_data(
indicator = "SP.POP.TOTL",
start_date = 1960,
end_date = 2016
-> pop_data1
)
wb_data(
country = c("ABW","AF", "SSF", "ECA", "IND", "CHN"),
indicator = "SP.POP.TOTL",
start_date = 1960,
end_date = 2016
-> pop_data2
)
wb_data(
country = c("ABW","AF", "SSF", "ECA", "IND", "CHN"),
indicator = c("SP.POP.TOTL", "NY.GDP.MKTP.CD"),
start_date = 1960,
end_date = 2016
-> pop_data3
)
head(pop_data3)
# A tibble: 6 × 6
iso2c iso3c country date NY.GDP.MKTP.CD SP.POP.TOTL
<chr> <chr> <chr> <dbl> <dbl> <dbl>
1 AW ABW Aruba 1960 NA 54208
2 AW ABW Aruba 1961 NA 55434
3 AW ABW Aruba 1962 NA 56234
4 AW ABW Aruba 1963 NA 56699
5 AW ABW Aruba 1964 NA 57029
6 AW ABW Aruba 1965 NA 57357
wb_data(
country = c("ABW","AF", "SSF", "ECA", "IND", "CHN"),
indicator = c("SP.POP.TOTL", "NY.GDP.MKTP.CD"),
start_date = 1960,
end_date = 2016,
return_wide = TRUE
-> pop_data4
)
head(pop_data4)
# A tibble: 6 × 6
iso2c iso3c country date NY.GDP.MKTP.CD SP.POP.TOTL
<chr> <chr> <chr> <dbl> <dbl> <dbl>
1 AW ABW Aruba 1960 NA 54208
2 AW ABW Aruba 1961 NA 55434
3 AW ABW Aruba 1962 NA 56234
4 AW ABW Aruba 1963 NA 56699
5 AW ABW Aruba 1964 NA 57029
6 AW ABW Aruba 1965 NA 57357
By default wb_data()
will return the data in long format
but not necessarily in a tidy format. If you want the data returned on
call in a wide format, specify return_wide = TRUE
and you
will have tidy
data.
If you will be working with dates, whether for plotting purposes or
otherwise, then activate the date_as_class_date = TRUE
switch. Otherwise you will have to manually format the date
variables.
wb_data(
country = c("IND", "CHN"),
indicator = c("SP.POP.TOTL"),
start_date = 1960,
end_date = 2016,
return_wide = TRUE,
date_as_class_date = TRUE
-> pop_data5
)
head(pop_data5)
# A tibble: 6 × 10
iso2c iso3c country date SP.POP.TOTL unit obs_status footnote
<chr> <chr> <chr> <date> <dbl> <chr> <chr> <chr>
1 CN CHN China 1960-01-01 667070000 <NA> <NA> <NA>
2 CN CHN China 1961-01-01 660330000 <NA> <NA> <NA>
3 CN CHN China 1962-01-01 665770000 <NA> <NA> <NA>
4 CN CHN China 1963-01-01 682335000 <NA> <NA> <NA>
5 CN CHN China 1964-01-01 698355000 <NA> <NA> <NA>
6 CN CHN China 1965-01-01 715185000 <NA> <NA> <NA>
# … with 2 more variables: last_updated <date>, obs_resolution <chr>
library(scales)
ggplot(
pop_data5, aes(
x = date,
y = SP.POP.TOTL)
+
) geom_line(
aes(
color = country
)+
) scale_y_continuous(labels = comma) +
scale_x_date(date_breaks = "10 years") +
theme(legend.position = "bottom") +
labs(
x = "Date",
y = "Total Population"
+
) theme_minimal()
In the case that you want the most recent data, use mrv
switch, and note that the number indicates how many of the most recent
values you would like to access. This will also return rows of data
where there are no recent values. If you wish to avoid this latter
result, use mrnev
instead.
wb_data(
country = "all",
indicator = c("SP.POP.TOTL"),
mrv = 1,
return_wide = TRUE,
date_as_class_date = TRUE
-> pop_data6
)
head(pop_data6)
# A tibble: 6 × 10
iso2c iso3c country date SP.POP.TOTL unit obs_status footnote
<chr> <chr> <chr> <date> <dbl> <chr> <chr> <chr>
1 AW ABW Aruba 2020-01-01 106766 <NA> <NA> <NA>
2 <NA> AFE Africa… 2020-01-01 677243299 <NA> <NA> <NA>
3 AF AFG Afghan… 2020-01-01 38928341 <NA> <NA> <NA>
4 <NA> AFW Africa… 2020-01-01 458803476 <NA> <NA> <NA>
5 AO AGO Angola 2020-01-01 32866268 <NA> <NA> <NA>
6 AL ALB Albania 2020-01-01 2837743 <NA> <NA> Extrapo…
# … with 2 more variables: last_updated <date>, obs_resolution <chr>
Eritrea has missing data. To avoid this we could have done the following (code not executed below) …
wb_data(
country = "all",
indicator = c("SP.POP.TOTL"),
mrnev = 1,
freq = "Y",
return_wide = TRUE,
date_as_class_date = TRUE
-> pop_data7
)
head(pop_data7)
We could have also asked for data by countries only, administrative regions, and so on.
wb_data(
country = "regions_only",
indicator = c("SP.POP.TOTL"),
mrv = 1
-> regions
)
wb_data(
country = "admin_regions_only",
indicator = c("SP.POP.TOTL"),
mrv = 1
-> admin_regions
)
wb_data(
country = "income_levels_only",
indicator = c("SP.POP.TOTL"),
mrv = 1
-> income_levels
)
wb_data(
country = "lending_types_only",
indicator = c("SP.POP.TOTL"),
mrv = 1
-> lending_types )
What if I want the data in Chinese?
wb_data(
country = "lending_types_only",
indicator = c("SP.POP.TOTL"),
mrv = 1,
lang = "zh"
-> lending_types_chinese
)
head(lending_types_chinese)
# A tibble: 3 × 9
iso2c iso3c country date SP.POP.TOTL unit obs_status footnote
<chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
1 XF IBD 只有IBRD 2020 4862388283 <NA> <NA> <NA>
2 XH IDB IDA混合 2020 574159138 <NA> <NA> <NA>
3 XI IDX 只有IDA 2020 1134444535 <NA> <NA> <NA>
# … with 1 more variable: last_updated <date>
The {dataRetrieval}
package gives you easy access to
water data gathered and warehoused by the USGS, USDA, EPA, and other
entities. The package has an excellent tutorial available
here so I will not go into too many details and nuances here. Start
by installing the package and then loading it.
library(dataRetrieval)
You will need to know the site(s) you are interested in, as well as the parameters and statistics of interest. The package comes with a built-in data-set that shows you the parameters available, and the complete list of statistics is available here. Sites can be located from this inventory.
-> parameterCdFile
parameterCdFile
names(parameterCdFile)
[1] "parameter_cd" "parameter_group_nm" "parameter_nm"
[4] "casrn" "srsname" "parameter_units"
If you are curious about a specific parameter, you can see what all
is available for it. I’ll look for anything related to the keyword
storm
, and also what parameter units are available.
parameterCdFile[grep("storm",
$parameter_nm,
parameterCdFileignore.case = TRUE),
-> stormq
]
unique(stormq$parameter_units)
[1] "nu" "hours" "minutes" "Mgal" "ft3/s" "mgd"
[7] "in"
Let us do a quick grab of some data for the Hocking River at Athens, Ohio.
<- "03159500"
siteNo <- "00065"
pCode <- "2014-10-01"
start.date <- "2021-03-20"
end.date
readNWISuv(
siteNumbers = siteNo,
parameterCd = pCode,
startDate = start.date,
endDate = end.date
-> hocking )
Since the column names are based on parameter codes and hence cryptic, you can clean them up, and also see other attributes embedded in the data-set.
names(hocking)
[1] "agency_cd" "site_no" "dateTime"
[4] "X_00065_00000" "X_00065_00000_cd" "tz_cd"
renameNWISColumns(hocking) -> hocking
names(hocking)
[1] "agency_cd" "site_no" "dateTime" "GH_Inst" "GH_Inst_cd"
[6] "tz_cd"
names(attributes(hocking))
[1] "names" "row.names" "class" "url"
[5] "siteInfo" "variableInfo" "disclaimer" "statisticInfo"
[9] "queryTime"
If I wanted data for multiple sites, I could find the site numbers and then grab the data.
<- c("03158200", "03159246")
sites <- "00065"
pcode <- "2014-10-01"
start.date <- "2019-01-01"
end.date
readNWISuv(
siteNumbers = sites,
parameterCd = pcode,
startDate = start.date,
endDate = end.date
-> hocking2
)
renameNWISColumns(hocking2) -> hocking2
Now a simple time-series plot of gage height for both sites. Note that although I asked for data going far back, not all sites have all data for all time periods so it helps to check the site inventory first.
attr(hocking2, "variableInfo") -> parameterInfo
ifelse(
$site_no == "03158200",
hocking2"Monday Creek at Doanville",
"Sunday Creek Below Millfield"
-> hocking2$station
)
as.Date(
as.character(hocking2$dateTime),
format = "%Y-%m-%d"
-> hocking2$mydates
)
ggplot(
data = hocking2,
aes(
x = mydates,
y = GH_Inst,
color = station
)+
) geom_line() +
labs(
x = "",
y = parameterInfo$variableDescription
+
) scale_x_date(date_breaks = "16 weeks") +
theme_minimal() +
theme(legend.position = "bottom")
There was no Summary File 3 generated for 2010.↩︎
For attribution, please cite this work as
Ruhil (2022, Feb. 16). Using APIs from the Census Bureau & other sources. Retrieved from https://aniruhil.org/courses/mpa6020/handouts/module06.html
BibTeX citation
@misc{ruhil2022using, author = {Ruhil, Ani}, title = {Using APIs from the Census Bureau & other sources}, url = {https://aniruhil.org/courses/mpa6020/handouts/module06.html}, year = {2022} }