Tutorial 6: Predicting the relationship between light pollution and density

Last updated: 2021-12-01

Checks: 7 0

Knit directory: ebird_light_pollution/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20211122)

The command set.seed(20211122) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: f639050

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version f639050. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rproj.user/

Untracked files:
    Untracked:  data/.here

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/6_predicting_density_and_light_pollution.Rmd) and HTML (docs/6_predicting_density_and_light_pollution.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
html	f639050	markravinet	2021-12-01	Build site.
Rmd	1823e59	markravinet	2021-12-01	fix title
html	ab07fa7	markravinet	2021-12-01	Build site.
Rmd	5d285b4	markravinet	2021-12-01	tutorial 6

Introduction

In the last tutorial, we extracted the light pollution data for the points of occurence of our bird species. This time, we will read that data back into R and then predict the relationship between the two. There are some similarities here to what we did when we drew a density map, so it may be worth looking over that tutorial if needed.

Step 1: Installing packages and getting the data

We are once again only using packages we have already worked with, so we just need to load them like so:

library(auk)
library(tidyverse)
library(sf)
library(rnaturalearth)
library(rnaturalearthdata) # necessary to draw a detailed map
library(rgeos)
library(raster)
library(rgdal)
library(viridis)

Next we need to read in the species and light pollution data we created in the last tutorial. We can set the filepath variable first. We will also set the outpath variable too.

filepath <- "./house_sparrow_light_data_2017.csv"
outfile <- "./house_2017_light_occurrence_prediction.csv"

Then we just in the data like so:

my_data <- read_csv(filepath)

We are now ready to filter it for our analysis. We will do this in the next section.

Step 2: Filtering the data

As with tutorial 4, we need to extract the data for our downstream analysis. Do take a moment to look at tutorial 4 for a refresher on what this R code is actually doing.

# this is the zero-filled checklist bird data with the light data
my_data_sf <- my_data %>%
  # this extracts coordinates
  st_as_sf(coords = c("longitude", "latitude"), crs = 4236) %>%
  # this projects the presence absence data across an equal area
  st_transform(crs ="+proj=laea +lat_0=44 +lon_0=-71.70") %>% 
  # this bit gets the species information
  inner_join(ebird_taxonomy %>% 
               dplyr::select(species_code, scientific_name, common_name),
             by = "scientific_name") %>% 
  # finally we extract the data we want 
  dplyr::select(species_code, common_name, scientific_name, species_observed, light)

Once the data is filtered out and ready - i.e. into a table with our species names, code, whether or not the were observed at this location, the light pollution and finally the geometry - i.e. the latitude and longitude for each row of the data frame (i.e. each observation). We’re now ready for our spatial analysis.

Step 3: Performing a spatial analysis

First, we need to take our data and convert it to a raster object. This is really simple to do:

# then we convert to a raster file
r_template <- raster(as(my_data_sf, "Spatial"))

This gives us presence and absence, plus light pollution indexes for each pixel. We can then estimate the ‘mean’ presence absence (somewhat meaningless - we will come to this in the next tutorial) and the mean light pollution values for our given spatial scale. As before, we will set the scale to 5000 metres or 5km.

# we can now set the resolution of we want to visualise density with - we will go for 5km grid cells
res(r_template) <- 5000

Next we perform rasterization in order to calculate the mean values of light and presence absence across these grid squares.

# next we calculate the frequency of occurrence AND the mean light data
my_raster <- rasterize(my_data_sf, r_template, field = c("light", "species_observed"), fun = mean)

With this done we are ready to take the next steps for looking at the data. To do this, we actually have to convert back from a raster to point data - i.e. as a data frame or tibble.

# finally we convert the rasta back to points
my_predict <- rasterToPoints(my_raster) %>% as_tibble()
# this is the predicted frequency of occurrence and mean light data

Take a moment to look at this. What we have here is the predicted probability of occurrence in a 5km grid square (i.e. species_observed) and the mean light pollution values, alongside their x and y (longitude) coordinates. It is this we can use to visualise our relationships and get an estimate of the level of association between species and light.

Step 4: Finishing up

We can take a look at the relationship between light pollution and whether a species is observed with a simple scatter plot. We do this using ggplot2 like so.

# this means we can visualise the relationship between prob of occurrence and light pollution
my_predict %>% ggplot(aes(light, species_observed)) + geom_point()

However this isn’t really that clear - perhaps there is a better way to do this? We shall investigate that in the final tutorial. For now we will write out the data and save it for later.

# before we go any further - write out the data to save it!
write_csv(my_predict, outfile)

sessionInfo()

R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] viridis_0.6.2           viridisLite_0.4.0       rgdal_1.5-27           
 [4] raster_3.5-2            rgeos_0.5-8             sp_1.4-5               
 [7] rnaturalearthdata_0.1.0 rnaturalearth_0.1.0     sf_1.0-3               
[10] forcats_0.5.1           stringr_1.4.0           dplyr_1.0.7            
[13] purrr_0.3.4             readr_2.0.2             tidyr_1.1.4            
[16] tibble_3.1.5            ggplot2_3.3.5           tidyverse_1.3.1        
[19] auk_0.5.1               workflowr_1.6.2        

loaded via a namespace (and not attached):
 [1] httr_1.4.2         jsonlite_1.7.2     modelr_0.1.8       assertthat_0.2.1  
 [5] cellranger_1.1.0   yaml_2.2.1         pillar_1.6.4       backports_1.3.0   
 [9] lattice_0.20-45    glue_1.5.0         digest_0.6.28      promises_1.2.0.1  
[13] rvest_1.0.2        colorspace_2.0-2   htmltools_0.5.2    httpuv_1.6.3      
[17] pkgconfig_2.0.3    broom_0.7.10       haven_2.4.3        scales_1.1.1      
[21] whisker_0.4        terra_1.4-11       later_1.3.0        tzdb_0.2.0        
[25] git2r_0.28.0       proxy_0.4-26       generics_0.1.1     ellipsis_0.3.2    
[29] withr_2.4.2        cli_3.1.0          magrittr_2.0.1     crayon_1.4.2      
[33] readxl_1.3.1       evaluate_0.14      fs_1.5.0           fansi_0.5.0       
[37] xml2_1.3.2         class_7.3-19       tools_4.1.2        hms_1.1.1         
[41] lifecycle_1.0.1    munsell_0.5.0      reprex_2.0.1       compiler_4.1.2    
[45] jquerylib_0.1.4    e1071_1.7-9        rlang_0.4.12       classInt_0.4-3    
[49] units_0.7-2        grid_4.1.2         rstudioapi_0.13    rmarkdown_2.11    
[53] codetools_0.2-18   gtable_0.3.0       DBI_1.1.1          R6_2.5.1          
[57] gridExtra_2.3      lubridate_1.8.0    knitr_1.36         fastmap_1.1.0     
[61] utf8_1.2.2         rprojroot_2.0.2    KernSmooth_2.23-20 stringi_1.7.5     
[65] Rcpp_1.0.7         vctrs_0.3.8        dbplyr_2.1.1       tidyselect_1.1.1  
[69] xfun_0.28