This repository stores the scripts used to generate the analyses in "Landscape Analyses Using eDNA Metabarcoding and Earth Observation Predict Community Biodiversity in California" by Lin et al. in press at Ecological Applications.
The complete repository including raw data, maps and rasters and derived data can be found at zenodo archive https://doi.org/10.5281/zenodo.4516670.
The raw sequencing results are deposited in Sequence Reads Archive (Bioproject: PRJNA702201).
Contact: Meixi Lin (meixilin[at]ucla[d0t]edu)
Please adapt the scripts input and output directory accordingly to your working directories. Important derived data can be found in the zenodo archive as well.
-
Computational environment
- Most downstream analyses can be conducted on a PC/Mac.
- Some analyses (e.g. sequence processing and gradient forest analyses) were performed on UCLA Hoffman2 cluster program
- Some environmental layers were obtained using Google Earth Engine
-
Most analyses are performed using R. The required R packages are listed in install_packages.R
-
The utility script that records necessary variables and functions can be sourced at function_transect.R
.
├── asv_tables
│ ├── 1_Transect_data_decontamination.R
│ ├── 2_remove_5_dup_sites_1028.R
│ └── 3_import_taxonomy_detailed_10172019.R
├── metadata
│ ├── 1_reproject_raster.R
│ ├── 2_extract_raster.R
│ ├── 3_calculate_geographical_dist_0827.R
│ ├── 3_combine_old_meta.R
│ ├── 4_specify_biom_RData_0412.R
│ ├── 5_update_coastal_0531.R
│ ├── 6.1_correlation_test_metadata_1026.R
│ ├── 6.2_correlation_grouping_metadata_0523.R
│ ├── 7.1_density_distribution.R
│ ├── function_raster.R
│ ├── get_bioclim.sh
│ ├── get_earth_engine.js
│ ├── get_land_cover.js
│ ├── get_soilgrid_100m.sh
│ ├── get_uncertainty_earth_engine.js
│ ├── get_uncertainty_percent_earth_engine.js
│ ├── metadata_reextract_04302019.sh
│ ├── metadata_reproject_04032019.sh
│ ├── x.2_get_other_uncertainty_20200930.R
│ └── x_get_soilgrid_uncertainty_20200904.R
└── sample_map
├── ca_base_map.R
└── sample_map.R
- decontaminate asv tables and remove five replicate sites
- obtain and extract environmental layers
- final taxonomy tables and metadata
- environmental layers with Statewide coverage
.
├── 1_step1_read_newdeco_data_0412.R
├── 2_step1_rarefaction_1107.R
└── rarefaction_evaluation
├── 1_rarefaction_plot_0506.R
├── 2_evaluate_rarefaction_0529.R
└── 3_iNext_evaluation.R
-
create
phyloseq
object from final dataset -
perform rarefaction evaluation and select the appropriate rarefaction depth.
- decontaminated phyloseq objects
- rarefied phyloseq objects
├── derive_data
│ ├── phy_deco # phyloseq objects of each decontaminated metabarcode dataset
│ │ ├── phy_deco_16S.RData
│ │ ├── phy_deco_18S.RData
│ │ ├── phy_deco_CO1.RData
│ │ ├── phy_deco_FITS.RData
│ │ ├── phy_deco_PITS.RData
│ │ ├── phy_deco_all.RData
│ │ ├── phydeco.RData
│ │ └── phydeco_uc.RData
│ ├── phy_rare # phyloseq objects of each rarefied metabarcode dataset
│ │ ├── phy_1000_CO1.RData
│ │ ├── phy_1000_PITS.RData
│ │ ├── phy_2000_16S.RData
│ │ ├── phy_4000_18S.RData
│ │ ├── phy_4000_FITS.RData
│ │ └── phyrare.RData
│ ├── step1_create_phyloseq # evaluation output
│ │ ├── eval_rarefaction
│ │ └── physeq_rare_otu
.
├── concordance
│ ├── 1_summary_5_dup_sites_20200107.R
│ ├── 2_describe_5_dup_sites_20200107.R
│ ├── 2_describe_5_dup_sites_full_20200107.R
│ ├── 3_summary_plots.R
│ ├── 3_summary_plots_cutoff.R
│ ├── 4_pcoa_dup_sites_supp_fig6.R
│ └── describe_5_dup_sites_call.sh
├── gbif_tos
│ └── 1_check_gbif_tos_20200928.R
├── summary_stats
│ ├── 1_decontam_seqs_read_depths.R
│ ├── 2.1_metadata_distributions.R
│ ├── 2.2_raster_data_distribution.R
│ ├── 3_category_correlation.R
│ ├── 4_step1_physeq_basic_stats_0830.R
│ ├── 4_step1_physeq_basic_stats_rarefied.R
│ ├── 5_step1_physeq_plot_stats_06122019.R
│ ├── 6_step1_physeq_plot_by_sample_stats_09022019.R
│ ├── 7_README_step1_physeq_phylo_tree_plot.md
│ ├── 7_step1_physeq_phylo_tree_plot.R
│ └── 7_step1_physeq_phylo_tree_plot.sh
└── ucnrs
├── 1_mk_phyloseq_ucnrs_sites.R
├── 2.1_clean_ucnrs_records_format_typo.R
├── 2.2_clean_ucnrs_taxlevel_genus.R
├── 2.3_clean_ucnrs_taxlevel_fillin.R
├── 3.1_compare_ucnrs_subset_sites.R
├── 3.2_compare_ucnrs_intersect.R
├── 3.3_compare_ucnrs_overall.R
├── README_ucnrs.md
├── README_ucnrs.pdf
└── x.1_match_loc_ucnrs_name.R
- Perform evaluation of stability by comparing the concordance of the five replicated sites
- Generate summary statistics such as metadata distribution, read depth, read count per taxa, etc.
- Perform cleaning of ucnrs records and comparison of ucnrs records with eDNA
- Concordance analyses phyloseq objects and results
- Summary statistics: metadata distribution, sequencing read distribution and taxonomic coverage distribution
- UCNRS analyses: UCNRS cleaned species list and comparison results
.
├── 1.2_step2_alpha_diversity_summary_20200113.R
├── 1_step2_alpha_diversity_calc_0530.R
├── 2.1_step2_alpha_diversity_kruskal_20200113.R
├── 2.2_step2_alpha_diversity_kruskal_eval_20200113.R
├── 3.1_step2_alpha_diversity_individual_lm_20200125.R
├── 3.2_step2_alpha_diversity_indi_lm_plot_FITS_20200504.R
├── 4.1_step2_alpha_diversity_reduce_pls_20200125.R
├── 5.1_step2_alpha_diversity_plot_fig2.R
├── 5.2_step2_alpha_diversity_plot_supp_fig7_loc.R
├── 5.3_step2_alpha_diversity_plot_supp_fig8.R
└── 6.1_step2_alpha_diversity_map.R
- Calculate
observed
andShannon index
for rarefied dataset. - Perform Kruskal Walis testing on the categorical variables.
- Perform individual linear regression on the continuous variables.
- Perfrom partial least square analyses on the continuous variables.
- Alpha diversity values.
- Test results.
.
├── 0_util_get_var_color.R
├── 0_util_query_var_color.R
├── 1_step4_beta_diversity_cal_diss_60152019.R
├── 2.1_step4_beta_diversity_lcbd_06242019.R
├── 3.1_step4_beta_diversity_adonis_0804.R
├── 3.2_step4_beta_diversity_out_adonis_0804.R
├── 4.1_step4_beta_diversity_pcoa_generate_plotobject.R
├── 5.1_step4_beta_diversity_pcoa_envfit.R
├── 5.2_step4_beta_diversity_pcoa_envfit_ordisurf_plot.R
├── 6.1_step4_beta_diversity_cap_generate_plotobject.R
├── 6.2_step4_beta_diversity_out_cap_0805.R
├── manuscript_plotting
│ ├── m_step4_beta_diversity_combined_lcbd_fig3.R
│ ├── m_step4_beta_diversity_combined_ordinations_fig4.R
│ ├── m_step4_beta_diversity_combined_ordinations_resub_fig2.R
│ ├── m_step4_beta_diversity_combined_ordinations_supp_figs.R
│ └── m_step4_beta_diversity_pcoa_envfit_ordisurf_plot_fig6.R
├── no_coast
│ ├── x.1_step3_beta_diversity_cal_diss_nocoast_60152019.R
│ ├── x.2_step3_beta_diversity_nocoast_adonis_0804.R
│ ├── x.3_step3_beta_diversity_nocoast_out_adonis_0804.R
│ ├── x.4_step4_beta_diversity_nocoast_pcoa_plot_0805.R
│ └── x.5_step3_beta_diversity_pcoa_combine_0805.R
└── zoom_in
├── z.1_step4_beta_diversity_adonis_minor_in_major_09302019.R
└── z.2_step4_beta_diversity_pcoa_minor_in_major_09302019.R
- Calculate binary Jaccard dissimilarity
- Generate relative abundance plots
- Perform beta dispersion and PERMANOVA analysis
- Perform PCoA ordination and plotting
- Perform envfit on the PCoA ordination post hoc
- Perform additional beta dispersion and PERMANOVA analysis on samples 1) excluded coastal sites 2) according to minor habitats within major habitats
- Perform cap scale and varpart analyses to remove location effects
- Jaccard dissimilarity measures
- Analyses output
Author: Ariel Levi Simons. His github repository for this section
.
├── CALeDNAZetaFactors.R
├── LICENSE
├── PresenceAbsence.R
├── README.md
├── Zeta4ClassMap.R
├── Zeta4ClassMapCluster.R
├── Zeta4FamilyMap.R
├── Zeta4FamilyMapCluster.R
├── Zeta4eDNA.R
├── ZetaClassCluster.R
├── ZetaFamilyCluster.R
├── ZetaMap.R
├── ZetaeDNA.R
├── step5_glom_to_family.R
├── step5_subset_phyloseq.R
└── zeta.sh
- Calculate
$\zeta_4$ diversity at family level - Perform variable importance testing
.
├── 1_step6_gradient_forest_final
│ ├── 1_step6_gradient_forest_final_all_06060219.R
│ ├── 1_step6_gradient_forest_final_all_qsub_06060219.sh
│ ├── 1_step6_gradient_forest_final_coast_10182019.R
│ ├── 1_step6_gradient_forest_final_coast_qsub_10182019.sh
│ ├── 1_step6_gradient_forest_final_nocoast_10182019.R
│ ├── 1_step6_gradient_forest_final_nocoast_qsub_10182019.sh
│ ├── 2_step4_gradient_forest_final_run_stability_10152019.R
│ └── 2_step4_gradient_forest_final_run_stability_10152019.sh
├── 2_step6_gf_permutate
│ ├── 3.2_plot_permutation_result.R
│ ├── 3_step4_gradient_forest_permutate_06060219.R
│ ├── 3_step4_gradient_forest_permutate_qsub_06060219.sh
│ └── 3_step4_gradient_forest_permutate_qsub_wrapper_06060219.sh
├── 3_step6_gf_plot_object
│ ├── 4.1_step6_gradient_forest_summary_stability.R
│ ├── 4.1_step6_gradient_forest_summary_stability_output.R
│ ├── 4.2_plot_permutation_result.R
│ ├── 4.3_validate_reads_10172019.R
│ ├── 4.4_check_occurrences_predictability.R
│ ├── 4_step6_gradient_forest_plotting_for_publication.R
│ ├── 4_step6_gradient_forest_plotting_function_local.R
│ ├── 4_step6_gradient_forest_species_response_10162019.R
│ ├── other_save_gf_x_y_manuscript.R
│ └── species_response_legend_key.R
└── functions
├── change_density_plot.R
├── change_performance.plot.R
├── change_species_cumulative_plot.R
└── function_gfprep.R
- Perform gradient forest data preparation
- Perform gradient forest analyses using classification tree setting
- Perform gradient forest validation
- Plot gradient forest output
- gradient forest input table
- gradient forest result
.
├── 0.1_prepare_latlong_raster.R
├── 0.2_prepare_raster.R
├── 0.2_prepare_raster_qsub.sh
├── 0.3_prepare_raster_splits.R
├── 0.3_prepare_raster_splits_qsub.sh
├── 0.4_get_random_points.sh
├── all_100m
│ ├── 1_step7_make_prediction_all_20201014.R
│ ├── 1_step7_make_prediction_extrap_qsub.sh
│ ├── 1_step7_make_prediction_noextrap_qsub.sh
│ ├── 2_step7_combine_prediction_20201014.R
│ ├── 2_step7_combine_prediction_extrap_qsub.sh
│ ├── 2_step7_combine_prediction_noextrap_qsub.sh
│ ├── 3_step7_prediction_pc_20201014.R
│ ├── 3_step7_prediction_pc_extrap_qsub.sh
│ ├── 3_step7_prediction_pc_noextrap_qsub.sh
│ ├── 4.1_step7_ref_scale_grid_20201014.R
│ ├── 4.1_step7_ref_scale_grid_qsub.sh
│ ├── 4.2_step7_ref_scale_grid_plot_20201014.R
│ ├── 4.2_step7_ref_scale_grid_plot_qsub.sh
│ ├── 4.3_step7_store_ref_scale_grid_20201014.R
│ ├── 4.3_step7_store_ref_scale_grid_qsub.sh
│ ├── 5_step7_store_prediction_20201014.R
│ ├── 5_step7_store_prediction_extrap_qsub.sh
│ ├── 5_step7_store_prediction_noextrap_qsub.sh
│ ├── record_all_100m_predictions.sh
│ ├── util_compare_previous.R
│ ├── util_compare_previous_qsub.sh
│ └── util_get_md5sum.sh
├── manuscript_plotting
│ ├── 1_fig9_nobio_prediction_pc_plots.R
│ ├── 1_fig9_noextrap_prediction_pc_plots.R
│ └── record_manuscript_plotting_20201025.sh
└── sites_rand_points
├── 1_step7_make_prediction_gf_sites.R
├── 2_step7_make_prediction_random_sites.R
├── 3_step7_ref_scale_sites_random_sites_20200421.R
├── 3_step7_ref_scale_sites_random_sites_20200421_qsub.sh
├── 4_evaluate_prediction.R
└── 5_evaluate_prediction_uncertainty.R
- Perform predictions based on the gradient forest output
- Stacked layers of 33 environmental variables in California at 100 m x 100 m resolution.
- Community turnover map predicted from the gradient forest model.