diff --git a/.DS_Store b/.DS_Store new file mode 100644 index 0000000..aa27799 Binary files /dev/null and b/.DS_Store differ diff --git a/config.yaml b/config.yaml index 37ac1ee..e832b95 100644 --- a/config.yaml +++ b/config.yaml @@ -59,9 +59,8 @@ contact: 'Daria Orlowska at daria.orlowska@wmich.edu' # Order of episodes in your lesson episodes: -- introduction.md - dmp.md -- data-resources.md +- dmp-resources.md - supporting-researchers.md - prioritizing-services.md diff --git a/episodes/.DS_Store b/episodes/.DS_Store new file mode 100644 index 0000000..24fd1fc Binary files /dev/null and b/episodes/.DS_Store differ diff --git a/episodes/data-resources.md b/episodes/data-resources.md deleted file mode 100644 index 8d79d32..0000000 --- a/episodes/data-resources.md +++ /dev/null @@ -1,101 +0,0 @@ ---- -title: 'data-resources' -teaching: 10 -exercises: 2 ---- - -:::::::::::::::::::::::::::::::::::::: questions - -- How do you write a lesson using R Markdown and `{sandpaper}`? - -:::::::::::::::::::::::::::::::::::::::::::::::: - -::::::::::::::::::::::::::::::::::::: objectives - -- Explain how to use markdown with the new lesson template -- Demonstrate how to include pieces of code, figures, and nested challenge blocks - -:::::::::::::::::::::::::::::::::::::::::::::::: - -## Introduction - -This is a lesson created via The Carpentries Workbench. It is written in -[Pandoc-flavored Markdown][pandoc] for static files (with extension `.md`) and -[R Markdown][r-markdown] for dynamic files that can render code into output -(with extension `.Rmd`). Please refer to the [Introduction to The Carpentries -Workbench][carpentries-workbench] for full documentation. - -What you need to know is that there are three sections required for a valid -Carpentries lesson template: - - 1. `questions` are displayed at the beginning of the episode to prime the - learner for the content. - 2. `objectives` are the learning objectives for an episode displayed with - the questions. - 3. `keypoints` are displayed at the end of the episode to reinforce the - objectives. - -:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: instructor - -Inline instructor notes can help inform instructors of timing challenges -associated with the lessons. They appear in the "Instructor View" - -:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: - -::::::::::::::::::::::::::::::::::::: challenge - -## Challenge 1: Can you do it? - -What is the output of this command? - -```r -paste("This", "new", "lesson", "looks", "good") -``` - -:::::::::::::::::::::::: solution - -## Output - -```output -[1] "This new lesson looks good" -``` - -::::::::::::::::::::::::::::::::: - - -## Challenge 2: how do you nest solutions within challenge blocks? - -:::::::::::::::::::::::: solution - -You can add a line with at least three colons and a `solution` tag. - -::::::::::::::::::::::::::::::::: -:::::::::::::::::::::::::::::::::::::::::::::::: - -## Figures - -You can use pandoc markdown for static figures with the following syntax: - -`![optional caption that appears below the figure](figure url){alt='alt text for -accessibility purposes'}` - -![You belong in The Carpentries!](https://raw.githubusercontent.com/carpentries/logo/master/Badge_Carpentries.svg){alt='Blue Carpentries hex person logo with no text.'} - -## Math - -One of our episodes contains $\LaTeX$ equations when describing how to create -dynamic reports with {knitr}, so we now use mathjax to describe this: - -`$\alpha = \dfrac{1}{(1 - \beta)^2}$` becomes: $\alpha = \dfrac{1}{(1 - \beta)^2}$ - -Cool, right? - -::::::::::::::::::::::::::::::::::::: keypoints - -- Use `.md` files for episodes when you want static content -- Use `.Rmd` files for episodes when you need to generate output -- Run `sandpaper::check_lesson()` to identify any issues with your lesson -- Run `sandpaper::build_lesson()` to preview your lesson locally - -:::::::::::::::::::::::::::::::::::::::::::::::: - diff --git a/episodes/dmp-resources.md b/episodes/dmp-resources.md new file mode 100644 index 0000000..aa68a68 --- /dev/null +++ b/episodes/dmp-resources.md @@ -0,0 +1,184 @@ +--- +title: 'DMP Resources' +teaching: 10 +exercises: 3 +--- + +::: questions +- Where can one find Funder requirements? +- Example DMPs? +- Appropriate data repositories and data standards? +::: + +::: objectives +1. find funder requirements for a DMP +2. Successfully search for example DMPs +3. Find FAIR data repositories appropriate for a patron's research project +4. Match a data type with appropriate data standards +::: + +## Introduction +As librarians, we use a variety of resources to answer researchers’ questions, such as library databases like ERIC or PsycInfo, or reference sources like Credo. When answering data management plan questions, you will use a new set of resources. In this lesson, we will introduce you to places to find data management planning information to answer common researcher questions. + +## Funder Requirements +Funders are increasingly including DMPs in their requirements for grant applications. Due to the 2022 Ensuring Free, Immediate, and Equitable Access to Federally Funded Research memo, colloquially known as the “[Nelson Memo](https://www.whitehouse.gov/wp-content/uploads/2022/08/08-2022-OSTP-Public-access-Memo.pdf)”, all federal granting agencies in the US are required to establish data sharing policies. When assisting a researcher writing a DMP for a grant application, the first step is to get a handle on the funder’s requirements for the plan. + +Here are some places to check for funder requirements: + +1. **The funding announcement.** Most grant programs create an announcement, which may be called by any of a number of acronyms such as a “CFP” (call for proposals) or “NOFO” (Notice of funding opportunity) to publicize their funding opportunity. After navigating to the funding announcement, you can scan through the associated links to look for information on their data management plan requirements. Below, you can see the information provided in a National Institutes of Health Notice of Funding Opportunity:\ +![](fig/Copy of NIH NOFO DMP info.PNG){alt="NIH NOFO DMP info.png"}\ +In case the funding announcement does not have the information you need, proceed to the other items on this list. + + +1. **Funder application instructions or website.** Large funders will have a website set up to help researchers through the application process. Looking through the documentation can help you understand their requirements for data management plans. This example from the NIH application instructions redirects you to sharing.nih.gov, their website specifically for data sharing: ![](fig/Copy of NIH application instructions.PNG){alt="NIH application instructions.PNG"} + +1. [The SPARC directory of data sharing requirements](https://datasharing.sparcopen.org/) **of federal agencies.** SPARC stands for the Scholarly Publishing and Academic Resources Coalition, and is a non-profit that supports open research and education systems. Through this website, you can view and compare data sharing policies from top funding organizations. + +1. **Google search.** Often googling *"[funder name]" "data sharing requirements" OR "data management plan" OR "dmp"* will direct you to the appropriate documentation. + +1. **Contact the funder.** If you are still not sure about what guidance to follow, consider reaching out to the research office contact in the funding announcement. + +## Example DMPs + +After locating the funder requirements, researchers may find it usefful to see an example DMP written by others as part of their application to the same program or funding agency. + +Here are some places to check when looking for example DMPs: + +1. **University libguides.** Many research university libraries have created libguides that provide researchers with guidance on writing DMPs. Some library resources include boilerplate templates for specific funding agencies, or successfully funded researcher DMPs. To find university libguides, you can either search through the LibGuides Community or search google for DMP AND [funding agency] AND [libguide](https://community.libguides.com/). + + +1. **DMPTool.** The [DMPTool](https://dmptool.org/) is a free online application that helps guide researchers through writing plans, and even allows librarians to provide feedback along the way. The website [maintains templates](https://dmptool.org/public_templates) based on funder and program requirements, in addition to hosting public DMPs created by researchers using the DMPTool. In the [public DMPs](https://dmptool.org/public_plans) list, you can narrow your search by funding agency to find relevant examples. We will explore the DMPTool more in Episode 4. + + +1. **Funder example DMPs.** Some funding agencies, [like the NIH](https://sharing.nih.gov/data-management-and-sharing-policy/planning-and-budgeting-for-data-management-and-sharing/writing-a-data-management-and-sharing-plan#sample-plans), have created example DMPs to provide researchers an idea of what they would like to see as part of their grant applications. + + +1. **Grant agencies DMP repositories.** Some granting agencies share data management plans submitted as part of grant applications through a public repository. An example of this is the Department of Transportation’s (DOT) [repository & open science access portal (rosap)](https://rosap.ntl.bts.gov/gsearch?collection=&terms=DMP+data+management+plan). + + +1. **Example DMS plans website.** The Working Group on NIH DMSP Guidance created the [Example DMS Plans website](https://example-dms-plans.github.io/examples/), which aggregated stable versions of publicly available DMPs. It was compiled ahead of the rollout of the NIH DMS Policy. + + +## FAIR Data Repositories + +In data management, we often speak of making data “FAIR.” [FAIR](https://www.go-fair.org/fair-principles/) is an acronym for Findable, Accessible, Interoperable and Reusable. Using these principles, we can evaluate the “openness” of a data set. Repositories should promote these characteristics in order to make data housed there user-friendly. + +According to the [NNLM’s data glossary](https://www.nnlm.gov/guides/data-glossary/repository), “a repository is a tool to share, preserve, and discover research outputs, including but not limited to data or datasets.” Generally, a data repository is a website that houses a collection of datasets, making them available to a broad(er) audience. Repositories manage data sharing infrastructure and provide a stable location for researchers to share their work. + +Why choose a data repository? For most researchers, a data repository is the best practice for data sharing. Using a personal or lab website or even uploading the dataset in journal article supplementary files is not advisable because the long term sustainability of the website is unknown: it can be updated or taken down without warning. In addition, simply adding “available upon request” to the data availability statement in a publication is not sufficient. [Studies have demonstrated](https://www.sciencedirect.com/science/article/abs/pii/S089543562200141X) a lack of author compliance when data is actually requested. + +::: callout +In addition to lack of sustainability, personal websites, article supplementary files and availability upon request are typically difficult to find. Data repositories mints DOIs, digital object identifiers, providing persistent access to data even if it is updated or moved, and will provide reasoning for removal if the dataset needs to be taken down. Data repositories also provide structures that link README files or other metadata schemas to the dataset, allowing for greater reusability in the future. +::: + +Through repositories, several options are available to researchers for data sharing: +- **Public access.** Public datasets are available to all without restriction. This is commonly used for animal studies or data without privacy concerns. +- **Controlled access.** In a controlled access repository, researchers must verify their identity before they are allowed to download and analyze data. This can take the form of verifying a university-associated email address, signing a data use agreement, or sending in an application before access is granted. Some repositories, such as Vivli, which specializes in clinical trial data, require that sensitive data be analyzed in a controlled cloud computing environment. Others, like ICPSR, may require that their restricted datasets be accessed on-site, using a computer not connected to the internet. +- **Embargoes.** Most repositories allow for datasets to be embargoed. Data sets may be embargoed for a number of reasons. For example, the researchers may not wish to publish their data until the accompanying article is available, or they may be pursuing a patent based on their discoveries. + +Here are the types of data repositories that researchers can use for sharing data: + +- **Specialist data repository.** Specialist data repositories accept scholarships from certain disciplines or on a specific topic. These include [ICPSR](https://www.icpsr.umich.edu/), the inter-university consortium for political and social research. +- **Generalist data repository.** Generalist data repositories accept any scholarship from any discipline. These include [Figshare](https://figshare.com/), [Zenodo](https://zenodo.org/), [Mendeley](https://data.mendeley.com/), and [OSF](https://osf.io/). Other generalist repositories include [Vivli](https://vivli.org/), which only accepts clinical data, and [Dryad](https://datadryad.org/), which primarily accepts data from the sciences. +- **Institutional data repository.** Some institutions host their own data repository to encourage their researchers to deposit data. These are typically limited to affiliates of the host institution, though you should check if your researcher’s collaborators are affiliated with an institution hosting a data repository. Other institutional repositories, like the [Harvard Dataverse](https://dataverse.harvard.edu/), allow any researcher to deposit their datasets regardless of affiliation. + +::: callout +Institutional repositories vary in their ability to accept and maintain data. Before commiting to using an institutional repository, check that they routinely accept data. +::: + +![](fig/Copy of Repository choice flow chart.png) + +Recommending a data repository for inclusion in a DMP can be challenging. Generally, it is best to recommend a specialist repository, followed by an institutional or generalist repository. Here are some tools to help you find the right data repository for your researcher: + +- **Repository indices.** Locating the right data repository for your researcher among the thousands in existence may be challenging, especially if you are not familiar with where others in the discipline are depositing their datasets. Luckily, there are a number of repository indices that aggregate data repositories and provide filters to facilitate pinpointing the one that works best for your researcher. [FAIRsharing](https://fairsharing.org/) and [re3data](https://www.re3data.org/) are good starting points when you are not aware if a specialist repository exists for a given discipline. +- **Funder recommendations.** Some funders like the [AHA](https://professional.heart.org/en/research-programs/awardee-policies/aha-approved-data-repositories) or the [NIH provide recommendations](https://sharing.nih.gov/data-management-and-sharing-policy/sharing-scientific-data/repositories-for-sharing-scientific-data) for where their funded projects should share their data after the active research phase has concluded– the [NNLM Data Repository Finder](https://www.nnlm.gov/finder) provides more guidance for finding an NIH-supported repository. +- **Publisher recommendations.** For researchers without funding or whose funders provided no guidance on which data repositories to use, some publishers like [Nature have requirements](https://www.nature.com/sdata/policies/repositories) where researchers should deposit their data before their articles are published. + +::: callout +Publishers requiring data deposit before article publication will also require a data availability statement, a description of where the dataset is publicly available, located at the bottom of the article. +::: + +## Metadata and data standards + +According to the [NNLM data glossary](https://www.nnlm.gov/guides/data-glossary/data-standards), a data standard is “a type of standard, which is an agreed upon approach to allow for consistent measurement, qualification or exchange of an object, process, or unit of information. [...] Data standards refer to methods of organizing, documenting, and formatting data in order to aid in data aggregation, sharing and reuse.” + +Data standards help to promote the FAIR principles. By using a data standard when creating and describing their data, researchers make their data easier to discover and reuse. For instance, if you use a standardized survey instrument when collecting data, your data can be easily compared and combined with the results of other researchers using the same instrument. + +Some of the most common data management questions data librarians receive revolve around standards. Many researchers, even if they support the principles of open science, are not trained to find and utilize data standards. Often, they are not thinking about their process from the point of view of data reuse. For librarians, data standards present an opportunity for us to educate researchers. + +There are many types of data standards, including: + +- **File type.** When curating a dataset to share, researchers should convert their data to an [open file format](https://opendatahandbook.org/guide/en/appendices/file-formats/. For instance, spreadsheets should be made available as a CSV rather than an excel document (XLSX). Using standardized open file types is a data standard. +- **Controlled vocabularies/ontologies.** A controlled vocabulary ensures data standardization by limiting the number of terms that can be used in a given field. Librarians often use controlled vocabularies when cataloging, for example [MESH](https://www.ncbi.nlm.nih.gov/mesh/) for medical subject terms, or the [Getty AAT](https://www.getty.edu/research/tools/vocabularies/aat/index.html) for art terms. Researchers can also use controlled vocabularies in their work to ensure interoperability across studies. +- **Minimum information.** Minimum information standards, such as the [MINSEQE](https://zenodo.org/record/5706412), specify the minimum amount of metadata and data required for different data types. This helps to facilitate reuse and prevent mystery datasets without documentation from coming into a repository. +- **Metadata schema.** A metadata schema defines the elements of metadata for an object and how those elements can be used to describe a specific resource. Many librarians are familiar with metadata schemas such as [MARC](https://www.loc.gov/marc/) or [Dublin Core](https://www.dublincore.org/), but there are also specialized metadata schemas for particular research fields. + +Finding appropriate data standards can be tricky for both librarians and researchers. The data standard landscape is still evolving, and the availability of data standards varies widely by field. There may be no widely accepted standard for a researcher’s project. If there is a lack of appropriate data standards, this information should be included in the DMP. + +One strategy when answering reference questions about data standards is to work backwards. If you or the researcher have already picked out a data repository, look in the documentation of that repository to find what data standards they are using. For example, the [NIMH National Data Archive](https://nda.nih.gov/) has a page describing their data standards. + +If working backwards isn't possible, here are some resources to find data standards: + +- Research data alliance [metadata standards catalog](https://rdamsc.bath.ac.uk/) +- [Fairsharing](https://fairsharing.org/) has a registry of standards +- [Bioportal Ontologies](https://bioportal.bioontology.org/) catalogs ontologies, with a biomedical focus + +## Quiz + +As part of an intervention study, a researcher will be conducting surveys of adolescents in the juvenile justice system who have exhibited suicidal behavior. In addition to surveys, there will be in-depth interviews with some of the research subjects. The results will eventually be published in a peer-reviewed journal. Accordingly, the datasets will be shared and preserved. The researcher has learned that their University has an institutional repository that has not typically collected scientific data. + + +::: challenge + +## Which of the data types mentioned would be applicable to the data management plan? + +a. The interviews (qualitative) +b. The survey data (quantitative) +c. Both + + + +::: solution +c. Both +::: + +## What aspects of managing and sharing this data should this researcher consider as they prepare their data management plan? + +a. The choice of repository. +b. Usage of standard questionnaires for the survey. +c. Facilitating the potential reuse of the data. +d. All of the above + +::: solution +d. All of the above +::: + +## Which repository is most appropriate for this researcher's dataset? + +a. [ICPSR](https://www.icpsr.umich.edu/web/pages/) +b. [Dryad](https://datadryad.org/stash) + +::: solution +ICPSR + +The example study contains sensitive data because it deals with a non-adult incarcerated sample exhibiting suicidal behavior. In order to be shared, it requires additional safeguards. ***Dryad*** does not have a controlled access feature and therefore is not an appropriate choice in this case. ***ICPSR*** aligns more with the discipline of this study, and requires researcher screening and secure access before data can be viewed. +::: + + +::: + +## Think-Pair-Share (Optional activity) + +A researcher who is planning to conduct a clinical trial for a new Multiple Sclerosis medication comes to you. They know that their funder will require them to submit a DMP and share their data. They need to find what data standards are common for clinical trials and decide which repository to deposit their data. + +1. Recommend a repository for this researcher. + a. Answers will vary, but one acceptable response is Vivli. To find which data repositories accept clinical trial data, use the NNLM data repository finder and check off “Clinical Trials” under question 4. +1. Recommend a data standard this researcher could consider using. + a. Answers will vary. We can see [Vivli’s guidance on data standards](https://vivli.org/wp-content/uploads/2023/01/NIH-DMSP-Template-and-Budget-Justification-Using-Vivli-v1.0.docx#:~:text=VIVLI%20Notes%3A%20Vivli%20does%20not,%2C%20csv)%20used%20for%20analysis.), where they recommend following [CDISC standards](https://www.cdisc.org/standards/therapeutic-areas). + +Discuss your answer and how you arrived at that conclusion with a partner + + + + + diff --git a/episodes/dmp.md b/episodes/dmp.md index e4c39e5..f8b152f 100644 --- a/episodes/dmp.md +++ b/episodes/dmp.md @@ -1,7 +1,7 @@ --- -title: 'DMP' +title: 'Data Management Plan (DMP) Overview' teaching: 10 -exercises: 2 +exercises: 5 editor_options: markdown: wrap: 72 @@ -20,87 +20,151 @@ editor_options: ## What is Data Management? -Data management is a broad term that encompasses collecting, storing, -sorting, organizing and sharing data. We all use some form of data -management in our lives: making lists, saving photos in a folder with -the name of the trip, or even scrapbooking. For researchers, data -management is important to make sense of the often vast amounts of data -they collect. Good data management makes it easy to find a specific -piece of information again, enhances reproducibility by making it clear -what data were used to support which conclusion, and increases security -by keeping track of sensitive information. - -This is a lesson created via The Carpentries Workbench. It is written in -[Pandoc-flavored Markdown][pandoc] for static files (with extension -`.md`) and [R Markdown][r-markdown] for dynamic files that can render -code into output (with extension `.Rmd`). Please refer to the -[Introduction to The Carpentries Workbench][carpentries-workbench] for -full documentation. - -What you need to know is that there are three sections required for a -valid Carpentries lesson template: - -1. `questions` are displayed at the beginning of the episode to prime - the learner for the content. -2. `objectives` are the learning objectives for an episode displayed - with the questions. -3. `keypoints` are displayed at the end of the episode to reinforce the - objectives. - -::: instructor -Inline instructor notes can help inform instructors of timing challenges -associated with the lessons. They appear in the "Instructor View" +Data management is a broad term that encompasses collecting, storing, sorting, organizing and sharing data. We all use some form of data management in our lives: making to-do lists, putting paperwork in labeled file folders, organizing photos by trip name. For researchers, data management is an essential part of the research process that enables them to make sense of the vast amounts of data they collect. Data management practices strive to make data FAIR - Findable, Accessible, Interoperable and Reusable. In other words, good data management makes it easy to find, understand, and reuse a specific piece of information again. It also enhances reproducibility by making it clear what data were used to support which conclusion, and increases security by keeping track of sensitive information. + +We often speak of the research data lifecycle when educating researchers about data management +

+![Source: https://zenodo.org/record/8076168](fig/RDM-lifecycle-v5.png){width="80%"} +

+ +The research data lifecycle represents the stages of data collection, use, and reuse. How data is managed is integral to each step in the cycle. For example, when a researcher collects data, they should use standardized file naming and develop documentation to facilitate its interpretation and analysis. In order to analyze and collaborate, the researcher must organize their data and make it comprehensible to outside parties. In this lesson, we will focus on the data management plan, which should be composed during the “Plan & Design” stage, though it can be updated throughout the lifecycle. + +The National Center for Data Services (part of the Network of the National Library for Medicine) defines Data Management Plans in their [Data Glossary](https://www.nnlm.gov/guides/data-glossary): + +::: callout +“A Data Management Plan (DMP or DMSP) details how data will be collected, processed, analyzed, described, preserved, and shared during the course of a research project. A data management plan that is associated with a research study must include comprehensive information about the data such as the types of data produced, the metadata standards used, the policies for access and sharing, and the plans for archiving and preserving data so that it is accessible over time. Data management plans ensure that data will be properly documented and available for use by other researchers in the future. + +Data management plans are often required by grant funding agencies, such as the National Science Foundation (NSF) or National Institute of Health (NIH), and are ~2-page documents submitted as part of a grant application process.” ::: -::: challenge -## Challenge 1: Can you do it? +A data management plan, sometimes also called a data management and sharing plan, is generally written by a researcher as part of the planning process before embarking on a project. Spending the time writing a DMP itself can clarify how to carry out data management tasks throughout the entire research data lifecycle. The process also creates a document that can be shared with lab staff or referenced as needed. DMPs are considered a living document and should be updated as circumstances inevitably change through the course of a research project. -What is the output of this command? +::::::: challenge -``` r -paste("This", "new", "lesson", "looks", "good") -``` +## Which of the following are uses of a DMP? *(select all that apply)* + +1. Component of a grant proposal to inform the agency of your data plans and funding needs. +2. Planning how the project will manage its data. +3. Creating documentation that can be referenced throughout the research process. +4. Listing citations the researcher plans to use in their published paper. + +:::: solution +1. Component of a grant proposal to inform the agency of your data plans and funding needs. +2. Planning how the project will manage its data. +3. Creating documentation that can be referenced throughout the research process. +:::: + +## How often should a DMP be updated? *(multiple choice, select one)* + +1. Never. Data management plans are created during the funding proposal and need to be followed accurately. +2. As needed. DMPs are living documents that are updated as circumstances change throughout the course of a research project. +3. Constantly. DMPs should be written as you work on the project, and therefore should be updated every week during the data collection and analysis phases. ::: solution -## Output +2. As needed. DMPs are living documents that are updated as circumstances change throughout the course of a research project. +::: +::: + +## What are the components of a DMP? +The components of a DMP may vary depending on the funding agency. Always check the funding announcement for specific instructions on how your plan should be structured within your proposal and the level of detail required. In general, most DMPs will address the following five elements (each section is followed by an example): + +### Data Description and Format +The [2013 OSTP Memo](https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf) defines data as “digital recorded factual material commonly accepted in the scientific community as necessary to validate research findings including data sets used to support scholarly publications, but does not include laboratory notebooks, preliminary analyses, drafts of scientific papers, plans for future research, peer review reports, communications with colleagues, or physical objects, such as laboratory specimens (OMB circular A-110)”. This section of a DMP provides a brief description of what data will be collected as part of the research project and their formats. Information about general files size (MB / GB per file) and estimated total number of files can be helpful. It is not necessary for researchers to describe their experimental process in this section. -``` output -[1] "This new lesson looks good" -``` +::: discussion +**From a project examining the link between religion and sexual violence.** This study will generate data primarily through (1) participant observations of support groups for those abused by clergy and (2) in-depth, semi-structured interviews with these individuals. Data will be collected via phone calls and video calls hosted on encrypted and passcode-protected conferencing platforms. Data will be collected in the form of audio recordings (MP3, collected on an external recording device free of any network connections), transcriptions of these recordings, physical notes taken during participant observation sessions, and any documents (e.g., email correspondences, scanned copies of letters or photographs) that respondents voluntarily choose to share with the researchers. All data in this study will be de-identified and associated with an anonymizing alpha-numeric code. The research team anticipates that most of these data will be preserved in DOCX, JPG, MP3, PDF, PNG, TXT, or XLSX format. [Source](https://dmptool.org/plans/48540/export.pdf?export%5Bquestion_headings%5D=true) (slightly modified) ::: -## Challenge 2: how do you nest solutions within challenge blocks? +### Metadata and Data Standards +Metadata is information that describes, explains, locates, classifies, contextualizes, or documents an information resource ([NNLM](https://www.nnlm.gov/guides/data-glossary/metadata)). Succinctly, metadata is data about data. A library catalog entry is an example of metadata. Metadata is compiled according to different standards: Dublin Core is an example of a general metadata schema. There are also specific metadata standards for different types of data: for example, MIAME and MINSEQE are commonly used for genomic data. We will discuss data standards further in Episode 2. This section provides information about what standards will be used, giving context to the data generated for easier interpretation and reuse. Often, discipline-specific data repositories will specify a particular metadata standard for their platform. -::: solution -You can add a line with at least three colons and a `solution` tag. +::: discussion +**From a project quantifying the ecological role of sea coral gardens at multiple spatial scales:** Field observation data will be stored in flat ASCII files, which can be read easily by different software packages. Field data will include date, time, latitude, longitude, cast number, and depth, as appropriate. Metadata will be prepared in accordance with BCO-DMO conventions (i.e. using the BCO-DMO metadata forms) and will include detailed descriptions of collection and analysis procedures. [Source](https://dmptool.org/plans/43691/export.pdf?export%5Bquestion_headings%5D=true) +::: + +### Preservation and Access Timeline +The data timeline includes information about when data will be backed-up, preserved, and published. Some agencies specify in their policies when the dataset must be shared, such as at the end of the reporting period (the active research phase). In addition to specifying their timelines, this section requires researchers consider what measures they need to take to ensure data security. Raw data may include identifiers such as PII or sensitive information such as location of endangered species that should be protected during collection and processing. Examples of good security practices include using access restrictions such as passwords, encryption, power supply backup, and virus and intruder protections. Active storage location and appropriate software will depend on data sensitivity level. In addition, before sharing any sensitive data with collaborators or depositing into a repository, the dataset should be de-identified or aggregated. + +::: callout +Although there are many data de-identification methods, it is beyond the scope of this lesson. For more information, please see the further reading resources in Episode 6. +::: + +::: discussion +**From a project screening for protein biomarkers in human samples**: Concomitant with publication of the results of the study, participant level data that have been stripped of demographic information will be published as supplementary data and/or made publicly available (with restricted access as laid out below) in the PI’s institutional data repository, which will mint a DOI and continue providing access for at least 10 years, or as long as the repository exists. Raw proteomics data and accompanying metadata will be made publicly available for at least 10 years. [Source](https://osf.io/euaty) +::: + +### Access and Reuse +Access refers to where the data will be made publicly available, and includes a justification why the repository chosen will help with dissemination, preservation, and reuse. In this section, researchers should also consider if their data will need to be embargoed, or if their dataset can only be published in a controlled access repository. Controlled access repositories require some form of verification before data can be accessed, and are commonly used for human subject research data. In addition, this section includes information on who can reuse the data, typically indicated by a license, and if reuse requires a [data use (DUA) or data sharing agreement (DSA)](https://dataverse.org/best-practices/sample-dua). In the event that data cannot be shared due to its sensitive nature, the researcher can use this section to specify why the dataset will not be published, including any ethical or legal issues around sharing data. + +::: discussion +**For a project developing a diagnostic test to acne-causing bacteria**: We will maintain a one-year embargo on data to organize and archive it, performing quality assurance checks prior to making in publically available on one of the data sharing sites. Some species detected may be listed as endangered (Atlantic sturgeon, right whales) and these locations may not be publicly listed until management agencies have been notified and can protect the species involved. Intellectual property rights will reside with the persons identifying the species detected in passive acoustic data using data classification algorithms or by listening to the sounds. Identified species will have sample sound recordings deposited in the Macaulay Library of Natural Sounds (http://macaulaylibrary.org/). Geographic data of tracks of the AWG will be made publically available as part of the ECU Coastal Atlas (http://www.ecu.edu/renci/Focus/NCCoastalAtlas.html). Acoustic tag Telemetry data detections will be archived in the Atlantic Telemetry Network (ACT http://www.theactnetwork.com/). Fluorometer data (Turner C3) are typically from a data stream taken at 1 to 2 Hz that is combined with CTD data as separate voltage channels into one file. Files will comprise a line of text for every second measurement, so if complexed with CTD data over long periods of time, they may reach GB size. We will submit our data to DataOne (https://www.dataone.org/). [Source](https://dmptool.org/plans/101113/export.pdf?export%5Bquestion_headings%5D=true) ::: + +### Oversight +This section includes information on who is responsible for data oversight, which includes deciding how often or when actions such as backup, converting files to open access versions, depositing the data into a repository, long term preservation, and data destruction will occur. This also includes education for other members of the project team on the DMP and how to follow it. Generally, the PI is ultimately responsible for ensuring proper data management, but researchers may name any lab staff who will be working on data management as part of the oversight team. + +::: discussion +**From a project examining the effects of placental dysfunction on brain growth in congenital heart disease**: Lead PI and the eight co-investigators from the three sites mentioned above who are directly engaged in the research will be responsible for day-to-day oversight of data management activities and data sharing. Lead PI will meet monthly with key study personnel to ensure the timeliness of data entry and review data to ensure the quality of data entry. Lead PI will ensure that the metadata are sufficient and appropriate and that the data management and sharing plan follows the FAIR data principles. Lead PI will report the DMS related activities as outlined in this DMS plan in RPPR and request approval for a revised plan if there is any deviation from the approved DMS plan. At the project conclusion, the final progress report will summarize how the DMS objectives were fulfilled and provide links to the shared dataset(s). [Source](https://dmptool.org/plans/93022/export.pdf?export%5Bquestion_headings%5D=true) ::: -## Figures -You can use pandoc markdown for static figures with the following -syntax: +### Budget +Although most data management plans do not have a dedicated section on costs, data management should be considered when budgeting for a project, especially when writing a grant application. Costs associated with data management may include: + - Staff time for data management: writing documentation, curating data, maintaining data integrity + - Software to process and manage data + - Data storage above what is normally provided by the university + - De-identification services + - Repository deposit and curation fees -`![optional caption that appears below the figure](figure url){alt='alt text for accessibility purposes'}` +## Quiz Questions: DMP Sections -![Blue Carpentries hex person logo with no -text.](https://raw.githubusercontent.com/carpentries/logo/master/Badge_Carpentries.svg) +::: challenge -## Math +## Which information is not found within a DMP? *(multiple choice, select one)* -One of our episodes contains $\LaTeX$ equations when describing how to -create dynamic reports with {knitr}, so we now use mathjax to describe -this: +1. File formats of the data +2. Software, tools and code used to create this data +3. What additional information (metadata) will be provided to allow for understanding the data, and what disciplinary standards it will follow +4. The name of the journals where articles using this data will be published +5. Schedules for backing up, publishing, and creating open format versions of files, including who is responsible +6. Where the data will be published after the granting period and any conditions for reuse -`$\alpha = \dfrac{1}{(1 - \beta)^2}$` becomes: -$\alpha = \dfrac{1}{(1 - \beta)^2}$ +::: solution +4. The name of the journals where articles using this data will be published +::: + +## What information should be found within the data type section? *(select all that apply)* -Cool, right? +1. The general purpose of the project +2. What data will be generated by the project +3. What is the expected file type of the data +4. How much data will be generated by the project +5. How much data storage will cost +6. Who in the research team will be responsible for collecting the data -::: keypoints -- Use `.md` files for episodes when you want static content -- Use `.Rmd` files for episodes when you need to generate output -- Run `sandpaper::check_lesson()` to identify any issues with your - lesson -- Run `sandpaper::build_lesson()` to preview your lesson locally +::: solution +2. What data will be generated by the project +3. What is the expected file type of the data +4. How much data will be generated by the project ::: + +## Match the sample text with the DMP component: +| | | +| :---------- | :------------- | +| 1. Data Description and format | **a.** We will use the Brain Imaging Data Structure (BIDS) for our data | +| 2. Metadata Standards | **b.** Data will be deposited in the Zenodo generalist data repository | +| 3. Preservation and Access Timeline | **c.** The PI will oversee implementation of this plan, with assistance from the lab data manager | +| 4. Access and Reuse | **d.** As required by the NIMH, we will upload our data in 6 months intervals from the beginning of the project. | +| 5. Oversight | **e.** We expect to collect 200 survey results, which will be stored in .csv format. | + + +::: solution +1-e, 2-a, 3-d, 4-b, 5-c +::: + +::: + + + + + diff --git a/episodes/fig/Copy of NIH NOFO DMP info.PNG b/episodes/fig/Copy of NIH NOFO DMP info.PNG new file mode 100644 index 0000000..e7065a3 Binary files /dev/null and b/episodes/fig/Copy of NIH NOFO DMP info.PNG differ diff --git a/episodes/fig/Copy of NIH application instructions.PNG b/episodes/fig/Copy of NIH application instructions.PNG new file mode 100644 index 0000000..17a6f0c Binary files /dev/null and b/episodes/fig/Copy of NIH application instructions.PNG differ diff --git a/episodes/fig/Copy of Repository choice flow chart.png b/episodes/fig/Copy of Repository choice flow chart.png new file mode 100644 index 0000000..d759656 Binary files /dev/null and b/episodes/fig/Copy of Repository choice flow chart.png differ diff --git a/episodes/fig/RDM-lifecycle-v5.png b/episodes/fig/RDM-lifecycle-v5.png new file mode 100644 index 0000000..693d3b5 Binary files /dev/null and b/episodes/fig/RDM-lifecycle-v5.png differ diff --git a/episodes/introduction.md b/episodes/introduction.md deleted file mode 100644 index 87aff83..0000000 --- a/episodes/introduction.md +++ /dev/null @@ -1,114 +0,0 @@ ---- -title: "Introduction" -teaching: 10 -exercises: 2 ---- - -:::::::::::::::::::::::::::::::::::::: questions - -- How do you write a lesson using Markdown and `{sandpaper}`? - -:::::::::::::::::::::::::::::::::::::::::::::::: - -::::::::::::::::::::::::::::::::::::: objectives - -- Explain how to use markdown with The Carpentries Workbench -- Demonstrate how to include pieces of code, figures, and nested challenge blocks - -:::::::::::::::::::::::::::::::::::::::::::::::: - -## Introduction - -This is a lesson created via The Carpentries Workbench. It is written in -[Pandoc-flavored Markdown](https://pandoc.org/MANUAL.txt) for static files and -[R Markdown][r-markdown] for dynamic files that can render code into output. -Please refer to the [Introduction to The Carpentries -Workbench](https://carpentries.github.io/sandpaper-docs/) for full documentation. - -What you need to know is that there are three sections required for a valid -Carpentries lesson: - - 1. `questions` are displayed at the beginning of the episode to prime the - learner for the content. - 2. `objectives` are the learning objectives for an episode displayed with - the questions. - 3. `keypoints` are displayed at the end of the episode to reinforce the - objectives. - -:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: instructor - -Inline instructor notes can help inform instructors of timing challenges -associated with the lessons. They appear in the "Instructor View" - -:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: - -::::::::::::::::::::::::::::::::::::: challenge - -## Challenge 1: Can you do it? - -What is the output of this command? - -```r -paste("This", "new", "lesson", "looks", "good") -``` - -:::::::::::::::::::::::: solution - -## Output - -```output -[1] "This new lesson looks good" -``` - -::::::::::::::::::::::::::::::::: - - -## Challenge 2: how do you nest solutions within challenge blocks? - -:::::::::::::::::::::::: solution - -You can add a line with at least three colons and a `solution` tag. - -::::::::::::::::::::::::::::::::: -:::::::::::::::::::::::::::::::::::::::::::::::: - -## Figures - -You can use standard markdown for static figures with the following syntax: - -`![optional caption that appears below the figure](figure url){alt='alt text for -accessibility purposes'}` - -![You belong in The Carpentries!](https://raw.githubusercontent.com/carpentries/logo/master/Badge_Carpentries.svg){alt='Blue Carpentries hex person logo with no text.'} - -::::::::::::::::::::::::::::::::::::: callout - -Callout sections can highlight information. - -They are sometimes used to emphasise particularly important points -but are also used in some lessons to present "asides": -content that is not central to the narrative of the lesson, -e.g. by providing the answer to a commonly-asked question. - -:::::::::::::::::::::::::::::::::::::::::::::::: - - -## Math - -One of our episodes contains $\LaTeX$ equations when describing how to create -dynamic reports with {knitr}, so we now use mathjax to describe this: - -`$\alpha = \dfrac{1}{(1 - \beta)^2}$` becomes: $\alpha = \dfrac{1}{(1 - \beta)^2}$ - -Cool, right? - -::::::::::::::::::::::::::::::::::::: keypoints - -- Use `.md` files for episodes when you want static content -- Use `.Rmd` files for episodes when you need to generate output -- Run `sandpaper::check_lesson()` to identify any issues with your lesson -- Run `sandpaper::build_lesson()` to preview your lesson locally - -:::::::::::::::::::::::::::::::::::::::::::::::: - -[r-markdown]: https://rmarkdown.rstudio.com/ diff --git a/episodes/supporting-researchers.md b/episodes/supporting-researchers.md index ffa11cf..ad1f148 100644 --- a/episodes/supporting-researchers.md +++ b/episodes/supporting-researchers.md @@ -1,101 +1,126 @@ --- -title: 'supporting-researchers' +title: 'Supporting Researchers' teaching: 10 exercises: 2 --- -:::::::::::::::::::::::::::::::::::::: questions +::: questions +- How does a data interview compare to a reference interview? +- What are common researcher questions about the DMP process? +::: -- How do you write a lesson using R Markdown and `{sandpaper}`? - -:::::::::::::::::::::::::::::::::::::::::::::::: - -::::::::::::::::::::::::::::::::::::: objectives - -- Explain how to use markdown with the new lesson template -- Demonstrate how to include pieces of code, figures, and nested challenge blocks - -:::::::::::::::::::::::::::::::::::::::::::::::: +::: objectives +- Identify the difference between a data interview and a reference interview +- Construct questions for a data interview +- Answer common researcher questions about the DMP process +::: ## Introduction +In this lesson, we pivot from learning about DMPs into how to apply this knowledge when serving patrons. We will provide insights into common questions and concerns researchers have about the DMP process, and describe strategies on how to effectively conduct a data interview. + +## Data Interview +According to the [NNLM’s data glossary](https://www.nnlm.gov/guides/data-glossary/data-interview), “a data interview in the library context refers to an interaction between a librarian and a researcher with a structured or semi-structured set of questions designed to elicit information about the researcher’s data practices and/or needs.” +This process is essentially a specialized subcategory of the reference interview, and is a good first step in helping a researcher prepare a DMP. + +Just like the reference interview begins with establishing a background purpose (“what is this information being used for”), you might want to begin broadly by asking researchers about their project and its purpose. This can help you to begin formulating follow-up questions that will extrapolate the researcher’s needs. + +Librarian: “Please tell me a little bit about your project and its purpose” +Researcher: “I am running a project about the impact of pets on the emotional well-being of children.” + +Even short responses can give you an idea of who/what is the subject of the research, how sensitive this data may be, and the potential formats of the data. Even with this short response, you have already found out this is a human subjects study, and that this researcher will need additional accommodations if they want to share their dataset. Like in a reference interview, it is useful to paraphrase the project back to the researcher and ask clarifying questions to make sure you have a good grasp of the research purpose. + +After establishing the purpose of the project, it is helpful to ask about where the researcher is applying for grant funding and their timeline for submitting materials. Researchers who are not applying for grant funding can still benefit from writing a data management plan, and these questions can help them consider their project needs and what workflows need to be put into place. + +Next, we move on to follow-up questions that relate to the DMP sections. Like in a reference interview, these questions move from open-ended to closed, specific questions to clarify needs not raised by the researcher. Use the purpose of the project to inform your questions, and help the researcher think through their workflow and needs. Sometimes the researcher will answer “I am not sure”. This is an opportunity to explore what they think they will do, and to provide some options as to how they may proceed. Remind the researcher that a DMP can (and should) be updated as necessary to better align with their procedures as the project evolves. + +### Follow-up questions on data description and size + +- What is your target sample size? +- How are you collecting data? + - Are you using structured questionnaires or interviews? + - Are you interacting with subjects directly or indirectly? + - Talking with the subject + - Talking to their guardians or a third party + - Recording observation + - Are you taking video, audio recordings, or images? + - What devices are you using? + - For videos/audio recordings, how long are the recordings? In what format? + - Are you collecting data any other way? + - Scans + - Measurements +- Is the collected data in a physical format (such as on paper) or in a digital format (through a computer or other electronic device)? +- How often are you collecting information for each subject throughout the study? + +### Follow-up questions on metadata and data standards + +- How are you documenting your variables? + - Are you using abbreviations that need defining? + - Does your data have units that need clarification? + - Are you using derived variables (variables obtained by combining or coding other variables)? +- Does your discipline have any requirements for how you should be describing your dataset? + - Are you using a set of words standard to your field (controlled vocabulary)? +- What minimum information would colleagues need to know to + - Recreate your research study? + - Recreate your analyses? + +### Follow-up questions on preservation and access +- Where are you storing the paper copies of the questionnaires? +- Where are you storing the audio recordings of your interviews? + - Are you planning on transcribing your interviews? +- What software are you using to + - code your data (Excel, Google sheets, SPSS etc)? + - analyze your data? +- Have you considered file naming conventions or file structures to help you find your files more easily? +- If using a proprietary software, are you planning on saving your files in an open format for sharing and long-term preservation? + +::: callout +Proprietary software is owned by an organization that requires a license or a fee to access. Typically, this software will generate files formats specific to it (such as Excel .xlsx), and it might be difficult to open or manipulate it using other software. Converting these data files into an open format, a version that is easily accessible by many pieces of software, makes data more FAIR (such as from .xlsx to .csv or .tsv). For a list of open access file formats, please see the resources in Episode 6. +::: + +### Follow-up questions on access and reuse +- Are you planning on sharing your data in the future? + - Do you have any obligations from your funder to share your data? + - Where are you planning on publishing your articles? Does the publisher have any data sharing requirements? +- If you are planning on sharing your data in the future, is data sharing explicitly addressed in the consent form? +- If you are planning on sharing data in the future, do you have a sense of where you want to deposit your research data when the time comes? + - Discuss repository options +- Do you need to de-identify or aggregate your data before you can share your data? + - Discuss embargoes, controlled vs open access + +### Follow-up questions on oversight +- Who is coding your data? How are you maintaining accuracy? + - Data checks? Double entry? Controlled entry? +- Who is responsible for backing up your data? How often? +- Who is responsible for preserving your data long term? +- Who is responsible for depositing your data? + +::: callout +Researchers may ask if they can list you as the librarian for helping them plan the data management activities specified in the DMP. Unless they are compensating you for your time and writing your name into the grant to manage the data on the project, remind them that this section is for listing who is carrying out these activities. Typically, the PI (primary investigator) is responsible for this activity, however lab managers or other staff may also be listed. +::: + +### Follow-up questions on budget + +- Where are you planning on storing your data during the active research phase? + - Do you need additional or specific types of platforms that the university does not provide? Do these have costs? +- Do you need to pay someone to manage your research data? +- Do you need to pay for data de-identification or curation? +- Do you need to pay for your dataset deposit? + +## Tips for talking with researchers +- Researchers have not been formally trained in data management and may not think about their project through this lens +- Researchers speak a different language - they may assign a different meaning to metadata or data standards +- Researchers are not accustomed to submitting data to a repository +- There are many reasons a researcher may be hesitant to share their data. This can include a lack of sharing culture within their disciple, fear of their research getting “scooped” (having your research idea or results published by someone else), or the additional labor associated with preparing their dataset after the active research phase. + + +## Assessments + +### Mock Data Interview + +Conduct a data interview with a classmate. The “researcher” will read the scenario below, but the “librarian” will not. The researcher can feel free to fill in any details needed to answer the questions from the librarian – these scenarios have been left intentionally domain agnostic. Then switch. + + +**Scenario 1:** You are a researcher writing a grant proposal to be submitted to the NIH. You have heard that a data management and sharing plan is required for NIH grant applications, but you don’t know any details.\ +**Scenario 2:** You are a researcher working to publish an article in a journal. You have just found out you need to make your data open by depositing it in a repository to satisfy journal requirements. You aren’t sure which repository to choose. -This is a lesson created via The Carpentries Workbench. It is written in -[Pandoc-flavored Markdown][pandoc] for static files (with extension `.md`) and -[R Markdown][r-markdown] for dynamic files that can render code into output -(with extension `.Rmd`). Please refer to the [Introduction to The Carpentries -Workbench][carpentries-workbench] for full documentation. - -What you need to know is that there are three sections required for a valid -Carpentries lesson template: - - 1. `questions` are displayed at the beginning of the episode to prime the - learner for the content. - 2. `objectives` are the learning objectives for an episode displayed with - the questions. - 3. `keypoints` are displayed at the end of the episode to reinforce the - objectives. - -:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: instructor - -Inline instructor notes can help inform instructors of timing challenges -associated with the lessons. They appear in the "Instructor View" - -:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: - -::::::::::::::::::::::::::::::::::::: challenge - -## Challenge 1: Can you do it? - -What is the output of this command? - -```r -paste("This", "new", "lesson", "looks", "good") -``` - -:::::::::::::::::::::::: solution - -## Output - -```output -[1] "This new lesson looks good" -``` - -::::::::::::::::::::::::::::::::: - - -## Challenge 2: how do you nest solutions within challenge blocks? - -:::::::::::::::::::::::: solution - -You can add a line with at least three colons and a `solution` tag. - -::::::::::::::::::::::::::::::::: -:::::::::::::::::::::::::::::::::::::::::::::::: - -## Figures - -You can use pandoc markdown for static figures with the following syntax: - -`![optional caption that appears below the figure](figure url){alt='alt text for -accessibility purposes'}` - -![You belong in The Carpentries!](https://raw.githubusercontent.com/carpentries/logo/master/Badge_Carpentries.svg){alt='Blue Carpentries hex person logo with no text.'} - -## Math - -One of our episodes contains $\LaTeX$ equations when describing how to create -dynamic reports with {knitr}, so we now use mathjax to describe this: - -`$\alpha = \dfrac{1}{(1 - \beta)^2}$` becomes: $\alpha = \dfrac{1}{(1 - \beta)^2}$ - -Cool, right? - -::::::::::::::::::::::::::::::::::::: keypoints - -- Use `.md` files for episodes when you want static content -- Use `.Rmd` files for episodes when you need to generate output -- Run `sandpaper::check_lesson()` to identify any issues with your lesson -- Run `sandpaper::build_lesson()` to preview your lesson locally - -:::::::::::::::::::::::::::::::::::::::::::::::: diff --git a/site/README.md b/site/README.md index 42997e3..0a00291 100644 --- a/site/README.md +++ b/site/README.md @@ -1,2 +1,2 @@ This directory contains rendered lesson materials. Please do not edit files -here. +here.