-
Notifications
You must be signed in to change notification settings - Fork 2
/
index.Rmd
41 lines (34 loc) · 2.86 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
---
title: "Library AI Trials"
description: "Building, testing, and questioning AI in libraries."
site: distill::distill_website
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
# Learn more about creating websites with Distill at:
# https://rstudio.github.io/distill/website.html
# Learn more about publishing to GitHub Pages at:
# https://rstudio.github.io/distill/publish_website.html#github-pages
```
<div class="section" id="current-active">
<h2>CURRENT - ACTIVE</h2>
<div class="section" id="spoc-species-occurrences">
<h3>SPOC (Species Occurrences)</h3>
<p><a class="reference external" href="https://sul-dlss-labs.github.io/spoc/">Project Book</a> - Data, models, and documentation of the project</p>
<p><a class="reference external" href="https://github.com/sul-dlss-labs/spoc">Github Repository</a> - Code repository and project communications</p>
<p>Observations of marine plants and animals are “hidden” in the text of undergraduate student research papers (Paper > TIFF > OCR > plain text). These historical observations are important historical markers for studies of biodiversity and the effects of climate change on species. The goal of this project is to extract species occurrences (genus-species, place, time) from student papers held in libraries along the western coast of the U.S., verify those observations, and contribute them to the <a class="reference external" href="https://www.gbif.org/">Global Biodiversity Information Facilty (GBIF)</a>.</p>
</div>
</div>
<div class="section" id="previous-projects">
<h2>PREVIOUS PROJECTS</h2>
<div class="section" id="electronic-theses-and-dissertations">
<h3>Electronic Theses and Dissertations</h3>
<p>This project was our first dive into how machine learning and natural language processing might help us automate the assignment of subject headings from the <a class="reference external" href="https://www.oclc.org/research/areas/data-science/fast.html">FAST (Faceted Application of Subject Terminology)</a> vocabulary to ETDs.</p>
<p><strong>Objective</strong>: To become familiar with the data, experiment with methods, and determine whether we can successfully automate or semi-automate FAST subject headings for ETDs.</p>
<p><strong>Outcomes:</strong></p>
<ul class="simple">
<li><p><a class="reference external" href="https://github.com/sul-dlss-labs/druid_2_text">druid_2_text</a> (to retrieve pdf from our repository and convert to txt for pre-processing)</p></li>
<li><p><a class="reference external" href="https://github.com/sul-dlss-labs/etd_structure_classifier">Document Structure Classifier</a> (to extract bibliographies)</p></li>
<li><p><a class="reference external" href="https://biology-fast-etds.herokuapp.com/">Biology ETDs FAST tagging app</a></p></li>
<li><p><a class="reference external" href="https://etd-abstract-similarity.herokuapp.com/">Abstract Similarity/clustering app</a></p></li>
</ul>