Skip to content

Commit

Permalink
Merge pull request #28 from ismaelgutier/work_dog
Browse files Browse the repository at this point in the history
updating README and vignettes
  • Loading branch information
ismaelgutier authored Nov 22, 2024
2 parents 607c2e2 + f6515a3 commit a7f02d9
Show file tree
Hide file tree
Showing 14 changed files with 312 additions and 502 deletions.
6 changes: 3 additions & 3 deletions .Rhistory
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
ungroup() %>%
select(ID, task, item_ID, item, response = Response, RA, Attempt, accessed)
# For IGC_long_phon
IGC_long_phon <- IGC_long_phon %>%
mutate(Attempt = as.numeric(Attempt)) %>% # Convertir Attempt a numérico
group_by(task) %>% # Agrupar solo por task
Expand Down Expand Up @@ -510,3 +507,6 @@ dplyr::filter(!stringr::str_detect(task_type, "nonword")) %>%
dplyr::arrange(ID)
document()
install()
document()
install()
print(333333)
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: sunflower
Type: Package
Title: Managing Multiple Responses, Computing Formal Quality Measures, and Classifying Language Production Errors
Version: 0.16.11
Version: 0.22.11
Author: Gutiérrez-Cordero, I [aut][cre](<https://orcid.org/0000-0003-1508-4203>)
Authors@R: person(
given = "Ismael",
Expand Down
4 changes: 2 additions & 2 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ knitr::opts_chunk$set(

<!-- badges start -->

![](https://img.shields.io/badge/sunflower-v._0.16.11-orange?style=flat&link=https%3A%2F%2Fgithub.com%2Fismaelgutier%2Fsunflower) [![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0) ![](https://img.shields.io/badge/Language-grey?style=flat&logo=R&color=grey&link=https%3A%2F%2Fwww.r-project.org%2F)
![](https://img.shields.io/badge/sunflower-v._0.22.11-orange?style=flat&link=https%3A%2F%2Fgithub.com%2Fismaelgutier%2Fsunflower) [![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0) ![](https://img.shields.io/badge/Language-grey?style=flat&logo=R&color=grey&link=https%3A%2F%2Fwww.r-project.org%2F)

<!-- badges end -->

Expand Down Expand Up @@ -202,7 +202,7 @@ errors_classified %>%

*sunflower* allows for the classification of production errors once some indexes related to responses to a stimulus have been obtained and contextualized based on whether they come from repeated attempts or single productions. This process involves three steps.

First, a lexicality check of the response is performed using the `lexicality_check()` function, which involves determining whether the response is a real word. To do this, the package searches for the response in a database such as *BuscaPalabras* ([BPal](https://www.uv.es/~mperea/Davis_Perea_in_press.pdf)) and compares its frequency with the target word to determine if it is a real word based on whether it has a higher frequency or not when the parameter `criterion = "database"` is set. Alternatively, the response can be checked against a dictionary (*sunflower* searches for responses among entries from the *Real Academia Española*, [RAE](https://www.rae.es/)) when the parameter `criterion = "dictionary"` is used.
First, a lexicality check of the response is performed using the `check_lexicality()` function, which involves determining whether the response is a real word. To do this, the package searches for the response in a database such as *BuscaPalabras* ([BPal](https://www.uv.es/~mperea/Davis_Perea_in_press.pdf)) and compares its frequency with the target word to determine if it is a real word based on whether it has a higher frequency or not when the parameter `criterion = "database"` is set. Alternatively, the response can be checked against a dictionary (*sunflower* searches for responses among entries from the *Real Academia Española*, [RAE](https://www.rae.es/)) when the parameter `criterion = "dictionary"` is used.

Next, similarity measures between the targets and the responses are obtained using various algorithms within the `get_formal_similarity()` function. Finally, the cosine similarity between the two productions is computed if possible using the `get_semantic_similarity()` function, based on an NLP model. In our case, the parameter `model = m_w2v` refers to a binary file containing a Spanish Billion Words embeddings corpus created using the word2vec algorithm. This file is included in the zip file (for more information, see the markdown in the vignettes) located within the <a href="https://osf.io/mfcvb" style="color: purple;">dependency-bundle zip</a>, which can be found in our supplementary [OSF repository mirror](https://osf.io/akuxv/).

Expand Down
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

<!-- badges start -->

![](https://img.shields.io/badge/sunflower-v._0.16.11-orange?style=flat&link=https%3A%2F%2Fgithub.com%2Fismaelgutier%2Fsunflower)
![](https://img.shields.io/badge/sunflower-v._0.22.11-orange?style=flat&link=https%3A%2F%2Fgithub.com%2Fismaelgutier%2Fsunflower)
[![License: GPL
v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
![](https://img.shields.io/badge/Language-grey?style=flat&logo=R&color=grey&link=https%3A%2F%2Fwww.r-project.org%2F)
Expand Down Expand Up @@ -75,7 +75,7 @@ formal_metrics_computed = df_to_formal_metrics %>%
response_col = "response",
attempt_col = "attempt",
group_cols = c("ID", "item_ID"))
#> The function get_formal_similarity() took 2.52 seconds to be executed
#> The function get_formal_similarity() took 3.44 seconds to be executed
```

Display some of the results from the formal quality analysis.
Expand Down Expand Up @@ -137,9 +137,9 @@ errors_classified = df_to_classify %>%
get_semantic_similarity(item_col = "item", response_col = "response", model = m_w2v) %>%
classify_errors(response_col = "response", item_col = "item",
access_col = "accessed", RA_col = "RA", also_classify_RAs = T)
#> The function check_lexicality() took 0.49 seconds to be executed
#> The function check_lexicality() took 0.54 seconds to be executed
#> The function get_formal_similarity() took 0.68 seconds to be executed
#> The function get_semantic_similarity() took 0.75 seconds to be executed
#> The function get_semantic_similarity() took 0.73 seconds to be executed
#> The function classify_errors() took 0.80 seconds to be executed
```

Expand All @@ -165,7 +165,7 @@ contextualized based on whether they come from repeated attempts or
single productions. This process involves three steps.

First, a lexicality check of the response is performed using the
`lexicality_check()` function, which involves determining whether the
`check_lexicality()` function, which involves determining whether the
response is a real word. To do this, the package searches for the
response in a database such as *BuscaPalabras*
([BPal](https://www.uv.es/~mperea/Davis_Perea_in_press.pdf)) and
Expand Down
172 changes: 172 additions & 0 deletions vignettes/functioning_example.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
---
title: "Data Analysis Workflow using the Sunflower Package"
author: "Ismael Gutiérrez-Cordero"
date: "`r Sys.Date()`"
output:
html_document:
toc: true
toc_float: true
number_sections: true
theme: cerulean
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

In this vignette, we present a practical example of using the sunflower package to work with datasets that include a column of responses containing multiple answers. We demonstrate how to convert the dataset into a long format to obtain formal similarity metrics. Additionally, we illustrate how to perform error classification based on classical criteria found in the literature (e.g., [Dell et al., 1997](https://doi.org/10.1037/0033-295x.104.4.801); [Gold & Kertesz, 2001](https://doi.org/10.1006/brln.2000.2441); see also, [García-Orza et al., 2020](https://doi.org/10.1016/j.cortex.2020.03.020)).

# Environment Setup

```{r}
# Clear the workspace and unload all packages
#rm(list = ls())
#invisible(lapply(paste("package:", names(sessionInfo()$otherPkgs), sep = ""),
# detach, character.only = TRUE, unload = TRUE))
# Install and load `devtools` package
if (!requireNamespace("devtools", quietly = TRUE)) {
install.packages("devtools")
}
library(devtools)
# Install RTools on Windows (if applicable)
# Visit: https://cran.rstudio.com/bin/windows/Rtools/ for installation.
# Not needed on macOS or Linux (I am not an user, so I am guessing)
```

# Install and Load Required Packages

```{r}
# Install additional packages if needed
possible_dependencies <- c("tidyverse", "htmlTable", "knitr")
for (pkg in possible_dependencies) {
if (!requireNamespace(pkg, quietly = TRUE)) {
install.packages(pkg)
}
}
# Install the `xfun` package (if necessary)
if (!requireNamespace("xfun", quietly = TRUE)) {
install.packages("xfun", type = "source")
}
# Install the `sunflower` package from GitHub
if (!requireNamespace("sunflower", quietly = TRUE)) {
devtools::install_github("ismaelgutier/sunflower")
}
# Load required libraries
library(sunflower)
```

# Step 1
## Load and Wrangle Data

```{r}
# Load dataset
dataframe0 <- sunflower::IGC_sample
# Separate responses
dataframe1 <- dataframe0 %>%
sunflower::separate_responses(col_name = "response",
separate_with = ", ")
# Extract attempts and clean blank spaces
dataframe2 <- dataframe1 %>%
sunflower::get_attempts(first_production = attempt_1,
drop_blank_spaces = TRUE)
```

# Step 2
## Formal Similarity Analysis

```{r}
# Calculate formal similarity
dataframe3 <- dataframe2 %>%
sunflower::get_formal_similarity(item_col = "item",
response_col = "response",
attempt_col = "attempt",
group_cols = c("task_item_ID"))
```

# Step 2.1
## Positional Accuracy

```{r}
# Calculate positional accuracy
dataframe3.1 <- dataframe3 %>%
sunflower::positional_accuracy(item_col = "item",
response_col = "response",
match_col = "adj_strict_match_pos")
```

# Step 3
## Lexicality Check

```{r}
# Check lexicality
dataframe4 <- dataframe3 %>%
sunflower::check_lexicality(item_col = "item",
response_col = "response",
criterion = "dictionary")
```

## Semantic Similarity Analysis

```{r}
# Load a pre-trained word2vec model
model <- word2vec::read.word2vec(file = file.choose(), normalize = FALSE)
# Calculate semantic similarity
dataframe5 <- dataframe4 %>%
sunflower::get_semantic_similarity(item_col = "item",
response_col = "response",
model = model)
```

## Error Classification

### Classify Errors Considering Retrieval Attempts (RAs)

```{r}
dataframe6a <- dataframe5 %>%
dplyr::select(-correct) %>% # remove old correct_column (the user might also rename if want to keep it)
dplyr::mutate(accessed = ifelse(item == response, 1, 0)) %>%
sunflower::classify_errors(access_col = "accessed",
RA_col = "RA",
response_col = "response",
item_col = "item",
also_classify_RAs = TRUE,
cosine_limit_value = 0.46)
knitr::kable(dataframe6a) %>%
kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed")) %>%
kableExtra::scroll_box(width = "120%", height = "500px")
```

### Classify Errors Without Considering RAs

```{r}
dataframe6b <- dataframe5 %>%
dplyr::select(-correct) %>% # remove old correct_column (the user might also rename if want to keep it)
dplyr::mutate(accessed = ifelse(item == response, 1, 0)) %>%
sunflower::classify_errors_regular(access_col = "accessed",
response_col = "response",
item_col = "item",
cosine_limit_value = 0.46)
knitr::kable(dataframe6b) %>%
kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed")) %>%
kableExtra::scroll_box(width = "120%", height = "500px")
```

# Conclusion

This R Markdown document provides a complete workflow for analyzing data using the sunflower package, incorporating data wrangling, similarity metrics, and error classification.

# R Session Info.
```{r}
sessionInfo()
```
Binary file removed vignettes/sepex_presentation/f1_initial.png
Binary file not shown.
Binary file removed vignettes/sepex_presentation/f1_second.png
Binary file not shown.
Binary file removed vignettes/sepex_presentation/f2.png
Binary file not shown.
Binary file removed vignettes/sepex_presentation/f2b.png
Binary file not shown.
Binary file removed vignettes/sepex_presentation/f3.png
Binary file not shown.
Binary file removed vignettes/sepex_presentation/pa.png
Binary file not shown.
Loading

0 comments on commit a7f02d9

Please sign in to comment.