Wilcoxon Test Suite

Overview

The Wilcoxon Test Suite is a comprehensive set of R scripts designed for conducting non-parametric Wilcoxon tests across multiple datasets. This suite is particularly useful for comparing two groups when the assumption of normality is violated. It includes functionality to handle single and multiple CSV files, providing results and visualizations in a structured manner.

How to cite

@misc{MultiLabelWilcoxon2024,
  author = {Elaine Cecília Gatto},
  title = {MultiLabelWilcoxon: A package to performing wilcoxon between many methods},  
  year = {2024},
  note = {R package version 0.1.0. Licensed under CC BY-NC-SA 4.0},
  doi = {10.13140/RG.2.2.33711.96169},
  url = {https://github.com/cissagatto/MultiLabelWilcoxon}
}

Comparing Two Groups Using the Wilcoxon Test

When the assumption of normality is violated, the Wilcoxon test is an effective method for comparing two groups. As a non-parametric test, it does not rely on the data conforming to any specific parametric family of probability distributions. Non-parametric tests share the same goals as their parametric counterparts but offer two key advantages: they do not require the assumption of normal distribution, and they are more robust to outliers. However, non-parametric tests are typically less powerful than parametric tests when the normality assumption is met. As a result, when the data follow a normal distribution, non-parametric tests are less likely to reject the null hypothesis when it is false.

Independent Samples

Mann-Whitney-Wilcoxon Test: Also known as the Wilcoxon rank-sum test or the Mann-Whitney U test, this test is used for comparing two independent samples. It is the non-parametric equivalent of the Student’s t-test for independent samples.
Wilcoxon Signed-Rank Test: This test is applied when comparing two paired or dependent samples. It serves as the non-parametric equivalent of the Student’s t-test for paired samples.

Hypotheses of the Wilcoxon Test

The null and alternative hypotheses for the Wilcoxon test are as follows:

H0: The two methods are equal in terms of the variable of interest.
H1: The two methods differ in terms of the variable of interest.

Interpreting the Results

The p-value (displayed after "p =" in the plot subtitle) indicates whether we reject the null hypothesis. In this case, with a p-value of 0.17, we do not reject the null hypothesis, suggesting that [?] are not significantly different.

Note: A small p-value (typically less than 0.05) suggests that there is sufficient evidence to reject the null hypothesis.

Functionality

Main Features

Single File Analysis: Compute Wilcoxon tests for a single CSV file and save results to the specified directory.
Batch File Analysis: Perform Wilcoxon tests for multiple CSV files within a directory and generate results for each file.

CSV Example

Dataset	Model_1	Model_2	....	Model_N
Dataset1
Dataset2
........
DatasetD

Mandatory: Rename the first column of the dataframe to "Dataset". This is crucial because the code assumes the first column is named "Dataset" for proper functioning.

Code Structure

wilcoxon.R: Contains the core functions for performing the Wilcoxon test and plotting results.
utils.R: Includes utility functions used throughout the scripts.
libraries.R: Sources additional libraries required for the analysis.

How It Works

For a Single CSV File

Define the path to a single CSV file.
Read the CSV file into a data frame and rename the first column to "Dataset". If you don't rename, there will be a problem! This is
Perform the Wilcoxon test using the wilcoxon function.
Save the results in the specified results directory.

For Multiple CSV Files

Set the working directory to the folder containing the CSV files.
List and normalize paths for all CSV files.
Extract measure names from file paths.
Loop through each file, perform the Wilcoxon test, and save results.

Progress Tracking

A progress bar is included to provide real-time updates on the analysis status, especially useful for batch processing.

Folder Structure

Ensure the following folder structure is set up:

FolderRoot: Root directory of the project.
FolderData: Directory where CSV data files are stored.
FolderResults: Directory where results and plots are saved.

Documentation

For more detailed documentation on each function, check out the ~/MultiLabelWilcoxon/docsfolder

A complete example is available in ~/MultiLabelWilcoxon/example folder

Instalation

# install.packages("devtools")
library("devtools")
devtools::install_github("https://github.com/cissagatto/MultiLabelWilcoxon")
library(MultiLabelWilcoxon)

Example for one single csv file

library(MultiLabelWilcoxon)
FolderResults <- "path/to/your/results"
filename <- "path/to/your/accuracy.csv"
df <- data.frame(read.csv(filename))
names(df)[1] <- "Dataset"
res <- wilcoxon(data = df, measure.name = "accuracy", folder.path = FolderResults)

Example for many csv files

library(MultiLabelWilcoxon)
FolderResults <- "path/to/your/results"
setwd(FolderData)
files <- list.files(full.names = TRUE)
measure_names <- extract_measure_names(files)
for (i in seq_along(files)) {
  data <- data.frame(read.csv(files[i]))
  names(data)[1] <- "Dataset"
  res <- wilcoxon(data = data, measure.name = measure_names[i], folder.path = FolderResults)
}

📚 Contributing

We welcome contributions from the community! If you have suggestions, improvements, or bug fixes, please submit a pull request or open an issue in the GitHub repository.

Acknowledgment

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.
This study was financed in part by the Conselho Nacional de Desenvolvimento Científico e Tecnológico - Brasil (CNPQ) - Process number 200371/2022-3.
The authors also thank the Brazilian research agencies FAPESP financial support.

📧 Contact

For any questions or support, please contact:

Prof. Elaine Cecilia Gatto (elainececiliagatto@gmail.com)

Links

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
R		R
data		data
docs		docs
example		example
man		man
results/accuracy		results/accuracy
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
MultiLabelWilcoxon.Rproj		MultiLabelWilcoxon.Rproj
NAMESPACE		NAMESPACE
README.md		README.md
_pkgdown.yml		_pkgdown.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wilcoxon Test Suite

Overview

How to cite

Comparing Two Groups Using the Wilcoxon Test

Independent Samples

Hypotheses of the Wilcoxon Test

Interpreting the Results

Functionality

Main Features

CSV Example

Code Structure

How It Works

For a Single CSV File

For Multiple CSV Files

Progress Tracking

Folder Structure

Documentation

Instalation

Example for one single csv file

Example for many csv files

📚 Contributing

Acknowledgment

📧 Contact

Links

About

Releases

Packages

Languages

License

cissagatto/MultiLabelWilcoxon

Folders and files

Latest commit

History

Repository files navigation

Wilcoxon Test Suite

Overview

How to cite

Comparing Two Groups Using the Wilcoxon Test

Independent Samples

Hypotheses of the Wilcoxon Test

Interpreting the Results

Functionality

Main Features

CSV Example

Code Structure

How It Works

For a Single CSV File

For Multiple CSV Files

Progress Tracking

Folder Structure

Documentation

Instalation

Example for one single csv file

Example for many csv files

📚 Contributing

Acknowledgment

📧 Contact

Links

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages