Skip to content

Commit

Permalink
add option to limit candidate regulators to subset
Browse files Browse the repository at this point in the history
  • Loading branch information
tmichoel committed Jun 8, 2024
1 parent 3ed9b8f commit 111cde8
Show file tree
Hide file tree
Showing 3 changed files with 27 additions and 12 deletions.
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "BioFindr"
uuid = "77580646-997d-4218-a3cc-42097ecd1c68"
authors = ["tmichoel <11647967+tmichoel@users.noreply.github.com> and contributors"]
version = "1.0.3"
version = "1.0.4"

[deps]
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Expand Down
20 changes: 13 additions & 7 deletions src/findr.jl
Original file line number Diff line number Diff line change
Expand Up @@ -199,10 +199,12 @@ The input dataframes are:
- `dX` - DataFrame with expression data, columns are genes
- `dG` - DataFrame with genotype data, columns are variants (SNPs)
- `dE` - DataFrame with eQTL results, must contain columns with gene and SNP IDs that can be mapped to column names in `dX` and `dG`, respectively
The numeric mapping between column indices in `Matrix(dG)` and `Matrix(dX)` is obtained from these inputs using the [`getpairs`](@ref) function and the optional parameters:
- `colG` - name or number of variant ID column in `dE`, default 1
- `colX` - name or number of gene ID column in `dE`, default 2
The numeric mapping between column indices in `Matrix(dG)` and `Matrix(dX)` is obtained from these inputs using the [`getpairs`](@ref) function.
- `namesX` - names of a possible subset of columns in `dX` to be considered as potential causal regulators (default `names(dX)`)
The optional parameter `method` determines the LLR mixture distribution fitting method and can be either `moments` (default) for the method of moments, or `kde` for kernel-based density estimation.
Expand All @@ -212,12 +214,12 @@ The optional parameter `sorted` determines if the output must be sorted by incre
See also [`findr(::Matrix,::Array,::Matrix)`](@ref), [`getpairs`](@ref), [`combineprobs`](@ref), [`stackprobs`](@ref), [`globalfdr!`](@ref).
"""
function findr(dX::T, dG::T, dE::T; colG=1, colX=2, method="moments", combination="IV", FDR=1.0, sorted=true) where T<:AbstractDataFrame
function findr(dX::T, dG::T, dE::T; colG=1, colX=2, namesX=[], method="moments", combination="IV", FDR=1.0, sorted=true) where T<:AbstractDataFrame
if combination == "none"
error("Returning posterior probabilities for individual tests is not supported with DataFrame inputs. Set combination argument to one of \"IV\", \"mediation\", or \"orig\", or use matrix inputs.")
elseif combination in Set(["IV","mediation","orig"])
# Create the array with SNP-Gene pairs
pairGX = getpairs(dX, dG, dE; colG = colG, colX = colX)
pairGX = getpairs(dX, dG, dE; colG = colG, colX = colX, namesX = namesX)
# Call BioFindr on numeric data
PP = findr(Matrix(dX), Matrix(dG), pairGX; method = method, combination = combination)
dP = stackprobs(PP, names(dX)[pairGX[:,2]], names(dX))
Expand Down Expand Up @@ -278,7 +280,11 @@ end
Wrapper for `findr(Matrix(dX1), Matrix(dX2), Matrix(dG), pairGX2)` when the inputs `dX1`, `dX2`, and `dG` are in the form of a DataFrame. The output is then also wrapped in a DataFrame with `Source`, `Target`, (Posterior) `Probability`, and `qvalue` columns. When DataFrames are used, only combined posterior probabilities can be returned (`combination="IV"` (default), `"mediation"`, or `"orig"`).
The numeric mapping between column indices in `Matrix(dG)` and `Matrix(dX2)` is obtained from the DataFrame inputs using the [`getpairs`](@ref) function.
The numeric mapping between column indices in `Matrix(dG)` and `Matrix(dX2)` is obtained from these inputs using the [`getpairs`](@ref) function and the optional parameters:
- `colG` - name or number of variant ID column in `dE`, default 1
- `colX` - name or number of gene ID column in `dE`, default 2
- `namesX` - names of a possible subset of columns in `dX` to be considered as potential causal regulators (default `names(dX)`)
The optional parameter `method` determines the LLR mixture distribution fitting method and can be either `moments` (default) for the method of moments, or `kde` for kernel-based density estimation.
Expand All @@ -288,12 +294,12 @@ The optional parameter `sorted` determines if the output must be sorted by incre
See also [`findr(::Matrix,::Array,::Array,::Matrix)`](@ref), [`combineprobs`](@ref), [`stackprobs`](@ref), [`globalfdr!`](@ref).
"""
function findr(dX1::T, dX2::T, dG::T, dE::T; colG=1, colX=2, method="moments", combination="IV", FDR=1.0, sorted=true) where T<:AbstractDataFrame
function findr(dX1::T, dX2::T, dG::T, dE::T; colG=1, colX=2, namesX=[], method="moments", combination="IV", FDR=1.0, sorted=true) where T<:AbstractDataFrame
if combination == "none"
error("Returning posterior probabilities for individual tests is not supported with DataFrame inputs. Set combination argument to one of \"IV\", \"mediation\", or \"orig\", or use matrix inputs.")
elseif combination in Set(["IV","mediation","orig"])
# Create the array with SNP-Gene pairs
pairGX = getpairs(dX2, dG, dE; colG = colG, colX = colX)
pairGX = getpairs(dX2, dG, dE; colG = colG, colX = colX, namesX = namesX)
# Call BioFindr on numeric data
PP = findr(Matrix(dX1), Matrix(dX2), Matrix(dG), pairGX; method = method, combination = combination)
dP = stackprobs(PP, names(dX2)[pairGX[:,2]], names(dX1))
Expand Down
17 changes: 13 additions & 4 deletions src/utils.jl
Original file line number Diff line number Diff line change
@@ -1,17 +1,26 @@
"""
getpairs(dX::T, dG::T, dE::T; colG=1, colX=2)
Get pairs of indices of matching columns from dataframes `dX` and `dG`, with column names that should be matched listed in dataframe `dE`. The optional parameters `colG` (default value 1) and `colX` (default value 2) indicate which columns of `dE` need to be used for matching, either as a column number (integer) or column name (string).
Get pairs of indices of matching columns from dataframes `dX` and `dG`, with column names that should be matched listed in dataframe `dE`. The optional parameters `colG` (default value 1) and `colX` (default value 2) indicate which columns of `dE` need to be used for matching, either as a column number (integer) or column name (string). The optional parameter `namesX` can be used to match rows in `dE` to only a subset of the column names of `dX`.
"""
function getpairs(dX::T, dG::T, dE::T; colG=1, colX=2) where T<:AbstractDataFrame
function getpairs(dX::T, dG::T, dE::T; colG=1, colX=2, namesX=[]) where T<:AbstractDataFrame
# if namesX is empty, use all column names of dX
if isempty(namesX)
namesX = names(dX)
end

# Extract dX ID column from dE
idX = select(dE, colX)[:,1]

# Extract dG ID column from dE
idG = select(dE, colG)[:,1]

# Keep only rows in idX and idG where idX is in namesX
row_select = findall(x -> x in namesX, idX)
idX = idX[row_select]
idG = idG[row_select]

# Create the array with idG-idX pairs
pairsGX = zeros(Int64,nrow(dE),2);
pairsGX = zeros(Int64,length(idX),2);
for rowE = axes(pairsGX,1)
pairsGX[rowE,1] = findfirst(idG[rowE] .== names(dG))
pairsGX[rowE,2] = findfirst(idX[rowE] .== names(dX))
Expand Down

2 comments on commit 111cde8

@tmichoel
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator register

Release notes:

  • add option to perform causal inference for a subset of regulators without having to modify the input data

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/108555

Tagging

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v1.0.4 -m "<description of version>" 111cde8c20d64d72d9a079134d15a3368390fde5
git push origin v1.0.4

Please sign in to comment.