Skip to content

Commit

Permalink
Seqfold (#295)
Browse files Browse the repository at this point in the history
* feat: add seqfold

Add seqfold package, Go port of the python project at:
https://github.com/Lattice-Automation/seqfold

Add general useful types and maps of energies used to calculate the
folding structures.

Add to checks/checks.go a couple of function to determine if a sequence
is DNA or RNA. This is need by `seqfold` to discriminate for example the
energies used in the folding.

* feat: add seqfold functions

Add the functions to calculate the folding energies, the `Fold` function
is the core function to calculate the folding structures and energies of
a nucleotide sequence. The result can be use to print more informations,
e.g. dot-bracket notation of the folded sequence.

* feat: add seqfolg tests and example

Add the unit tests ported from the python codebase and a small example.

* renamed seqfold to fold.

* renamed FoldEnergy to MinimumFreeEnergy.

* renamed s to structure in most of fold.go.

* renamed DNA and RNA energy constants.

* renamed fc to foldContext

* renamed H and S in energies to EnthalpyH and EntropyS.

* changed NN to nearest neighbors.

* renamed BP energy to Matching Basepair energy and s to structure.

* more variable renaming.

* renamed V and W.

* renamed V and W and fixed errors...

* fixed all underscored names in fold_test.go and moved to fold package.

* fixed accidentally modified rebase test data.

* made most function private.

* more public to private functions and structs

* random: add random.RNASequence

* transform: add RNA specific transformations

* fold: use the transform package for complement

And some cleanup

* added minimal biological context and package level comments.

* fold: add fold.Result, un-export most of lib

Add `fold.Resutl` to hold the result of fold.Fold and move there as
methods `MinimumFreeEnergy()` and `DotBracket()`.
In this way we can keep the API surface small without compromising the
package functionality, more methods can be added to `fold.Result` in
case we need them, e.g. `result.JSON()` to obtain a representation of the
resulting structure in JSON or other formats.

* fold: move 1600 magic number in a `const`

Add some explanation as well.

* fold: move the 30 magic number to a const

Add a small explanation as well.

* fold: rename variables

Rename some variables in `unpairedMinimumFreeEnergyW()`

* fold: move magic number 4 in a const

It seem to be used as a minimum length for which stable nucleotide
structures can be made.

* fold: more variable renaming and cleanup

* fold: more var and const rename and refactor

* fold: more renaming

* fold: more magic numbers lifted in constant

* changed fold.Fold to fold.Zuker.

* changed ExampleFold to ExampleZuker.

---------

Co-authored-by: Timothy Stiles <tim@stiles.io>
Co-authored-by: Tim <TimothyStiles@users.noreply.github.com>
  • Loading branch information
3 people authored Jul 22, 2023
1 parent c03951e commit 4b25a31
Show file tree
Hide file tree
Showing 15 changed files with 3,140 additions and 18 deletions.
24 changes: 24 additions & 0 deletions checks/checks.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,27 @@ func GcContent(sequence string) float64 {
GuanineAndCytosinePercentage := float64(GuanineCount+CytosineCount) / float64(len(sequence))
return GuanineAndCytosinePercentage
}

func IsDNA(seq string) bool {
for _, base := range seq {
switch base {
case 'A', 'C', 'T', 'G':
continue
default:
return false
}
}
return true
}

func IsRNA(seq string) bool {
for _, base := range seq {
switch base {
case 'A', 'C', 'U', 'G':
continue
default:
return false
}
}
return true
}
62 changes: 62 additions & 0 deletions checks/checks_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,65 @@ func TestGcContent(t *testing.T) {
t.Errorf("GcContent did not properly calculate GC content")
}
}

func TestIsDNA(t *testing.T) {
tests := []struct {
name string
args string
want bool
}{
{
name: "Success",
args: "GATTACA",
want: true,
},
{
name: "FailRNA",
args: "GAUUACA",
want: false,
},
{
name: "FailUnknown",
args: "RANDOM STRING",
want: false,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
if got := checks.IsDNA(tt.args); got != tt.want {
t.Errorf("IsDNA() = %v, want %v", got, tt.want)
}
})
}
}

func TestIsRNA(t *testing.T) {
tests := []struct {
name string
args string
want bool
}{
{
name: "Success",
args: "GAUUACA",
want: true,
},
{
name: "FailDNA",
args: "GATTACA",
want: false,
},
{
name: "FailUnknown",
args: "RANDOM STRING",
want: false,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
if got := checks.IsRNA(tt.args); got != tt.want {
t.Errorf("IsRNA() = %v, want %v", got, tt.want)
}
})
}
}
Loading

0 comments on commit 4b25a31

Please sign in to comment.