I am sharing my code repository to understand the basic working and principle of various BioInformatics code in Python; I am writing this github repository with 2 main intensions of mine first and foremost is to build myself a code base to look back onto how I learnt the Bioinformatics and understood how the skeleton of code works/interacts with my subject of interest i.e. Biology and second is to make you people understand how the BioPython a popular Biological library in Python which almost feel like a cheat sheet to solve complex algorithms in Biologial genetic data.
Hence, Let's dive into the journey with me where I start from basic to the complex concepts which is actually necessary to solve the real world calculations in Bioinformatics.
A very basic initial Structural concept explained which primarirly involves this concepts:
- Calculate pattern
- Frequecney map of DNA sequence
- Most frequent K-mers
- Complementary DNA string
- Pattern Matching
This all seems very intimaidating at first but trust me all of 'em have been explained in a very interesting manner, I urge you to take a look at it and you'll surely thank me later considering that.
A bit more complex and tough concepts are being touched upon which needed a lot more efforts to be understood are taken into account here, and I am giving my level best to make it most simpler interpretation of that concept to understand and trying to address each and every question which I had and possibly you could have during your learning journey.
The primary topics involved here looks like:
- Symbol Array in DNA Seq
- Symbol Array but faster (Extended
SymbolArray
) - Skew Array in DNA seq (2 methods explained)
- Minimum Skew in DNA seq
- Calculating Hamming Distance between DNA
- Approximate Pattern Matching in DNA
Now season 3 and the upcoming last season involves a bit of Umbrella type of Code where I will walk you through each and every type of functions separately and then finally after using that small function codes we will make something very meaningful out of it.
Therefore, I encourage you to just stick with me and just enjoy the miracle or magic that you are about to experience at the end during this journey.
The primary topics covered here are:
- Basic Numpy Array understanding
- Counting Nucleotide Frequencies in DNA
- Profile Matrix from DNA
- Consensus Sequence from DNA
- Scoring DNA Motifs
- Profile Most Probable K-mer
- Greedy Motif Search (the finale of all - the main standalone function which uses all the above function in the form of Greedy Algorithm)
- Finding Patterns with Mismatches (extras)
Now the Series finale is this particular season which leads to end of this particualar git repo and in turn ends your initial journey of understanding very major Bioinformatics algorithms as well.
Here primarirly we will cover 2 important topics
- GreedyMotifSearch with Pseudocounts
- Randomized Motif Search
But wait, there's more! Keep an eye out for other exciting repositories on Bioinformatics, Computational Drug Design, and Next-Gen Sequencing over on my GitHub.
That's all from my side for now. Until next time, this is your man Mohit @mhtjsh signing off. Cheers! ✌️