Skip to content

A highly efficient, powerful, and feature-rich algorithm for analyzing DNA sequences

License

Notifications You must be signed in to change notification settings

LimesKey/DNAnalyzer

 
 

Repository files navigation

UPDATE: Development on DNAnalyzer was on temporary hiatus due to a busy schedule. However, we are back and ready to continue development on this project. We are currently working on the following features:
  • Partnering with genomics and computer science researchers from the University of Victoria and the University of Washington to develop a new algorithm for analyzing DNA sequences using machine learning.
  • Updating the software to support the latest version of Java and Python.
  • Creating a GUI (local + web) for the software that allows users to upload their DNA data and analyze it using the new algorithm.
  • Implementing a new feature that allows users to upload their DNA data from 23andMe, AncestryDNA, and other DNA testing services.
  • New website design.

If you would like to contribute to the project, please feel free to open a PR. New feature issues will be created by June 30th at the latest.

DNAnalyzer-modified

Copyright Releases Repository Size Hits Counter DeepSource

DNAnalyzer

Revolutionizing DNA analysis and making it accessible to all through innovative AI-powered analysis and interpretive tools.

DNAnalyzer is a fiscally sponsored 501(c)(3) nonprofit organization (EIN: 81-2908499) dedicated to revolutionizing the field of DNA analysis. We aim to democratize access to DNA analysis tools for a deeper understanding of human health and disease and pushing the boundaries of what is possible in the field of genetics research to make a significant impact in the industry. It was created by Piyush Acharya and is currently led by him and @LimesKey.

Summary

DNAnalyzer is your gateway to deciphering the secrets of DNA. Our innovative AI-powered analysis and interpretive tools empower geneticists, physicians, and researchers to gain deep insights into DNA sequences, revolutionizing how we understand human health and disease.

Open in GitHub Codespaces

Table of Contents

Background

The human genome is composed of over 3 billion base pairs, making human analysis nearly impossible. Consequently, using powerful computational and statistical methods to decode the functional information hidden in DNA sequences are necessary. The genome is also extremely intricate and contains a plethora of data, which need to be organized and converted into analyzable data appropriately. Current analytical tools and software make it arduous for both geneticists and physicians to do so, thus restricting them from acquiring crucial information to better understand humans. [1]

Features

  • Start and Stop Codons
    • Indicate the start and stop of a protein. There are 20 different amino acids. A protein consists of one or more chains of amino acids (called polypeptides) whose sequence is encoded in a gene. [2]
  • High Coverage Regions
    • Promoter sequences in the genome that code for proteins have a relatively high proportion of guanine and cytosine nucleotides to the 4 nucleotide bases (45-60% GC-content). Such CpG islands are likely to reveal important information about the genome. [3]
  • Neurodevelopmental Disorders
    • A group of disorders, usually characterized by longer genes, that affect the development of the brain and nervous system. These disorders are caused by genetic mutations that affect the development of the brain and nervous system. These disorders include autism, attention deficit hyperactivity disorder (ADHD), and schizophrenia. [4]
  • Core Promoter Elements
    • Promoter sequences are short DNA sequences that are located upstream of a gene and are responsible for initiating transcription (e.g. BRE, TATA, INR, and DPE). [5]
  • FASTA File Support
    • Supports multi-line and single-line FASTA database files. Files can either be uploaded or linked to from the web. [7]
  • Command-Line Interface (CLI)
    • The Methionine command-line interface (abbreviated as Met CLI) is a unified tool for running DNAnalyzer services from the command-line. The CLI is a powerful tool for using DNAnalyzer services and scripting a sequence of commands to execute. You can currently access all the core features present in DNAnalyzer without having to log in, although account support will be implemented soon. To get more information on Met CLI installation and currently supported commands, refer to Met CLI GitHub repository.
  • Web UI Coming Soon

Quick Introduction to DNA

DNA

DNA, present in most cells of the body, holds the blueprint for creating over 200 distinct cell types. Like a programming language, it is exclusive to living organisms. With the aid of ML, we can decode and comprehend DNA, leading to life-saving discoveries and valuable insights.

Databases

A DNA database is crucial for interpreting DNA sequences. By leveraging machine learning, predictions can be made on previously unseen DNA sequences. This is the foundation on which modern DNA analysis programs operate.

Getting Started

Please refer to the Getting Started document for more information on how to use DNAnalyzer.

Future Support and Improvements

Optimized SQL Database for Genomic Data

Our goal is to find the best SQL database fork that can handle high performance and vertical scaling. We will store and query genomic data from thousands of species, including their genes and mutations. This will help us train our machine learning model more effectively.

Improved Neural Network for Genotyped Data

This will bring the ability to use genotyped data from 3rd-party DNA testing services with our algorithm. In the future, to use this program, all you will need is a simple $99 DNA Test to be able to experience all the features of DNAnalyzer.

DIAMOND Implementation, a BLAST fork

This will combine DIAMOND's performance advantage along with BLAST's algorithm.

Citations

View our in-line citations in the Citations document.

Contributing

Terms of Use

Your complete responsibility lies in the utilization of this application, encompassing all actions and consequences that arise. While the DNAnalyzer Team is dedicated to addressing significant issues that may arise, whether reported by users or as new research unfolds, they cannot be held accountable for any losses users may experience due to the application's use, irrespective of circumstances. For further inquiries, please reach out to the following email address: help@dnanalyzer.org.

If you use this software in your research, we request that you provide the appropriate citation.

Copyright © Piyush Acharya 2024. DNAnalyzer is a fiscally sponsored 501(c)(3) nonprofit (EIN: 81-2908499). Licensed under the MIT License.

About

A highly efficient, powerful, and feature-rich algorithm for analyzing DNA sequences

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 76.6%
  • HTML 9.0%
  • CSS 7.9%
  • PowerShell 5.8%
  • Other 0.7%