Sample code for analyzing VCF files in Azure Synapse (once converted to Parquet using Glow).
- Convert VCF files to Parquet: ConvertVCFsToParquet.md
- Create External Table to VCF-based Parquet Files in Azure Synapse: CreateVCFTable.md
- Sample SQL Queries: SampleQueries.md
The sample VCF data used in this demo is from the Phase 3 release of the 1000 Genomes Project. This includes ~168GB of data in VCFs, which can be downloaded from their FTP site.
- This repository accompanies the BlueGranite blog post: https://www.bluegranite.com/blog/query-millions-of-genomic-variants-in-minutes-with-azure-synapse
- Demo video on YouTube: https://www.youtube.com/watch?v=4B-8cviFPYU
- Building a Genomics Data Lake in Azure eBook: https://www.bluegranite.com/genomics-data-lake-ebook
- BlueGranite Genomics Page: https://www.bluegranite.com/genomics