Replies: 2 comments 1 reply
-
With the correct data structure for describing a generic GWAS output, this use case becomes a simple search problem. The standard data structure could be searched across all species specific databases to look for results matching those the user specifies. The data structure needs to have all the metadata described here identifying the different types of GWAS experiments that could occur, so the user could narrow down the parameters of their search. The tool might need to capability to make data type conversions from different types of GWAS output so they could be compared (if possible/reasonable). A cached index of the GWAS metadata could be built to make searches faster. Once all the relevent data is found and retrieved from all the databases, this tool can allow the user to play with it, filter it in finer granularity, produce visualizations, save it as evidence for future claims, etc. With the right data standard, the data search and retrieve part of the tool is the simplest part compared to all the other functions the tool could do once it has the data. |
Beta Was this translation helpful? Give feedback.
-
@BrapiCoordinatorSelby one aspect of the use case that I think makes it a little less simple is the part regarding comparison across species. I think you are more or less correct that defining a generic GWAS output (essentially variant positions with a measure of statistical significance for their association with traits) would make a federated search feasible across multiple data sources. I think the more challenging part comes when trying to decide whether "hits" observed in different species have any meaningful correspondence to one another. Synteny is probably the main criterion most people would use for deciding whether regions correspond, while the use of broadly applicable trait ontologies would hopefully help address challenges of deciding whether a phenotype being observed in different species is "homologous" (e.g. pod shattering in legumes vs grain shattering in cereals). We developed an application that tries to handle some of these aspects, described a bit more here: https://www.legumeinfo.org/blog/2022/02/17/zzbrowse.html In principle, if the generic search you described were implemented by multiple sites with a consistent API I think it would be possible to adapt applications such as the one linked above to obtain data in a federated manner (the dream- though probably worth mentioning that it is already making use of microservices defined by another application that takes an explicitly federated approach to generating synteny across sites on demand). |
Beta Was this translation helpful? Give feedback.
-
A user finds a region from a GWAS experiment with an apparent association to a trait of interest. They want to strengthen confidence in the result by looking for similar associations in other species and, if possible, narrow a list of candidate genes. This requires both genomic and genetic data; underlying phenotypic data are complex and may be inaccessible. Some GWAS experiments only report associations above a certain significance threshold or a fixed number of variants representing highest ranking associations, while others report putatively associated variants without significance levels. Variants may be based on different genotyping technologies and within-species results may be based on different references/versions to need to be `lifted'. Haplotypes and structural variation (pangenomics) may be important but not assayed directly. Can end users make choices about how to aggregate the data within a tool?
Beta Was this translation helpful? Give feedback.
All reactions