-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weird output #20
Comments
Hi Jonas Meisner, Thanks for reaching out! The results do look strange indeed. Could we arrange a way to get the actual data you are using, and maybe the results that original ADMIXTURE is providing? Please, send me an email at dmasmont@stanford.edu Regards, |
Yes! The data is the newest version of 1KGP: But that also includes the related individuals, so I have attached the names of the original 2504 individuals, the 500K sampled sites as well as a run from ADMIXTURE with K=5. Best, |
Hey, Thanks! |
Hi @Rosemeis and @haneenih7, Thanks to both for your interest and testing the software! @Rosemeis, thanks for pointing the issue out and sending us some results with data! Having checked it out, it looks like the initialization of the Q values by the network were very different, and sometimes too off to be recoverable by the network. We didn't see this at all while developing the method, so we suspect it might be some change introduced in a dependency. We have also seen that the initialization Nevertheless, in order to stabilize the results, we have added a "Warmup training" for initialization, where we supervisedly train the encoder to estimate Q values using as labels a function of the distance to the initial values of the P matrix in PCA space. This way, we not only have a sensible initialization for the P matrix, but also for the encoder which computes Q. In practice, there's no change as to how the algorithm is called from the CLI! Convergence checks are also performed starting at epoch 15 to avoid early stopping too early. This is how results look like for the data you provided using the default parameters you were using (for K=5): Let me know in case of any followup, happy to discuss the issue a bit more if necessary! To install the upgrades, simply run @haneenih7, regarding bootstrapping, there currently no such option for Neural ADMIXTURE. I will open a new issue for the feature and we will try to publish it in the next release, along with cross-validation! |
Hi,
I have tested Neural ADMIXTURE on the 1000 Genomes Project data but I'm seeing some weird results. It is a simple dataset of the 2504 phase 3 individuals with 500,000 SNPs randomly sampled (MAF > 0.05) across all chromosomes. I see no issues when using standard ADMIXTURE or SCOPE as an example. The outputted PCA plot in Neural ADMIXTURE also looks fine. I have uploaded the admixture plots for K = 5, 6, 7 each with 10 runs using different seeds. Neural ADMIXTURE was run with all default parameter settings.
Environment
Command example
The text was updated successfully, but these errors were encountered: