Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting visualizations from saved report taking very long #465

Open
gnomesanta opened this issue Oct 12, 2023 · 1 comment
Open

Getting visualizations from saved report taking very long #465

gnomesanta opened this issue Oct 12, 2023 · 1 comment
Labels
question General question about the software under discussion Issue is currently being discussed

Comments

@gnomesanta
Copy link

Environment details

  • SDMetrics version: 0.11.1
  • Python version: 3.10.6
  • Operating System: Ubuntu 22.04.1

Problem description

Hi, I already have a saved QualityReport after generating my synthetic data. The QualityReport pkl file is about 40MB.

I am trying to extract the visualization for Column Pair Trends. But the function is taking very long to get the visualization. I am not sure if this is normal, or if it might be an issue with the method?

What I already tried

x = report.get_visualization('Column Pair Trends')
x.write_image('test1.png')

@gnomesanta gnomesanta added new Label applied to new issues question General question about the software labels Oct 12, 2023
@npatki
Copy link
Contributor

npatki commented Oct 12, 2023

Hi @gnomesanta, Could you describe a little more about the dataset you're using? Is it single table or multi table? How many columns are there in each table?

The Column Pair Trends property evaluates the correlation across every pair of columns in a table. So if a table has a lot of columns -- say 500 -- then you're looking at 250K pairwise comparisons. This would explain the size of the report and the time taken for visualization.

(Rest assured that the report is not saving the actual data. Only the scores.)

@npatki npatki added under discussion Issue is currently being discussed and removed new Label applied to new issues labels Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question General question about the software under discussion Issue is currently being discussed
Projects
None yet
Development

No branches or pull requests

2 participants