-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PDL::Core::dog doesn't scale #421
Comments
For comparison, within |
Here's a plot of user time vs N, where the code run is This was run on an AMD Epyc 7763 w/ 128 cores & 638Gb, using 100 parallel processes. Noise might be due to clocking of cores at different speeds (3244MHz vs. 2445Mhz) Code to generate the data: |
I've attached a zip file with NYTProf output for N= 1_000, 10_000, and 100_000 |
My analysis of the 1e5 results: for 100k calls, |
This is really cool - how did you make it? |
For future reference, it's been added to the plot comment above. |
Thank you. That will be at least twice as valuable if you also include the rest, i.e. specifically how you turned those times into the plot shown? |
Check out the |
Thanks! This is the code, to avoid having to download things: #! /usr/bin/env perl
use PDL;
use Path::Tiny;
use PDL::Graphics::PGPLOT;
my ( @n, @e, @u, @s );
for ( path('timelog')->lines ) {
chomp;
my ( $n, $e, $u, $s ) = split /\h+/;
for my $time ( $e, $u, $s ) {
my @c = reverse split /:/, $time;
$time = (pdl(@c) * 60**sequence(0+@c))->dsumover / 3600;
}
push @n, $n;
push @e, $e;
push @u, $u;
push @s, $s;
}
my $n = PDL->new( @n );
my $e = PDL->new( @e );
my $u = PDL->new( @u );
my $s = PDL->new( @s );
# points( $n, $e, { symbol => 2, symbolsize=>3});
# label_axes( 'N', 'Elapsed Time [hours]' );
points( $n, $u, { symbol => 2, symbolsize=>3});
label_axes( 'N', 'User Time [hours]' );
# points( $n, $s, { symbol => 2, symbolsize=>3});
# label_axes( 'N', 'System Time [hours]' ); |
Updated the plot in the above comment, just for kicks. |
Final update of the plot. Now the relationship looks really odd at higher N. Maybe there was process contention on the machine after all? I'll run it again with fewer processes and see if it changes. |
Multiple clear-ish lines is very confusing! |
@mohawk2 new results finished; see plot in comment w/ revised code. Much cleaner. |
Thank you. It does look like it might be O(n^2). Feel like making a log graph to see? :-) |
To be comparable, with the current code:
|
Using |
Without the C implementation of Within PDL_START_CHILDLOOP(it)
if (PDL_CHILDLOOP_THISCHILD(it) != trans) continue;
/* stuff */
PDL_END_CHILDLOOP(it) This will be because PDL uses this linked list to track an ndarray's child transforms: typedef struct pdl_trans_children {
pdl_trans *trans[PDL_NCHILDREN];
struct pdl_trans_children *next;
} pdl_trans_children; There's one inside the |
For this,
time perl -MPDL -e 'zeroes(X)->dog'
is being discussed. This is prompted by an observation from @djerius.X=
1e4,6
takes no appreciable time. X=6,1e4
takes around 5s. The current implementation ofdog
does a Perlmap
slicing over the highest dimension. This reveals two problems:dog
be a PP operation, because we still don't have a notation for aPar
to be expressed in a way that would help, nor do we have a way to return many ndarraysA partial solution to the second point would be to pass in an
OtherPar
as the input (with noPars
at all), and an OtherPar[o] pdl *retval[]
whereretval_count
gets set automatically as with anyOtherPar
that's a[]
, but used for output. The typemap would turn that into an AVREF populated with new ndarray SVs. The input version of that would allow a PPcat
operation as well.The text was updated successfully, but these errors were encountered: