Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large Memory and Speed performance regression from 2.5.3 to 2.6.3 #122

Open
azrazalea-debtbook opened this issue Feb 21, 2024 · 6 comments

Comments

@azrazalea-debtbook
Copy link

azrazalea-debtbook commented Feb 21, 2024

We upgraded to rails 7.1.3 and at the same time I upgraded our version of creek from 2.5.3 to 2.6.3.

After deploying the new rails version, we noticed our memory usage going up significantly as well as our time to process spreadsheets.

I tested the two versions in irb isolation and saw no difference, with a slight improvement for 2.6.3. However, when in docker the difference is massive.

(See #122 (comment), my testing methodology wasn't great for local Mac and the difference is less but still large. This is not just docker)

[3] [app][development] pry(main)> Creek::VERSION
=> "2.5.3"
[4] [app][development] pry(main)> time = Time.now; creek = Creek::Book.new('performance-test.xlsx'); puts(Time.now - time);
8.105815712
[5] [app][development] pry(main)> time = Time.now; creek.sheets[0].simple_rows.map(&:inspect); puts(Time.now - time);
38.974233879
[6] [app][development] pry(main)>
VmPeak:	  709908 kB
VmSize:	  593128 kB
VmLck:	       0 kB
VmPin:	       0 kB
VmHWM:	  616644 kB
VmRSS:	  518568 kB
VmData:	  559256 kB
VmStk:	    8188 kB
VmExe:	       4 kB
VmLib:	   24516 kB
VmPTE:	    1372 kB
VmSwap:	       0 kB
[1] [app][development] pry(main)> Creek::VERSION
=> "2.6.3"
[2] [app][development] pry(main)> time = Time.now; creek = Creek::Book.new('performance-test.xlsx'); puts(Time.now - time);
8.186497337
[3] [app][development] pry(main)> time = Time.now; creek.sheets[0].simple_rows.map(&:inspect); puts(Time.now - time);
66.137462044
[4] [app][development] pry(main)>
VmPeak:	 3852528 kB
VmSize:	  603644 kB
VmLck:	       0 kB
VmPin:	       0 kB
VmHWM:	 3765544 kB
VmRSS:	  529308 kB
VmData:	  569772 kB
VmStk:	    8188 kB
VmExe:	       4 kB
VmLib:	   24516 kB
VmPTE:	    1520 kB
VmSwap:	       0 kB

As you can see in the above, the problem appears to be the row generator. Also, the peak memory usage went from 709,000 KB to 3,852,000 KB, 5 times more memory!

Here is the spreadsheet I used for testing (a subset of chicago's public crime data) https://drive.usercontent.google.com/download?id=1zWiCRYCS7Vs9EPZyOqcKr-eT59F7w3dm&export=download (give it a minute to start downloading, google has to virus scan it first and it takes forever)

@azrazalea-debtbook
Copy link
Author

More system information:

Ruby 3.2.2
Docker arm64 alpine linux on Mac OS X host via Docker Desktop

@azrazalea-debtbook
Copy link
Author

Related to #121 but more specific and I was able to find some data I can show you to reproduce it!

@azrazalea-debtbook azrazalea-debtbook changed the title [RAILS ONLY] Large Memory and Speed performance regression from 2.5.3 to 2.6.3 [Docker ONLY] Large Memory and Speed performance regression from 2.5.3 to 2.6.3 Feb 21, 2024
@azrazalea-debtbook
Copy link
Author

azrazalea-debtbook commented Feb 21, 2024

Found that this is actually docker not rails. I spun up a fresh rails instance and wasn't producing it locally. Then, I used the rails default dockerfile modified to strip it back closer to essentials.

This is on arm64 architecture but I have the same problem in our production amd64 architecture docker images too.

# syntax = docker/dockerfile:1

# Make sure RUBY_VERSION matches the Ruby version in .ruby-version and Gemfile
ARG RUBY_VERSION=3.2.2
FROM registry.docker.com/library/ruby:$RUBY_VERSION-slim as base

# Rails app lives here
WORKDIR /rails

# Set production environment
ENV BUNDLE_DEPLOYMENT="1" \
    BUNDLE_PATH="/usr/local/bundle" \
    BUNDLE_WITHOUT="development"


# Throw-away build stage to reduce size of final image
FROM base as build

# Install packages needed to build gems
RUN apt-get update -qq && \
    apt-get install --no-install-recommends -y build-essential git libvips pkg-config

# Copy application code
COPY . .

# Final stage for app image
FROM base

# Install packages needed for deployment
RUN apt-get update -qq && \
    apt-get install --no-install-recommends -y curl libsqlite3-0 libvips && \
    rm -rf /var/lib/apt/lists /var/cache/apt/archives

# Copy built artifacts: gems, application
COPY --from=build /usr/local/bundle /usr/local/bundle
COPY --from=build /rails /rails

# Run and own only the runtime files as a non-root user for security
RUN useradd rails --create-home --shell /bin/bash && \
    chown -R rails:rails db log storage tmp
USER rails:rails

RUN gem install creek

CMD ["bash"]

And built it docker build .

Then ran docker docker run -it <image sha>

Finally inside docker I ran irb:

rails@5058bc9c86b6:/rails$ irb

Inside irb I did:

irb(main):001:0> require 'creek'
=> true
irb(main):003:0> time = Time.now; creek = Creek::Book.new('./performance-test.xlsx'); puts(Time.now - time);
7.558300586
=> nil
irb(main):004:0> time = Time.now; creek.sheets[0].simple_rows.map(&:inspect); puts(Time.now - time);
52.017292079
=> nil

In a separate shell to the same docker container (docker exec -it <container SHA (NOT image sha> bash) I did:

rails@5058bc9c86b6:/rails$ ps -wef
UID        PID  PPID  C STIME TTY          TIME CMD
rails        1     0  0 17:07 pts/0    00:00:00 bash
rails       18     1 26 17:08 pts/0    00:01:00 irb
rails       20     0  0 17:12 pts/1    00:00:00 bash
rails       27    20  0 17:12 pts/1    00:00:00 ps -wef
rails@5058bc9c86b6:/rails$ grep -i vm /proc/18/status
VmPeak:	 3552164 kB
VmSize:	 3551392 kB
VmLck:	       0 kB
VmPin:	       0 kB
VmHWM:	 3408084 kB
VmRSS:	 3408084 kB
VmData:	 3462272 kB
VmStk:	    8188 kB
VmExe:	       4 kB
VmLib:	   12232 kB
VmPTE:	    6840 kB
VmSwap:	       0 kB

@azrazalea-debtbook
Copy link
Author

azrazalea-debtbook commented Feb 21, 2024

Note: I tried this in an amd_64 version of the same container and got the same result. 3GB RAM usage in 2.6.3, 800MB RAM in 2.5.3. Still on Mac OS X host though

@azrazalea-debtbook
Copy link
Author

azrazalea-debtbook commented Feb 21, 2024

Nevermind, I'm not used to Mac OS X and had trouble figuring out the peak memory usage.

Got gtime working though and amd64 Mac OS X Sonoma 14.3.1 ruby 3.2.2 shows a similar though smaller effect.

So this is not docker or rails specific, I just failed at figuring out peak memory usage on Mac and the speed didn't get way slower like it did in docker.

It is better locally at least. Only about 2.6x more memory not 5x.

lilithcarpenter@Liliths-MacBook-Pro tmp-rails % gtime -v irb
irb(main):001> require 'creek'
=> true
irb(main):002> Creek::VERSION
=> "2.6.3"
irb(main):003> time = Time.now; creek = Creek::Book.new('./performance-test.xlsx'); puts(Time.now - time);
6.727392
irb(main):004> time = Time.now; creek.sheets[0].simple_rows.map(&:inspect); puts(Time.now - time);
32.480844
irb(main):005>
	Command being timed: "irb"
	User time (seconds): 36.39
	System time (seconds): 2.42
	Percent of CPU this job got: 21%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 2:56.84
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 1691680
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 14
	Minor (reclaiming a frame) page faults: 654378
	Voluntary context switches: 1737
	Involuntary context switches: 20670
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 16384
	Exit status: 0
lilithcarpenter@Liliths-MacBook-Pro tmp-rails % gtime -v irb
irb(main):001> gem 'creek', '2.5.3'
=> true
irb(main):002> require 'creek'
=> true
irb(main):003> time = Time.now; creek = Creek::Book.new('./performance-test.xlsx'); puts(Time.now - time);
6.498211
irb(main):004> time = Time.now; creek.sheets[0].simple_rows.map(&:inspect); puts(Time.now - time);
27.670476
irb(main):005>
	Command being timed: "irb"
	User time (seconds): 34.16
	System time (seconds): 0.57
	Percent of CPU this job got: 53%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 1:05.09
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 647200
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 15
	Minor (reclaiming a frame) page faults: 44136
	Voluntary context switches: 3867
	Involuntary context switches: 1967
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 16384
	Exit status: 0

@azrazalea-debtbook azrazalea-debtbook changed the title [Docker ONLY] Large Memory and Speed performance regression from 2.5.3 to 2.6.3 Large Memory and Speed performance regression from 2.5.3 to 2.6.3 Feb 21, 2024
@azrazalea-debtbook
Copy link
Author

azrazalea-debtbook commented Feb 21, 2024

Identified commit 494ed05 as the culprit.

Version 2.6

irb(main):001> gem 'creek', '2.6.2'
=> true
irb(main):002> require 'creek'
=> true
irb(main):003> time = Time.now; creek = Creek::Book.new('performance-test.xlsx'); puts(Time.now - time);
6.848251
irb(main):004> time = Time.now; creek.sheets[0].simple_rows.map(&:inspect); puts(Time.now - time);
35.083837
irb(main):005>
	Command being timed: "irb"
	User time (seconds): 39.23
	System time (seconds): 2.26
	Percent of CPU this job got: 39%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 1:45.88
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 2344960
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 43
	Minor (reclaiming a frame) page faults: 632327
	Voluntary context switches: 797
	Involuntary context switches: 13858
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 16384
	Exit status: 0

Commit f8856ee

lilithcarpenter@Liliths-MacBook-Pro test-creek % gtime -v bundle exec irb
irb(main):001:0> require 'creek'
=> true
irb(main):002:0> time = Time.now; creek = Creek::Book.new('performance-test.xlsx'); puts(Time.now - time);
6.800597
=> nil
irb(main):003:0> time = Time.now; creek.sheets[0].simple_rows.map(&:inspect); puts(Time.now - time);
27.776927
=> nil
irb(main):004:0>
	Command being timed: "bundle exec irb"
	User time (seconds): 34.50
	System time (seconds): 0.51
	Percent of CPU this job got: 38%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 1:30.84
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 545728
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 29
	Minor (reclaiming a frame) page faults: 49358
	Voluntary context switches: 1595
	Involuntary context switches: 1773
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 16384
	Exit status: 0

Commit 494ed05

lilithcarpenter@Liliths-MacBook-Pro test-creek % gtime -v bundle exec irb
irb(main):001:0> require 'creek'
=> true
irb(main):002:0> time = Time.now; creek = Creek::Book.new('performance-test.xlsx'); puts(Time.now - time);
6.739267
=> nil
irb(main):003:0> time = Time.now; creek.sheets[0].simple_rows.map(&:inspect); puts(Time.now - time);
36.350612
=> nil
irb(main):004:0>
	Command being timed: "bundle exec irb"
	User time (seconds): 40.49
	System time (seconds): 2.09
	Percent of CPU this job got: 47%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 1:28.87
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 2022608
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 105
	Minor (reclaiming a frame) page faults: 624585
	Voluntary context switches: 747
	Involuntary context switches: 20804
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 16384
	Exit status: 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant