Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long import of +300k records #1509

Open
adrianjean opened this issue Sep 14, 2024 · 2 comments
Open

Long import of +300k records #1509

adrianjean opened this issue Sep 14, 2024 · 2 comments
Labels

Comments

@adrianjean
Copy link

adrianjean commented Sep 14, 2024

Description

I have a JSON file of 300,000+ records each containing:

  • First name (plaintext)
  • Last Name (plaintext)
  • Address (plaintext)
  • Visitor ID (num)
  • Region (Category via Title string)
  • Preference ID (num)

Things I have done:

  • devMode is off.
  • /config/feed-me.php
<?php
return [
    '*' => [
        'compareContent' => true,
    ]
];
  • Type is Entry
  • I have Add Entry and Update Entry using the Visitor ID as the unique identifier as there are a few thousand records in the system already.
  • Hardware wise, it's a DigitalOcean droplet with ample CPU and memory running 20 other CraftCMS sites on it.
  • htop shows the CPU (4 cores) is barely being used (cores bouncing between 1-10% each) and memory is just under half used.

Status

  • I have the import running right now I've made about 2% in progress over the last 2 hours.

Questions:

  • Is this "normal" for this amount/type of data? I tried CSV before and it was about the same.
  • If it's not normal, what do you suggest I try to speed this import up?

Additional info

  • Craft version: Craft Pro 5.4.3
  • PHP version: 8.3.11
  • Database driver & version: MySQL 8.0.30
  • Plugins & versions:

AsyncQueue 4.0.0
Button Box 5.0.0
Campaign dev-develop
CKEditor 4.2.0
Code Field 5.0.0
Control Panel CSS 3.0.0
Cookies 5.0.0
Dashboard Begone 3.0.0
Expanded Singles 3.0.0
Feed Me 6.3.0
Field Manager 4.0.2
Formie 3.0.6
Icon Picker 3.0.1
Knock Knock 3.0.1
Neo 5.2.5
SendGrid 3.0.0
SEO 5.1.3
Single Cat 4.0.0
Sprig 3.5.0
Timber 2.0.2

@adrianjean adrianjean added the bug label Sep 14, 2024
@adrianjean
Copy link
Author

Just to update this — I split my list into 20k chunks, and it's taking at least a day to import each chuck. Often timing out so I have to extend the timeout to 20 hours to try and process the chunks successfully.

if anyone had any suggestions on how to go about this better/quicker, it would be super appreciated.

@adrianjean
Copy link
Author

  • I have a Digital Ocean droplet with 32GB of ram and 4 vCPUs. The CPU usage never really gets past 6% in total.
  • I'm running php-fpm with pool settings that are plenty to satisfy the traffic on the server.
  • Im using Digital Ocean hosted database, and also upgraded it to 16 GB RAM / 6vCPUs

None of these actions seem to be making any kind if noticeable improvement on the import speed.
I need to get this data in, and ultimately more data into the system, but doing it over days and weeks is not sustainable.

Is there anything else I can try?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant