Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java/ Load big XER files: performance issue #506

Closed
alex-matatov opened this issue May 17, 2023 · 4 comments
Closed

Java/ Load big XER files: performance issue #506

alex-matatov opened this issue May 17, 2023 · 4 comments

Comments

@alex-matatov
Copy link

Dear Jon,

First of all thank you SO much for your work on mpxj lib! It is great! We recently started to use it and so far so good except one case. When a P6 project file is huge (for example we have 73Mb XER) it takes more then 5 mins just to read it:

var project = new UniversalProjectReader().read(content);  // 5..6 mins          

The reason of it it is ProjectEntityContainer.getByUniqueID(). This method takes about 95% of total execution time.

public T getByUniqueID(Integer id)
{
   if (m_uniqueIDMap.size() != size())
   {
      clearUniqueIDMap();
      for (T item : this)
      {
         m_uniqueIDMap.put(item.getUniqueID(), item);
      }
   }
   return m_uniqueIDMap.get(id);
}

As far as I understand m_uniqueIDMap is used for Task caching during parsing (it is ok) and it is cleared on each iteration and filled again with previously loaded Tasks (it is not ok).

PrimaveraReader.processTasks():

...
// first loop: load parent tasks (wbs)
for (Row row : wbs)
{
  // an empty Task is created and is added to a list of tasks (TaskContainer)
   Task task = m_project.addTask();
   …
   
   // populate Task with data from a file. There clearUniqueIDMap() is called! (AbstractFieldContainer.invalidateCache-> Task.invalidateCache)
   // it is not big deal now. The problem comes on the second loop 
   processFields(m_wbsFields, row, task); 
  …
}
….
// second loop: load tasks
for (Row row : tasks)
{
   Task task;
   Integer parentTaskID = row.getInteger("wbs_id");

  // problem is here. Even we loaded N  tasks m_uniqueIDMap is empty and getTaskByUniqueID() will re-populate this map again (see above)
   Task parentTask = m_project.getTaskByUniqueID(parentTaskID); 
   …
   processFields(m_taskFields, row, task); // m_uniqueIDMap will be cleared here! 

In order to fix it I would propose these changes (of course it is up to you how it could be done):

  1. ProjectEntityContainer (ListWithCallbacks) could use only m_uniqueIDMap instead of m_list. I would say it is not necessary to duplicate data between 2 collections. I don’t think that invalidateCache() would be necessary in this case.

  2. Change logic around creating Task (and other project entities: Calendar, Resource…) .
    Ideally,
    Step 1: create Task and fill it with data
    Step 2: add it to TaskContainer. In case of Point #1 it could be added to m_uniqueIDMap (Map<UniqueID, Task/Resource/Calendar>) . Otherwise it could be added to m_list (current implementation) and to m_uniqueIDMap. Also, processFields() should not clear m_uniqueIDMap

By the way… Maybe this issue somehow related to #266

Cheers,
Alexander

@joniles
Copy link
Owner

joniles commented May 17, 2023

@alex-matatov thanks for opening the issue, I will take a look as soon as I can. Would you be able to email me a large sample XER file to validate any changes I make (happy to NDA if required).

@alex-matatov
Copy link
Author

alex-matatov commented May 17, 2023

I think you can reproduce it with any P6 file (both xml and xer). Basically the problem of cache clear operation is in ProjectEntityContainer.

By the way... I did a dirty quick fix by avoiding cache clean and that xer was loaded in 4 seconds instead of 5 mins

@joniles
Copy link
Owner

joniles commented May 22, 2023

@alex-matatov thanks again for opening the issue, I've merged some changes to address this and improve performance. They'll be available in the next MPXJ release.

@joniles joniles closed this as completed May 22, 2023
@alex-matatov
Copy link
Author

Hi Jon,

Thank you SO much!

I've just checked "master" and indeed "73Mb xer" processing time was reduced to ~10 seconds. Good job :) .

Cheers,
Alex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants