Java/ Load big XER files: performance issue #506

alex-matatov · 2023-05-17T10:20:57Z

Dear Jon,

First of all thank you SO much for your work on mpxj lib! It is great! We recently started to use it and so far so good except one case. When a P6 project file is huge (for example we have 73Mb XER) it takes more then 5 mins just to read it:

var project = new UniversalProjectReader().read(content);  // 5..6 mins

The reason of it it is ProjectEntityContainer.getByUniqueID(). This method takes about 95% of total execution time.

public T getByUniqueID(Integer id)
{
   if (m_uniqueIDMap.size() != size())
   {
      clearUniqueIDMap();
      for (T item : this)
      {
         m_uniqueIDMap.put(item.getUniqueID(), item);
      }
   }
   return m_uniqueIDMap.get(id);
}

As far as I understand m_uniqueIDMap is used for Task caching during parsing (it is ok) and it is cleared on each iteration and filled again with previously loaded Tasks (it is not ok).

PrimaveraReader.processTasks():

...
// first loop: load parent tasks (wbs)
for (Row row : wbs)
{
  // an empty Task is created and is added to a list of tasks (TaskContainer)
   Task task = m_project.addTask();
   …
   
   // populate Task with data from a file. There clearUniqueIDMap() is called! (AbstractFieldContainer.invalidateCache-> Task.invalidateCache)
   // it is not big deal now. The problem comes on the second loop 
   processFields(m_wbsFields, row, task); 
  …
}
….
// second loop: load tasks
for (Row row : tasks)
{
   Task task;
   Integer parentTaskID = row.getInteger("wbs_id");

  // problem is here. Even we loaded N  tasks m_uniqueIDMap is empty and getTaskByUniqueID() will re-populate this map again (see above)
   Task parentTask = m_project.getTaskByUniqueID(parentTaskID); 
   …
   processFields(m_taskFields, row, task); // m_uniqueIDMap will be cleared here!

In order to fix it I would propose these changes (of course it is up to you how it could be done):

ProjectEntityContainer (ListWithCallbacks) could use only m_uniqueIDMap instead of m_list. I would say it is not necessary to duplicate data between 2 collections. I don’t think that invalidateCache() would be necessary in this case.
Change logic around creating Task (and other project entities: Calendar, Resource…) .
Ideally,
Step 1: create Task and fill it with data
Step 2: add it to TaskContainer. In case of Point #1 it could be added to m_uniqueIDMap (Map<UniqueID, Task/Resource/Calendar>) . Otherwise it could be added to m_list (current implementation) and to m_uniqueIDMap. Also, processFields() should not clear m_uniqueIDMap

By the way… Maybe this issue somehow related to #266

Cheers,
Alexander

The text was updated successfully, but these errors were encountered:

joniles · 2023-05-17T10:31:43Z

@alex-matatov thanks for opening the issue, I will take a look as soon as I can. Would you be able to email me a large sample XER file to validate any changes I make (happy to NDA if required).

alex-matatov · 2023-05-17T11:22:02Z

I think you can reproduce it with any P6 file (both xml and xer). Basically the problem of cache clear operation is in ProjectEntityContainer.

By the way... I did a dirty quick fix by avoiding cache clean and that xer was loaded in 4 seconds instead of 5 mins

joniles · 2023-05-22T13:35:35Z

@alex-matatov thanks again for opening the issue, I've merged some changes to address this and improve performance. They'll be available in the next MPXJ release.

alex-matatov · 2023-05-22T14:26:32Z

Hi Jon,

Thank you SO much!

I've just checked "master" and indeed "73Mb xer" processing time was reduced to ~10 seconds. Good job :) .

Cheers,
Alex

joniles closed this as completed May 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Java/ Load big XER files: performance issue #506

Java/ Load big XER files: performance issue #506

alex-matatov commented May 17, 2023

joniles commented May 17, 2023

alex-matatov commented May 17, 2023 •

edited

Loading

joniles commented May 22, 2023

alex-matatov commented May 22, 2023

Java/ Load big XER files: performance issue #506

Java/ Load big XER files: performance issue #506

Comments

alex-matatov commented May 17, 2023

joniles commented May 17, 2023

alex-matatov commented May 17, 2023 • edited Loading

joniles commented May 22, 2023

alex-matatov commented May 22, 2023

alex-matatov commented May 17, 2023 •

edited

Loading