This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Slow XML import

Former Member
Former Member
Hia,

We have a conversion mechnism running that's converting legacy postscript output to generic PDF with formatted XML (1 pdf = 1 pdf, same naming).
On a daily base about 5000 files are generated and imported with an external file source job using the XML and xpaths for meta data.
It's all working nicely, however, the import into M-Files is so slo...o..w.....

Any experience here to speed up this process, to increase the # of files per import (for example 500 in stead of 100) etc?
The (virtual) hardware shouldn't be the issue, nor the SQL vault database.

We have a backlog of let's say a few million files, so any speed increase would be great.


PS: best wishes all!
Parents
  • Former Member
    Former Member
    That sounds about normal, we usually estimate importing speed to be ~1 document/second if we don't have details about the environment. The actual speed depends on a lot of variables such as the hardware specs, the network, the complexity of the metadata structure and any event handlers and other scripts that need to run when the document is added to the vault etc.

    The environment is no issue. All data is local for the server and there's nothing complex about plain text property import. Server CPU isn't going over 10% and memory is barely dented for the process.

    What would be the impact if we duplicate the import job to a second or third, and do parallel imports? Is M-Files ready for concurrent threads in this manner and would that increase import speed?
Reply
  • Former Member
    Former Member
    That sounds about normal, we usually estimate importing speed to be ~1 document/second if we don't have details about the environment. The actual speed depends on a lot of variables such as the hardware specs, the network, the complexity of the metadata structure and any event handlers and other scripts that need to run when the document is added to the vault etc.

    The environment is no issue. All data is local for the server and there's nothing complex about plain text property import. Server CPU isn't going over 10% and memory is barely dented for the process.

    What would be the impact if we duplicate the import job to a second or third, and do parallel imports? Is M-Files ready for concurrent threads in this manner and would that increase import speed?
Children
No Data