This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

M-Files as a storage for frequent big documents

Hi there,

I was approached today with a requirement to save big technical reports containing picture snippets and summaries. These reports are usually between 250-500MB each and they are in PDF format.
Another point is that these reports could easily be produce 1-10 times per month.
So if I calculate a middle case 5 documents of 300 MB per month I coming to 60 documents with size 17 GB.
I have did some load test with 200-250MB files over REST based API and it was not particularly fast (obviously due to size of a document). I am currently not aware about complexity of metadata for that specific document type. This can of course influence uploading speed as well.
I am not aware on any document size limit in M-Files but wondering if M-Files is a proper solution for such a use case.
I am also worried because Vault will grow to be quite big and this could influence other document types used in a system (planned 50-100 document types).

What are your experiences with big frequent documents? I would appreciate your experiences.

Dejan
  • Hi Dejan
    I have customer who currently has more than 500 zip files sized 100 - 300 MB and few sized around 2 GB in M-Files. We did at first have an issue with too little free disk space on the server to handle the temporary file while uploading. Once they increased the available space to be adequate it has worked just fine. The vault obviously has thousands of more common sized documents and receives over 100 new documents daily from automated import of files attached to emails. Those documents will be related other objects and some metadata is added manually resulting in around 500 modified objects on a daily basis in an organization with about 30 employees. Most of the documents and other objects have 20 - 30 properties, and most of those are filled automatically.
    Obviously, you will need to size the server to handle those large documents just like you would in any other system. And obviously, it takes a little time to upload such files if the connection does not have enough bandwidth. But apart from that I would not be concerned about the size of the files nor the metadata related to them. The danger lies not in the size but in the complexity when you over some years add more automated calculations and each time use the newest features available for that particular task.
    The company mentioned above has used M-Files for at least 6 years from they started out as a small firm developing a new product to now being a global player handling sales, procurement, production and service by subcontractors and vendors in many countries. When we first started out we had no idea where this was going to take us, and we have constantly added functionality and new features to the vault as they grew and added new functions to their organization. At this point we sometimes get surprised when a seemingly simple change in the vault structure suddenly creates unexpected consequences in features or functions build years ago. But file size has never been an issue as long as the server had the resources required to handle them.
    BR, Karl
  • Hi Karl,

    Thanks for the summary and insights. That is very encouraging.

    I am wondering if you could share the server setup (RAM, Disk size, disk throughput as I read it is very important for fast storing) as I could imagine the setup would be the key. I know about general setup that M-Files recommended which is I suppose good guidance. I would be very curious about type of disks and disk throughput as documents would be saved on a disk. In our case, we would have shared drive provided from NetApp which could grow so we would prepare ourselves for increase scenario.

    Are the amount of those 500 big zips constantly growing or is more or less fixed? 100 documents per day incoming is quite a solid number for standard documents so this seems to be quite a solid throughput. Also wonder which search engine the client is using DtSearch or IDOL. I suppose indexing services are setup on separate server so that index can be re-built fast as new documents arrive.

    Obviously users would not preview zip files. Do you have any experience with previewing of big 300MB files in a preview mode? I did some testing and it freezes a client for some time with VM having 16GB (but I have to admit many other things are running in parallel, it is our testing VM). As in our case, we plan to upload PDFs, the usage of preview mode would be an realistic usage.
    I do agree with you about complexity: we have currently 2 effective document types with quite a complex workflows. M-Files is amazing because offers so flexibility but of course with that flexibility comes complexity as well. We now try to reduce number of auto-calculate properties and also trying to re-use as much metadata we can. Not an easy task, when you have real scenarios already running over existing solutions. We would need to exponentially grow (+50 document types and than even more) so maintenance costs and maintenance best practices are worrying/points of interest for me as well.

    Nevertheless what you have summarized sounds encouraging and have a lot of potential.

    Thanks again for you insights.

    Dejan


  • The number of large files is growing by approx 2 pr working day.

    None of the large files are associated with Preview modules, I can't say for sure how they would respond. However, it will take some time for the client to download the content and generate the preview - in particular if the whole file needs to get downloaded before the preview can be generated. On small files I get the impression that it takes more time to start the application then to generate the preview, but that would likely be the other way around on large files.

    The server is a virtual machine in MS Azure (not M-Files Cloud). Capacity is monitored and expanded as the need grows. Search engine is DTsearch. Database is SQL.
    The databases and temp-files are placed on SSD, file storage is on HDD. I do not have details on disk throughput. Currently the server has 32 GB RAM. It will need more in the near future. Free space on file storage needs to be a fair bit larger than the largest file expected to become uploaded. We had a case at one time where the disk filled up with about 90 GB of temporary data while generating one of those large ZIP files which should contain about 40 GB of data. The process could not complete and started over and over again placing a heavy load on the server. You could actually monitor in Explorer on the server how free space on the drive filled up over a matter of a some minutes and then returned to about 90 GB. About 20 minutes later the process started over again in the next attempt. Amazingly the server did not halt and users only noticed a slightly slower response while this was going on.
  • Thanks Karl! I highly appreciate a level of details.
    I have done some testing in Azure VMs as well and the performance decreases as we switched to HDD disks (need to save money). SSD is definitely a way to go, I would even prefer it for storage but as mentioned we would have NetApp shared drives which should be fairly fast.
    I noticed as well that the preview mode for fairly small documents sometimes take more time. But I have to admit that we use converting into pdf on preview as windows preview does not look nice for word documents. Also I have noticed that documents with higher amount of metadata placeholders take more time to preview as pdf. Unfortunately not all placeholders get filled out which is kind of odd. Users need to checkout document to get all placeholders filled out. Metadata behind contains value but it does not get filled out during preview. Kind of strange. I wrote once about this but no one could explain me how the preview works and why it does not pick and apply all metadata into converted pdf.
    Sorry short detour about preview as it is a big topic for users.
    Are temp files for M-Files Server separately configured? I know about setting up the storage (file data location, secondary file data location).
    It is great to hear that even under such a big load users did not experience complete blackout.

    Thanks again Karl.

    Dejan
  • The temp-files mentioned to be on SSD are those generated by the Windows server itself, it is not directly related to M-Files. It is something that the sub-contractor responsible for the server has configured. I do not know exactly how and why, but I can see 2 SSD based volumes on the server, one is named Temporary Storage, and one SQLVMLOG.

    Regarding the issue with previews not having updated all properties: First of all the document needs to be updated prior to generating the PDF. If you use PDF for preview of Word documents you really need to ensure that those word documents have been updated before they get saved to M-Files. There is a function in M-Files that can update properties inside the word document, but I doubt very much that it will work if you use PDF for preview. You should discuss this in detail with M-Files if you want to depend on this.
    Recently I had a case where we had turned on Embedded Metadate Update for Word and Excel documents as described in www.m-files.com/.../enabling_metadata_field_updates_in_web_and_mobile.html This should only affect users in Classic Web and Mobile access. However, it turned out to block update of metadata on certain documents in Desktop access - apparently because the automatic metadata update collided with other automated functions. We were not able to change metadata from Desktop on those documents before we turned off this feature.

    One thing that perhaps is worth considering if you need quick access to those large files in a place with poor bandwidth (don't know if that is the case) would be to have an on premise server that can provide access over Gbit LAN connections. I am lucky to live an place where the minimum WAN speed is 200 Mbps, but that is certainly not the case everywhere. With lower bandwidth comes slower access to large files. You could implement a hybrid setup.
  • Hi Karl,
    Those are great points. Thank you very much.

    We will be on-premise in intranet so hope it would work well.
    For preview, as you mentioned document needs to get saved before previewing. What user expects is often: they change value on metadata card and want to preview it again and see the old value. It is hard to explain people that they need to checkout document to see new metadata there. That is the problem.
    That what you mentioned with embedded metadata is interesting but if it is not working well with desktop client, again is not very useful. I always thought if we automate pdf converting M-Files would pick up the latest metadata values but it does not work like that unfortunately.

    Many thanks for insights. That really helps.

    Dejan