Need to speed up metadata extraction

I have a project where the client wants to get the metadata out of one of their m-files classes.  Currently, the vault has 1.96 million files.  All I am trying to pull is the metadata for each file.  My tool runs and it was running much faster a few days ago but yesterday it slowed dramatically.  My tool was estimating that it would take about 3 days to get all the metadata and now it is 16 days.  So, my questions are, what causes the change in performance and what are some best practices and maybe COM functions that would make this better.  They will be doing this multiple times a year.  I understand why the download takes some time but not sure if there are some better classes/object to bulk download metadata

My metadata extraction uses the following code

public string GetMetadataAsJson(ObjectVersion objectVersion, Vault vault, string filePath, string className)
{

var metadataDict = new Dictionary<string, object>
{
["MFilesFolderPath"] = filePath,
["ClassName"] = className
};


PropertyValues properties = vault.ObjectPropertyOperations.GetProperties(objectVersion.ObjVer);


var propertyDefsCache = new Dictionary<int, string>();

foreach (PropertyValue propertyValue in properties)
{

if (!propertyDefsCache.TryGetValue(propertyValue.PropertyDef, out string propertyDefName))
{
PropertyDef propertyDef = vault.PropertyDefOperations.GetPropertyDef(propertyValue.PropertyDef);
propertyDefName = propertyDef.Name;
propertyDefsCache[propertyValue.PropertyDef] = propertyDefName;
}


metadataDict[propertyDefName] = propertyValue.Value.DisplayValue;
}

return JsonConvert.SerializeObject(metadataDict, Formatting.Indented);
}

Parents
  • A couple of things:

    • I definitely agree with  : convert this to retrieve data in batches.  For reading data batches in the hundreds should work.  It won't make it hundreds of times faster, but it'll make it faster.
    • Your current code seems to populate that propertyDefsCache for each object, which means tens of millions (at least - possibly hundreds of millions) of additional vault queries to get this data.  My suggestion would instead be to retrieve all of the properties once, first, populate a static dictionary instead, then use that cache inside your GetMetadataAsJson method.
Reply
  • A couple of things:

    • I definitely agree with  : convert this to retrieve data in batches.  For reading data batches in the hundreds should work.  It won't make it hundreds of times faster, but it'll make it faster.
    • Your current code seems to populate that propertyDefsCache for each object, which means tens of millions (at least - possibly hundreds of millions) of additional vault queries to get this data.  My suggestion would instead be to retrieve all of the properties once, first, populate a static dictionary instead, then use that cache inside your GetMetadataAsJson method.
Children