We finally have the memory leaks which I reported months back, fixed in this release. Now we actually have something usable.
Apparently there were 2 separate memory leaks. One in Java code and another in native C code for the DJImport object. The Java code memory leak was in the the XML-DMS connector. This is what's used when you're pulling data from an XML data source.
Basically what all this means is that if you had a project prior to this release that iterated many XML files to read data from them .. and if you were doing a couple of lookups using the DJImport object while you were processing the map that read the data from the XML file ... you would have run into trouble.
During development, we were processing about 5,000 files, about 250 KB each, through the XML-DMS connector in a map. The map itself would use the DJImport object 7 different times, per transformation. So our job was using the DJImport object 35,000 times in the main processing map and few more times elsewhere.
The first run of the job takes about 70 to 80 minutes depending on resource conditions. Subsequent runs of the job take about 30 to 40 minutes because we allow the map to process XML files it's already processed. After running the job just once we would notice that 2GB or more of system RAM would be used up and not released. Running the job again would take up another 2GB+ of RAM, and so on, until your server ran out of RAM altogether. This is no longer the case after this release. After the job completes, the entire block of RAM that the job was using would be returned to the OS.
There are still other bugs that I reported but received no feedback on, but at least we have something can we push to production now.
I can only surmise that there are few or possibly, no other companies, using this system as intensely as we are in the same way that we are, otherwise this issue likely would have been discovered before.
ReplyDeleteMy experience with Pervasive's Data Integrator tools just proves once again, that there are cases where one would be much better off using open-source software, as opposed to using software where you are at the mercy of the vendor to fix critical bugs. You might find the vendor slow to fix a critical bug, just because their other customers don't use the software quite the same way you do, even though you are just trying to get the software to do what the vendor advertises it can do.
If you're open to using an open-source ETL tool, you might want to check into this: http://www.jonathanlevin.co.uk/2008/03/open-source-etl-tools-vs-commerical-etl.html