Direct Connect Loads into OneStream

Kelley_M
New Contributor

When loading data into OneStream, our longest workflow takes ~40 minutes to load. We would like to try to kick off the 5 direct connect loads within this workflow to start simultaneously instead of sequentially. Have any of you organized your loads similarly and do you have tips or recommendations please?

1 ACCEPTED SOLUTION

MarcR
VIP

Hi Kelley, we just did actually.
It is important that the workflow steps are in different workflows (different workflow steps in 1 workflow don’t run in parallel). You also need to make the load ID (the first item of your load file) dynamic.
OneStream advises to have a max of 3 in parallel per server so you could create two dm jobs to two different servers with both 3 threads (that’s what we do).
We also use the direct load (if you don’t expect tranformation issues) that is also 20 to 30% faster with large data sets.
Next to that the transformation itself is important so you could check the log after the load what takes most time. If it is the parsing or the writing to cube then parallel will help you, if it is the complex transformation then you can try to fix that in the SQL/ source.

Marc Roest
OneStream consultant @Finext

View solution in original post

7 REPLIES 7

MarcR
VIP

Hi Kelley, we just did actually.
It is important that the workflow steps are in different workflows (different workflow steps in 1 workflow don’t run in parallel). You also need to make the load ID (the first item of your load file) dynamic.
OneStream advises to have a max of 3 in parallel per server so you could create two dm jobs to two different servers with both 3 threads (that’s what we do).
We also use the direct load (if you don’t expect tranformation issues) that is also 20 to 30% faster with large data sets.
Next to that the transformation itself is important so you could check the log after the load what takes most time. If it is the parsing or the writing to cube then parallel will help you, if it is the complex transformation then you can try to fix that in the SQL/ source.

Marc Roest
OneStream consultant @Finext

Kelley_M
New Contributor

Thank you Mark! I’ve copied this and sent to my OneStream consultant.

Keep in mind that direct connects does not have a drill back capability, also it is kind of like an all-or-nothing sort of load. If you got level 2 data units in multiple workflows that need to be sibling workflow profiles (similar to what is there currently).
Shameless plug: We do have a chapter on direct connects in our upcoming book.

Thank you Celvin. Do you need a “proofreader” for that chapter? LOL!

I meant direct load (not direct connect). It only keeps one Table in Stage (only summary is kept).

Clarke_Hair
Contributor

We have a BR that is scheduled that imports from 4 different ERP’s into 50+ different WF’s. Once finished it marks the Import task as complete and sends out an email. I have seen this done two ways. One way was to take the base WF’s and cycle through them, querying for assigned Entities. The other was separate BR’s that were configured to an individual ERP and would update those WF’s with assigned entites being pulled. We had to do it these ways as we update every hour in one case and every 4 in the other.

FrankDK
Contributor

I would start looking at, what part of the load that takes the longest time. Is the actual pulling of data from the source-system, or the transformation rules or? From there, then start optimizing for performance. Eg, maybe you can have the source system to aggregate data before pulling data. Maybe you have some transformation rules using the "Mask" option, with BR's that can be optimized etc.