Forum Discussion

Kelley_M's avatar
Kelley_M
New Contributor
3 years ago

Direct Connect Loads into OneStream

When loading data into OneStream, our longest workflow takes ~40 minutes to load. We would like to try to kick off the 5 direct connect loads within this workflow to start simultaneously instead of sequentially. Have any of you organized your loads similarly and do you have tips or recommendations please?

  • Hi Kelley, we just did actually.
    It is important that the workflow steps are in different workflows (different workflow steps in 1 workflow don’t run in parallel). You also need to make the load ID (the first item of your load file) dynamic.
    OneStream advises to have a max of 3 in parallel per server so you could create two dm jobs to two different servers with both 3 threads (that’s what we do).
    We also use the direct load (if you don’t expect tranformation issues) that is also 20 to 30% faster with large data sets.
    Next to that the transformation itself is important so you could check the log after the load what takes most time. If it is the parsing or the writing to cube then parallel will help you, if it is the complex transformation then you can try to fix that in the SQL/ source.

  • Hi Kelley, we just did actually.
    It is important that the workflow steps are in different workflows (different workflow steps in 1 workflow don’t run in parallel). You also need to make the load ID (the first item of your load file) dynamic.
    OneStream advises to have a max of 3 in parallel per server so you could create two dm jobs to two different servers with both 3 threads (that’s what we do).
    We also use the direct load (if you don’t expect tranformation issues) that is also 20 to 30% faster with large data sets.
    Next to that the transformation itself is important so you could check the log after the load what takes most time. If it is the parsing or the writing to cube then parallel will help you, if it is the complex transformation then you can try to fix that in the SQL/ source.

  • We have a BR that is scheduled that imports from 4 different ERP’s into 50+ different WF’s. Once finished it marks the Import task as complete and sends out an email. I have seen this done two ways. One way was to take the base WF’s and cycle through them, querying for assigned Entities. The other was separate BR’s that were configured to an individual ERP and would update those WF’s with assigned entites being pulled. We had to do it these ways as we update every hour in one case and every 4 in the other.

  • Keep in mind that direct connects does not have a drill back capability, also it is kind of like an all-or-nothing sort of load. If you got level 2 data units in multiple workflows that need to be sibling workflow profiles (similar to what is there currently).
    Shameless plug: We do have a chapter on direct connects in our upcoming book.

    • Kelley_M's avatar
      Kelley_M
      New Contributor

      Thank you Celvin. Do you need a “proofreader” for that chapter? LOL!

      • I meant direct load (not direct connect). It only keeps one Table in Stage (only summary is kept).

  • I would start looking at, what part of the load that takes the longest time. Is the actual pulling of data from the source-system, or the transformation rules or? From there, then start optimizing for performance. Eg, maybe you can have the source system to aggregate data before pulling data. Maybe you have some transformation rules using the "Mask" option, with BR's that can be optimized etc.