Direct Load Limits

Jorge_Haces
New Contributor II

Hi all,

does anyone knows about the limits of using the Direct Load method?

Like # of rows, memory, parallel loads, or something else.

Thx for advance

3 REPLIES 3

JackLacava
Community Manager
Community Manager

Direct Load is actually more performant than regular import and recommended for larger jobs, so I wouldn't worry too much about problems. The only limits I know of are in error-reporting: there is a max of 50.000 stored errors, of which only 1.000 can actually be presented.

LeeB
Contributor II

You can only load 1 file per step in Direct Loads and you can't load annotations.

TonyToniTone
Contributor II

The Direct Load method is faster than an Import, Validate, Load step. Reasons why it is faster could also be considered limits. For example, Direct Load process eliminates writing data to several Stage database tables which saves time. Therefore, if you had a process that required data in these Stage database tables, then this could be a limitation. Also, audits, history, and drill options are impacted. So if you need any of these as a requirement, then Direct Load is not the process for you. From a benchmarking perspective, 540,278 data records loaded through 1 Workflow using Direct Load takes ~ 7:22 minutes to load to the Cube (timings can vary wildly depending on hardware). Loading the same data records through 1 Workflow using Import, Validate, and Load takes ~ 2:00 minutes to Import and Validate and ~ 6:10 minutes to load from Stage to the Cube. A total of 8:10 minutes from start to finish to load to the Cube vs. 7:22 minutes for Direct Load. ~ 1:00 minute faster to use the Direct Load method. Regardless of the method that is used, the same principles exist for loading data through a Workflow. The more data records needed to import and load into a Cube, the more Workflow partitioning should be considered to break the data into smaller chunks and optimize the loading to Cube process by running the Workflows in parallel. For example, in this scenario, we have 540,278 data records being executed through 1 Workflow. If we can break this up between 5 Workflows where each Workflow has ~ 108,000 data records, we can run these 5 Workflows in parallel with a smaller subset and can execute faster.