Blog Post

Community Blog
4 MIN READ

What is a Data Unit?

PFugereCSO's avatar
PFugereCSO
Contributor
4 years ago

To understand how OneStream performs many of its tasks and operations – including logic execution, consolidation, and data cache – it is critical to understand the concept of Data Units. A Data Unit represents the constituent of work for loading, clearing, calculating, storing, and locking data within the OneStream XF multidimensional engine. A Data Unit is also something that shares some common Point of View (POV) information. OneStream can deliver three different levels of Data Unit granularity. As this relates to design, I will address Level 1 (for other levels, please refer to the OneStream XF Design and Reference Guide).

Data Unit – Level 1

This is the largest unit of work within the system and is mostly thought of as Entity, Scenario, and Time. Users of financial analytic systems typically think about clearing, loading, calculating, and locking combinations of Entity, Scenario, and Time.

Members of the Level 1 Data Unit Cube

  • Consolidation.
  • Entity.
  • Scenario.
  • Parent.
  • Time.

These Level 1 Dimensions define the Data Unit, and it consists of the stored data records for the above combination of dimensional intersections. When you reference any combination of these Dimensions, a “Data Unit” is created in the server’s memory. The server calculates parent Members of Account, Flow, and User-Defined – dynamically – and generates a small Cube of this data. The greater the size of the Data Unit, the larger the strain placed on the system.

You can estimate the size of a Data Unit by multiplying the number of Members in the Account and User-Defined Dimensions to determine all possible intersections. Thus, an application with many Accounts (for example, 10,000) and large Custom Dimensions (Custom 1 has 10,000 Members, Custom 2 has 7500 Members, and so on) will result in potentially exceptionally large Data Units. You will then need to evaluate the data to determine the quantity of stored records for each Data Unit. This needs to be evaluated for both base and parent-level Entities as they are handled the same by the system.

I like to explain the Data Unit like a page in a Workbook. It is easier to see.

 

Figure 1

 

Here is an example of a Data Unit, with each record a single row in a Spreadsheet. Each record and loadable Dimension have a data value. None of the parent Members are shown in the rows. If you wrote a rule to loop over each of these records in the Data Unit (represented as a row, above), you would only run the rule six times. This thinking helps the rules in OneStream be ‘data-driven’. That means the volume of data will dictate what and when rules run. This can be a very efficient way to design an application.

You can’t beat factorial math here. Adding one Dimension can create millions or more intersections of data. Adding a Dimension for the existing six records with only four Members could increase that Data Unit from 6 to 24.

The size and number of Data Units are what you are trying to manage. A Cube with exceptionally large User-Defined Dimensions – populated with a lot of data – will have large Data Units. A Cube with everything pushed into the Entity Dimension will have much smaller, but many more, Data Units. If the processor is spending all its time creating and managing these Data Units, because they are either big or numerous, it does not have capacity for anything else.

OneStream XF treats a zero as data, so it is strongly recommended to avoid loading or calculating cells with zero ‘hard coded’ values. Dense Account or Custom Dimensions will result in slower performance as the application server must process and aggregate many records resulting in performance degradation. I would be careful with allocation rules; while they will not populate the database with a lot of zeros, they could populate the database with near-zero data. I define near-zero data as data that is not zero, but numerically insignificant. If I have a bad rule that creates thousands of cells with fractions of a penny, the number will not increase the accuracy of the financial data but can slow the system down. Near-zero data adds no value and will slow performance.

 

 

Figure 2

 

OneStream will provide detail on the data that is zero in the Data Unit Statistics. This is available from a custom Report or by right-clicking on a cell in a grid. (See Figure 1.) The number of zeros should be monitored closely, and if they either spike significantly or increase above 10% of the data, they should be addressed. You will need to identify the source of the zeros and resolve it.

It is important to note that the period is part of the Data Unit. So, if you loaded data in each month, and did not load data in the subsequent months, the system will generate either a year-to-date or periodic zero. While this is not real data, you will see that number if you loop over the cells of the Data Unit in your rules. Stored calculations also add cells of data that could require processing.

(Excerpt from Data Units, OneStream Design and Reference Guide, OneStream Software, 2016 )

Updated 2 years ago
Version 3.0
No CommentsBe the first to comment