OSAdmin
Valued Contributor

BLUEPRINT BULLETIN

ARCHITECT FACTORY

Release Date:  12/08/2020 Updated

Prepared by: Peter Fugere
Severity: Medium

Data Units

 Purpose:    The purpose of this bulletin is to build effective and smart data units. The example application utilizes few entities, and has large user defined dimensions, with data volumes that create large data units. 

Key Recommendations:

OneStream XF manages the in-memory 18-dimension model through a concept called data units. Data units allow for highly efficient processing and viewing of data using application server caching. Each data unit is comprised of a fixed point from each of the page dimensions and all members of the data unit dimensions. These dimensions are listed in the following table:

Page dimensions

Data unit dimensions

Scenario

Account

Time

IC

Entity/Parent

UD1 – 8

Cons

Period

 

View

 

Origin

 

Flow

 

The data unit consists of the stored data records for the above combination of populated dimensional intersections, not possible intersections. Typical applications could have trillions of potential intersections, but the data is typically not very dense. 

 

Dimensions in the data unit should be sparse dimensions. To maximize performance, OneStream uses a RAM-based model for processing on the application server. When page dimensions are called, a data unit is created in memory and its size is determined by data population for the given intersections. The size of the data units and the number of data units created determine the performance of the application. A poor performing application could have only few entities, but thousands of user defined members.  The application will also perform poorly if there are thousands of entities and only few user defined dimensions  

The actual data unit limit can vary by some other factors that affect the creation of these data units, such as number of parents and dynamic calculations, system limitations like processor speed and available memory, and missing data values. Applications with data units over 2-4 million records should be reviewed to see if there can be efficiencies created by adjusting the data unit.

 

The data unit is evaluated by the actual data populated in the dimensions of the model. Data units should be measured in the following manners to be of relevance, each giving you an idea of the impact of the model for the data set:

 

1.            Input level data for one year:  This is measured by extracting all Base level entities, all Base level accounts, for all base periods, for the densest Year/Scenario combination. This excludes calculated values.

2.            Calculated base level data for one year:  Perform all selections in step 1 and include calculated data.

3.            Consolidated base level data for one year:  Perform all selections in step 2 with one important distinction: only select the primary top entity in the application. This selection is meant to measure the densest data set in the application.

4.            Consolidated base level data for one period:  Perform all selections in step 3 except select only the last period in the year. This is the simplest way to measure the maximum number of records in a data unit by counting all unique combinations. This exercise is to establish if there is data explosion from rules and the largest data unit size. 

 

Effect of multiple periods on the data unit size

Weekly and daily applications are more sensitive to data unit size. A weekly application has 53 base periods, about 4.5 times the size of the monthly dimension. The same dimensions that yield 1,000,000 values in the twelve-period data unit, would be about 4,500,000 for the weekly data unit. A daily application would be even more extreme with those data levels.

 

Zeros/Near Zero data in the application

Older applications or ones poorly designed for data integration can have large numbers of loaded zeros.  Near zero data is data that is close to zero, but effectively useless.  Poorly written rules will create data values that provide no valuable data often from allocations rules. Both values do not help reporting but do affect performance. Data units are driven by populated intersections.  Zeros should be removed.

 

 

Version history
Last update:
‎09-22-2021 01:29 PM
Updated by: