Forum Discussion

dbeavon's avatar
dbeavon
Contributor
3 years ago

FDX methods (like FdxExecuteCubeView) ... Why are they faster?

Recently I heard in these forums that FDX methods (like FdxExecuteCubeView) should be used from a BR whenever the performance of the data is critical.

Can someone help me understand why they are faster?

We've already built quite a lot of client-integration software using cubeviews.  The cubeviews are being executed via the REST api.  We are retrieving data with "api/DataProvider/GetAdoDataSetForCubeViewCommand".

Now that I have discovered the "FDX" methods, it puts me in the troublesome position.  I realize that a ton of our integration software should probably be re-written to use those FDX methods.  The re-write may take some time.  Why won't onestream just extend the "normal" cubeview interface of the REST api, and add an option to run it in FDX mode?  That would be similar to the "turbo" buttons they used to put on a the tower of a PC to make it run faster (for those users who had a preference between a fast computer and a slow one.)

It would be helpful to understand the differences between these two (fast and slow) mechanisms for retrieving data from cube views.

 

  • To add to Peter's points, the FDX methods were specifically created to programmatically extract data from the appropriate OneStream tables or other source systems in the most efficient way.  The FDX connectors were primarily created in conjunction of the BI Blend process but can be used for any process that requires data to be extracted from OneStream  

    FDX, Fast Data Extract, BRAPIs allow a variety of options for connecting to DataSources for BI Blend.  A key differentiator between the FDX BRAPI’s and other collection methods is support of parallel processing, in memory processing, and management of the Time dimension.

    FDX BRAPI’s provide functionality to build Connectors to extract data from:

    • Cube Views: Extract data through a Cube View definition.  Ideal for defining data definitions through a Cube View, including Dynamic Calc results.
    • Across Cube Data Units: Extract Cube data to a BI Blend target table through defined Data Unit filters.
    • Stage Workflow Imports: Ability to leverage existing Stage Data.   Uses may be reporting on existing “attribute” records contained in Stage, or simply enhanced dashboard reporting on Stage data.
    • Source Systems / Data Warehouses: Performance oriented solution to connect to source system.

    Performance is gained through the BRAPI’s ability to parallel process.  For example, extracting data by Cube Data Unit will parallel process all the Data Units defined in the filter.  Second, the FDX BRAPI’s do not generate a “.CSV” file as do Data Management File “Export Data” or “Export File” processes.  The results of the export are managed during the BI Blend “in-memory” processing.

    In cases of very large data sets, which where multiple periods are loaded, the processing time can be slow because each period is reflected as a data record.  FDX BRAPI’s offer solutions to pivot the Time records to columns in order to create a matrix data layout.  The Datasource can associate each of the periods with an “Attribute Value” field within the Integration settings.  The design will treat each record as a collection of 12 periods when processing.

    • FdxExecuteCubeView: Extract data defined through a Cube View.  Any data presented in the Cube View will be extracted, such as Dynamic Calculated results.
    • FdxExectuteCubeViewTimePivot: Cube View Data will generate all time as Columns which can be assigned as Attribute Value members in the Data Source.
    • FdxExecuteDataUnit: Cube Data extract solution to extract data from Data Unit members.
    • FdxExecuteDataUnitTimePivot: Cube Data extract solution to extract data from Data Unit members. Generate all time as Columns, which can be assigned as Attribute Value members in the Data Source.
    • FdxExecuteStageTargetTimePivot: Extract existing Workflow’s Stage Data. Generate all time as Columns, which can be assigned as Attribute Value members in the Data Source.
    • FdxExecuteWarehouseTimePivot: Extract data from an external source system.
    • FdxGetCubeViewOrDataUnitColumnList: Connector BRAPI used to return field names.
    • FdxGetStageTargetColumnList: Connector BRAPI used to return field names.
    • FdxGetWarehouseColumnList: Connector BRAPI used to return field names.
  • To add to Peter's points, the FDX methods were specifically created to programmatically extract data from the appropriate OneStream tables or other source systems in the most efficient way.  The FDX connectors were primarily created in conjunction of the BI Blend process but can be used for any process that requires data to be extracted from OneStream  

    FDX, Fast Data Extract, BRAPIs allow a variety of options for connecting to DataSources for BI Blend.  A key differentiator between the FDX BRAPI’s and other collection methods is support of parallel processing, in memory processing, and management of the Time dimension.

    FDX BRAPI’s provide functionality to build Connectors to extract data from:

    • Cube Views: Extract data through a Cube View definition.  Ideal for defining data definitions through a Cube View, including Dynamic Calc results.
    • Across Cube Data Units: Extract Cube data to a BI Blend target table through defined Data Unit filters.
    • Stage Workflow Imports: Ability to leverage existing Stage Data.   Uses may be reporting on existing “attribute” records contained in Stage, or simply enhanced dashboard reporting on Stage data.
    • Source Systems / Data Warehouses: Performance oriented solution to connect to source system.

    Performance is gained through the BRAPI’s ability to parallel process.  For example, extracting data by Cube Data Unit will parallel process all the Data Units defined in the filter.  Second, the FDX BRAPI’s do not generate a “.CSV” file as do Data Management File “Export Data” or “Export File” processes.  The results of the export are managed during the BI Blend “in-memory” processing.

    In cases of very large data sets, which where multiple periods are loaded, the processing time can be slow because each period is reflected as a data record.  FDX BRAPI’s offer solutions to pivot the Time records to columns in order to create a matrix data layout.  The Datasource can associate each of the periods with an “Attribute Value” field within the Integration settings.  The design will treat each record as a collection of 12 periods when processing.

    • FdxExecuteCubeView: Extract data defined through a Cube View.  Any data presented in the Cube View will be extracted, such as Dynamic Calculated results.
    • FdxExectuteCubeViewTimePivot: Cube View Data will generate all time as Columns which can be assigned as Attribute Value members in the Data Source.
    • FdxExecuteDataUnit: Cube Data extract solution to extract data from Data Unit members.
    • FdxExecuteDataUnitTimePivot: Cube Data extract solution to extract data from Data Unit members. Generate all time as Columns, which can be assigned as Attribute Value members in the Data Source.
    • FdxExecuteStageTargetTimePivot: Extract existing Workflow’s Stage Data. Generate all time as Columns, which can be assigned as Attribute Value members in the Data Source.
    • FdxExecuteWarehouseTimePivot: Extract data from an external source system.
    • FdxGetCubeViewOrDataUnitColumnList: Connector BRAPI used to return field names.
    • FdxGetStageTargetColumnList: Connector BRAPI used to return field names.
    • FdxGetWarehouseColumnList: Connector BRAPI used to return field names.
    • PeterFu's avatar
      PeterFu
      Contributor II

      Thanks for the update Tony!

       

      Peter

    • dbeavon's avatar
      dbeavon
      Contributor

      Thanks Tony.

      Sorry to tag on another question.  One issue that came up when swapping over to the FDX for cubeviews is a concern about memory usage .  Is there any additional responsibility that our BR may have insofar as memory is concerned?  Will the owner of the BR be responsible for release memory used by the FDX, or possibly flushing a cache?  Sometimes .Net API's provide deliberate Dispose() methods for this purpose, if the release of memory needs to happen proactively.

      I was working with our pre-prod environment the other day (I was the only one in there), and was running a massive number of FDX operations.  Memory usage continually increased until it reached the capacity of the server and then we eventually needed to restart IIS since the dashboard adapters started failing and returning the word "null".  If this memory in IIS was related to legitimate "caching" then I would expect that memory usage would level out after repeatedly executing the same FDX-cubeviews.  We are talking about 16 GB of RAM on the server, and each cubeview would only conceivably use 1 GB at the very most.

      I am preparing to deploy my work to production and this will become someone else's problem. 😉  But I just want to make sure that I've done my own due diligence, to limit the need for technical support tickets with onestream in the future.  I'm pretty good at working with .Net memory dumps so if you can give me an idea about what particular FDX class instances should be the focus of my attention, then I should be able to do more to test for possible memory leaks. 

      We are using 6.5 in pre-prod and 6.3 in prod.

       

    • cons1's avatar
      cons1
      New Contributor III

      Hi Tony, 

      I recently modified my parameters to generate a range of dates (From and To parameters) instead of only inputting one month per extract. I just recently discovered that it requires a lot of memory as files are significant, and the process takes time and gets stuck. Is this always the case? That is why one of my client's setups only extracts one month at a time.  I wonder if this is okay in Production since they have more memory and I am just concern that it will mess up in prod once deployed. 

      • TonyToniTone's avatar
        TonyToniTone
        Contributor II

        Since you changed the parameters to go beyond one period of data, did you reconstruct the way the data table is configured by using one of the TimePivot options such as FdxExectuteCubeViewTimePivot?   So now the number of data records in the data table are reduced and each data record has a supporting column for values for the various periods.  That should help reduce the amount of memory needed since each data record is updating value in a column for the periods vs. creating an entirely new data record row in the table for each additional period.  

  • PeterFu's avatar
    PeterFu
    Contributor II

    Hi,

     

    This is not my area of expertise, but I found something from the AnalyticBlend Training that I have attended. See print screen below. It looks like the FDX are BRapi's that can run in paralell, making the performce better in this way.

     

     

     

    Peter

    • dbeavon's avatar
      dbeavon
      Contributor

      I noticed that as well.  I think that is part of it.

      But I think there is more to it than just parallelization.  I already am running parallel cubeviews via the REST api.  But even if I introduce the parallelization, I still don't get the same overall efficiency as those FDX methods. 

      Additionally the data from the FDX methods seems a bit more "low-level" or "raw" in nature.  It returns only member names (not descriptions), and for whatever reason it annotates the current/selected member of all dimensions (UD's) even when those dimensions do not appear in the definition of the cubeview rows or columns.

       

  • To address your questions, I would recommend submitting a ticket with Technical Support to go through your situation.  In general, you should not be responsible for clearing memory.  The garbage disposal of IIS and Memory Manager should be clearing out memory as the execution of the function is complete.  The assumption of a Cube View rendering not exceeding 1 GB is not entirely a correct assumption.  It all depends on the amount of data pulled into cache and then inserted into a data table.  Each data record in a database takes 3,200 bytes.  If you have 1 million data records from the result of the Cube View definitions, then you will use 3.2 GB of memory just for the data records not including the additional memory for the new object of a data table.  Every Cube View can be wildly different so I can't give you a definitive answer on any of this.  It would be a good idea to meet with Technical Support to run through all of this.  The memory not releasing could be a concern but we won't know unless Technical Support gets involved to troubleshoot.  OneStream has a lot of memory intensive processes.  Cube Views being one of them.  16 GB's on a server is ok for general development but not optimal for a Production environment.  16 GB's of memory can be consumed very quickly depending on the process, data volumes, number of large data tables in memory, etc.  

    • dbeavon's avatar
      dbeavon
      Contributor

      Thanks TonyToniTone 

      In the past the tech support team was not terribly interested in having our memory dumps.  Moreover, a memory dump of 16 GB that demonstrates a problem is pretty unwieldy, even after you are able to download it and open it with a tool like in WinDbg.  I was just going to watch counters, and take some smaller memory dumps after running a few cubeviews and triggering the garbage collection.

       

      Ideally it would be possible to confirm a memory problem before 3 or 5 GB.  If necessary I could work on creating an artificial repro with tiny cubeviews.  I just wanted to know what to look for, to determine whether things were "leaking" or just "caching".  (I know that this distinction can sometimes be based on lots of factors.)

      Based on your response, it sounds like you have no knowledge of FDX-specific memory issues.  I will hold off on assuming that there is a problem for now....  The only reason for my suspicion is based on the behavior I noticed, and the requirement for IIS to be reset every night.  Yes, our production servers do have quite a lot more RAM (32 or 64 I think).