Forum Discussion

fc's avatar
fc
New Contributor III
5 months ago

Import data using multiple workflow profles

Hi everyone,

I have an issue in understanding how to implement multiple import steps within a workflow.

My client will load data through a classing import step throughout all the months of the year. In addition to that, they asked to set up 3 additional import steps (P13, P14, P15) to be used in December that will serve as "adjustments" to the previous data loaded on the cube.

At each import step, however, the data will be loaded in full (with eventual adjustments) and should overwrite the data loaded through previous import steps (and this is the reason why we don't use forms but we use multiple import steps).

For example, when P13 is loaded, it should overwrite the data loaded through the Import step, and when P14 is loaded it should overwrite the data loaded through P13, and so on.

The problem now is that since we are using a different workflow profile at each step, the system does not automatically overwrite the data loaded through the previous workflow profile, and therefore the data is just added on top of the data that was loaded in previous steps.

Does anyone knows whether there is a way to overwrite data from previous loads when a new import step is used?

An idea I thought about was to delete the data on the cube through sql while executing the BR Connector rule but I don't know what tables in the db hold the data that I want to overwrite.

 

Thanks in advance for any help, and if something is not clear please ask, I'll be happy to clarify!

  • Did you try setting the overlapping siblings to False and then processing the workflows all the way through again?  The behavior you describe is what I thought happens when that setting is True, in that it joins the sibling import profiles together when loading to the cube, regardless of from which load you are executing.  

    I thought setting that to False would then mean that each load would be treated individually and not as a join, so the second import would replace the first import to cube.

  • T_Kress's avatar
    T_Kress
    Contributor III

    I believe the "Replace" option upon the "Load and Transform" step is based off source ID.  If you define both loads with the same source ID at the data source level, the "Replace" may work.  You could test.  See here:

     

     

     

  • fc's avatar
    fc
    New Contributor III

    Hi T_Kress,

    thank you for the answer.

    There are still things that are not clear to me. 

    The DataSource that I use is the same for all the 4 load steps. Its settings are the following:

    Since I use the same DataSource for all the load steps, I would expect them to have the same SourceID, or am I wrong?
    I struggle to understand where is this SouceID defined and what's its purpose.

    Thanks again for the help,

    • T_Kress's avatar
      T_Kress
      Contributor III

      Yes, the source ID is defined at the data source level so if they are all using the same data source then that is not the issue.

      You could try setting the "Can Load Overlapping Siblings" to False.  If these PX imports are the only one under a particular parent, setting this to false may work for you.  Again, test.

      Lastly, you could consider using WF channels to control what is or is not cleared with each import load.

       

  • fc's avatar
    fc
    New Contributor III

    Hi T_Kress,

    I run some tests and I managed to isolate the issue.

    It seems like the data within the cube is correctly overwritten when new data is loaded through the next input step. What happens instead is that when the Import step is executed:

     

    the data is fed to the StageSourceData table:

    then the user proceeds with loading the data in the cube.

    When the Import step of the subsequent workflow profile is executed (P13), again the data is loaded on the StageSourceData, but with a different wfKey. When the user loads the data into the cube, the system not only loads the data related to the current wf profile (P13), but also the data related to the previous Import step, therefore duplicating the data that is loaded in the cube. So it basically joins the staging data loaded through different wf profiles and loads them as if it all came from the current wf profile.

    A potential solution could be to create a rule to delete the data from the StageSourceData table that was imported through previous wf profiles as soon as a new one is loaded (so when P13 is loaded, we automatically delete the data imported through the previous Import step). This approach makes sure the correct data is loaded on the cube, but it makes the previous loads unauditable, which is a prerequisite for the client.

     Do you have any suggestion to solve this issue?
    Thank you again for the help!

    • T_Kress's avatar
      T_Kress
      Contributor III

      Did you try setting the overlapping siblings to False and then processing the workflows all the way through again?  The behavior you describe is what I thought happens when that setting is True, in that it joins the sibling import profiles together when loading to the cube, regardless of from which load you are executing.  

      I thought setting that to False would then mean that each load would be treated individually and not as a join, so the second import would replace the first import to cube.

  • fc's avatar
    fc
    New Contributor III

    that works as expected, thanks a lot for the help!