Problem with Harvest Directory

YahyaOS
New Contributor II

Hi everyone,

I'm having a problem with the Harvest directory. Indeed, I have two batches running at the same time and which retrieve files from an SFTP to put them in the Harvest directory in order to load them into the WF corresponding to each file, for this I use this line coded :
Dim batchInfo As WorkflowBatchFileCollection = BRApi.Utilities.ExecuteFileHarvestBatch(si, fixedScenario, systemTime, valTransform, valIntersect, loadCube, processCube, confirm, autoCertify, False)

But with this line I do not filter the files present in Harvest.

Is there a way to filter by file name inside Harvest which seems to be common and mandatory?

Thank you for your help

5 REPLIES 5

RobbSalzmann
Valued Contributor

My understanding of batch processing is that all files available will be processed by the batch.  The way to "filter" is to put the filter upstream of this (in your case, the SFTP process), allowing only the files you wish to process to land in the harvest folder.

RobbSalzmann_0-1701102545384.png

 

thank you for your response. 

my problem is that I have two "data management sequences" which are executed at the same time and which each recovers a type of file from an SFTP.
we manage to filter by file name upstream before the files arrive in Harvest. but once the two Data Management are launched and each recovers its files, they all end up in the Harvest which means that sometimes one of the files of one of the tasks is processed by the other and vice versa.

And I need a solution to avoid that.

Thank you in advance

Right now there is no way to have specific batch executions harvesting from different folders. It’d be great to have an additional parameter for harvest sub folder so this is a good one for ideastream.

in the meantime, the workaround I can think of is to have sempahore solution:

- have different batch subfolders, one for each system (you can download to other parent folder like contents) 

- have 2 DM seqs, one per system 

- have one ER which performs semaphore logic

1) get current DM sequence name

2) check if the other DM sequence task is running 

3) if it is, skip execution (red light)

4) if it is not (green light), move files from subfolder of the current source system

5) execute harvest 

in this way you make sure each sequence runs its file. If you want to have more parallelism, you could move the batch harvest execution to a ER in a separate DM and have the semaphore EX executing the DM with Queue api method.

You can also have a single DM job getting system as parameter and updating the description of the task to include the system. You can do this with API

YahyaOS
New Contributor II

Thank you for your precious help.

Do you mean creation of two subfolder inside Harvest, right ?

You don't think that steps 2 and 3 will impact the performance of the baches ?

Step 4 : move files to the parent folder (Harvest), right ?

Best regards,

Yahya

Hi YahyaOS,
Creating subfolders is unnecessary.  Just manage the SFTP process so that it only moves the files used by the current batch process.

Run your processes synchronously, only allow one process to run at a time.  

If you can't manage the processes centrally, then use Francisco's semaphore method to give any process awareness of another running process.  A semaphore is a flag or indication to other processes.  In this case it is a flag that lets another batch that wants to start that another batch is running and to wait until that batch is done.

This can be done by having the running process create a file somewhere and any other process that starts checks first for the existence of that file.  It if exists, it goes into a loop to keep checking for the existance of the file.  When the file disappears, the polling process creates the same file and then runs its SFTP and batch.  When it finishes, it deletes the semaphore file it created.

Please sign in! YahyaOS