Forum Discussion

photon's avatar
photon
Contributor
24 hours ago
Solved

importing a file with variable columns

We have been tasked with importing a file that has a large number of variable columns. For the sake of easy explanation, let's say the first five columns are standard (time, entity, ud1, ud2, ud3) but the file could have from 50 to 150 additional columns, one for each account. If there is no data for the account, there is no column for it. New accounts could appear in the future without warning.

No, we don't have the ability to change the format of the report. (Oh, how I wish.)

I have thought up several ways of making this work but each is fraught with its own type of peril.

  1. Create a new, custom table dynamically to stage the data. Parse the column names from the file. Use the column name list to run a new SQL query to unpivot.
  2. Parse the file in-memory to manually unpivot by parsing each data column and adding rows to a datatable, then returing the full data table.
  3. Maintain a list of the columns we care about the most, parse the file in advance and save the column name/position maps to parameters/a lookup table. Use up every possible attribute/value field in a data source to stage to BI Blend and try to unpivot from there. Hope they never need more "important" columns than OS can handle. (This is similar to option 1 but we're not stuck dropping/creating a custom table ourselves and we have more consistent column names.)
  4. Write a manual file-parser that creates a new, sane text file and then imports that instead. (Seems wasteful. If I can get it this far, I can probably just do it in-memory, ie, option 2.)
  5. Some other, better idea that I haven't thought of yet.
  • I would use approach 1 or 2.  The deciding factor is the size of the file, number of records.

    Since you know the first n columns are fixed to specific dimensions, and the remaining columns are specific to accounts, assuming column header = account name, this is pretty easy to pivot, something like this:

    Function PivotAccounts(inputTable As DataTable) As DataTable
        ' Prepare output table
        Dim outputTable As New DataTable()
        outputTable.Columns.Add("time", GetType(String))
        outputTable.Columns.Add("entity", GetType(String))
        outputTable.Columns.Add("ud1", GetType(String))
        outputTable.Columns.Add("ud2", GetType(String))
        outputTable.Columns.Add("ud3", GetType(String))
        outputTable.Columns.Add("account", GetType(String))
        outputTable.Columns.Add("amount", GetType(Decimal))
    
        ' Validate column count
        If inputTable.Columns.Count < 7 Then
            Throw New InvalidOperationException("Input table must have at least 7 columns.")
        End If
    
        ' Fixed column indexes
        Dim fixedColCount As Integer = 5
        Dim lastColIndex As Integer = inputTable.Columns.Count - 1
    
        ' Iterate through each row
        For Each row As DataRow In inputTable.Rows
            ' Extract fixed values once per row
            Dim timeVal As String = row(0).ToString()
            Dim entityVal As String = row(1).ToString()
            Dim ud1Val As String = row(2).ToString()
            Dim ud2Val As String = row(3).ToString()
            Dim ud3Val As String = row(4).ToString()
    
            ' Iterate over the account columns in the current row
            For i As Integer = fixedColCount To lastColIndex - 1
                Dim accountName As String = inputTable.Columns(i).ColumnName
                Dim rawValue As Object = row(i)
    
                If Not IsDBNull(rawValue) AndAlso Decimal.TryParse(rawValue.ToString(), Nothing) Then
                    Dim amountVal As Decimal = Convert.ToDecimal(rawValue)
    
                    ' create a record for each account's amount
                    If amountVal <> 0D Then
                        Dim outputRow As DataRow = outputTable.NewRow()
                        outputRow("time") = timeVal
                        outputRow("entity") = entityVal
                        outputRow("ud1") = ud1Val
                        outputRow("ud2") = ud2Val
                        outputRow("ud3") = ud3Val
                        outputRow("account") = accountName
                        outputRow("amount") = amountVal
                        outputTable.Rows.Add(outputRow)
                    End If
                End If
            Next
        Next
    
        Return outputTable
    End Function

     

2 Replies

  • RobbSalzmann's avatar
    RobbSalzmann
    Valued Contributor II

    I would use approach 1 or 2.  The deciding factor is the size of the file, number of records.

    Since you know the first n columns are fixed to specific dimensions, and the remaining columns are specific to accounts, assuming column header = account name, this is pretty easy to pivot, something like this:

    Function PivotAccounts(inputTable As DataTable) As DataTable
        ' Prepare output table
        Dim outputTable As New DataTable()
        outputTable.Columns.Add("time", GetType(String))
        outputTable.Columns.Add("entity", GetType(String))
        outputTable.Columns.Add("ud1", GetType(String))
        outputTable.Columns.Add("ud2", GetType(String))
        outputTable.Columns.Add("ud3", GetType(String))
        outputTable.Columns.Add("account", GetType(String))
        outputTable.Columns.Add("amount", GetType(Decimal))
    
        ' Validate column count
        If inputTable.Columns.Count < 7 Then
            Throw New InvalidOperationException("Input table must have at least 7 columns.")
        End If
    
        ' Fixed column indexes
        Dim fixedColCount As Integer = 5
        Dim lastColIndex As Integer = inputTable.Columns.Count - 1
    
        ' Iterate through each row
        For Each row As DataRow In inputTable.Rows
            ' Extract fixed values once per row
            Dim timeVal As String = row(0).ToString()
            Dim entityVal As String = row(1).ToString()
            Dim ud1Val As String = row(2).ToString()
            Dim ud2Val As String = row(3).ToString()
            Dim ud3Val As String = row(4).ToString()
    
            ' Iterate over the account columns in the current row
            For i As Integer = fixedColCount To lastColIndex - 1
                Dim accountName As String = inputTable.Columns(i).ColumnName
                Dim rawValue As Object = row(i)
    
                If Not IsDBNull(rawValue) AndAlso Decimal.TryParse(rawValue.ToString(), Nothing) Then
                    Dim amountVal As Decimal = Convert.ToDecimal(rawValue)
    
                    ' create a record for each account's amount
                    If amountVal <> 0D Then
                        Dim outputRow As DataRow = outputTable.NewRow()
                        outputRow("time") = timeVal
                        outputRow("entity") = entityVal
                        outputRow("ud1") = ud1Val
                        outputRow("ud2") = ud2Val
                        outputRow("ud3") = ud3Val
                        outputRow("account") = accountName
                        outputRow("amount") = amountVal
                        outputTable.Rows.Add(outputRow)
                    End If
                End If
            Next
        Next
    
        Return outputTable
    End Function

     

    • photon's avatar
      photon
      Contributor

      Yep, this is basically what I had in mind and the solution I was leaning towards but I guess I was secretly hoping for #5 and that there was some "yes, there's actually a way to handle this already" that would lead to a nice, supportable solution.

      Thanks for the pseudocode though! There are a couple little syntax things you use in there that I assume are better practice than some of my own. Always learning.