importing a file with variable columns
We have been tasked with importing a file that has a large number of variable columns. For the sake of easy explanation, let's say the first five columns are standard (time, entity, ud1, ud2, ud3) but the file could have from 50 to 150 additional columns, one for each account. If there is no data for the account, there is no column for it. New accounts could appear in the future without warning.
No, we don't have the ability to change the format of the report. (Oh, how I wish.)
I have thought up several ways of making this work but each is fraught with its own type of peril.
- Create a new, custom table dynamically to stage the data. Parse the column names from the file. Use the column name list to run a new SQL query to unpivot.
- Parse the file in-memory to manually unpivot by parsing each data column and adding rows to a datatable, then returing the full data table.
- Maintain a list of the columns we care about the most, parse the file in advance and save the column name/position maps to parameters/a lookup table. Use up every possible attribute/value field in a data source to stage to BI Blend and try to unpivot from there. Hope they never need more "important" columns than OS can handle. (This is similar to option 1 but we're not stuck dropping/creating a custom table ourselves and we have more consistent column names.)
- Write a manual file-parser that creates a new, sane text file and then imports that instead. (Seems wasteful. If I can get it this far, I can probably just do it in-memory, ie, option 2.)
- Some other, better idea that I haven't thought of yet.
I would use approach 1 or 2. The deciding factor is the size of the file, number of records.
Since you know the first n columns are fixed to specific dimensions, and the remaining columns are specific to accounts, assuming column header = account name, this is pretty easy to pivot, something like this:Function PivotAccounts(inputTable As DataTable) As DataTable ' Prepare output table Dim outputTable As New DataTable() outputTable.Columns.Add("time", GetType(String)) outputTable.Columns.Add("entity", GetType(String)) outputTable.Columns.Add("ud1", GetType(String)) outputTable.Columns.Add("ud2", GetType(String)) outputTable.Columns.Add("ud3", GetType(String)) outputTable.Columns.Add("account", GetType(String)) outputTable.Columns.Add("amount", GetType(Decimal)) ' Validate column count If inputTable.Columns.Count < 7 Then Throw New InvalidOperationException("Input table must have at least 7 columns.") End If ' Fixed column indexes Dim fixedColCount As Integer = 5 Dim lastColIndex As Integer = inputTable.Columns.Count - 1 ' Iterate through each row For Each row As DataRow In inputTable.Rows ' Extract fixed values once per row Dim timeVal As String = row(0).ToString() Dim entityVal As String = row(1).ToString() Dim ud1Val As String = row(2).ToString() Dim ud2Val As String = row(3).ToString() Dim ud3Val As String = row(4).ToString() ' Iterate over the account columns in the current row For i As Integer = fixedColCount To lastColIndex - 1 Dim accountName As String = inputTable.Columns(i).ColumnName Dim rawValue As Object = row(i) If Not IsDBNull(rawValue) AndAlso Decimal.TryParse(rawValue.ToString(), Nothing) Then Dim amountVal As Decimal = Convert.ToDecimal(rawValue) ' create a record for each account's amount If amountVal <> 0D Then Dim outputRow As DataRow = outputTable.NewRow() outputRow("time") = timeVal outputRow("entity") = entityVal outputRow("ud1") = ud1Val outputRow("ud2") = ud2Val outputRow("ud3") = ud3Val outputRow("account") = accountName outputRow("amount") = amountVal outputTable.Rows.Add(outputRow) End If End If Next Next Return outputTable End Function