EN VI

Powershell - In power shell, I'm trying to convert some plain text input to a csv format?

2024-03-13 01:30:07
How to Powershell - In power shell, I'm trying to convert some plain text input to a csv format

I'm trying to create a csv file with data taken from a PDF file. Why? The bank has changed their web page and export of statements is in PDF only. My finance application requires the data in the form of a CSV file. Because of security, I do not want to use any cloud service where I upload my pdf and they will convert it to a csv. In PowerShell, I can get a text file as shown below after I cut/paste data into a text file:

The header should look like: "Date","Description"."Money out"."Money in"."Borrowings" My current input file (called vertical file) looks like below:

Date,
Description,
Money out,
Money in,
Borrowings,
01 Feb 2024,
CDN Bank 1,
1601.87,
3490.43 CR,
01 Feb 2024,
Life Ins,
372.23,
3118.20 CR,
01 Feb 2024,
Interac e-Transfer Receive,
250.00,
3368.20 CR,
01 Feb 2024,
Water company,
33.84,
3334.36 CR,

I need it to look like: Note: The date 01 Feb 2024 should be the start of each line.

"Date","Description","Money out","Money in","Borrowings"
"01 Feb 2024","CDN Bank 1","1601.87","3490.43 CR"
"01 Feb 2024","Life Ins"."372.23","3118.20 CR"
"01 Feb 2024","Interac e-Transfer Receive","250.00","3368.20 CR"
"01 Feb 2024","Water company","33.84","3334.36 CR"

I copy/pasted data from the PDF file into a Notepad++ file and ran the following code to delete data I did not need. The code to delete data I did not need is: Where-Object { $_ -notmatch 'James|CDNBnk|number|continued' This is to get rid of any line that contains James,CDNBnk,Number and continued as that data is not relevant to the data I need. And Yes, I need to figure out an extra blank line in each row because there is always a blank beneath money in OR money out. So, you are correct. If the header has 5 elements then so should the data.

 ``# Specify the input and output file paths
 $inputFile = "C:\Users\J\Downloads\input\VerticalFile.txt"
 $outputFile = "C:\Users\J\Downloads\Output\File.txt"

# Read the content of the input file
$lines = Get-Content $inputFile

# Append a comma to the end of each line and filter out the unwanted lines
$modifiedLines = $lines | ForEach-Object { $_ -replace '[,$]', '' } | ForEach-Object { $_ + ","    
} | Where-Object { $_ -notmatch 'James|CDNBnk|number|continued' }

# Write the modified content to the output file
$modifiedLines | Set-Content -Path $outputFile

# Display a message indicating successful completion
Write-Host "Commas added to each line and unwanted lines removed. Output written to 
$outputFile".`
`

The results are shown as the vertical file above.

The second read of the file needed to create each line that started with a data (01 Feb 2024)

The code I tried and fails with: "Exception calling "Substring" with "2" argument(s): "Index and length must refer to a location within the string."

``# Specify the input and output file paths
$inputFile = "C:\Users\J\Downloads\Output\Filea.txt"
$outputFile = "C:\Users\J\Downloads\Output\File2.txt"

# Read the content of the input file
$lines = Get-Content $inputFile

# Initialize an empty array to store the combined lines and create the header
$combinedLines = @("")

# Define an array of shortform months
$months = "Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"

# Loop through each line in the input file
foreach ($line in $lines) {
# Split the line by comma and store the fields in an array
$fields = $line -split ","
# Check if the first field contains a shortform month
if ($months -contains $fields[0].Substring(0,5)) {
# If yes, add the line to the combined lines array
$combinedLines += $line
} else {
# If not, append the line to the last element of the combined lines array
$combinedLines[-1] += $line
 }
}


# Write the combined lines to the output file
$combinedLines | Set-Content -Path $outputFile

# Display a message indicating successful completion
Write-Host "Lines combined by date. Output written to $outputFile"` 

The output of the above code is $combinedLines which is written to the Outputfile:

PS C:\WINDOWS\system32> $combinedLines Description,Money out,Money in,Borrowings,01 Feb 2024,CDN Bank 1,1601.87,3490.43 CR,01 Feb 2024,Life Ins,372.23,3118.20 CR,01 Feb 2024,Interac e-Transfer Receive,250.00,3368.20 CR,01 Feb 2024,Water company,33.84,3334.36 CR,

Solution:

Try following

$inputFilename = 'c:\temp\test.txt'
$outputFilename = 'c:\temp\test.csv'
$headerRowsCount = 5
$dataRows = 4
$reader = [System.IO.StreamReader]::New($inputFilename)

$headerRows = @()
$rowCount = 0;
$table = [System.Collections.ArrayList]::new()
while(($line = $reader.ReadLine()) -ne $null)
{
   if($rowCount -lt $headerRowsCount)
   {
      $headerRows += $line.Trim(@(" ", ","))
   }
   else `
   {
      $index = $itemCount % $dataRows
      if($rowCount -eq $headerRowsCount) {$itemCount = 0}
      if(($index) -eq 0) {$newRow = [pscustomobject]@{} }
      $newRow | Add-Member -NotePropertyName $headerRows[$index] -NotePropertyValue $line.Trim(@(" ", ","))

      $itemCount++
      if($index -eq 0) {$table.Add($newRow) | out-null}
   }
   $rowCount++
}
$table

$table | Export-Csv -Path $outputFilename -NoTypeInformation
Answer

Login


Forgot Your Password?

Create Account


Lost your password? Please enter your email address. You will receive a link to create a new password.

Reset Password

Back to login