EN VI

Python - Searching and catching dictionary values at the txt files?

2024-03-12 18:00:09
How to Python - Searching and catching dictionary values at the txt files

I am stuck with extracting spesific data from txt file.

I have a txt file which includes some infomation.

E.g.

Company Name GmbH, Teststraße 24 , 01000 Sampleort Customer Nr. 11111111 Invoice Nr. 22222

Invoice Adress Company Name 2 mbH, Test2straße 11, 01001 Sample2ort Order number. 555555 Order Date 01.01.1999

So, I have different structures of like above information. Some files include with Invoice Nr., some files include Inv. Number: 44444. I want to catch all of them. I think, I can catch all of these informations with creating dictionary like:

`values_dict= {'Customer Number':'customer nr.', 'customer number', 
'cus. nr.', ... , 'Order Number':'order number', 'Order nr' ,....}`

And, how can I catch spesific values from txt files with using that dict?

I am expecting output like:

Order Number : 555555 Customer Number: 11111111 Invoice Number: 22222 Order Date: 01.01.1999

Company Information: Company Name GmbH, Teststraße 24 , 01000 Sampleort Invoice Company Information: Company Name 2 mbH, Test2straße 11, 01001 Sample2ort

Solution:

The re module in Python provides support for regular expressions, allowing you to search, match, or split strings based on specified patterns. Python re module

import re

# Your dictionary of keys with their possible variations
values_dict = {
    'Customer Number': ['Customer Nr.', 'customer number', 'cus. nr.', 'Customer Number'],
    'Order Number': ['Order number.', 'Order nr', 'order number', 'Order Number'],
    'Invoice Number': ['Invoice Nr.', 'Inv. Number:', 'invoice number', 'Invoice Number'],
    'Order Date': ['Order Date'],
    'Company Information': ['Company Name'],
    'Invoice Company Information': ['Invoice Adress Company Name'] # Add more variations as needed
}

# Function to extract information based on the dictionary
def extract_information(text, values_dict):
    results = {}

    for key, variations in values_dict.items():
        for variation in variations:
            pattern = rf"{variation}[:]?[\s]*(.*)"
            match = re.search(pattern, text, re.IGNORECASE)
            if match:
                results[key] = match.group(1).strip()
                break 

    return results

text = """Company Name GmbH, Teststraße 24 , 01000 Sampleort 
Customer Nr. 11111111 
Invoice Nr. 22222

Invoice Adress Company Name 2 mbH, Test2straße 11, 01001 Sample2ort
Order number. 555555 
Order Date 01.01.1999"""

info = extract_information(text, values_dict)

for key, value in info.items():
    print(f"{key}: {value}")

Answer

Login


Forgot Your Password?

Create Account


Lost your password? Please enter your email address. You will receive a link to create a new password.

Reset Password

Back to login