I have some descriptions and tags for each of the token in descriptions. Tags specify the type of token. I want to extract the address out of the descriptions, that is: all tokens corresponding to , and . How can I achieve this in pyspark.
|description |tags|
+-------------------------------------------+--------------------------------------------------------------------------+
|"aci*credit one bank, n" |<vendor_name> <vendor_name> <vendor_name> <vendor_name> |
|odot dmv2u 503-9455400 or 06/30 |<vendor_name> <vendor_name> <phone_number> <state> <trans_date> |
|# 7-eleven 41066 5050 hunter rd ooltewah tn|<other> <vendor_name> <store_id> <street> <street> <street> <city> <state>|
Output I am looking for is:
NULL
OR
5050 hunter rd ooltewah tn
Anything which is not an address tag, should not be included.