EN VI

How to grab two words separated by comma using regex in c#?

2024-03-11 22:00:12
How to grab two words separated by comma using regex in c#?

I have been working on creating a pattern using a regex that will capture a word then whitespace then another word like in the examples below:

string ToSearch = "Hello Steve steve, Martin martin, Scott scott to ASP.Net Program";
string ToSearch2 = "Hello Steve steve, Martin martin, Scott scott, Harry harry, Peter peter, Ron ron to ASP.Net Program";
string ToSearch3 = "Hello Steve steve, Martin martin, Scott scott, Harry harry, Peter peter, Ron ron, Will will, smith Smith to ASP.Net Program";
string ToSearch4 = "Hello Steve Harvey, Martin Lawrence, Scott Hardy, Harry Potter, Peter Pan, Ron Wesley, Will Smith, mr Smith to ASP.Net Program";

So, I wanted to grab for instance list of Matches like Steve Harvey, Martin Lawrence, etc. but I can't seem to use the right patterns.

Here are the patterns I used

//string pattern = @"[a-zA-Z]+(,[\s]+[\D]+)";
//string pattern = @"\b[a-zA-Z]+(?:,[\s]+[a-zA-Z]+)*\b"; // Working
//string pattern = @"\b[a-zA-Z]+[\s]+[a-zA-Z]+(?:[\s]+[a-zA-Z]+)*\b";
//string pattern = @"\b[a-zA-Z]+(?:[\s]+[a-zA-Z]+)*(?:,[\s]+[a-zA-Z]+(?:[\s]+[a-zA-Z]+)*)*\b";
//string pattern = @"\b[a-zA-Z]+(?:\s+[a-zA-Z]+)*(?:,\s*[a-zA-Z]+(?:\s+[a-zA-Z]+)*)*\b";
//string pattern = @"(?<!\w)(?:[A-Za-z]+(?: [A-Za-z]+)*)(?!\w)";
//string pattern = @"\b[a-zA-Z]+([\s]+[a-zA-Z]*(?:,[\s]+[a-zA-Z]+))+[\s]+[a-zA-Z]*\b";
//string pattern = @"\b[\D]+\s+[\D]*\b";
//string pattern = @"\b([A-Z][a-z]*)\s*([a-z]+)(?=\s*,|\s|$)"; // Best
//string pattern = @"\b([\D]*)\s*([\D]+)(?=\s*,|\s|$)"; // Best
//string pattern = @"\b(\w+)\s+\1\b";
//string pattern = @"\b(\w+)\s+\b(\1|\1\w+)";
//string pattern = @"\b(\w+)\b.*?(?i:\1)\b";
//string pattern = @"\b([A-Za-z]*\s*[A-Za-z])\s+([a-zA-Z]+)";
//string pattern = @"\b[A-Za-z]*\s*([A-Za-z]+)(?=\s*,|\s|$)";
//string pattern = @"\b([A-Za-z])+\s+([A-Za-z]+)(?=\s*,)"; // Almost
//string pattern = @"\b([A-Za-z]+)\s+([A-Za-z]+)(?=\s*,|\s*$)";
//string pattern = @"\b[a-zA-Z]+[\s]+[a-zA-Z]+(?:,[\s]+[a-zA-Z]+)*\b";
//string pattern = @"\b[a-zA-Z]+(?:,[\s]+[a-zA-Z]+)+(?:,[\s]+[a-zA-Z]+)*";
//string pattern = @"\b[a-zA-Z]+(?:,[\s]+[a-zA-Z])+(?:[\s]+[a-zA-Z]+)*\b";
//string pattern = @"\b[a-zA-Z]+(?:,[\s]+[0-9a-zA-Z])?(?:,[\s]+[a-zA-Z]+(?:,[\s]+[0-9a-zA-Z])+)*\b";

What am I doing wrong? Also, can you please provide me with a website/blog/video tutorial that provides a detailed explanation of using regex for c#?

skeleton code for regex:

Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
System.Diagnostics.Debug.WriteLine($"Regex Result: {regex.IsMatch(ToSearch)}");

Assert.IsTrue(regex.IsMatch(ToSearch));

foreach(Match m in regex.Matches(ToSearch)) 
{
    System.Diagnostics.Debug.WriteLine($"Regex Result: {m.Value}");
    System.Diagnostics.Debug.WriteLine($"Regex Result contains space: {m.Value.Contains(" ")}");
}

Edit 1:

Here are the results of the query from the pattern string pattern = @"(\w+\s+\w+(?=,))|((?<=,\s*\w+\s+\w+))";

Debug Trace:
Regex Result: True
Regex Result: Steve steve
Regex Result: Martin martin
Regex Result: 
Regex Result: 
Regex Result: 
Regex Result: 
Regex Result: 
Regex Result: 
Regex Result2: True
Regex Result2: Steve steve
Regex Result2: Martin martin
Regex Result2: 
Regex Result2: Scott scott
Regex Result2: 
Regex Result2: Harry harry
Regex Result2: 
Regex Result2: Peter peter
Regex Result2: 
Regex Result2: 
Regex Result2: 
Regex Result2: 
Regex Result3: True
Regex Result3: Steve steve
Regex Result3: Martin martin
Regex Result3: 
Regex Result3: Scott scott
Regex Result3: 
Regex Result3: Harry harry
Regex Result3: 
Regex Result3: Peter peter
Regex Result3: 
Regex Result3: Ron ron
Regex Result3: 
Regex Result3: Will will
Regex Result3: 
Regex Result3: 
Regex Result3: 
Regex Result3: 
Regex Result3: 
Regex Result3: 
Regex Result4: True
Regex Result4: Steve Harvey
Regex Result4: Martin Lawrence
Regex Result4: 
Regex Result4: Scott Hardy
Regex Result4: 
Regex Result4: Harry Potter
Regex Result4: 
Regex Result4: Peter Pan
Regex Result4: 
Regex Result4: Ron Wesley
Regex Result4: 
Regex Result4: Will Smith
Regex Result4: 
Regex Result4: 
Regex Result4: 
Regex Result4: 
Regex Result4: 
Regex Result4: 
Test assertions passed successfully.

Here is my code test:

        string ToSearch = "Hello Steve steve, Martin martin, Scott scott to ASP.Net Program";
        //string ToSearch = "Hello Steve steve, Martin martin, Scott scott, Harry harry, Peter peter, Ron ron to ASP.Net Ptogram";

        Regex regex = new Regex(pattern);
        System.Diagnostics.Debug.WriteLine($"Regex Result: {regex.IsMatch(ToSearch)}");

        Assert.IsTrue(regex.IsMatch(ToSearch));

        foreach(Match m in regex.Matches(ToSearch)) 
        {
            System.Diagnostics.Debug.WriteLine($"Regex Result: {m.Value}");
            //System.Diagnostics.Debug.WriteLine($"Regex Result contains space: {m.Value.Contains(" ")}");
        }

        string ToSearch2 = "Hello Steve steve, Martin martin, Scott scott, Harry harry, Peter peter, Ron ron to ASP.Net Program";

        System.Diagnostics.Debug.WriteLine($"Regex Result2: {regex.IsMatch(ToSearch2)}");

        Assert.IsTrue(regex.IsMatch(ToSearch2));

        foreach (Match m in regex.Matches(ToSearch2))
        {
            System.Diagnostics.Debug.WriteLine($"Regex Result2: {m.Value}");
            //System.Diagnostics.Debug.WriteLine($"Regex Result contains space2: {m.Value.Contains(" ")}");
        }

        string ToSearch3 = "Hello Steve steve, Martin martin, Scott scott, Harry harry, Peter peter, Ron ron, Will will, smith Smith to ASP.Net Program";

        System.Diagnostics.Debug.WriteLine($"Regex Result3: {regex.IsMatch(ToSearch3)}");

        Assert.IsTrue(regex.IsMatch(ToSearch3));

        foreach (Match m in regex.Matches(ToSearch3))
        {
            System.Diagnostics.Debug.WriteLine($"Regex Result3: {m.Value}");
            //System.Diagnostics.Debug.WriteLine($"Regex Result contains space3: {m.Value.Contains(" ")}");
        }

        string ToSearch4 = "Hello Steve Harvey, Martin Lawrence, Scott Hardy, Harry Potter, Peter Pan, Ron Wesley, Will Smith, mr Smith to ASP.Net Program";

        System.Diagnostics.Debug.WriteLine($"Regex Result4: {regex.IsMatch(ToSearch4)}");

        Assert.IsTrue(regex.IsMatch(ToSearch4));

        foreach (Match m in regex.Matches(ToSearch4))
        {
            System.Diagnostics.Debug.WriteLine($"Regex Result4: {m.Value}");
            //System.Diagnostics.Debug.WriteLine($"Regex Result contains space2: {m.Value.Contains(" ")}");
        }

        // Print a success message to the trace
        System.Diagnostics.Debug.WriteLine("Test assertions passed successfully.");

Solution:

The first two words are not preceded by a comma where as the last two words are not followed by a comma. Therefore, we have to distinguish 2 cases separated by OR (regex |):

\w+\s+\w+(?=,)|(?<=,\s*)\w+\s+\w+

We have

  • pos(?=exp) Match any position pos preceding a suffix exp as: \w+\s+\w+(?=,), matches two words followed by a comma.
  • (?<=exp)pos Match any position pos following a prefix exp as: (?<=,\s*)\w+\s+\w+, matches two words preceded by a comma and possibly white spaces.

Note that in the lookarounds pos(?=exp) and (?<=exp)pos only pos is matched and exp (i.e., the commas and leading spaces) will no be part of the result.

string[] examples = [
    "Hello Steve steve, Martin martin, Scott scott to ASP.Net Program",
    "Hello Steve steve, Martin martin, Scott scott, Harry harry, Peter peter, Ron ron to ASP.Net Program",
    "Hello Steve steve, Martin martin, Scott scott, Harry harry, Peter peter, Ron ron, Will will, smith Smith to ASP.Net Program",
    "Hello Steve Harvey, Martin Lawrence, Scott Hardy, Harry Potter, Peter Pan, Ron Wesley, Will Smith, mr Smith to ASP.Net Program"
];
var regex = new Regex(@"\w+\s+\w+(?=,)|(?<=,\s*)\w+\s+\w+");
foreach (string example in examples) {
    Console.WriteLine("==> " + example);
    foreach (Match match in regex.Matches(example)) {
        Console.WriteLine("    " + match.Value);
    }
}
Console.ReadKey();

prints:

==> Hello Steve steve, Martin martin, Scott scott to ASP.Net Program
    Steve steve
    Martin martin
    Scott scott
==> Hello Steve steve, Martin martin, Scott scott, Harry harry, Peter peter, Ron ron to ASP.Net Program
    Steve steve
    Martin martin
    Scott scott
    Harry harry
    Peter peter
    Ron ron
==> Hello Steve steve, Martin martin, Scott scott, Harry harry, Peter peter, Ron ron, Will will, smith Smith to ASP.Net Program
    Steve steve
    Martin martin
    Scott scott
    Harry harry
    Peter peter
    Ron ron
    Will will
    smith Smith
==> Hello Steve Harvey, Martin Lawrence, Scott Hardy, Harry Potter, Peter Pan, Ron Wesley, Will Smith, mr Smith to ASP.Net Program
    Steve Harvey
    Martin Lawrence
    Scott Hardy
    Harry Potter
    Peter Pan
    Ron Wesley
    Will Smith
    mr Smith
Answer

Login


Forgot Your Password?

Create Account


Lost your password? Please enter your email address. You will receive a link to create a new password.

Reset Password

Back to login