EN VI

Python - Using Beautifulsoup to parse HTML - Print works but Return does not?

2024-03-11 22:00:05
How to Python - Using Beautifulsoup to parse HTML - Print works but Return does not

Why does print() return all the text under these tags but return does not?

This is the function I am using-

def parse_html(data):
    ls = []
    htmlParse = BeautifulSoup(data, 'html.parser')
    for para in htmlParse.find_all(['script', 'head', 'title', 'meta', '[document]', 'p', 'body', 'a', "form", "input", "button", "style"]): 
        ls.append(para.text.strip())
        return ls
Text = '<!DOCTYPE html><html><head>    <meta charset="utf-8">    <meta http-equiv="X-UA-Compatible" content="IE=edge">    <meta name="viewport" content="width=device-width, initial-scale=1">    <title>FlexPortalen - Log ind</title>    <link rel="stylesheet" href="/Content/bootstrap.css" />    <link rel="stylesheet" href="/Content/bootstrap-theme.min.css" />    <link rel="stylesheet" href="/login.css" />    <!--[if lt IE 9]>      <script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script>      <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>    <![endif]--></head><body>    <div class="container">        <div class="login-box">            <form method="post">                <input name="__RequestVerificationToken" type="hidden" value="w4YgqRKtcaPFQn6ncaavNgPVb5rLp0CtbylMJ3zYYa2fTGoAfkJ97araAO5i4Nbwo0wERIboCQssguo0UviOaM3HvECpjfuokKcq4rt_ADM1" />                <h2 class="text-center login-heading">FlexPortalen</h2>                <div class="form-group">                    <input type="text" class="form-control input-lg" name="username" id="username" placeholder="Brugernavn...    " />                </div>                <div class="form-group">                        <input type="password" class="form-control input-lg" name="password" id="password" placeholder="Adgangskode..." />                </div>                <div class="checkbox text-center">                    <label>                        <input type="checkbox" name="rememberMe" id="rememberMe"  /> Husk mig?                    </label>                </div>                                <p class="text-center">                    <button type="submit" class="btn btn-primary btn-lg">Log ind</button>                </p>            </form>        </div>    </div>    <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.12.4/jquery.min.js"></script>    <script src="/Scripts/bootstrap.min.js"></script></body></html>'

If I print, it gives:

FlexPortalen - Log ind
FlexPortalen  
Husk mig?                 
Log ind

But when I return, it gives only:

['FlexPortalen - Log ind']

Solution:

Check the indent of your return - To return the list with all information put it outside the for loop, else it would return ls with first iteration:

def parse_html(data):
    ls = []
    htmlParse = BeautifulSoup(data, 'html.parser')
    for para in htmlParse.find_all(['script', 'head', 'title', 'meta', '[document]', 'p', 'body', 'a', "form", "input", "button", "style"]): 
        ls.append(para.text.strip())
    return ls
Answer

Login


Forgot Your Password?

Create Account


Lost your password? Please enter your email address. You will receive a link to create a new password.

Reset Password

Back to login