EN VI

Python - read_html without Beautifulsoup?

2024-03-12 05:30:04
How to Python - read_html without Beautifulsoup

based on my previous question : optimal web parsing

i have got nice solution, but let us suppose that i am explaining this part to my students from economic diriection, they dont know too much about web parsing, so i have decided to use directly read_html and got folloing result :

data =pd.DataFrame(pd.read_html("https://www.geostat.ge/ka/modules/categories/26/samomkhmareblo-fasebis-indeksi-inflatsia",encoding='utf-8')[0])
# data.drop(0,axis=0,inplace=True)
# data =data.droplevel(level=0,axis=1)
print(data)


                                        0       1   ...      11      12
0                                        NaN  2012.0  ...  2022.0  2023.0
1  საშუალო წლიური წინა წლის საშუალო წლიურთან    99.1  ...   111.9   102.5
2            დეკემბერი წინა წლის დეკემბერთან    98.6  ...   109.8   100.4

as you see there is additional level on the top of year, how can i handle this case?

Solution:

Try:

import pandas as pd

data = pd.DataFrame(
    pd.read_html(
        "https://www.geostat.ge/ka/modules/categories/26/samomkhmareblo-fasebis-indeksi-inflatsia",
        encoding="utf-8",
    )[0]
)

header = data.loc[[0], :]  # first row of the dataframe will be the new header
data = data.loc[1:, :]  # the rest will be new data

data.columns = ["", *header.loc[0, 1:].astype(int)]  # rename columns (first column will be empty string, the rest integers (years))

print(data)

Prints:

                                              2012   2013   2014   2015   2016   2017   2018   2019   2020   2021   2022   2023
1  საშუალო წლიური წინა წლის საშუალო წლიურთან  99.1   99.5  103.1  104.0  102.1  106.0  102.6  104.9  105.2  109.6  111.9  102.5
2            დეკემბერი წინა წლის დეკემბერთან  98.6  102.4  102.0  104.9  101.8  106.7  101.5  107.0  102.4  113.9  109.8  100.4
Answer

Login


Forgot Your Password?

Create Account


Lost your password? Please enter your email address. You will receive a link to create a new password.

Reset Password

Back to login