EN VI

Python - How to scrape and store multiple tables from a wiki page?

2024-03-11 07:00:04
Python - How to scrape and store multiple tables from a wiki page?

I am attempting to pull data from three specific tables on Survivor wiki pages. Mostly the conestant, season summary, and voting history tables. I can get it to work just fine for the contestant table, but it tells me it cannot find a table for the season summary or voting history tables. My end goal is to combine all of them into one dataframe for cleaning and manipulating.

My code that works for the contestant table but not the others looks like this:

import pandas as pd

list_of_seasons = ['41', '42', '43', '44', '45', '46']
season_start = 41
contestants = {}
season_summary = {}
voting_history = {}

for i in list_of_seasons :
    contestants[i] = pd.read_html('https://en.wikipedia.org/wiki/Survivor_' + str(season_start), match='contestants')
    season_summary[i] = pd.read_html('https://en.wikipedia.org/wiki/Survivor_' + str(season_start), match='season summary')
    voting_history[i] = pd.read_html('https://en.wikipedia.org/wiki/Survivor_' + str(season_start), match='voting history')
    season_start = season_start + 1

print(contestants['45'])
print(season_summary['45'])
print(voting_history['45'])

And the error i get is:

Traceback (most recent call last):
  File "c:\Users\bsjes\Documents\Code\Personal Projects\Survivor Data Grabber\SurvivorWikiRipper_0.2.py", line 13, in <module>
    season_summary[i] = pd.read_html('https://en.wikipedia.org/wiki/Survivor_' + str(season_start), match='season summary')        
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^        
  File "C:\Users\bsjes\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\io\html.py", line 1246, in read_html       
    return _parse(
           ^^^^^^^
  File "C:\Users\bsjes\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\io\html.py", line 1009, in _parse
    raise retained
  File "C:\Users\bsjes\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\io\html.py", line 989, in _parse
    tables = p.parse_tables()
             ^^^^^^^^^^^^^^^^
  File "C:\Users\bsjes\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\io\html.py", line 249, in parse_tables     
    tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\bsjes\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\io\html.py", line 622, in _parse_tables    
    raise ValueError(f"No tables found matching pattern {repr(match.pattern)}")
ValueError: No tables found matching pattern 'season summary'

What should I be doing differently? Do i need to learn a different package instead?

Solution:

Looking at the wiki pages the tables are positioned on the same index (for example "contestants" table is second <table>, season summary third etc.)

You can try:

import pandas as pd

contestants = {}
season_summary = {}
voting_history = {}

for season_start in range(41, 47):
    u = f"https://en.wikipedia.org/wiki/Survivor_{season_start}"

    tables = pd.read_html(u)
    contestants[season_start] = tables[1]
    season_summary[season_start] = tables[2]
    voting_history[season_start] = tables[4]

print(contestants[45])
print(season_summary[45])
print(voting_history[45])

Prints:

                       Contestant Age                          From    Tribe                                                      Finish        
                       Contestant Age                          From Original Switched     None    Merged                       Placement     Day
0                     Hannah Rose  33           Baltimore, Maryland     Lulu      NaN      NaN       NaN                   1st voted out   Day 3
1                  Brandon Donlon  26      Sicklerville, New Jersey     Lulu      NaN      NaN       NaN                   2nd voted out   Day 5
2               Sabiyah Broderick  28  Jacksonville, North Carolina     Lulu      NaN      NaN       NaN                   3rd voted out   Day 7
3                    Sean Edwards  35                   Provo, Utah     Lulu     Reba      NaN       NaN                   4th voted out   Day 9
4          Brandon "Brando" Meyer  23           Seattle, Washington     Belo     Belo      NaN       NaN                   5th voted out  Day 11
5   Janani "J. Maya" Krishnan-Jha  24       Los Angeles, California     Reba     Reba  None[a]       NaN                   6th voted out  Day 13
6           Nicholas "Sifu" Alsup  30            O'Fallon, Illinois     Reba     Reba  None[a]  Dakuwaqa                   7th voted out  Day 14
7                 Kaleb Gebrewold  29   Vancouver, British Columbia     Lulu     Lulu  None[a]  Dakuwaqa   8th voted out 1st jury member  Day 14
8               Kellie Nalbandian  29       New York City, New York     Belo     Lulu  None[a]  Dakuwaqa   9th voted out 2nd jury member  Day 16
9                Kendra McQuarrie  30   Steamboat Springs, Colorado     Belo     Belo  None[a]  Dakuwaqa  10th voted out 3rd jury member  Day 17
10    Bruce Perreault Survivor 44  47         Warwick, Rhode Island     Belo     Lulu  None[a]  Dakuwaqa  11th voted out 4th jury member  Day 19
11                  Emily Flippen  28              Laurel, Maryland     Lulu     Belo  None[a]  Dakuwaqa  12th voted out 5th jury member  Day 21
12                    Drew Basile  23    Philadelphia, Pennsylvania     Reba     Belo  None[a]  Dakuwaqa  13th voted out 6th jury member  Day 23
13                    Julie Alley  49          Brentwood, Tennessee     Reba     Reba  None[a]  Dakuwaqa  14th voted out 7th jury member  Day 24
14                  Katurah Topps  35            Brooklyn, New York     Belo     Lulu  None[a]  Dakuwaqa      Eliminated 8th jury member  Day 25
15                    Jake O'Kane  26         Boston, Massachusetts     Belo     Lulu  None[a]  Dakuwaqa                   2nd runner-up  Day 26
16                 Austin Li Coon  26             Chicago, Illinois     Reba     Belo  None[a]  Dakuwaqa                       Runner-up  Day 26
17                 Dee Valladares  26                Miami, Florida     Reba     Reba  None[a]  Dakuwaqa                   Sole Survivor  Day 26
   Episode                                                                                                        Challenge winner(s)                                                                    Eliminated         
       No.                               Title            Air date                                                             Reward                                                           Immunity      Tribe   Player
0        1             "We Can Do Hard Things"  September 27, 2023                                                               Reba                                                               Belo       Lulu   Hannah
1        1             "We Can Do Hard Things"  September 27, 2023                                                               Reba                                                               Reba       Lulu   Hannah
2        2  "Brought a Bazooka to a Tea Party"     October 4, 2023                                                               Reba                                                               Reba       Lulu  Brandon
3        2  "Brought a Bazooka to a Tea Party"     October 4, 2023                                                               Belo                                                               Belo       Lulu  Brandon
4        3                "No Man Left Behind"    October 11, 2023                                                               Lulu                                                               Reba       Lulu  Sabiyah
5        3                "No Man Left Behind"    October 11, 2023                                                               Reba                                                               Belo       Lulu  Sabiyah
6        4                  "Music to My Ears"    October 18, 2023                                                                NaN                                                               Lulu       Reba     Sean
7        4                  "Music to My Ears"    October 18, 2023                                                                NaN                                                               Belo       Reba     Sean
8        5       "I Don't Want to Be the Worm"    October 25, 2023                                                               Reba                                                               Reba       Belo   Brando
9        5       "I Don't Want to Be the Worm"    October 25, 2023                                                               Lulu                                                               Lulu       Belo   Brando
10       6  "I'm Not Batman, I'm the Canadian"    November 1, 2023  Austin, Bruce, Drew, Julie, Kendra, Sifu [Katurah] (Blue Team)[a]  Austin, Bruce, Drew, Julie, Kendra, Sifu [Katurah] (Blue Team)[a]        NaN  J. Maya
11       7             "The Thorn in My Thumb"    November 8, 2023            Dee [Austin, Jake, Julie, Kaleb, Katurah] (Red Team)[b]                                                 Kellie (Blue Team)   Dakuwaqa     Sifu
12       7             "The Thorn in My Thumb"    November 8, 2023            Dee [Austin, Jake, Julie, Kaleb, Katurah] (Red Team)[b]                                                     Dee (Red Team)   Dakuwaqa    Kaleb
13       8   "Following a Dead Horse to Water"   November 15, 2023                                                   Survivor Auction                                                              Bruce   Dakuwaqa   Kellie
14       9                 "Sword of Damocles"   November 22, 2023                                 Bruce, Julie, Kendra (Yellow Team)                                                              Bruce   Dakuwaqa   Kendra
15      10             "How Am I the Mobster?"   November 29, 2023                                        Emily [Dee, Julie, Katurah]                                                             Austin   Dakuwaqa    Bruce
16      11     "This Game Rips Your Heart Out"    December 6, 2023                                             Drew [Austin, Jake][c]                                                               Drew   Dakuwaqa    Emily
17      12  "The Ex-Girlfriend at the Wedding"   December 13, 2023                                              Austin [Dee, Katurah]                                                                Dee   Dakuwaqa     Drew
18      13         "Living the Survivor Dream"   December 20, 2023                                                   Austin [Jake][d]                                                             Austin   Dakuwaqa    Julie
19      13         "Living the Survivor Dream"   December 20, 2023                                                                NaN                                                       Dee [Austin]   Dakuwaqa  Katurah
   Unnamed: 0_level_0 Original tribes                   Switched tribes            No tribes          Merged tribe                                                                                     Unnamed: 17_level_0
              Episode               1        2        3               4          5         6      6.1            7       7.1         8          9        10        11        12          13       13.1 Unnamed: 17_level_1
0                 Day               3        5        7               9         11        13       13        14[a]     14[a]        16         17        19        21        23          24         25                 NaN
1               Tribe            Lulu     Lulu     Lulu            Reba       Belo       NaN      NaN     Dakuwaqa  Dakuwaqa  Dakuwaqa   Dakuwaqa  Dakuwaqa  Dakuwaqa  Dakuwaqa    Dakuwaqa   Dakuwaqa                 NaN
2          Eliminated          Hannah  Brandon  Sabiyah            Sean     Brando       NaN  J. Maya         Sifu     Kaleb    Kellie     Kendra     Bruce     Emily      Drew       Julie    Katurah                 NaN
3               Votes          5–0[b]      3–0      2–1           3–1–1        3–2    0–0[c]     10–1          5–1       4–2       5–3        6–1     4–3–1    1–0[d]       4–2  2–1–1–0[e]    None[f]                 NaN
4               Voter            Vote     Vote     Vote            Vote       Vote      Vote     Vote         Vote      Vote      Vote       Vote      Vote      Vote      Vote        Vote  Challenge                 NaN
5                 Dee             NaN      NaN      NaN            Sifu        NaN     Kaleb  J. Maya          NaN     Kaleb    Kellie     Kendra      Jake     Julie      Drew     Katurah  Immune[f]                 NaN
6              Austin             NaN      NaN      NaN             NaN  Brando[g]   None[h]  None[h]          NaN     Kaleb    Kellie  Kendra[i]      Jake     Julie     Julie       Julie   Saved[f]                 NaN
7                Jake             NaN      NaN      NaN             NaN        NaN     Kaleb  J. Maya          NaN     Julie   None[j]     Kendra     Bruce     Julie      Drew         Dee     Won[f]                 NaN
8             Katurah             NaN      NaN      NaN             NaN        NaN     Kaleb  J. Maya          NaN     Kaleb      Jake    None[i]     Bruce     Julie      Drew       Julie    Lost[f]                 NaN
9               Julie             NaN      NaN      NaN            Sean        NaN     Kaleb  J. Maya          NaN     Kaleb    Kellie     Kendra     Bruce     Emily      Drew        Jake        NaN                 NaN
10               Drew             NaN      NaN      NaN             NaN     Brando     Kaleb  J. Maya         Sifu       NaN    Kellie     Kendra      Jake     Julie     Julie         NaN        NaN                 NaN
11              Emily          Hannah  Brandon  Sabiyah             NaN     Brando     Kaleb  J. Maya         Sifu       NaN    Kellie    None[i]     Bruce     Julie       NaN         NaN        NaN                 NaN
12              Bruce             NaN      NaN      NaN             NaN        NaN     Kaleb  J. Maya         Sifu       NaN   None[k]     Kendra     Julie       NaN       NaN         NaN        NaN                 NaN
13             Kendra             NaN      NaN      NaN             NaN       Drew     Kaleb  J. Maya         Sifu       NaN      Jake       Jake       NaN       NaN       NaN         NaN        NaN                 NaN
14             Kellie             NaN      NaN      NaN             NaN        NaN     Kaleb  J. Maya         Sifu       NaN      Jake        NaN       NaN       NaN       NaN         NaN        NaN                 NaN
15              Kaleb          Hannah  Brandon  Sabiyah             NaN        NaN   None[j]  None[j]          NaN     Julie       NaN        NaN       NaN       NaN       NaN         NaN        NaN                 NaN
16               Sifu             NaN      NaN      NaN            Sean        NaN     Kaleb  J. Maya        Bruce       NaN       NaN        NaN       NaN       NaN       NaN         NaN        NaN                 NaN
17            J. Maya             NaN      NaN      NaN            Sean        NaN     Kaleb    Emily          NaN       NaN       NaN        NaN       NaN       NaN       NaN         NaN        NaN                 NaN
18             Brando             NaN      NaN      NaN             NaN       Drew       NaN      NaN          NaN       NaN       NaN        NaN       NaN       NaN       NaN         NaN        NaN                 NaN
19               Sean          Hannah  Brandon    Kaleb             Dee        NaN       NaN      NaN          NaN       NaN       NaN        NaN       NaN       NaN       NaN         NaN        NaN                 NaN
20            Sabiyah          Hannah  None[l]  None[h]             NaN        NaN       NaN      NaN          NaN       NaN       NaN        NaN       NaN       NaN       NaN         NaN        NaN                 NaN
21            Brandon          Hannah  None[l]      NaN             NaN        NaN       NaN      NaN          NaN       NaN       NaN        NaN       NaN       NaN       NaN         NaN        NaN                 NaN
22             Hannah         None[b]      NaN      NaN             NaN        NaN       NaN      NaN          NaN       NaN       NaN        NaN       NaN       NaN       NaN         NaN        NaN                 NaN
Answer

Login


Forgot Your Password?

Create Account


Lost your password? Please enter your email address. You will receive a link to create a new password.

Reset Password

Back to login