- Published on
Web Scraping Pokémon Data with Ruby and Nokogiri
- Authors
- Name
- Iván González
- @dreamingechoes
Web scraping is a useful method to extract information from websites when there’s no API available, or simply for fun—like grabbing random content from the web.
For my next post, I needed data about Pokémon from the first Game Boy game. One of the best sources of this information is Pokémon Database, but since it doesn’t offer an API (or at least I couldn’t find one), I used web scraping to get the data I needed.
Since this year marks Pokémon’s 20th Anniversary, I thought some of you might find this interesting—or at least entertaining! 😃
The full documentation for Nokogiri is available here, but in essence, I’m doing two main things:
- Fetching HTML content from a URL using
open-uri
. - Searching for specific elements with CSS selectors using
css
andat_css
methods in Nokogiri.
The Script
Here’s a simple script to scrape Pokémon data:
#!/usr/bin/env ruby
require 'rubygems'
require 'nokogiri'
require 'open-uri'
base_url = "http://pokemondb.net"
url_index = "#{base_url}/pokedex/game/firered-leafgreen"
index = Nokogiri::HTML(open(url_index))
index.css(".infocard-tall").each do |item|
begin
name = item.at_css(".ent-name").text
puts "Fetching #{name} info..."
url_detail = "#{base_url}#{item.at_css(".ent-name")[:href]}"
number = kind = species = height = weight = abilities = nil
pokemon_detail = Nokogiri::HTML(open(url_detail))
number = pokemon_detail.at_css(".vitals-table tr:contains('National')").at_css("td").text
kind = pokemon_detail.at_css(".vitals-table tr:contains('Type')").at_css("td").text.split(" ").join(", ")
species = pokemon_detail.at_css(".vitals-table tr:contains('Species')").at_css("td").text
height = pokemon_detail.at_css(".vitals-table tr:contains('Height')").at_css("td").text
weight = pokemon_detail.at_css(".vitals-table tr:contains('Weight')").at_css("td").text
abilities = pokemon_detail.at_css(".vitals-table tr:contains('Abilities')").at_css("td")
rescue
puts "Something went wrong with #{name} :("
ensure
puts "Pokemon info"
puts "Name: #{name}, number: #{number}, kind: #{kind}, species: #{species}, height: #{height}, weight: #{weight}, abilities: #{abilities}"
end
end
How to Use It
You can tweak this script to extract data from other Pokémon games by changing the base URL and adjusting the CSS selectors to find the information you need.
To run the script:
- Save it as a
.rb
file. - Make it executable:
chmod +x script.rb
- Run it:
./script.rb
Where Can I Get the Code?
You can find the full script on my GitHub Gist. Feel free to copy, modify, and experiment with it! 😃