This is part 2 of this series. Go here for part 1.

Requirements

In the previous post, I mentioned I want to create this API using Go so I’ll need a web framework, and since I’m planning to return JSON when calls are made I’ll need a NoSQL database solution.

Also, since I don’t want to write all the stats and information for all of WOINs data I’ll need to get it from somewhere.

Some Quick Background on Me

I’m a hobbyist-level programmer at best. I’ve mainly learned what I needed, on the fly mind you, out of interest or to solve a specific problem for work. If you look at my code and wonder, “why would he do this?” you now know why. Also, my knowledge of Go when starting this project consisted of completing the Go track on Code Academy and writing a small app to text me if the freezer in my garage loses power, nothing fancy, and nothing overly complex.

The Database Problem

The problem is I don’t know much about them outside of basic knowledge like the difference between relational databases and NoSQL, what you’d store in each, and some basic querying. A DBA I am not and the idea of setting one up locally just to get this off the ground made me sad. So off to the interwebs I went hoping to find a solution. I looked at the managed services offerings from Couchbase and MongoDB and while it looks like Couchbase has some better features at scale, Mongo has a free forever tier, and well, pricing wins out here. Setup was easy and their desktop UI makes it easy to create collections and query them.

Setup also allows you to choose which cloud provider you want your mongo instance in. I chose AWS because once the API is working I plan to build and run it in an AWS lightsail instance. Why lightsail? It’s a VPS in the same region as my database and it comes with some decent features and a friendly price tag of $3.50/month with the first 3 months free.

Sure I could use one of the raspberry pis I have lying around but I like the idea of the API and DB running in the same AWS region and also not having to maintain the pi, open ports for incoming traffic at home, or deal with power outages, internet outages, etc. and $3.50/month is a budget-friendly price point.

Sourcing WOIN Data

To my knowledge, there is no preexisting API or lists of data that exist as files of some format somewhere. I searched the interwebs and even the official EN Publishing site and associated forums and managed to find some web pages with tables of data relating to species, origins, careers, career/racial exploits and, universal exploits.

They are just HTML tables on a page though and trying to copypasta these will get you data but it’s not pretty. Take the origins page for example. Certain fields are nil rather than having a 0 value. There are rows in the first column resembling ***foo*** that serve as headers for the following row(s) which could use some cleanup as well. Not the end of the world and can likely be fixed in bulk with a find and replace. So, how do we get the data?

Well, I know enough about python to know it can do web scraping so naturally I googled and settled on Selenium. Then I spent some time RTFM and figured I could make it happen pretty easily. Now I only needed a webdriver and some python code.

I scraped these into CSV files because those are easy to edit, find/replace I mentioned earlier, and they are a method of upload into MongoDB.

from selenium import webdriver
import csv
 
driver = webdriver.Chrome(executable_path='/path/to/chromedriver')
driver.implicitly_wait(10) # wait for the page to load
driver.maximize_window() # self explanatory
 
urls = ['']
 
for url in urls:
   driver.get(url)
 
# Scrape data
t_rows = driver.find_elements_by_tag_name('tr') # tr is HTML for 'table row'
row_data = [line.text for line in t_rows]
row_splits = [line.split() for line in row_data]
 
with open('foo.csv', 'a', newline='') as f:
   line_writer = csv.writer(f)
   line_writer.writerows(row_splits)
 

Each page was scraped into its own file and uploaded to Google Drive. Then began the longest and most boring part of this project, cleaning up the data. The find and replace wasn’t too hard. It was the text formatting that existed in some cells where text would wrap unnecessarily and needed to be cleaned by hand.

Some things like careers or exploits also have prerequisites listed in the rule books that aren’t listed in these tables. I added those where I could as I don’t have all the rule books and in the case of Judge Dredd and the Worlds of 2000 A.D., it’s no longer available. I’m hoping to buy what I’m missing soon and hopefully find the information I’m missing where things no longer exist.

With the data now scraped, cleaned, and, formatted I simply created a collection for each in Mongo and imported them. Easy peasy.

Picking a Go Web Framework

When I originally thought of this project I planned to write it in Python as that’s what I have the most experience with and I at least knew of Flask and that it would work fine here.

I didn’t have that existing knowledge with Go so off to Google I went and lo and behold there’s a simple tutorial on the official Go website on making a RESTful API with Go and Gin.

The tutorial is easy to follow and nicely covers the CR of a CRUD app but just like with my DB choice I wanted to see what else may be out there in terms of a web framework.

I ended up finding echo which has some nice features I’d like to implement in the future and it boasts some nice benchmarks when compared to Gin. Their docs are nicely laid out and have some great examples too including one for CRUD.

Conclusion

I got my data, my DB, and my web framework. In the next installment, I’ll start building the API. See you then.