Importing JSON Data

Data is spread across the web in different formats and an analyst needs to parse this data to build a local datastore. JSON is one of the most popular data exchange formats on the web today and many API services will return data in this format. Hence, it is important to know how to read and process JSON data.

In this article, we will begin putting together a Fantasy Soccer team for the English Premier League. The inspiration comes from this article by Bill Mill, who has analyzed the data using Python. We will be analyzing the dataset in R.

Quick View

This is a data sample of player data from the Fantasy League. This is the raw data we will be using for our analysis.

Setting Up An R JSON Environment

Using CRAN packages is one of the best ways to read JSON in R. Here, we will install the jsonlite package.

install.packages("jsonlite")

Once installed, we begin using the package by loading it into the search namespace.

library(jsonlite)

Reading JSON

The jsonlite package provides the fromJSON function to read JSON strings. We will use this function to read the JSON from the web API.

url <- "http://fantasy.premierleague.com/web/api/elements/"
names(fromJSON(paste0(url,1))) # Concatenate URL and player id to fetch player data

This code reads the JSON data and prints the various key values as shown:

column names
Column Names from API read call

TIP: paste0 is a special use of the paste function and uses a default blank separator for concatenating strings.

url <- "http://fantasy.premierleague.com/web/api/elements/"

## Both paste commands produce same output
paste(url, 1, sep = "")
paste0(url, 1)

There are 59 columns in this dataset for a single player. To see the output in a formatted layout, go ahead and type the following and see what you get.

url <- "http://fantasy.premierleague.com/web/api/elements/"
toJSON(fromJSON(paste0(url, 1)), pretty = TRUE)

We’ve identified that there are 567 players in the player database. Now, we want to build a local R dataset using this API. To do this, we run the following code:

## List of relevant fields we are interested in

relevantFields <- c("points_per_game","total_points","type_name",
 "team_name","team_code","team_id",
 "id","status","first_name","second_name",
 "now_cost","value_form","team",
 "ep_next","minutes","goals_scored",
 "assists","clean_sheets","goals_conceded",
 "own_goals","penalties_saved","penalties_missed",
 "yellow_cards","red_cards","saves",
 "bonus","bps","ea_index",
 "value_form","value_season","selected_by")

numCols = length(relevantFields) # Length of relevant string vector
# Initializing an empty dataframe
allplayerdata <- data.frame(matrix(NA,nrow=1,ncol=numCols))
allplayerdata <- allplayerdata[-1,]

fetchData <- function(i) {

 res <- try(jsondata <- fromJSON(paste0(url,i)))

 if(!inherits(res, "try-error")) {

      jsondata <- jsondata[which(names(jsondata) %in% relevantFields)]
 }
}

allplayerdata <- lapply(1:567, fetchData)
allplayerdata <- do.call(rbind, lapply(allplayerdata,
                                           data.frame,
                                           stringsAsFactors=FALSE))

This code iteratively fetches player data from the web API and appends to the dataframe allplayerdata. If you wish to track the code performance, you can save all the code in an R code file (extension: .R) and use system.time() to see how much time it takes.

In the next articles in this series, we will cover data reshaping, visualizations and linear optimization modeling.

Key Takeaways

  1. JSON is a popular data exchange format on the web.
  2. We used the fromJSON function in the jsonlite package to read JSON data.
  3. paste0 uses a blank separator for string concatenation.
  4. rbind function is used to append a row of data to an existing dataframe.

Leave a Reply

2 Comments on "Importing JSON Data"

avatar

jack10063
Guest
2 years 8 months ago

Nice Post! Can I somehow use this API to get data for mini-league teams as well?

Dr. Duru
Guest
Dr. Duru
18 days 4 hours ago

Looks like the JSON data no longer exists. I also had to install a package named “curl”.

wpDiscuz