some fakemon based on tree sap and a few various tree species

seen from Brazil

seen from Belgium

seen from Saudi Arabia
seen from United States

seen from United Kingdom
seen from Netherlands

seen from United States
seen from United States

seen from Spain
seen from United States
seen from Belgium

seen from United States
seen from India
seen from Russia

seen from United States

seen from China

seen from United States
seen from China

seen from Indonesia

seen from Singapore
some fakemon based on tree sap and a few various tree species
R - Split a String in a Data Frame Column and Keep a Piece as a New Variable
I’ve been having trouble figuring out where to begin with this data blog, so I think I’ll start with something pretty simple but ultimately very valuable - splitting a column of values in an R data frame and creating a new variable out of one piece of the split, for every row in your dataset. I use this all the time to create a new variable who’s values are a subset of another variable. This might be a niche piece of code, but I looooooove it :)
Dataset:
Data Frame: NBA
Function
df$variable2 <- sapply(strsplit(as.character(df$variable1), " "),"[", 1)
Let’s break down the pieces to this nifty little trick, from the inside out:
as.character(df$variable1)
We want the variable that we are splitting to be a character variable, if it is not already.
strsplit(…, " ")
This will split the value in a variable by a delimiter, which is great. However, say you have a variable1, with a value “Tyler is awesome“. Using the strsplit function (and splitting on a space “ “), you would end up with “Tyler” “is” “awesome“. There’s nothing wrong with this, but if you tried to assign this to a data frame, you would have one variable with three rows, one for each of the split words. And this is only working for a single value. This isn’t what we are trying to do here - especially if you have a large data frame with lots of different values in variable1. We do want to split the variable though, which is why this is an important piece to this.
sapply(…(…),"[", 1)
This is where the magic happens. sapply() function takes a list, vector or data frame as input and gives output in vector or matrix. The apply family in general primarily are used to avoid explicit uses of loop constructs, which in our case is quite helpful as we have many rows of data that we want to perform some sort of function on.
The piece “[“,1 is the FUN function for sapply and the part where we tell R to retain just one piece of the split. The “1” tells R that we want the first piece of the split - we could change that to 2, 3 etc depending on what we want to keep.
It’s probably best to see it in action though, as even some of these intricate details can get complicated for me as well.
Example
Alright so based on the dataset above, let’s say we wanted to split the variable “NBA_Teams” and store the city that each team is from in a new column, called “Cities”. Here’s the code we would use to do that:
NBA$Cities <- sapply(strsplit(as.character(NBA$NBA_Teams), " "),"[", 1)
If we wanted to just keep the mascot portion of each team (let’s call that new variable “Mascot”, we would simply change the “1” to a “2” at the end of the function:
NBA$Mascot <- sapply(strsplit(as.character(NBA$NBA_Teams), " "),"[", 2)
So again, instead of just splitting a single value into smaller chunks, we can split an entire column of values based on any delimiter that we want (the above example we split on a space, but we could split on the letter “t” if we wanted to). No for loops necessary!
Thanks for reading!