Do a Full Outer Join in ArcGIS
Typically when joining tables in ArcGIS, you either do a left join or an inner join. A left join is when the result of joining Table A to Table B is all rows from Table A, and Table B where it joined, or “Keep all records” in the image below. An inner join is when the results are only rows where Table A and B match; this would be the “Keep only matching records” setting in the image below. But there are a few other types of joins, that many SQL and database people are familiar with, but many GIS folks may not be familiar with, namely right joins and outer joins. I’m going to walk you through doing a full outer join in ArcGIS.
A right join is when you join Table A to Table B, and the results are all rows in Table B and where Table A joined to Table B. Technically you can do a right join in ArcMap, you just have to reverse the tables so it becomes a left join.
A full outer join is when you merge Table A and Table B, and the results include all rows from both tables, but where Table A and B match, they are joined.
Here’s a SQL join diagram to illustrate:
In ArcMap you can do a full outer join, but it’s not straightforward. Here are the steps:
First, take your 2 tables and consider the columns you want to be included in the final product, because this process will merge the field names. Turn off excess fields, and recalculate fields or change data types if necessary.
Let’s do an example to illustrate this process. Say we are joining a group of people’s locations from YearA and YearB together. Some names are the same in both years, but some are not, but we want them altogether in one point feature. If each table has similar fields but slight differences, standardize them first. Let’s use an Age field for example. Say in YearA Age is a string, but in YearB it was changed to Long Integer. When we go through the following process, these two fields won’t merge into one because their data types aren’t the same, so if you want Age to be included, you will need to recalculate one year so that it matches the other. Once you’re happy with your data, go to the next step.
Start by joining YearA to YearB by the person’s name field. The result of that join will give you all records in YearB, and the records in YearA that match YearB’s name. That means you’re still missing the rest of YearA. Take the YearB with the YearA join on it, and export this as a new feature. Let’s call it YearB_YearA_joined_only.
That new feature currently has all of the columns from both tables, which might be a little excessive. Go to its layer properties, and turn off the duplicate fields.
Now you need to find all records in YearA that didn’t match to YearB. You can do this by taking the YearB_YearA_joined_only feature, join it to YearA, and select where YearA did not join.
I would do this in the Select by Attributes window after the join by selecting where the YearB_YearA_joined_only feature’s name Is Null. Ensure the result of that query is giving you the rows you need. Once it is, leave those rows selected, and remove the join. Now you should just have YearA rows, and rows where they are not joined to YearB selected. Export the selected rows as a new feature, call it something like YearA_nomatch.
Use the Append tool. In the Append tool options, select the YearA_nomatch feature, to append to the YearB_YearA_joined_only feature as the target layer. In the next option between NO_TEST and TEST, choose NO_TEST. See the Append image after #5 for more context. This will allow you to use the field map window below to match up fields/columns, giving you more control of the process. If they don’t match, the columns won’t be appended. See ESRI’s definition below for more detail:
Schema Type (optional)
Specifies if the schema (field definitions) of the input datasets must match the schema of the target dataset in order for data to be appended.
TEST —Input dataset schema (field definitions) must match the schema of the target dataset. An error will be returned if the schemas do not match.
NO_TEST —Input dataset schema (field definitions) do not have to match that of the target dataset. Any fields from the input datasets that do not match the fields of the target dataset will not be mapped to the target dataset unless the mapping is explicitly set in the Field Map control.
Click the plus sign to see which columns from the YearA_nomatch feature will be matched to the target layer (YearB_YearA_joined_only). This is why I suggested standardizing fields and data types first, so that it’s easier to use the field map. Now run the Append tool. It is important to remember you are appending data to an existing feature, not creating a new feature! Whatever you do is permanently appended to the target layer. If you think you might screw something up in this process, you may want to make a copy of YearB_YearA_joined_only first, and use Append on that.
Check out your results. Note that if the resulting data had different ways of formatting the same information, like if there was a Race column and YearA used “African-American” but YearB used “Black”, a simple field recalculation could fix results like those if necessary.
At this point, you should have your spatial full outer join. All the features will be joined together, whether the row is joined between both years or otherwise between the two input features.
The reason I wrote about this was because I learned about SQL and SAS joining in my statistical computing course, and then when I was presented with a full outer join to do in GIS, I realized it wasn’t as simple as SQL or SAS. I also realized there weren’t a lot of resources on this issue throughout the handful of big GIS forums, and at least now I’ve given you a way that will work.
If you know a better way, or have more pointers on this issue, I would love to hear from you. Thanks and I hope this might help someone!