SQL Querying for Beginners: INNER and RIGHT JOINs

Photo by Sunder Muthukumaran on Unsplash. Thanks, Sunder!IntroductionJOINs are one of the most common operations used in SQL among data scientists and data analysts. They’re also among the most devastating when misapplied. An incorrect JOIN can cause m…


This content originally appeared on Level Up Coding - Medium and was authored by Dakota Smith

Photo by Sunder Muthukumaran on Unsplash. Thanks, Sunder!

Introduction

JOINs are one of the most common operations used in SQL among data scientists and data analysts. They’re also among the most devastating when misapplied. An incorrect JOIN can cause massive duplication, loss of relevant records, and just plain wrong data.

In the first essay of this series, we discussed LEFT JOINs. A quick recap of the fundamentals:

  1. JOINs are how you combine two or more tables into one.
  2. JOINs work by identifying what tables have in common with each other. They do this using primary and foreign keys.
  3. A primary key is the unique identifier for each record in a table. No two records can have the same primary key value. A primary key can’t be NULL.
  4. A foreign key in one table is the primary key in some other table. Foreign keys aren’t restrained by the same limitations as primary keys (e.g., duplicate and NULL values).
  5. A LEFT JOIN brings data from a table “on the right” into a table “on the left”. All records from the left table remain. Records from the right table are brought over only when a match is found on the keys.

The above bullet points aren’t meant to be comprehensive definitions, but a reminder of the concepts explored in the previous essay. If they don’t make sense, go ahead and read that first—it’s not long and there are examples to aid in the explanations!

INNER JOINs

Most of the time when using JOINs, you’re going to use either a LEFT JOIN or an INNER JOIN. Though they both perform similar tasks—combining data from different tables—their results are slightly different.

Where a LEFT JOIN brings over data from the “right” table, leaving behind any records that didn’t find a match in the left table according to the keys, an INNER JOIN leaves behind the records from both tables that didn’t find a match.

Let’s use the same tables from the example in the previous essay: employees and offices.

  • The employees table contains data on five different employees at a fictional company.
  • The offices table contains data on the fictional company’s four offices.

And the code used to LEFT JOIN these tables together:

SELECT *
FROM employees
LEFT JOIN offices
ON employees.office_code = offices.code
;

Since the order of the tables determines whether a table is on the left or the right, the above code states that the employees table is the “left” table. When we LEFT JOIN these tables, every record from the employees table stays. Data from the offices table, on the other hand, is brought over wherever offices.code = employees.office_code.

The result is a list of all employees plus the available information on the offices each person works at:

Now let’s look at the tables pre-JOIN. If you look closely, you’ll notice that no employee has an office_code = 4. This code corresponds to the Chicago office, but because there was no match on that key, the result of LEFT JOIN didn’t bring over any data related to the Chicago office.

You might notice as well in the employees table that the office_code field for Paola Ferrante is empty (or NULL). However, her record is still in the table resulting from our LEFT JOIN because that data came from the left table.

In an INNER JOIN, neither table gets such privilege. There must be a match between the keys for a record to be returned in an INNER JOIN.

In our example, the keys are employees.office_code and offices.code. If an employee is missing an office code, they won’t be in the final result. Similarly, if an office code is missing in the employees table—such as in the Chicago office’s case—then data from that office won’t be in the final result either.

The code for an INNER JOIN is similar to a LEFT JOIN. The difference is only one word:

SELECT *
FROM employees
INNER JOIN offices
ON employees.office_code = offices.code
;

The resulting table, for the reasons mentioned above, doesn’t include Paola Ferrante’s record and also contains no data on the Chicago office:

Additionally, with INNER JOINs, the order of the tables doesn’t matter. The result is exclusive to records that match on the keys, so “right” and “left” don’t make a difference. The only difference—which admittedly at a glance looks significant—is that the order of the columns are swapped:

But the data is otherwise the same.

A brief note before moving on:

In the offices table, there’s no value in the phone_number field for the Seattle office (code = 2). This missing value (otherwise known as a NULL value¹) didn’t have an effect on the JOIN. The ON statement, which defined the keys, didn’t tell the JOIN to consider offices.phone_number. It only said to find a match on offices.code = employees.offices_code, which it did.

When to use an INNER JOIN over a LEFT JOIN?

The underlying schema behind our employees and offices table lends itself logically to LEFT JOINs over INNER JOINs. This is because the offices table is essentially a details table (otherwise sometimes known as a dimension table). The primary use of the offices table would be to bring over details (other data) related to employees at the company.

So when would you use an INNER JOIN over a LEFT JOIN?

Generally, the answer is: Any time you only want records that match on the designated keys. But, to me, this begs the question: Why would you want that?

I’m going to give just one example, but theoretically there are an infinite number of answers to this question. For now, it’s less important that you memorize as many as you can and more important that you understand why just one of them makes sense.

Imagine that our employees and offices tables are much bigger. Instead of 5 employees, there are 5,000. And instead of 4 offices, there are 400—some open and running, some not, and some that are in the works, so to speak. Maybe management knows they’re going to open an office in Dubai, but are still deciding between a few different places.

Imagine as well that there are hundreds of employees, who, like Paolo Ferrante, don’t have an office_code. They work from home.

What if you were asked for a list of employees who are currently working in one of the offices? You couldn’t LEFT JOIN offices onto employees—that would give you a list of all employees regardless of where they worked. (You could introduce a WHERE statement that filters out the work-from-home employees, but then you’re adding an extra line. Brevity is clarity in code.)

Similarly, you couldn’t LEFT JOIN employees onto offices. This would necessarily return all offices, including the ones that aren’t currently in operation.

The simplest solution would be to use an INNER JOIN. By doing so, you eliminate any employees that don’t have an office_code and any offices that don’t have employees in them.

We did exactly this in our INNER JOIN earlier. Paola Ferrante, who presumably works from home, wasn’t included in the result. And the Chicago office—which, according to our data, might well be an empty room with a working telephone somewhere—was also left out.

RIGHT JOINs

The controversial JOIN

I wasn’t sure if I was going to write about RIGHT JOINs. They can be controversial, and because of this, tend to be confusing. But I think they’re worth mentioning.

For one, RIGHT JOINs aren’t any more conceptually difficult than LEFT JOINs. Second, it’s important to understand that like any other language, SQL has standard conventions that are (most often reasonably) treated as law.

A RIGHT JOIN is essentially the mirror image of a LEFT JOIN. If a LEFT JOIN brings data into the table on the left—while retaining all records from that table—a RIGHT JOIN brings data into the table on the right while retaining all records from that table.

(It’s important to remember that which table is “on the left” vs. “on the right” is determined by the order of the JOIN. As a reminder: the first table, immediately following FROM in your query, is the left table. The second table, which follows JOIN, is the right table.)

Let’s look at a RIGHT JOIN using our employees and offices tables. Note that in the code, I’m swapping the order of the tables so that employees is on the right. Like before, I want the records of all employees to be in the result.

SELECT *
FROM offices
RIGHT JOIN employees
ON employees.office_code = offices.code
;

At a glance, this table might look like something we haven’t seen yet. But if I were to simply change the order of the columns in the SELECT statement (without doing anything to the JOINs), it starts looking a lot more familiar:

SELECT employees.*, offices.*
FROM offices
RIGHT JOIN employees
ON employees.office_code = offices.code
;

It’s the same result as the table created by our original LEFT JOIN. Any LEFT JOIN can be rewritten to be a RIGHT JOIN, and vice versa. The order of the tables (and the columns if you want it to look exactly the same) need only be reversed.

So…should I use LEFT JOINs or RIGHT JOINs?

The short and somewhat unsatisfactory answer is: It depends. In the English-speaking world, company best practices will usually dictate that you use LEFT JOINs over RIGHT for readability’s sake. (The exception being long queries involving numerous JOINs, in which case a quick and easy RIGHT JOIN might be preferable to rewriting the entire FROM statement in your code.)

Part of the reason for this is the way we think. English speakers, for example, tend to process information from left to right. As an English speaker, I much preferred the LEFT JOIN of our employees and offices table to the mirror-imaged RIGHT JOIN, because in the latter I had to switch the order of the columns to make sense of what I was seeing—because I process from left to right.

The ubiquity of this preference (again, specifically in the English-speaking world) has led to the belief that RIGHT JOINs must be bad practice, or at best messy syntax, when in fact they’re functionally no different from LEFT JOINs. There is an argument to be made for readability, however—if your code is unnecessarily confusing to your co-workers or teammates, there’s a higher chance that something will be misunderstood and potentially misapplied.

My opinion? Know your audience. If you’re unsure but want to go ahead and use a RIGHT JOIN anyway, consider adding a comment in your code that points it out and explains your reasoning.

Photo by RUT MIIT on Unsplash

LEFT JOINs and INNER JOINs are the two most common JOINs you’ll use when querying as a data analyst or scientist. I hope this essay and the last have been helpful in better understanding how they work.

Shoutout to Jeanique Rahmlow for the editorial help, and to the Dataxp Discord community for supporting the development of enthusiastic data professionals.

Have a suggestion for what SQL topic I should explain next? Leave it in the comments! And as always, feel free to connect with me on LinkedIn.

¹In SQL, you’d actually see the word NULL in the output. As SQL counts spaces as characters, a value that looks blank most likely has a space in it. NULL values, then, should be thought of as truly empty.


SQL Querying for Beginners: INNER and RIGHT JOINs was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.


This content originally appeared on Level Up Coding - Medium and was authored by Dakota Smith


Print Share Comment Cite Upload Translate Updates
APA

Dakota Smith | Sciencx (2022-04-18T01:45:20+00:00) SQL Querying for Beginners: INNER and RIGHT JOINs. Retrieved from https://www.scien.cx/2022/04/18/sql-querying-for-beginners-inner-and-right-joins/

MLA
" » SQL Querying for Beginners: INNER and RIGHT JOINs." Dakota Smith | Sciencx - Monday April 18, 2022, https://www.scien.cx/2022/04/18/sql-querying-for-beginners-inner-and-right-joins/
HARVARD
Dakota Smith | Sciencx Monday April 18, 2022 » SQL Querying for Beginners: INNER and RIGHT JOINs., viewed ,<https://www.scien.cx/2022/04/18/sql-querying-for-beginners-inner-and-right-joins/>
VANCOUVER
Dakota Smith | Sciencx - » SQL Querying for Beginners: INNER and RIGHT JOINs. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2022/04/18/sql-querying-for-beginners-inner-and-right-joins/
CHICAGO
" » SQL Querying for Beginners: INNER and RIGHT JOINs." Dakota Smith | Sciencx - Accessed . https://www.scien.cx/2022/04/18/sql-querying-for-beginners-inner-and-right-joins/
IEEE
" » SQL Querying for Beginners: INNER and RIGHT JOINs." Dakota Smith | Sciencx [Online]. Available: https://www.scien.cx/2022/04/18/sql-querying-for-beginners-inner-and-right-joins/. [Accessed: ]
rf:citation
» SQL Querying for Beginners: INNER and RIGHT JOINs | Dakota Smith | Sciencx | https://www.scien.cx/2022/04/18/sql-querying-for-beginners-inner-and-right-joins/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.