Last week we made a fairly quiet (too quiet, in fact) announcement of our plan to slowly and carefully deprecate the path-based access model that is used to specify the address of an object in an S3 bucket. I spent some time talking to the S3 team in order to get a better understanding of the situation in order to write this blog post. Here’s what I learned…
We launched S3 in early 2006. Jeff Bezos’ original spec for S3 was very succinct – he wanted
malloc (a key memory allocation function for C programs) for the Internet. From that starting point, S3 has grown to the point where it now stores many trillions of objects and processes millions of requests per second for them. Over the intervening 13 years, we have added many new storage options, features, and security controls to S3.
Old vs. New
S3 currently supports two different addressing models: path-style and virtual-hosted style. Let’s take a quick look at each one. The path-style model looks like either this (the global S3 endpoint):
Or this (one of the regional S3 endpoints):
In this example,
jeffbarr-public are bucket names;
/jeffbarr-public/classic_amazon_door_desk.png are object keys.
Even though the objects are owned by distinct AWS accounts and are in different S3 buckets (and possibly in distinct AWS regions), both of them are in the DNS subdomain
s3.amazonaws.com. Hold that thought while we look at the equivalent virtual-hosted style references (although you might think of these as “new,” they have been around since at least 2010):
These URLs reference the same objects, but the objects are now in distinct DNS subdomains (
jeffbarr-public.s3.amazonaws.com, respectively). The difference is subtle, but very important. When you use a URL to reference an object, DNS resolution is used to map the subdomain name to an IP address. With the path-style model, the subdomain is always
s3.amazonaws.com or one of the regional endpoints; with the virtual-hosted style, the subdomain is specific to the bucket. This additional degree of endpoint specificity is the key that opens the door to many important improvements to S3.
Out with the Old
In response to feedback on the original deprecation plan that we announced last week, we are making an important change. Here’s the executive summary:
Original Plan – Support for the path-style model ends on September 30, 2020.
Revised Plan – Support for the path-style model continues for buckets created on or before September 30, 2020. Buckets created after that date must be referenced using the virtual-hosted model.
We are moving to virtual-hosted references for two reasons:
First, anticipating a world with billions of buckets homed in many dozens of regions, routing all incoming requests directly to a small set of endpoints makes less and less sense over time. DNS resolution, scaling, security, and traffic management (including DDoS protection) are more challenging with this centralized model. The virtual-hosted model reduces the area of impact (which we call the “blast radius” internally) when problems arise; this helps us to increase availability and performance.
Second, the team has a lot of powerful features in the works, many of which depend on the use of unique, virtual-hosted style subdomains. Moving to this model will allow you to benefit from these new features as soon as they are announced. For example, we are planning to deprecate some of the oldest security ciphers and versions (details to come later). The deprecation process is easier and smoother (for you and for us) if you are using virtual-hosted references.
In With the New
As just one example of what becomes possible when using virtual-hosted references, we are thinking about providing you with increased control over the security configuration (including ciphers and cipher versions) for each bucket. If you have ideas of your own, feel free to get in touch.
Here are some things to know about our plans:
Identifying Path-Style References – You can use S3 Access Logs (look for the
Host Header field) and AWS CloudTrail Data Events (look for the
host element of the
requestParameters entry) to identify the applications that are making path-style requests.
Programmatic Access – If your application accesses S3 using one of the AWS SDKs, you don’t need to do anything, other than ensuring that your SDK is current. The SDKs already use virtual-hosted references to S3, except if the bucket name contains one or more “.” characters.
Bucket Names with Dots – It is important to note that bucket names with “.” characters are perfectly valid for website hosting and other use cases. However, there are some known issues with TLS and with SSL certificates. We are hard at work on a plan to support virtual-host requests to these buckets, and will share the details well ahead of September 30, 2020.
Non-Routable Names – Some characters that are valid in the path component of a URL are not valid as part of a domain name. Also, paths are case-sensitive, but domain and subdomain names are not. We’ve been enforcing more stringent rules for new bucket names since last year. If you have data in a bucket with a non-routable name and you want to switch to virtual-host requests, you can use the new S3 Batch Operations feature to move the data. However, if this is not a viable option, please reach out to AWS Developer Support.
Documentation – We are planning to update the S3 Documentation to encourage all developers to build applications that use virtual-host requests. The Virtual Hosting documentation is a good starting point.
We’re Here to Help
The S3 team has been working with some of our customers to help them to migrate, and they are ready to work with many more.
Our goal is to make this deprecation smooth and uneventful, and we want to help minimize any costs you may incur! Please do not hesitate to reach out to us if you have questions, challenges, or concerns.
PS – Stay tuned for more information on tools and other resources.