Yelp S3 Access Logs: How Yelp Scaled S3 Logging Efficiently

Yelp S3 Access Logs: How the Company Scaled Object-Level Logging Without Exploding Costs

Yelp S3 access logs clearly show how large-scale object storage logging can work in practice without triggering massive storage bills or slow analytics. Recently, Yelp published a detailed engineering blueprint explaining how its teams rebuilt Amazon S3 server-access logging to operate efficiently at scale.

However, as traffic grew, traditional approaches stopped working. Therefore, Yelp had to rethink how it stored, processed, and queried object-level access data across its infrastructure.

Why Yelp S3 Access Logs Became a Scaling Problem

Initially, Yelp relied on raw S3 server-access logs stored as plain text. Over time, however, traffic growth pushed the system beyond its limits.

As usage increased, engineers quickly ran into three major issues:

First, log volume reached terabytes per day
Second, storage costs climbed sharply
Finally, queries became slow and inefficient

As a result, object-level logging started to look impractical at scale.

How Yelp Rebuilt Its S3 Access Logs Pipeline

To address these problems, Yelp redesigned the entire logging pipeline.

The system still collects raw access logs from S3. However, instead of keeping them untouched, Yelp now compacts them on a regular schedule into fewer, larger files.

Most importantly, the pipeline converts plaintext logs into Parquet format, which significantly improves compression and query performance.

Because of this change, Yelp:

Reduced storage usage by around 85%
Decreased object count by more than 99.99%

As a result, engineers can query access data faster while spending far less on storage.

Yelp S3 Access Logs Architecture at Scale

Behind the scenes, several AWS services support the system.

For example:

AWS Glue Data Catalog manages schemas across multiple AWS accounts
Batch jobs and Lambda functions handle ingestion and compaction
Partition projection, meanwhile, allows Amazon Athena to scan data efficiently

In addition, the pipeline tolerates delayed or duplicate log delivery. Engineers deliberately designed inserts to remain idempotent, which prevents dataset corruption.

After archiving completes, the system tags old raw log objects for lifecycle expiration. Consequently, storage overhead continues to shrink over time.

What Yelp S3 Access Logs Enable Operationally

Thanks to the new pipeline, Yelp unlocked several operational benefits.

Debugging and Security

For instance, engineers can quickly verify whether a specific object was accessed or denied at a given time.

Cost Attribution

Moreover, teams aggregate API usage by IAM role. As a result, they clearly see which services generate the most traffic.

Data Hygiene

Finally, by combining access logs with S3 inventory data, engineers safely identify and delete objects that have not been accessed for long periods.

Why Yelp S3 Access Logs Matter for the Industry

For years, many organizations avoided object-level S3 logging due to cost and complexity concerns. However, Yelp’s work challenges that assumption.

In practice, Yelp S3 access logs demonstrate that companies can operate object-level logging efficiently and at scale.

Therefore, as cloud governance, auditing, and compliance requirements grow, this architecture offers a practical reference for teams facing similar challenges.

Similar Approaches Inspired by Yelp S3 Access Logs

Not surprisingly, several platforms now follow similar design patterns.

For example, Upsolver offers managed ingestion and transformation of S3 access logs into analytics-ready formats. Likewise, AWS publishes reference architectures that use Glue and Athena to achieve similar results.

Meanwhile, query engines such as Presto, Trino, and Druid perform especially well when teams store logs in columnar formats like Parquet or ORC. In contrast, organizations that need near-real-time visibility often push logs into OpenSearch, trading some storage efficiency for speed.

Final Thoughts

Ultimately, Yelp’s blueprint proves that scale does not have to lead to runaway costs. With the right design choices, Yelp S3 access logs transform massive volumes of raw data into a fast, affordable, and operationally valuable resource.

Read also

Join the discussion in our Facebook community.

Yelp S3 Access Logs: How the Company Scaled Object-Level Logging Without Exploding Costs