So, you’re in Redshift running queries and doing your typical data wrangling—but it’s taking forever. If you’ve ever waited minutes or even hours for your queries to compute, you’re not alone.
In this article, we’ll walk you through some of the most frequently encountered reasons for slow Amazon Redshift queries and show you how to speed them up.
Why Your Amazon Redshift Queries are Slow
1. There’s not enough space in your Redshift cluster.
Company growth is great. Yet, as your company’s data volume grows over time, your team has to write more and more queries. Unfortunately, that might lead to you running out of space in your cluster.
Check your maximum storage capacity to see whether space constraints are the culprit of your slow-running Redshift queries. The rule of thumb is to not exceed 80% of your cluster storage capacity.
If you’ve exceeded 80%e, resize your cluster. Elastic resize is the fastest method to resize a cluster—it adds or removes nodes on an existing cluster and automatically redistributes the data to the new nodes.
For more information about how to resize your clusters in Amazon Redshift, check out this step-by-step guide.
2. You’re running inefficient Amazon Redshift queries.
Running inefficient queries is a common cause for slow query times. For instance, a query that scans all your data would be very inefficient and not the best use of your time.
To mitigate this, consider the following tips as suggested by Redshift experts:
- When possible, avoid using the UNION set operator.
- Be as specific as possible with multiple levels of conditionals.
- Use INNER joins over LEFT joins because LEFT and RIGHT outer joins retain values from one of the joined tables when no match is found in the other table.
3. You’re running multiple Redshift queries.
As powerful as Redshift is, it still has computing limitations—the obvious being the number of queries you can run at once. Amazon Redshift ETL processes can also extend query time.
To avoid slowing down your Redshift queries, stagger your query schedule throughout the day or at times you know your cluster will be least active. Redshift users can use Redshift’s Workload Management to speed up their queries.
Alternatively, if you’re a Kloudio user, you can schedule your data sync times for your Redshift connection.
Final Thoughts on Running Redshift Queries
These three tips cover the most common reasons your Redshift queries may be running slower than desired, but there are certainly more to consider. Here are a couple of others:
- Data sort order: Amazon Redshift stores table data on a disk, sorted according to a table’s sort keys. The query optimizer and the query processor use the information about where the data is located to reduce the number of blocks that need to be scanned and thereby improve query speed.
- The number of nodes, processors, or slices: More nodes mean more processors and more slices, which enables your queries to process faster by running portions of the query concurrently across the slices. However, more nodes also mean greater expense, so you need to find the balance of cost and performance that is appropriate for your business.
We hope these quick tips will help you speed up your Redshift queries. As always, Kloudio can help kickstart your data analysis by connecting to Amazon Redshift. Create your free Kloudio account to see how our platform can speed up your Redshift queries.