• about reply
Storm Reply Logo
Menu
    Choose language:
    • about Reply
    Storm Reply Logo
    Focus On

    Article

    AMAZON REDSHIFT

    A fast, fully managed, petabyte-scale data warehouse solution – allows you to analyse large volume of data in a cost-effective manner.

    Amazon Redshift is data warehouse cloud solution that makes it easy to gain new insights from your data, it is a fully managed service so you won’t have the burden of updating, patching or backing up your database: Amazon will take care of these duties.

    BENEFITS

    Interoperability

    Amazon Redshift lets you query files in open format such as CSV, Parquet, ORC, JSON, Avro using standard SQL, and integrate other AWS or even third party services. Using Federated Queries, you can query data that live across one or more Amazon RDS databases effortless, without the need to move them.

    Redshift ML is the perfect tool for Data and Machine Learning engineers to create and train Amazon SageMaker models using data stored in Amazon Redshift, and also use those models directly when querying the database.

    Performance

    Amazon Redshift has been evolved from PostgreSQL and is now the industry leading solution for performance and flexibility. Thanks to the AQUA (Advanced Query Accelerator) for Amazon Redshift, a distributed and hardware-accelerated cache, your queries will run 10x faster with respect to other enterprise data warehouse solutions. Materialized views and result caching enables you to achieve faster query performance when using Business Intelligence (BI) tools or when developing Extract Transform and Load (ETL) workflows. The database engine is powered by machine learning algorithms to deliver high throughput even during concurrent activities. The Short Query Accelerator (SQA) estimates the computational effort required by the queries and prioritize short running queries.

    Scalability

    Amazon Redshift is virtually unlimited. You can setup managed storage to allow the automatic increase of storage when needed, up to 8PB of (compressed) data. Also, using the console or a few API calls you can scale in or out the number of nodes in a cluster in a minute.

    Thanks to the Concurrency Scaling, you can run thousands of concurrent query independently from the fact the query targets data stored on the warehouse or directly in your Amazon S3 data lake.

    Data sharing functionality enables you to share living data across different Redshift Cluster. This functionality gives you and your organization high performance access to any data inside any Redshift cluster you may have.

    Security

    With a couple of parameters, it is possible to enable both data encryption at rest (AES-256) and in transit (SSL) without caring about key management, that is taken in charge by Amazon Redshift itself. Once encryption at rest is enabled, not only data written to disk is encrypted but also backups are. According to the least privilege principle, Amazon Redshift gives the possibility to set up fine-grain access control policies at rows and columns level.

    The tight integration between Amazon Redshift and AWS CloudTrail allows a detailed monitoring of every Redshift API call. All the operations executed against the database, like SQL queries, connection attempts or changes to the data warehouse are logged into system tables and can be queried or exported to a secured Amazon S3 bucket.

    STORM REPLY BEST PRACTICES

    As Storm Reply, we have developed a set of best practices for Amazon Redshift by leveraging our experience in design, implementation and maintenance of huge Data Warehouse solutions for our enterprise customers.

    Security and Fault Tolerance

    Data protection in Amazon Redshift can be enabled both for data in transit and at rest. The service integrates with AWS Certificate Manager to support SSL connections for in transit encryption while at rest encryption can be managed both client-side or server-side. Server-side encryption is fully managed by AWS and your data is encrypted before it’s written to disk and this is the advised choice for most of use cases because it reliefs the user from the burden of key management.

    Amazon Redshift continuously backs up your data to Amazon S3 and provides tools to restore snapshots in any AZ in case of failure. In some cases, faster RPO/RTO are required and for such scenarios you can integrate Amazon Redshift with Amazon Kinesis and Amazon Route53 to deploy parallel clusters and achieve automatic failover.

    Tailoring the infrastructure to your needs

    Amazon Redshift has a wide offering for what concern the hardware features for the nodes of your cluster. Choosing the right node size is crucial both for performance and pricing. We carefully analyse your data and use case to choose the node type that best fits your needs. RA3 nodes provide high speed caching, managed storage (this allows to scale compute separately from storage) and high bandwidth networking, whereas DC2 nodes are optimised for compute-intensive workload on the data warehouse. DS2 nodes are storage optimised and the best solution for workflow with a medium compute workload and huge amount of data.

    Data partitioning and Query performance

    Amazon Redshift uses a compressed columnar data structure that provides high query throughput itself and can be leveraged at its most by following some tips.
    For example, Materialized Views can significantly boost query performance by precomputing frequent queries used during ELT processes or by BI tools. Concurrency scaling allows your Amazon Redshift cluster to add capacity dynamically in response to the workload arriving at the cluster, or if you have spikes with predictable schedule you can use the elastic resize scheduler feature.

    The data inside your nodes can be distributed and sorted by marking some column of your tables as Distribution Key or Sort Key. The use of a proper distribution key(s) enables optimized JOIN queries whereas selecting the right sort key(s) enhances the performance of SELECT queries.

    www.storm.reply.com

    RELATED CONTENTS

    01.06.2022 / Bochum

    Event

    CAR Symposium 2022

    The automotive experts from Cluster Reply, Data Reply and Storm Reply are represented at the CAR Symposium on 01 June 2022 with a booth. In addition to a wealth of information at the booth, Cluster Reply offers a presentation on the topic of "Cloudification of vehicle-related legacy software".

    CONNECTED VEHICLE

    Best Practice

    A Data Masking extension for AWS CMS

    AWS Connected Mobility solution (CMS) is a platform that provides a framework for Original Equipment Manufacturers (OEMs) to reduce the time-to-market of a connected vehicle solution capable of gathering several types of data from a fleet of vehicles.
    The CMS is an accelerator that provides the main building blocks customisable for each specific customer in order to deliver a connected vehicle platform.

    A Data Masking extension for AWS CMS 0

    11.05.2022 - 12.05.2022 / BERLIN

    Event

    AWS Summit 2022
    in Berlin

    As AWS Premier Consulting Partner, Reply accompanies you on your way to transform and connect your business into the Cloud. Meet the Reply Experts at their booth D03 and join their inspiring talks at the AWS Summit 2022 in Berlin, the event that brings together the cloud community to network and to collaborate.

     ​
     
     
     
    Reply ©​​ 2022 - Company Information -
     PrivacyCookie Settings​
    • Abou​t Reply​​​​
    • Investors​​​
    • Newsroom
    • Follow Reply on
    ​
    • ​About Storm ​Reply​
    • Privacy & Cookies Policy
    • Information (Client)
    • Information (Supplier)