This tutorial walks you through using Amazon Athena to query data. You’ll create a table based on sample data stored in Amazon Simple Storage Service, query the table, and check the results of the query.
The tutorial is using live resources, so you are charged for the queries that you run. You aren’t charged for the sample datasets that you use, but if you upload your own data files to Amazon S3, charges do apply.
Step 1: Create a Database
- Open the Athena console.
- If this is your first time visiting the Athena console, you’ll go to a Getting Started page. Choose Get Started to open the Query Editor. If it isn’t your first time, the Athena Query Editor opens.
- In the Athena Query Editor, you see a query pane with an example query. Start typing your query anywhere in the query pane.
- To create a database named
mydatabase, enter the following CREATE DATABASE statement, and then choose Run Query:
CREATE DATABASE mydatabase
- Confirm that the catalog display refreshes and
mydatabaseappears in the DATABASE list in the Catalog dashboard on the left side.
Step 2: Create a Table
Now that you have a database, you’re ready to create a table that’s based on the sample data file. You define columns that map to the data, specify how the data is delimited, and provide the location in Amazon S3 for the file.
To create a table
- Make sure that
mydatabaseis selected for DATABASE and then choose New Query.
- In the query pane, enter the following CREATE TABLE statement, and then choose Run Query:
You can query data in regions other than the region where you run Athena. Standard inter-region data transfer rates for Amazon S3 apply in addition to standard Athena charges. To reduce data transfer charges, replace myregionin
s3://athena-examples-myregion/path/to/data/with the region identifier where you run Athena, for example,
CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront_logs ( `Date` DATE, Time STRING, Location STRING, Bytes INT, RequestIP STRING, Method STRING, Host STRING, Uri STRING, Status INT, Referrer STRING, os STRING, Browser STRING, BrowserVersion STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "^(?!#)([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+[^\(]+[\(]([^\;]+).*\%20([^\/]+)[\/](.*)$" ) LOCATION 's3://athena-examples-myregion/cloudfront/plaintext/';
table cloudfront_logsis created and appears in the Catalog dashboard for your database.
Step 3: Query Data
Now that you have the
cloudfront_logs table created in Athena based on the data in Amazon S3, you can run queries on the table and see the results in Athena.
To run a query