Using BigQuery In Command Line
Examine a table
BigQuery offers a number of sample tables that you can run queries against. In this lab, you’ll run queries against the shakespeare
table, which contains an entry for every word in every play.
To examine the schema of the Shakespeare table in the samples dataset, run:
bq show bigquery-public-data:samples.shakespeare
In this command you’re doing the following:
bq
to invoke the BigQuery command line toolshow
is the action- then you’re listing the name of the
project:public dataset.table
in BigQuery that you want to see.
Run the help command
When you include a command name with the help commands, you get information about that specific command. For example, the following call to bq help
retrieves information about the query
command.
bq help query
To see a list of all of the commands bq
uses, run just bq help
.
Run a query
Now you’ll run a query to see how many times the substring “raisin” appears in Shakespeare’s works.
To run a query, run the command bq query "[SQL_STATEMENT]"
.
- Escape any quotation marks inside the [SQL_STATEMENT] with a \ mark, or
- Use a different quotation mark type than the surrounding marks (“versus”).
Run the following standard SQL query in Cloud Shell to count the number of times that the substring “raisin” appears in all of Shakespeare’s works:
bq query --use_legacy_sql=false \
'SELECT
word,
SUM(word_count) AS count
FROM
`bigquery-public-data`.samples.shakespeare
WHERE
word LIKE "%raisin%"
GROUP BY
word'
In this command:
--use_legacy_sql=false
makes standard SQL the default query syntax
Create a new table
- Use the
bq ls
command to list any existing datasets in your project:bq ls
You will be brought back to the command line since there aren’t any datasets in your project yet.
- Run
bq ls
and thebigquery-public-data
Project ID to list the datasets in that specific project, followed by a colon (:).bq ls bigquery-public-data:
- Use the
bq mk
command to create a new dataset namedbabynames
in your project:bq mk babynames
- Run
bq ls
to confirm that the dataset now appears as part of your project:bq ls
Upload the dataset
- Before you can build the table, you need to add the dataset to your project. The custom data file you’ll use contains approximately 7 MB of data about popular baby names, provided by the US Social Security Administration.Run this command to add the baby names zip file to your project, using the URL for the data file:
wget http://www.ssa.gov/OACT/babynames/names.zip
- List the file:
ls
You can see the name of the file added to your project.
- Now unzip the file:
unzip names.zip
- That’s a pretty big list of text files! List the files again:
ls
The
bq load
command creates or updates a table and loads data in a single step.You will use the bq load command to load your source file into a new table called names2010 in the babynames dataset you just created. By default, this runs synchronously, and will take a few seconds to complete.
- The
bq load
arguments you’ll be running are:datasetID: babynames tableID: names2010 source: yob2010.txt schema: name:string,gender:string,count:integer
- Create your table:
bq load babynames.names2010 yob2010.txt name:string,gender:string,count:integer
- Run
bq ls
andbabynames
to confirm that the table now appears in your dataset:bq ls babynames
- Run
bq show
and yourdataset.table
to see the schema:bq show babynames.names2010
Run queries
- Run the following command to return the top 5 most popular girls names:
bq query "SELECT name,count FROM babynames.names2010 WHERE gender = 'F' ORDER BY count DESC LIMIT 5"
- Run the following command to see the top 5 most unusual boys names.
bq query "SELECT name,count FROM babynames.names2010 WHERE gender = 'M' ORDER BY count ASC LIMIT 5"
Tag:Google Cloud