Google BigQuery
Google BigQuery is a data warehouse for running analytic SQL queries. It automatically scales to query datasets of up to petabytes in size.
Links
Overview
Use the BigQuery web UI,
bq
command-line
tool, or
APIs to make SQL
queries.
How it works
BigQuery is the cloud offering for the Dremel service (see paper). Data is stored by column in a compressed format called Capacitor.
To run a SQL query, the query engine scans all rows in the table. The query uses many parallel workers to scan the compressed data directly. The query scans only the columns and partitions it needs.
Pricing
BigQuery pricing is somewhat unique.
Queries
You are charged per query based on the amount of data the query needed to access. The project running the query is charged, not the project that stores the data (unless of course these are the same). See the query pricing table for details.
You can query 1 TB of data for free per month. You can try BigQuery without a credit card. You don’t need to start a free trial to use it.
Storage
Data is charged per GB per month, but prorated per MB, per second. See the storage pricing table. Data which is not modified for 90 days is charged a lower long-term storage rate
You can use up to 10 GB of storage for free per month.
Public Data
Google hosts many public datasets on BigQuery. You can query these tables directly or join them to your own data.
Contact the public data team at Google if you think your dataset would be a good fit for the program: bq-public-data AT google.com.
Hosting your own public dataset
Since queries are charged to the project running the queries not the one storing the data, you can make a popular dataset but only get charged for storage and the queries you run yourself.
See this tweet for how to make a dataset public:
- Go to the dataset on the BigQuery web UI.
- Click the down arrow next to the dataset name.
- Select Share dataset.
- Select All authenticated users (meaning anyone with a Google account and cloud project).
- Ensure the View permissions are set.
- Click the Add button.
- Click the Save changes button.
External data sources
You can use BigQuery to run SQL queries against data stored outside of BigQuery datasets by using the external data sources. For example:
- Query files in Cloud Storage of several different data formats.
- Query a Bigtable database. This supports key range SQL queries so that a full table scan is not always needed.
- Query files stored in Google Drive.