Google BigQuery is a data warehouse for running analytic SQL queries. It automatically scales to query datasets of up to petabytes in size.
How it works
To run a SQL query, the query engine scans all rows in the table. The query uses many parallel workers to scan the compressed data directly. The query scans only the columns and partitions it needs.
BigQuery pricing is somewhat unique.
You are charged per query based on the amount of data the query needed to access. The project running the query is charged, not the project that stores the data (unless of course these are the same). See the query pricing table for details.
You can use up to 10 GB of storage for free per month.
Google hosts many public datasets on BigQuery. You can query these tables directly or join them to your own data.
Contact the public data team at Google if you think your dataset would be a good fit for the program: bq-public-data AT google.com.
Hosting your own public dataset
Since queries are charged to the project running the queries not the one storing the data, you can make a popular dataset but only get charged for storage and the queries you run yourself.
See this tweet for how to make a dataset public:
- Go to the dataset on the BigQuery web UI.
- Click the down arrow next to the dataset name.
- Select Share dataset.
- Select All authenticated users (meaning anyone with a Google account and cloud project).
- Ensure the View permissions are set.
- Click the Add button.
- Click the Save changes button.
External data sources
You can use BigQuery to run SQL queries against data stored outside of BigQuery datasets by using the external data sources. For example: