The BigQuery client library for Python v0.28 includes some significant changes to how previous client libraries were designed in v0.27 and earlier. These changes can be summarized as follows:
- Query and view operations default to the standard SQL dialect.
- Client functions related to jobs, like running queries, immediately start the job.
- Functions to create, get, update, delete datasets and tables moved to the client class.
This topic provides details on the changes that you need to make to your Python code for the BigQuery client libraries so that you can use the latest version of the Python client library.
Running previous versions of the client library
You are not required to upgrade your Python client library to the latest version. However, new functionality in the BigQuery API is only supported in the v0.28 and later versions.
If you want to continue using a previous version of the Python client library
and you do not want to migrate your code, specify the version of
the Python client library that is used by your app. To specify a specific
library version, edit the requirements.txt file as shown in the following
example:
google-cloud-bigquery==0.27
Running the latest version of the client library.
To install the latest version of the Python client library, use the pip
command.
pip install --upgrade google-cloud-bigquery
For more detailed instructions, see BigQuery client libraries.
Importing the library and creating a client
Importing the Python client library and creating a client object is the same in previous and newer versions of the library.
from google.cloud import bigquery
client = bigquery.Client()
Query code changes
Querying data with the standard SQL dialect
Changes in v0.28 and later include:
- Standard SQL is the default SQL dialect.
- Use the
QueryJobConfigclass to configure the query job. client.query()makes an API request to immediately start the query.- A job ID is optional. If one is not supplied, the client library generates one for you.
The following sample shows how to run a query.
Previous versions of the client libraries:
client = bigquery.Client()
query_job = client.run_async_query(str(uuid.uuid4()), query)
# Use standard SQL syntax.
query_job.use_legacy_sql = False
# Set a destination table.
dest_dataset = client.dataset(dest_dataset_id)
dest_table = dest_dataset.table(dest_table_id)
query_job.destination = dest_table
# Allow the results table to be overwritten.
query_job.write_disposition = 'WRITE_TRUNCATE'
query_job.begin()
query_job.result() # Wait for query to finish.
rows = query_job.query_results().fetch_data()
for row in rows:
print(row)
In version 0.25.0 or earlier of the google-cloud-bigquery library, instead
of job.result(), the following code was required to wait for the job objects
to finish:
while True:
job.reload() # Refreshes the state via a GET request.
if job.state == 'DONE':
if job.error_result:
raise RuntimeError(job.errors)
return
time.sleep(1)
In version 0.25.0 or earlier of the google-cloud-bigquery library, instead
of job.query_results().fetch_data(), the following code was used to get the
resulting rows:
rows = query_job.results().fetch_data()
Latest version of the client library:
The Python client library now uses standard SQL by default.
Python
Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.
For more samples of running queries with the latest Python client library version, see:
Downloading query results as a pandas dataframe
The following sample shows how to run a query and download the results as a pandas dataframe.
Previous versions of the client libraries:
Previous versions of the client libraries did not support downloading results to a pandas dataframe.
Latest version of the client library:
Python
Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.
Querying data with the legacy SQL dialect
The following sample shows how to run a query using the legacy SQL dialect.
Previous versions of the client libraries:
Previous versions of the client libraries defaulted to legacy SQL syntax. For information on how to configure and run a query, see the query sample.
Latest version of the client library:
The client library defaults to standard SQL syntax. Set use_legacy_sql to true
to use legacy SQL. For information on how to configure and run a query, see the
query sample.
Querying data synchronously
In v0.28 and later, the Client.query() method is recommended because one is
able to access statistics and other properties of the query in the QueryJob.
Previous versions of the client libraries:
query_results = client.run_sync_query(query)
query_results.use_legacy_sql = False
query_results.run()
# The query might not complete in a single request. To account for a
# long-running query, force the query results to reload until the query
# is complete.
while not query_results.complete:
query_iterator = query_results.fetch_data()
try:
six.next(iter(query_iterator))
except StopIteration:
pass
rows = query_results.fetch_data()
for row in rows:
print(row)
Latest version of the client library:
Python
Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.
Table code changes
Table references
Use a TableReference object to refer to a table without additional properties
and a Table to refer to a full table resource. Several properties which
formerly used the Table class now use the TableReference class in v0.28 and
later. For example:
QueryJob.destinationis now aTableReference.client.dataset('mydataset').table('mytable')now returns aTableReference.
For an example that uses both the TableReference and Table classes, see
How to create a table.
Loading data from a local file
The following sample shows how to load a local file into a BigQuery table.
Previous versions of the client libraries:
dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
# Reload the table to get the schema.
table.reload()
with open(source_file_name, 'rb') as source_file:
# This example uses CSV, but you can use other formats.
# See https://cloud.google.com/bigquery/loading-data
job = table.upload_from_file(
source_file, source_format='text/csv')
# Wait for the load job to complete.
while True:
job.reload()
if job.state == 'DONE':
if job.error_result:
raise RuntimeError(job.errors)
return
time.sleep(1)
Latest version of the client library:
Python
Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.
For more details, see Loading data from a local data source.
Loading data from a pandas dataframe
The following sample shows how to upload a pandas dataframe to a BigQuery table.
Previous versions of the client libraries:
Previous versions of the client libraries did not support uploading data from a pandas dataframe.
Latest version of the client library:
Python
Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.
Loading data from Cloud Storage
The following sample shows how to load a file from Cloud Storage into a BigQuery table.
Previous versions of the client libraries:
dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
job_id = str(uuid.uuid4())
job = client.load_table_from_storage(
job_id, table, 'gs://bucket_name/object_name')
job.begin()
# Wait for the load job to complete.
while True:
job.reload()
if job.state == 'DONE':
if job.error_result:
raise RuntimeError(job.errors)
return
time.sleep(1)
Latest version of the client library:
Python
Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.
For more details, see Loading data from Cloud Storage.
Extracting a table to Cloud Storage
The following sample shows how to extract a table to Cloud Storage.
Previous versions of the client libraries:
dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
job_id = str(uuid.uuid4())
job = client.extract_table_to_storage(
job_id, table, 'gs://bucket_name/object_name')
job.begin()
# Wait for the job to complete.
while True:
job.reload()
if job.state == 'DONE':
if job.error_result:
raise RuntimeError(job.errors)
return
time.sleep(1)
Latest version of the client library:
Python
Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.
For more details, see Exporting table data.
Copying a table
The following sample shows how to copy a table to another table.
Previous versions of the client libraries:
dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
destination_table = dataset.table(new_table_name)
job_id = str(uuid.uuid4())
job = client.copy_table(job_id, destination_table, table)
job.create_disposition = (
google.cloud.bigquery.job.CreateDisposition.CREATE_IF_NEEDED)
job.begin()
# Wait for the job to complete.
while True:
job.reload()
if job.state == 'DONE':
if job.error_result:
raise RuntimeError(job.errors)
return
time.sleep(1)
Latest version of the client library:
Python
Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.
For more details, see Copying a table.
Streaming data into a table
The following sample shows how to write rows to a table's streaming buffer.
Previous versions of the client libraries:
dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
# Reload the table to get the schema.
table.reload()
rows = [('values', 'in', 'same', 'order', 'as', 'schema')]
errors = table.insert_data(rows)
if not errors:
print('Loaded 1 row into {}:{}'.format(dataset_name, table_name))
else:
do_something_with(errors)
Latest version of the client library:
Python
Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.
For more details, see Streaming data into BigQuery.
Listing tables
The following sample shows how to list tables in a dataset.
Previous versions of the client libraries:
dataset = client.dataset(dataset_name)
for table in dataset.list_tables():
print(table.name)
Latest version of the client library:
Python
Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.
For more details, see Listing tables.
Get a table
The following sample shows how to get a table.
Previous versions of the client libraries:
dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
table.reload()
Latest version of the client library:
Python
Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.
For more details, see Getting information about tables.
Check that a table exists
The BigQuery API does not provide a native exists method. Instead, get the
table resource and check if that request results in a 404 error. Previously, the
client library provided the exists() helper to perform this check. The
exists() helper allowed some inefficient use cases such as calling exists()
before trying to get the full resource. As a result, the exists() helper was
removed from the client library.
The following sample shows how to check whether a table exists.
Previous versions of the client libraries:
dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
if table.exists():
# do something
else:
# do something else
Latest version of the client library:
Python
Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.
Create a table
The following sample shows how to create a table.
Previous versions of the client libraries:
dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
table.create()
Latest version of the client library:
Python
Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.
For more details, see Creating a table.
Update a table
The following sample shows how to update a table.
Previous versions of the client libraries:
dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
table.patch(description='new description')
Note that previous versions of the library do not check versions of the table resource via the etag property, so a read-modify-write is unsafe.
Latest version of the client library:
Python
Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.
For more details, see Updating table properties.
Delete a table
The following sample shows how to delete a table.
Previous versions of the client libraries:
dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
table.delete()
Latest version of the client library:
Python
Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.
For more details, see Deleting a table.
Dataset code changes
Dataset references
Use a DatasetReference object to refer to a dataset without additional
properties and a Dataset to refer to a full dataset resource. Some methods
that formerly used the Dataset class now use the DatasetReference class in
v0.28 and later. For example:
client.dataset('mydataset')now returns aDatasetReference.
For an example that uses both the DatasetReference and Dataset classes, see
How to create a dataset.
Listing datasets
The following sample shows how to list datasets in a project.
Previous versions of the client libraries:
for dataset in client.list_datasets():
print(dataset.name)
Latest version of the client library:
Python
Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.
For more details, see Listing datasets.
Get a dataset
The following sample shows how to get a dataset.
Previous versions of the client libraries:
dataset = client.dataset(dataset_name)
dataset.reload()
Latest version of the client library:
Python
Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.
For more details, see Getting information about datasets.
Check that a dataset exists
The BigQuery API does not provide a native exists method.
Instead, get the dataset resource and check if that request results in a 404
error. Previously, the client library provided the exists() helper to perform
this check. The exists() helper allowed some inefficient use cases such as
calling exists() before trying to get the full resource. As a result, the
exists() helper was removed from the client library.
The following sample shows how to check whether a dataset exists.
Previous versions of the client libraries:
dataset = client.dataset(dataset_name)
if dataset.exists():
# do something
else:
# do something else
Latest version of the client library:
Python
Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.
Create a dataset
The following sample shows how to create a dataset.
Previous versions of the client libraries:
dataset = client.dataset(dataset_name)
dataset.create()
Latest version of the client library:
Python
Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.
For more details, see Creating datasets.
Update a dataset
The following sample shows how to update a dataset.
Previous versions of the client libraries:
dataset = client.dataset(dataset_name)
dataset.patch(description='new description')
Note that previous versions of the library do not check versions of the dataset resource via the etag property, so a read-modify-write is unsafe.
Latest version of the client library:
Python
Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.
For more details, see Updating dataset properties.
Delete a dataset
The following sample shows how to delete a dataset.
Previous versions of the client libraries:
dataset = client.dataset(dataset_name)
dataset.delete()
Latest version of the client library:
Python
Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.
For more details, see Deleting datasets.