Migrating from Python Client Library v0.27

The BigQuery client library for Python v0.28 includes some significant changes to how previous client libraries were designed in v0.27 and earlier. These changes can be summarized as follows:

Query and view operations default to the standard SQL dialect.
Client functions related to jobs, like running queries, immediately start the job.
Functions to create, get, update, delete datasets and tables moved to the client class.

This topic provides details on the changes that you need to make to your Python code for the BigQuery client libraries so that you can use the latest version of the Python client library.

Running previous versions of the client library

You are not required to upgrade your Python client library to the latest version. However, new functionality in the BigQuery API is only supported in the v0.28 and later versions.

If you want to continue using a previous version of the Python client library and you do not want to migrate your code, specify the version of the Python client library that is used by your app. To specify a specific library version, edit the requirements.txt file as shown in the following example:

google-cloud-bigquery==0.27

Running the latest version of the client library.

To install the latest version of the Python client library, use the pip command.

pip install --upgrade google-cloud-bigquery

For more detailed instructions, see BigQuery client libraries.

Importing the library and creating a client

Importing the Python client library and creating a client object is the same in previous and newer versions of the library.

from google.cloud import bigquery

client = bigquery.Client()

Query code changes

Querying data with the standard SQL dialect

Changes in v0.28 and later include:

Standard SQL is the default SQL dialect.
Use the QueryJobConfig class to configure the query job.
client.query() makes an API request to immediately start the query.
A job ID is optional. If one is not supplied, the client library generates one for you.

The following sample shows how to run a query.

Previous versions of the client libraries:

client = bigquery.Client()
query_job = client.run_async_query(str(uuid.uuid4()), query)

# Use standard SQL syntax.
query_job.use_legacy_sql = False

# Set a destination table.
dest_dataset = client.dataset(dest_dataset_id)
dest_table = dest_dataset.table(dest_table_id)
query_job.destination = dest_table

# Allow the results table to be overwritten.
query_job.write_disposition = 'WRITE_TRUNCATE'

query_job.begin()
query_job.result()  # Wait for query to finish.

rows = query_job.query_results().fetch_data()
for row in rows:
    print(row)

In version 0.25.0 or earlier of the google-cloud-bigquery library, instead of job.result(), the following code was required to wait for the job objects to finish:

while True:
    job.reload()  # Refreshes the state via a GET request.
    if job.state == 'DONE':
        if job.error_result:
            raise RuntimeError(job.errors)
        return
    time.sleep(1)

In version 0.25.0 or earlier of the google-cloud-bigquery library, instead of job.query_results().fetch_data(), the following code was used to get the resulting rows:

rows = query_job.results().fetch_data()

Latest version of the client library:

The Python client library now uses standard SQL by default.

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.

View on GitHub Feedback


from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

query = """
    SELECT name, SUM(number) as total_people
    FROM `bigquery-public-data.usa_names.usa_1910_2013`
    WHERE state = 'TX'
    GROUP BY name, state
    ORDER BY total_people DESC
    LIMIT 20
"""
query_job = client.query(query)  # Make an API request.

print("The query data:")
for row in query_job:
    # Row values can be accessed by field name or index.
    print("name={}, count={}".format(row[0], row["total_people"]))

For more samples of running queries with the latest Python client library version, see:

Downloading query results as a pandas dataframe

The following sample shows how to run a query and download the results as a pandas dataframe.

Previous versions of the client libraries:

Previous versions of the client libraries did not support downloading results to a pandas dataframe.

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.

View on GitHub Feedback

# from google.cloud import bigquery
# client = bigquery.Client()

sql = """
    SELECT name, SUM(number) as count
    FROM `bigquery-public-data.usa_names.usa_1910_current`
    GROUP BY name
    ORDER BY count DESC
    LIMIT 10
"""

df = client.query(sql).to_dataframe()

Querying data with the legacy SQL dialect

The following sample shows how to run a query using the legacy SQL dialect.

Previous versions of the client libraries:

Previous versions of the client libraries defaulted to legacy SQL syntax. For information on how to configure and run a query, see the query sample.

Latest version of the client library:

The client library defaults to standard SQL syntax. Set use_legacy_sql to true to use legacy SQL. For information on how to configure and run a query, see the query sample.

Querying data synchronously

In v0.28 and later, the Client.query() method is recommended because one is able to access statistics and other properties of the query in the QueryJob.

Previous versions of the client libraries:

query_results = client.run_sync_query(query)
query_results.use_legacy_sql = False

query_results.run()

# The query might not complete in a single request. To account for a
# long-running query, force the query results to reload until the query
# is complete.
while not query_results.complete:
  query_iterator = query_results.fetch_data()
  try:
     six.next(iter(query_iterator))
  except StopIteration:
      pass

rows = query_results.fetch_data()
for row in rows:
    print(row)

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.

View on GitHub Feedback


from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

query = """
    SELECT name, SUM(number) as total_people
    FROM `bigquery-public-data.usa_names.usa_1910_2013`
    WHERE state = 'TX'
    GROUP BY name, state
    ORDER BY total_people DESC
    LIMIT 20
"""
query_job = client.query(query)  # Make an API request.

print("The query data:")
for row in query_job:
    # Row values can be accessed by field name or index.
    print("name={}, count={}".format(row[0], row["total_people"]))

Table code changes

Table references

Use a TableReference object to refer to a table without additional properties and a Table to refer to a full table resource. Several properties which formerly used the Table class now use the TableReference class in v0.28 and later. For example:

QueryJob.destination is now a TableReference.
client.dataset('mydataset').table('mytable') now returns a TableReference.

For an example that uses both the TableReference and Table classes, see How to create a table.

Loading data from a local file

The following sample shows how to load a local file into a BigQuery table.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)

# Reload the table to get the schema.
table.reload()

with open(source_file_name, 'rb') as source_file:
    # This example uses CSV, but you can use other formats.
    # See https://cloud.google.com/bigquery/loading-data
    job = table.upload_from_file(
        source_file, source_format='text/csv')

# Wait for the load job to complete.
while True:
    job.reload()
    if job.state == 'DONE':
        if job.error_result:
            raise RuntimeError(job.errors)
        return
    time.sleep(1)

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.

View on GitHub Feedback

# from google.cloud import bigquery
# client = bigquery.Client()
# filename = '/path/to/file.csv'
# dataset_id = 'my_dataset'
# table_id = 'my_table'

dataset_ref = client.dataset(dataset_id)
table_ref = dataset_ref.table(table_id)
job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.CSV
job_config.skip_leading_rows = 1
job_config.autodetect = True

with open(filename, "rb") as source_file:
    job = client.load_table_from_file(source_file, table_ref, job_config=job_config)

job.result()  # Waits for table load to complete.

print("Loaded {} rows into {}:{}.".format(job.output_rows, dataset_id, table_id))

For more details, see Loading data from a local data source.

Loading data from a pandas dataframe

The following sample shows how to upload a pandas dataframe to a BigQuery table.

Previous versions of the client libraries:

Previous versions of the client libraries did not support uploading data from a pandas dataframe.

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.

View on GitHub Feedback

from google.cloud import bigquery

import pandas

# Construct a BigQuery client object.
client = bigquery.Client()

# TODO(developer): Set table_id to the ID of the table to create.
# table_id = "your-project.your_dataset.your_table_name"

records = [
    {"title": u"The Meaning of Life", "release_year": 1983},
    {"title": u"Monty Python and the Holy Grail", "release_year": 1975},
    {"title": u"Life of Brian", "release_year": 1979},
    {"title": u"And Now for Something Completely Different", "release_year": 1971},
]
dataframe = pandas.DataFrame(
    records,
    # In the loaded table, the column order reflects the order of the
    # columns in the DataFrame.
    columns=["title", "release_year"],
    # Optionally, set a named index, which can also be written to the
    # BigQuery table.
    index=pandas.Index(
        [u"Q24980", u"Q25043", u"Q24953", u"Q16403"], name="wikidata_id"
    ),
)
job_config = bigquery.LoadJobConfig(
    # Specify a (partial) schema. All columns are always written to the
    # table. The schema is used to assist in data type definitions.
    schema=[
        # Specify the type of columns whose type cannot be auto-detected. For
        # example the "title" column uses pandas dtype "object", so its
        # data type is ambiguous.
        bigquery.SchemaField("title", bigquery.enums.SqlTypeNames.STRING),
        # Indexes are written if included in the schema by name.
        bigquery.SchemaField("wikidata_id", bigquery.enums.SqlTypeNames.STRING),
    ],
    # Optionally, set the write disposition. BigQuery appends loaded rows
    # to an existing table by default, but with WRITE_TRUNCATE write
    # disposition it replaces the table with the loaded data.
    write_disposition="WRITE_TRUNCATE",
)

job = client.load_table_from_dataframe(
    dataframe, table_id, job_config=job_config
)  # Make an API request.
job.result()  # Wait for the job to complete.

table = client.get_table(table_id)  # Make an API request.
print(
    "Loaded {} rows and {} columns to {}".format(
        table.num_rows, len(table.schema), table_id
    )
)

Loading data from Cloud Storage

The following sample shows how to load a file from Cloud Storage into a BigQuery table.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
job_id = str(uuid.uuid4())

job = client.load_table_from_storage(
    job_id, table, 'gs://bucket_name/object_name')
job.begin()

# Wait for the load job to complete.
while True:
    job.reload()
    if job.state == 'DONE':
        if job.error_result:
            raise RuntimeError(job.errors)
        return
    time.sleep(1)

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.

View on GitHub Feedback

# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_id = 'my_dataset'

dataset_ref = client.dataset(dataset_id)
job_config = bigquery.LoadJobConfig()
job_config.schema = [
    bigquery.SchemaField("name", "STRING"),
    bigquery.SchemaField("post_abbr", "STRING"),
]
job_config.skip_leading_rows = 1
# The source format defaults to CSV, so the line below is optional.
job_config.source_format = bigquery.SourceFormat.CSV
uri = "gs://cloud-samples-data/bigquery/us-states/us-states.csv"

load_job = client.load_table_from_uri(
    uri, dataset_ref.table("us_states"), job_config=job_config
)  # API request
print("Starting job {}".format(load_job.job_id))

load_job.result()  # Waits for table load to complete.
print("Job finished.")

destination_table = client.get_table(dataset_ref.table("us_states"))
print("Loaded {} rows.".format(destination_table.num_rows))

For more details, see Loading data from Cloud Storage.

Extracting a table to Cloud Storage

The following sample shows how to extract a table to Cloud Storage.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
job_id = str(uuid.uuid4())

job = client.extract_table_to_storage(
    job_id, table, 'gs://bucket_name/object_name')
job.begin()

# Wait for the job to complete.
while True:
    job.reload()
    if job.state == 'DONE':
        if job.error_result:
            raise RuntimeError(job.errors)
        return
    time.sleep(1)

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.

View on GitHub Feedback

# from google.cloud import bigquery
# client = bigquery.Client()
# bucket_name = 'my-bucket'
project = "bigquery-public-data"
dataset_id = "samples"
table_id = "shakespeare"

destination_uri = "gs://{}/{}".format(bucket_name, "shakespeare.csv")
dataset_ref = client.dataset(dataset_id, project=project)
table_ref = dataset_ref.table(table_id)

extract_job = client.extract_table(
    table_ref,
    destination_uri,
    # Location must match that of the source table.
    ___location="US",
)  # API request
extract_job.result()  # Waits for job to complete.

print(
    "Exported {}:{}.{} to {}".format(project, dataset_id, table_id, destination_uri)
)

For more details, see Exporting table data.

Copying a table

The following sample shows how to copy a table to another table.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
destination_table = dataset.table(new_table_name)

job_id = str(uuid.uuid4())
job = client.copy_table(job_id, destination_table, table)

job.create_disposition = (
        google.cloud.bigquery.job.CreateDisposition.CREATE_IF_NEEDED)
job.begin()

# Wait for the job to complete.
while True:
    job.reload()
    if job.state == 'DONE':
        if job.error_result:
            raise RuntimeError(job.errors)
        return
    time.sleep(1)

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.

View on GitHub Feedback


from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

# TODO(developer): Set source_table_id to the ID of the original table.
# source_table_id = "your-project.source_dataset.source_table"

# TODO(developer): Set destination_table_id to the ID of the destination table.
# destination_table_id = "your-project.destination_dataset.destination_table"

job = client.copy_table(source_table_id, destination_table_id)
job.result()  # Wait for the job to complete.

print("A copy of the table created.")

For more details, see Copying a table.

Streaming data into a table

The following sample shows how to write rows to a table's streaming buffer.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)

# Reload the table to get the schema.
table.reload()

rows = [('values', 'in', 'same', 'order', 'as', 'schema')]
errors = table.insert_data(rows)

if not errors:
    print('Loaded 1 row into {}:{}'.format(dataset_name, table_name))
else:
    do_something_with(errors)

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.

View on GitHub Feedback


from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

# TODO(developer): Set table_id to the ID of the model to fetch.
# table_id = "your-project.your_dataset.your_table"

table = client.get_table(table_id)  # Make an API request.
rows_to_insert = [(u"Phred Phlyntstone", 32), (u"Wylma Phlyntstone", 29)]

errors = client.insert_rows(table, rows_to_insert)  # Make an API request.
if errors == []:
    print("New rows have been added.")

For more details, see Streaming data into BigQuery.

Listing tables

The following sample shows how to list tables in a dataset.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
for table in dataset.list_tables():
    print(table.name)

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.

View on GitHub Feedback


from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

# TODO(developer): Set dataset_id to the ID of the dataset that contains
#                  the tables you are listing.
# dataset_id = 'your-project.your_dataset'

tables = client.list_tables(dataset_id)  # Make an API request.

print("Tables contained in '{}':".format(dataset_id))
for table in tables:
    print("{}.{}.{}".format(table.project, table.dataset_id, table.table_id))

For more details, see Listing tables.

Get a table

The following sample shows how to get a table.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
table.reload()

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.

View on GitHub Feedback


from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

# TODO(developer): Set table_id to the ID of the model to fetch.
# table_id = 'your-project.your_dataset.your_table'

table = client.get_table(table_id)  # Make an API request.

# View table properties
print(
    "Got table '{}.{}.{}'.".format(table.project, table.dataset_id, table.table_id)
)
print("Table schema: {}".format(table.schema))
print("Table description: {}".format(table.description))
print("Table has {} rows".format(table.num_rows))

For more details, see Getting information about tables.

Check that a table exists

The BigQuery API does not provide a native exists method. Instead, get the table resource and check if that request results in a 404 error. Previously, the client library provided the exists() helper to perform this check. The exists() helper allowed some inefficient use cases such as calling exists() before trying to get the full resource. As a result, the exists() helper was removed from the client library.

The following sample shows how to check whether a table exists.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
if table.exists():
    # do something
else:
    # do something else

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.

View on GitHub Feedback

from google.cloud import bigquery
from google.cloud.exceptions import NotFound

client = bigquery.Client()

# TODO(developer): Set table_id to the ID of the table to determine existence.
# table_id = "your-project.your_dataset.your_table"

try:
    client.get_table(table_id)  # Make an API request.
    print("Table {} already exists.".format(table_id))
except NotFound:
    print("Table {} is not found.".format(table_id))

Create a table

The following sample shows how to create a table.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
table.create()

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.

View on GitHub Feedback

from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

# TODO(developer): Set table_id to the ID of the table to create.
# table_id = "your-project.your_dataset.your_table_name"

schema = [
    bigquery.SchemaField("full_name", "STRING", mode="REQUIRED"),
    bigquery.SchemaField("age", "INTEGER", mode="REQUIRED"),
]

table = bigquery.Table(table_id, schema=schema)
table = client.create_table(table)  # Make an API request.
print(
    "Created table {}.{}.{}".format(table.project, table.dataset_id, table.table_id)
)

For more details, see Creating a table.

Update a table

The following sample shows how to update a table.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
table.patch(description='new description')

Note that previous versions of the library do not check versions of the table resource via the etag property, so a read-modify-write is unsafe.

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.

View on GitHub Feedback

# from google.cloud import bigquery
# client = bigquery.Client()
# table_ref = client.dataset('my_dataset').table('my_table')
# table = client.get_table(table_ref)  # API request

assert table.description == "Original description."
table.description = "Updated description."

table = client.update_table(table, ["description"])  # API request

assert table.description == "Updated description."

For more details, see Updating table properties.

Delete a table

The following sample shows how to delete a table.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
table.delete()

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.

View on GitHub Feedback


from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

# TODO(developer): Set table_id to the ID of the table to fetch.
# table_id = 'your-project.your_dataset.your_table'

# If the table does not exist, delete_table raises
# google.api_core.exceptions.NotFound unless not_found_ok is True.
client.delete_table(table_id, not_found_ok=True)  # Make an API request.
print("Deleted table '{}'.".format(table_id))

For more details, see Deleting a table.

Dataset code changes

Dataset references

Use a DatasetReference object to refer to a dataset without additional properties and a Dataset to refer to a full dataset resource. Some methods that formerly used the Dataset class now use the DatasetReference class in v0.28 and later. For example:

client.dataset('mydataset') now returns a DatasetReference.

For an example that uses both the DatasetReference and Dataset classes, see How to create a dataset.

Listing datasets

The following sample shows how to list datasets in a project.

Previous versions of the client libraries:

for dataset in client.list_datasets():
    print(dataset.name)

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.

View on GitHub Feedback


from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

datasets = list(client.list_datasets())  # Make an API request.
project = client.project

if datasets:
    print("Datasets in project {}:".format(project))
    for dataset in datasets:
        print("\t{}".format(dataset.dataset_id))
else:
    print("{} project does not contain any datasets.".format(project))

For more details, see Listing datasets.

Get a dataset

The following sample shows how to get a dataset.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
dataset.reload()

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.

View on GitHub Feedback


from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

# TODO(developer): Set dataset_id to the ID of the dataset to fetch.
# dataset_id = 'your-project.your_dataset'

dataset = client.get_dataset(dataset_id)  # Make an API request.

full_dataset_id = "{}.{}".format(dataset.project, dataset.dataset_id)
friendly_name = dataset.friendly_name
print(
    "Got dataset '{}' with friendly_name '{}'.".format(
        full_dataset_id, friendly_name
    )
)

# View dataset properties.
print("Description: {}".format(dataset.description))
print("Labels:")
labels = dataset.labels
if labels:
    for label, value in labels.items():
        print("\t{}: {}".format(label, value))
else:
    print("\tDataset has no labels defined.")

# View tables in dataset.
print("Tables:")
tables = list(client.list_tables(dataset))  # Make an API request(s).
if tables:
    for table in tables:
        print("\t{}".format(table.table_id))
else:
    print("\tThis dataset does not contain any tables.")

For more details, see Getting information about datasets.

Check that a dataset exists

The BigQuery API does not provide a native exists method. Instead, get the dataset resource and check if that request results in a 404 error. Previously, the client library provided the exists() helper to perform this check. The exists() helper allowed some inefficient use cases such as calling exists() before trying to get the full resource. As a result, the exists() helper was removed from the client library.

The following sample shows how to check whether a dataset exists.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
if dataset.exists():
    # do something
else:
    # do something else

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.

View on GitHub Feedback

from google.cloud import bigquery
from google.cloud.exceptions import NotFound

client = bigquery.Client()

# TODO(developer): Set dataset_id to the ID of the dataset to determine existence.
# dataset_id = "your-project.your_dataset"

try:
    client.get_dataset(dataset_id)  # Make an API request.
    print("Dataset {} already exists".format(dataset_id))
except NotFound:
    print("Dataset {} is not found".format(dataset_id))

Create a dataset

The following sample shows how to create a dataset.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
dataset.create()

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.

View on GitHub Feedback

from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

# TODO(developer): Set dataset_id to the ID of the dataset to create.
# dataset_id = "{}.your_dataset".format(client.project)

# Construct a full Dataset object to send to the API.
dataset = bigquery.Dataset(dataset_id)

# TODO(developer): Specify the geographic ___location where the dataset should reside.
dataset.___location = "US"

# Send the dataset to the API for creation.
# Raises google.api_core.exceptions.Conflict if the Dataset already
# exists within the project.
dataset = client.create_dataset(dataset)  # Make an API request.
print("Created dataset {}.{}".format(client.project, dataset.dataset_id))

For more details, see Creating datasets.

Update a dataset

The following sample shows how to update a dataset.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
dataset.patch(description='new description')

Note that previous versions of the library do not check versions of the dataset resource via the etag property, so a read-modify-write is unsafe.

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.

View on GitHub Feedback


from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

# TODO(developer): Set dataset_id to the ID of the dataset to fetch.
# dataset_id = 'your-project.your_dataset'

dataset = client.get_dataset(dataset_id)  # Make an API request.
dataset.description = "Updated description."
dataset = client.update_dataset(dataset, ["description"])  # Make an API request.

full_dataset_id = "{}.{}".format(dataset.project, dataset.dataset_id)
print(
    "Updated dataset '{}' with description '{}'.".format(
        full_dataset_id, dataset.description
    )
)

For more details, see Updating dataset properties.

Delete a dataset

The following sample shows how to delete a dataset.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
dataset.delete()

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.

View on GitHub Feedback


from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

# TODO(developer): Set model_id to the ID of the model to fetch.
# dataset_id = 'your-project.your_dataset'

# Use the delete_contents parameter to delete a dataset and its contents.
# Use the not_found_ok parameter to not receive an error if the dataset has already been deleted.
client.delete_dataset(
    dataset_id, delete_contents=True, not_found_ok=True
)  # Make an API request.

print("Deleted dataset '{}'.".format(dataset_id))

For more details, see Deleting datasets.