What to do if you accidentally delete an entire bigquery dataset

Sun, Nov 22, 2020 2-minute read

A quick note on a particular feature of bigquery that I did not see anywhere in the documentation. If you accidentally delete a single table, you can use a feature called timetravel to restore the state of that table from a previous point in time. The way to do this:

1
bq cp my_data_set.my_deleted_table@1234567890 my_data_set.my_restored_table

If you have deleted the entire dataset however, the above method will fail. Interestingly, if you simply recreate a dataset with the same name as the one that was deleted, timetravel on the tables that lived in that dataset will work again. In other words, if you had accidentally deleted the dataset called “my_data_set” simply recreate it first:

1
bq mk my_data_set

and then perform the previous timetravel operation for every table that was in your deleted dataset:

1
bq cp my_data_set.my_deleted_table@1234567890 my_data_set.my_restored_table

and you should be all set. One thing to note is that if the dataset in question had a very large number of tables, there really are no good tools for obtaining bigquery metadata so you can figure out the list of tables that previously existed. As far as I can tell, the best you can do is go through your stackdriver logs, or other logs to look for all references to tables in that dataset. A similar problem applies to figuring out the latest available timestamp for a table. Bigquery’s bq command has the following syntactic sugar to denote the oldest available snapshot of a table:

1
bq cp my_data_set.my_deleted_table@0 my_data_set.my_restored_table

The 0 timestamp references the earliest available snapshot for that table. When dealing with a table that was deleted, however, no equivalent syntactic sugar exists to reference the most recent available snapshot for that table.