Firebase seems to have been providing great built in features and documentations for it’s users. Although when it comes to backups they are slightly lacking in features and tools.
What is Backup and Recovery ?
Backup and recovery describes the process of creating and storing copies of data that can be used to protect organizations against data loss. This is sometimes referred to as operational recovery. Recovery from a backup typically involves restoring the data to the original location, or to an alternate location where it can be used in place of the lost or damaged data.
What are the different ways in which one can create backup of Cloud Firestore ?
1. Manual Approach
2. Automated Approach
You must be clear on the following points before beginning:
1. You are creating backups for Cloud Firestore and not for Realtime Database. These sound similar and may confuse you during the implementation.
2. Your project must be on blaze plan for making backups.
1. You will have to create a new bucket corresponding to your project and give permissions to your bucket.
2. While creating bucket make sure your location supports import/export of data. You can select Multi Region with US.
3. Make sure you choose the correct bucket type depending on your requirement. I would recommend Nearline for storing backups. You can read more about the storage classes and their pricing before deciding.
4. You can setup the google cloudshell for executing the forthcoming commands.
gcloud config set project PROJECT_NAME
5. You can export your data manually to the respective bucket through :
gcloud alpha firestore export gs://BUCKET_NAME
6. Your data import will begin and you can check it’s status here. It may take few minutes for export depending on your firestore data. In case if you face any issues related to permissions, you can set it in the IAM feature of the GCP dashboard.
7. Once the data export is complete you can import the data into your cloud firestore from the backup of the respective bucket. Once you select the import option you will have to select a particular backup entry from the bucket which you want to import. It may take few minutes for import depending on your firestore data.
Here I had high hopes from Firebase but unfortunately at this time you will have to use Cloud Functions and Cloud Scheduler. The basic idea is that we write a cloud function which helps us create and store backup and then we can schedule the function calls with the help of scheduler.
In case if you are new to Cloud functions then you can read about it here before proceeding on the function.
- You will have to create a new bucket corresponding to your project and give permissions to your bucket.
- While creating bucket make sure your location supports import/export of data. You can select Multi Region with US.
- Make sure you choose the correct bucket type depending on your requirement. I would Nearline for storing backups. You can read more about the storage classes and their pricing before deciding.
- Now we must make sure that our cloud function can run data exports by granting our service account
email@example.com Datastore Import Export Admin role. This can be done using GCP IAM interface or from the command line :
gcloud projects add-iam-policy-binding yourproject \
--member serviceAccount:firstname.lastname@example.org \
5. Once the bucket is set we can head forward to the cloud function.
6. After the cloud function is ready, you can deploy the function using the following command
firebase deploy --only functions:scheduledFirestoreExport
Now the cloud function has been deployed successfully. Following are some of the variants in the cloud functions which you can use to define the schedule and frequency of the backup cloud function. You can explore more here.
You must decide on a few points before finally implementing the backup. The backup will probably be used only when there is a data issue or an operational issue. Therefore it is highly recommended to monitor your backups and verify backup data import on a periodical basis so that there is no confusion during a system crash. The amount of data and system usage will probably help you decide on the frequency of backup creation and the bucket type.
There are basically four types of Storage classes/buckets which you can choose from:
Standard Storage is used for data that is frequently accessed and/or stored only for short periods of time.
Nearline Storage is a low-cost option for accessing infrequently data. It offers lower at-rest costs in exchange to lower availability, 30 days minimum storage duration and cost for data access. It is suggested to be used in use cases where one accesses his data once per month on average.
Coldline Storage is similar to Nearline, but offers even lower at-rest costs again in exchange to lower availability, 90 days minimum storage duration and higher cost for data access.
Archive Storage is the lowest-cost, highly durable storage service for data archiving, online backup, and disaster recovery. has no availability SLA, though the typical availability is comparable to Nearline Storage and Coldline Storage. Archive Storage also has higher costs for data access and operations, as well as a 365-day minimum storage duration.
Backup deletion is also important for the overall scope which will help you keep the bill in check. Please note that minimum storage duration doesn’t have to do with automatic deletion of objects.
If you would like to delete objects based on conditions such the Age of an object, then you may set an Object Lifecycle policy for the target object. You may find an example on how to delete live versions of objects older than 30 days here.
Currently this is the only approach to backup data. If you want to reduce costs further you can select specific collections to be backed up.
I hope this article would help you create a robust methodology to help you in case of operational crashes and avoid any data losses. It’s important to import and verify that the backup system is working perfectly every fortnight to avoid any undesirable situation.