This article will provide details about our implementation of Google Cloud Storage (GCS), how to access it, and perform tasks.
Google Cloud Storage FAQ
Google Cloud Storage is an object storage utility, much like our HCS product, with storage locations worldwide. With GCS, you will enjoy the peace of mind of low latency and 99.999999999% annual durability. You can find more details about the availability of the GCS storage classes here.
More information about Google Cloud Storage can be found here.
There are 3 tools we recommend to access your GCS data, but you can use any tool compatible with Amazon S3. The first two tools, gsutil and rclone, are CLI tools, while Cyberduck is a GUI tool.
- gsutil - Installation instructions can be found here.
- rclone - Installation instructions can be found here.
- Cyberduck - Installation instructions can be found here.
You can also make use of Google Storage Client Libraries to programmatically access and modify your data.
You can log into StrikeTracker to view and manage your GCS buckets, service accounts to access GCS, and associated service account credentials. If you choose to opt into migration assistance, buckets will be created for you and data populated from your existing storage. This information can be found in "How do I create a GCS bucket?" below.
Yes, you may choose a location that is best suited to your needs.
Currently we offer the following locations.
- europe-west-4
- us-east-4
- us-west-4
By default, GCS does pass through a Cache-Control header. GCS will assign all objects in a Cache-Control: public, max-age=3600
header. If you would like to change this header, there are a couple of options available, which are outlined below.
- Use StrikeTracker to set the Cache Expiration Method to
Relative to Ingest
. This will apply to all newly ingest objects. - Set the Cache-Control metadata value on objects. This is done on an object-by-object basis, so it's recommended for objects that should have varying Cache-Control headers.
These options are discussed further in "How do I configure the Cache-Control header?" below.
No, you will need to access GCS with gsutil, rclone, Cyberduck, or a similar tool.
Yes. You can find more information about GCS and S3 interoperability here.
Yes, you will be able to use the GCS API with either an API key file or HMAC keys. To use the JSON API, you will need to download an API key file from StrikeTracker and use that to authenticate. Below is a link to Google's JSON API documentation.
When using the API with HMAC keys, you will only be able to use the XML API, not the JSON API. Below is a link to Google's XML API documentation.
You can find more information about downloading an API key file and generating HMAC keys in the "How do I create a GCS user (service account)?" below.
Each service account create will have full access to the entirety of the GCS bucket and cannot be restricted to individual buckets or directories. These service accounts will also be the only way to authenticate to GCS.
You will not be able to manage your GCS buckets contents within StrikeTracker. The only way to manage your GCS files will be with gsutil, rclone, Cyberduck, or similar tools.
How do I access my GCS data?
Before you can access your data, you will first need to create a service account (user), if one is not created already.
GCS does not have users, but you can create a service account from StrikeTracker that will permit access to your GCS buckets.
To create a service account, you will need to go the menu and select "Object Storage".
From there, you will want to navigate to "Service Accounts" and click "Add Service Account".
Now that you have a service account created, you can click on that service account to either generate a key file or HMAC key, depending on how you access your content. If you are using rclone or the JSON API, you will want to download an API key file.
If you use gsutil, Cyberduck, or the XML API, you will need to generate a HMAC key.
We will discuss authenticating these tools in more detail.
Creating a GCS bucket can be done within StrikeTracker. To create a new bucket you will need to click the menu and select "Object Storage".
From here you will want to select "Object Storage" from the left-hand menu (if you're not already there), and click "Add Bucket".
Here, you'll have the option to name your bucket and select the region you wish to use for that bucket.
Bucket name only supports (a-z, 0-9, -) and must start/end with a letter or number.
Once your bucket is created, you can click on it and view more details about the bucket, as well as modify it from private to public.
You will need to configure gsutil to access your bucket using your service account and your HMAC key. You can find installation instructions for gsutil here if you have not yet installed this tool.
Once gsutil is installed, you will want to run the following command and follow the instructions to grant access to your service account.
gsutil config -a
By default, gsutil will try to authenticate with OAuth2 credentials from the Cloud SDK, but this is not support by Stackpath. So you will need to run the following command to ensure the HMAC credentials are used to authenticate.
gcloud config set pass_credentials_to_gsutil false
You will need to configure rclone to access your bucket using the key file for your service account. You can find downloads and installation instructions for rclone here.
Below are instructions to configure a common GCS profile using rclone.
- First, run
rclone config
- Enter
n
to create a new config - Use the defaults for most variables, except for the following:
Variable
|
Value
|
---|---|
type | google cloud storage |
service_account_file | location to the key file that you downloaded from StrikeTracker. |
bucket_policy_only | true |
location | select the value you see in the "REGION" column in your bucket list in StrikeTracker |
storage_class | REGIONAL |
Cyberduck provides a GUI interface to navigate, upload, and delete your data. When using Cyberduck, you will need to use HMAC keys to connect. Please see "How do I create a GCS user (service account)?" above for instructions to generate an HMAC key.
- Click "Open Connection"
- Select "Amazon S3" for the service
- Update the "Server" to the
storage.googleapis.com
- The "Access Key ID" and "Secret Access Key" are the values provided when your HMAC key was generated in StrikeTracker.
Once logged in, you will see all of your GCS buckets. You can double click the bucket you wish to view or modify to enter into it.
Google provide client libraries for C++, C#, Go, Java, Node.js, PHP, Python and Ruby to allow you to access and modify your data and complete other tasks. More information is available on their Client Libraries documentation.
How do I complete common tasks in GCS?
In this section, we will cover common tasks for managing your GCS content, including listing files, uploading files, deleting files, etc.
You will be able to create service accounts and interact directly with the GCS API to upload data, modify permissions, etc. To create a service account, please see "How do I create a GCS user (service account)?" above.
Listing files in your GCS bucket can be done using gsutil, rclone, or by logging into the bucket via Cyberduck. In this section, we will cover listing files via gsutil and rclone.
gsutil
List all contents in the top-level directory. This will list all files and directories in the top-level directory, but not the files in the subdirectories. The <BUCKET_ID>
is the Bucket Name seen in StrikeTracker in the Bucket Details.
gsutil ls gs://<BUCKET_ID>
List all contents in a subdirectory
gsutil ls gs://<BUCKET_ID>/<SUBDIRECTORY>/
Wildcard match subdirectories and list all content in them (i.e. /example1/, /example2/, /example3/, etc.)
gsutil ls gs://<BUCKET_ID>/example*/
You may need to use double quotes around gs://<BUCKET_ID>/example*/
List all contents using a wildcard (i.e. all .txt files)
gsutil ls gs://<BUCKET_ID>/*.txt
You may need to use double quotes around gs://<BUCKET_ID>/*.txt
Recursively list all contents in the top-level directory. This will list the top-level objects and buckets, then the objects and buckets under gs://<BUCKET_ID/example1
, then those under gs://BUCKET_ID/images2
, etc.
gsutil ls -r gs://<BUCKET_ID>
To Recursively list all contents in the top-level directory or a subdirectory in a list format, we can use the following
gsutil ls -r gs://<BUCKET_ID>/**
gsutil ls -r gs://<BUCKET_ID>/<SUBDIRECTORY>/**
You may need to use double quotes around gs://<BUCKET_ID>/**
Print the object size, creation time stamp, and name of the object
gsutil ls -l gs://<BUCKET_ID>/<FILENAME>
This can also be done with a wildcard to match a pattern (i.e. *.txt)
gsutil ls -l gs://<BUCKET_ID>/*.txt
You may need to use double quotes around gs://<BUCKET_ID>/*.txt
Using the "-L" flag, you can print additional details about objects and buckets.
gsutil ls -L gs://<BUCKET_ID>/<SUBDIRECTORY>
rclone
There are several commands you can use to list content in your GCS buckets using rclone. The <BUCKET_ID>
is the Bucket Name seen in StrikeTracker in the Bucket Details.
-
ls
- list size and path of objects onlylsl
- list modification time, size, and path of objects onlylsd
- list directories onlylsf
- list objects and directories in easy to parse formatlsjson
- list objects and directories in JSON format
By default, "ls" and "lsl" are recursive, but you can use the "--max-depth 1" flag to stop the recursion. The "lsd", "lsf", and "lsjson" commands are not recursive by default, but you can use the "-R" flag to make them list recursively.
Below are some example commands
rclone ls <GCS_name_in_rclone_config>:<BUCKET_ID>
rclone ls --max-depth 1 <GCS_name_in_rclone_config>:<BUCKET_ID>/<SUBDIRECTORY>
rclone lsl <GCS_name_in_rclone_config>:<BUCKET_ID>/<SUBDIRECTORY>
rclone lsd <GCS_name_in_rclone_config>:<BUCKET_ID>
rclone lsf -R <GCS_name_in_rclone_config>:<BUCKET_ID>
rclone lsjson <GCS_name_in_rclone_config>:<BUCKET_ID>/<SUBDIRECTORY>
To upload files to your GCS bucket, you can use gsutil, rclone, or Cyberduck.
gsutil
Using gsutil, the cp
and rsync
commands can be used to copy/sync content to your GCS bucket. You can find more details about using cp
here. The <BUCKET_ID>
is the Bucket Name seen in StrikeTracker in the Bucket Details.
gsutil cp /path/to/local/files gs://<BUCKET_ID>/
You can find more details about using rsync
here.
gsutil rsync /path/to/local/files gs://<BUCKET_ID>/
rclone
See above on how to configure GCS in rclone. Once configured, you can transfer data from your local machine/storage server using the the copy
or sync
commands. The <BUCKET_ID>
is the Bucket Name seen in StrikeTracker in the Bucket Details.
You can find more information about using the copy
command here.
rclone copy \
/path/to/file \
<destination_GCS_name_in_rclone_config>:<GCS_bucket> -P
You can find more information using the sync
command here.
rclone sync \
/path/to/file \
<destination_GCS_name_in_rclone_config>:<GCS_bucket> -P
Deleting files from your GCS bucket can be done through gsutil or rclone.
Deleting files with gsutil is done using the rm
command. More details on deleting files from GCS via gsutil can be found here. The <BUCKET_ID>
is the Bucket Name seen in StrikeTracker in the Bucket Details.
Delete individual files
gsutil rm gs://<BUCKET_ID>/<FILENAME>
Delete all files in a directory, but not in subdirectories (i.e. delete only files in /example1/, but not in /example1/subdirectory1/)
gsutil rm gs://<BUCKET_ID>/example1/*
You may need to use double quotes around gs://<BUCKET_ID>/example1/*
Delete all files in a directory AND all subdirectories
gsutil rm gs://<BUCKET_ID>/example1/**
gsutil rm -r gs://<BUCKET_ID>/example1
You may need to use double quotes around gs://<BUCKET_ID>/example1/**
If you have a large amount of content to be deleted, you can use the "-m" flag to perform parallel deletions
gsutil -m rm -r gs://<BUCKET_ID>/example1
You can also delete content from a list if you have a large amount of specific content to remove
file_list | gsutil -m rm -I
The list must be formatted with the GCS URLs and wildcards of GCS URLs. For example:
gs://<BUCKET_ID>/file
gs://<BUCKET_ID>/example/file
gs://<BUCKET_ID>/example/*
There are a couple of commands you can use to delete content in your GCS buckets using rclone. Since these commands could result in data loss, we recommend running a dry-run using the --dry-run
or –interactive / -i
flags first.
To configure metadata for your GCS buckets, you will need to create a .json file locally containing the metadata you wish to apply to your bucket. Once your .json file is created you can use either gsutil or the REST API to apply the metadata to your bucket. More details about using metadata with GCS can be found here.
The object lifecycle policy in GCS will allow you to execute file deletion when conditions are met, much like HCS offers with metadata. Please note that this policy applies to the entire bucket, not individual files.
When configuring the lifecycle of the bucket, there are several sets of rules that can be used as criteria to meet before modifying or deleting files.
- Age
- CreatedBefore
- CustomTimeBefore
- DaysSinceCustomTime
- DaysSinceNoncurrentTime
- IsLive
- MatchesStorageClass
- NoncurrentTimeBefore
- NumberOfNewerVersions
You can configure object lifecycles via gsutil and the REST API. In both instances, you need to create a .json file locally containing the lifecycle rules before applying the configuration to the bucket. An example is provided below, which automatically deletes files after 30 days.
lifecycle.json
{
"lifecycle": {
"rule": [
{
"action": {
"type": "Delete"
},
"condition": {
"age": 30,
"isLive": true
}
}
]
}
}
Once your .json file is created, you can apply the configuration. In this example, we named the file "lifecycle.json"
gsutil
The <BUCKET_ID>
is the Bucket Name seen in StrikeTracker in the Bucket Details.
gsutil lifecycle set lifecycle.json gs://<BUCKET_ID>
REST API
The <BUCKET_ID>
is the Bucket Name seen in StrikeTracker in the Bucket Details.
curl -X PATCH --data-binary @lifecycle.json \
-H "Authorization: Bearer OAUTH2_TOKEN" \
-H "Content-Type: application/json" \
"https://storage.googleapis.com/storage/v1/b/<BUCKET_ID>?fields=lifecycle"
CDN Configuration Recommendations and Important Notes
Here, we will cover some of our CDN configuration recommendations and important notes once you have migrated to GCS.
The Cache-Control header can be customized using StrikeTracker or metadata. If your dataset should have the same Cache-Control values, you can use StrikeTracker to configure the header. However, if your dataset requires varying Cache-Control values for your files, you will need to utilize metadata on GCS to configure the values using gsutil.
StrikeTracker
To configure the Cache-Control in StrikeTracker, you will need to navigate to the site editor for the site you wish to update. Once there, go to Cache → Cache Settings → CDN Caching. There you will want to change the Cache Expiration Method to "Relative to ingest" and configure the CDN TTL to the number of seconds you wish to cache your content.
gsutil
Using gsutil, you can use the setmeta
command to set headers for specific buckets and files, including the Cache-Control header. The <BUCKET_ID>
is the Bucket Name seen in StrikeTracker in the Bucket Details. Below is an example to set the Cache-Control header for your bucket:
gsutil setmeta -r -h "Cache-control:public, max-age=3600" \
gs://<BUCKET_ID>
You can also configure headers on a file-by-file basis, or by file extension. Below is an example of each:
gsutil setmeta -r -h "Cache-control:public, max-age=3600" \
gs://<BUCKET_ID>/object
gsutil setmeta -r -h "Cache-control:public, max-age=3600" \
gs://<BUCKET_ID>/*.html
You can find more information and examples for setting headers on your GCS bucket using metadata here.
Once a bucket is created, you can click on that bucket and modify the permissions so that the bucket is either private, or public under the "Visibility" policy. By default, all new GCS buckets will be private.
First, you will need to add your GCS bucket to your origins in StrikeTracker. This can be done by going to Origins→ Add Origin.
When adding your GCS bucket as an origin, you will want to use the External
origin type and your endpoint URL as the hostname (i.e. mybucket.storage.googleapis.com). The endpoint URL can be found in the bucket details within StrikeTracker.
Now that the origin is configured, you can update your site to pull from that GCS bucket. To do this, navigate to the site editor for the site you wish to pull from GCS and go to Origin → Origin Selection. Here you will need to update the Origin Connections to your GCS bucket as the primary origin.
Additionally, if it's not already configured, you should use HTTPS as the origin pull protocol.
If your bucket is configured with public permissions, you have successfully added your GCS bucket as an origin. However, if your bucket is private, which is the default behavior, you will need to configure the AWS Signed OriginPull V4 policy. This will sign URLs during origin pulls so the content can be pulled and cached on our CDN. This policy can be configured by going to your site editor and navigating to Origin → Uncategorized → AWS Signed OriginPull V4. Below is the configuration template you will need to use to allow access from the CDN to GCS to pull and cache content.
You will also need to add the following signed headers in this policy:
- host
- x-amz-content-sha256
- x-amz-date
You can find Google documentation on the V4 signing here.
When making your GCS bucket private, it will lose the default Cache-Control headers. If you wish to use Cache-Control headers, you will need to configure it in StrikeTracker, or via gsutil with metadata. Please refer to "How do I configure the Cache-Control header?" above.