Overview
Raw access logs are available for customers who would like granular details about the end-user request for CDN content. Every hit to the CDN is reported in the Raw Logs, a premium service, and are enabled at the Account Services level, and then enabled for each site within the account. Logs enable custom reporting as well as visibility to details such as User-Agent, Requestor IP address, Status Code, and Path to Content delivered.
Raw Access Log FAQ
By default, raw logs are not enabled. You will need to contact your account manager to enable raw logs. If you are not sure if raw logs are enabled, you can reach out to your account manager or support at support@highwinds.com.
To access your raw logs, you will need to use a Google Cloud Storage (GCS) compatible tool, such as gsutil, rclone, Cyberduck. You will also need to have a service account created in StrikeTracker. You can find details about creating a service account in "How do I create a service account to access my raw access logs?" below.
Yes, you will be able to use the GCS API using HMAC keys. When using the API with HMAC keys, you will only be able to use the XML API, not the JSON API. Below is a link to Google's XML API documentation.
By default, the raw log retention is 45 days unless our support team has worked with you to modify that retention.
When listing your raw logs, you will see a series of folders containing the raw logs. Logs are nested in directories representing dates. Directory naming convention is [product]/YYYY/MM/DD/
, where the [product]
is either "cds" (access logs) or "cdi" (origin pull logs) and the dates are defined by GMT timezones.
Example:
cds/2020/08/26/
Within each daily folder, you will see one or more logs with the naming convention of:
[product]-YYYYmmDD-HHMMSS-#.log.gz
Note that the HH-MM portion is in the GMT timezone. For example:
cds_20200826-171745-1.log.gz
The delivery timestamp shown within the log is also GMT and represents when the delivery request is complete (end-user disconnects or full file is delivered).
Multiple separate logs may be generated for each time period, in which case many logs may appear within each directory. Logs are moved to the GCS bucket every 15 minutes or when individual log size reaches 5MB, whichever is first.
Below are the columns you will find in your raw logs, as well as a description of each.
# | Field | Report Fields | Format | Optional | Description |
---|---|---|---|---|---|
1 | Request Date | date | YYYY-MM-DD | n | The date the request was made to the CDN. |
2 | Request Time | time | HH:MM:SS | n | The time represented in GMT when the request was made to the CDN. |
3 | HTTP Method | cs-method | string | n | The HTTP method used for the request (HEAD, GET). |
4 | Client IP | c-ip | a.b.c.d | n | The IP address of the user agent making the request. |
5 | HTTP Version | cs-version | string | n | This should either be HTTP or HTTPS depending on how the content was requested on the CDN. |
6 | Referrer | cs-referrer | URL | Y | The referrer header sent by the user agent. A referrer is not a required header so this field may not always have a URL. |
7 | User-Agent String | cs-user-agent | string | Y | The string sent by the user agent to the identity itself. |
8 | File Size | filesize | bytes | n | Size of the asset being delivered. |
9 | Request Size | cs-bytes | bytes | n | The total size of the request header. |
10 | Response Bytes Sent | sc-bytes | bytes | n | Total bytes in the response to the client. |
11 | Web Server IP | s-ip | a.b.c.d [port] | n | The IP address of the edge server - we default to the Anycast IP. |
12 | Response Seconds | time-taken | floating-point number | n | The number of seconds (accurate to the millisecond) taken to deliver the asset. |
13 | HTTP Status Code | sc-status | number | n | The HTTP status code returned to the user agent. This number should be a valid HTTP status code. |
14 | Query String | cs-uri-query | string | Y | The query string parameters provided in the request. |
15 | Path | cs-uri-stem | URI | n | The full URI of the request. |
16 | Range Bytes | x-byte-range | range | Y | For range request, you'll see the range requested and delivered to the client in the form bytes=<range requested>. |
17 | Comment | comment | string | Y | Server-side notes. |
Accessing Your Raw Access Logs
With your raw access logs on GCS, you will access your raw logs using a GCS bucket name and a service account. When a site has raw access logs enabled, StrikeTracker will use a single bucket for the entire account, and create folders for each site in that bucket. The formatting of the raw log bucket name will be similar to the following:
sp-cdn-logs-<ACCOUNT_HASH>
To find the raw log buckets for your sites, you will need to click on the menu icon and navigate to "Object Storage".
On this page, you will see "Object Storage" on the left of the page. This is where all of your GCS buckets, including your raw logs, will be listed.
If you click on the raw log bucket, you will see the details for that bucket, including the endpoint URL.
You can create a service account in StrikeTracker by clicking on the menu icon and navigating to "Object Storage".
On this page, you will also see your GCS bucket if you use this service.
To add a service account, click on "Service Accounts" then "Add Service Account"
Once your raw log service account is created, you can click on that service account to view its details and keys or generate new keys.
For your raw logs, you will want to add a key, and/or create and save an HMAC key and secret. If you wish to use rclone, then you will need to add a key, but if you use gsutil, you will need to create an HMAC key.
To add a key, you will simply want to click "Add key" while on the "Service Account Details" tab. This will download a file that you can move to the directory of your choosing, be sure to note down its location on your local machine as you will need to know this if you are using rclone.
To create an HMAC key, this can be done by clicking the "HMAC Keys" tab, and then "Add Key".
You will then see a pop-up that contains the Access ID and Secret for that HMAC key. Make sure to save these somewhere safe as you will not be able to retrieve the secret once that pop-up is closed.
This service account will have access to all GCS buckets in your account.
To access your raw logs you will need to use a GCS compatible tool, such as gsutil, rclone, or Cyberduck. Depending on which tool you use, you will either need to use the key file you downloaded or the HMAC key and secret you saved.
First, you will need to configure gsutil to access your raw logs using your service account. You can find installation instructions for gsutil here.
Once gsutil is installed, you will want to run the following command and follow the instructions to grant access to your raw log service account using the HMAC key and secret that you saved.
gsutil config -a
By default, gsutil will try to authenticate with OAuth2 credentials from the Cloud SDK, but this is not supported by Stackpath. You will need to run the following command to ensure the HMAC credentials are used to authenticate.
gcloud config set pass_credentials_to_gsutil false
You can find downloads and installation instructions for rclone here.
Below are instructions to configure a common GCS profile using rclone.
- First, run
rclone config
- Enter
n
to create a new config - Use the defaults for most variables, except for the following:
Variable
|
Value
|
---|---|
type | google cloud storage |
service_account_file | location to the key file that you downloaded from StrikeTracker. See "How do I create a service account to access my raw access logs?" for details on downloading a key file. |
location | select the value you see in the "REGION" column in your bucket list in StrikeTracker |
storage_class | REGIONAL |
Cyberduck provides a GUI interface to navigate and download your raw logs. When using Cyberduck, you will need to use HMAC keys to connect. Please see "How do I create a service account to access my raw access logs?" above for instructions to generate an HMAC key.
- Click "Open Connection"
- Select "Amazon S3" for the service
- Update the "Server" to the
storage.googleapis.com
- The "Access Key ID" and "Secret Access Key" are the values provided when your HMAC key was generated in StrikeTracker.
Once logged in, you will see all of your GCS buckets. You can double click the bucket of the sites raw logs you wish to view/download.
When using gsutil, you will only need to use the bucket name, not the endpoint URL.
Below are some examples of listing files and directories in your raw log GCS bucket using gsutil. The <BUCKET_ID>
is the Bucket Name seen in StrikeTracker within the Bucket Details.
List all contents in the top-level directory. This will list all files and directories in the top-level directory, but not the files in the subdirectories
gsutil ls gs://<BUCKET_ID>
List all contents in a subdirectory
gsutil ls gs://<BUCKET_ID>/<SUBDIRECTORY>/
Wildcard match subdirectories and list all content in them (i.e. /example1/, /example2/, /example3/, etc.)
gsutil ls gs://<BUCKET_ID>/example*/
You may need to use double quotes around gs://<BUCKET_ID>/example*/
List all contents using a wildcard (i.e. all .txt files)
gsutil ls gs://<BUCKET_ID>/*.txt
You may need to use double quotes around gs://<BUCKET_ID>/*.txt
Recursively list all contents in the top-level directory. This will list the top-level objects and buckets, then the objects and buckets under gs://<BUCKET_ID/example1
, then those under gs://BUCKET_ID/images2
, etc.
gsutil ls -r gs://<BUCKET_ID>
To Recursively list all contents in the top-level directory or a subdirectory in a list format, we can use the following
gsutil ls -r gs://<BUCKET_ID>/**
gsutil ls -r gs://<BUCKET_ID>/<SUBDIRECTORY>/**
You may need to use double quotes around gs://<BUCKET_ID>/**
Print the object size, creation time stamp, and name of the object
gsutil ls -l gs://<BUCKET_ID>/<FILENAME>
This can also be done with a wildcard to match a pattern (i.e. *.txt)
gsutil ls -l gs://<BUCKET_ID>/*.txt
You may need to use double quotes around gs://<BUCKET_ID>/*.txt
Using the "-L" flag, you can print additional details about objects and buckets.
gsutil ls -L gs://<BUCKET_ID>/<SUBDIRECTORY>
There are several commands you can use to list content in your GCS buckets using rclone. The <BUCKET_ID>
is the Bucket Name seen in StrikeTracker within the Bucket Details.
-
ls
- list size and path of objects onlylsl
- list modification time, size, and path of objects onlylsd
- list directories onlylsf
- list objects and directories in easy to parse formatlsjson
- list objects and directories in JSON format
By default, "ls" and "lsl" are recursive, but you can use the "--max-depth 1" flag to stop the recursion. The "lsd", "lsf", and "lsjson" commands are not recursive by default, but you can use the "-R" flag to make them list recursively.
Below are some example commands
rclone ls <raw_log_name_in_rclone>:<BUCKET_ID>
rclone ls --max-depth 1 <raw_log_name_in_rclone>:<BUCKET_ID>/<SUBDIRECTORY>
rclone lsl <raw_log_name_in_rclone>:<BUCKET_ID>/<SUBDIRECTORY>
rclone lsd <raw_log_name_in_rclone>:<BUCKET_ID>
rclone lsf -R <raw_log_name_in_rclone>:<BUCKET_ID>
rclone lsjson <raw_log_name_in_rclone>:<BUCKET_ID>/<SUBDIRECTORY>
To list your raw log files in Cyberduck, you will need to open a connection using the instructions for Cyberduck in "How do I access my raw logs?" and double click on the raw log bucket you wish to view.
Using gsutil, rclone, and cyberduck, you can download your log files to your local machine or storage server. For each of these tools, we will provide recommend commands and examples to synchronize your raw logs so you do not download duplicate log files.
To download logs with gsutil, we recommend using the rsync
command. Using the rsync
command will allow you to download your log files while skipping the files that already exist in the directory you sync to on your local machine. You can find more information about the rsync
command here. The <BUCKET_ID>
is the Bucket Name seen in StrikeTracker within the Bucket Details.
Here is an example command you can use to download your logs.
gsutil rsync -r \
gs://<BUCKET_ID>/<site_hash>/<cds_or_cdi>/YYYY/MM/DD/ \
/local/machine/path
If you have many log files you wish to download, you can use the -m
flag for a parallel sync.
gsutil -m rsync -r \
gs://<BUCKET_ID>/<site_hash>/<cds_or_cdi>/YYYY/MM/DD/ \
/local/machine/path
There are several commands you can use to download your raw logs using rclone, but we recommend using the copy
command. This command will download the raw logs that do not exist in your local directory, so there are no duplicates. You can find more details about the copy
command here. Below is an example of the copy
command.
rclone copy \
<raw_log_name_in_rclone>:/<site_hash>/<cds_or_cdi>/YYYY/MM/DD/ \
/path/to/destination/ -P
As Cyberduck is a GUI client, downloading files is straightforward. You will want to double click on your sites raw access log bucket, double click the site hash, then select either "cds" or "cdi", if applicable. The "cds" folder will contain the access logs from CDN to the client, and the "cdi" folder will contain the access logs from your origin to CDN.
If you want to download many log files, we recommend using the "Synchronize" option. This can be found by right-clicking on anything in the bucket, selecting "Synchronize...", then choosing the folders or files you wish to download. Make sure to select "Download" from the dropdown menu when selecting the folders and files to download.
This method will synchronize all files from the folders/files you select to the location you have selected locally. If any of the files already exist locally, they will not be downloaded.