Configuration
Overview
The Data Logger makes use of a config.json
file to control various aspects of its operation. This file gets deployed within the container image, requiring redeployment should any run parameters need to be changed.
Configuration for Local Development
To test the Data Logger locally, the following features
should be set accordingly:
{
"features": {
"repository_collection": true,
"dependabot_collection": true,
"secret_scanning_collection": true,
"show_log_locally": true,
"write_to_s3": false // This MUST be false or else the live data will be overwritten
},
"settings": {
... // Other settings as required
}
}
The different collection features can be toggled on or off to control which data is collected. If only making changes to a single collection type, you can set the others to false
to speed up the local testing process.
Configuration Options
Features
The features section allows developers to toggle various aspects of the Data Logger's operation to their needs.
{
"features": {
"repository_collection": true,
"dependabot_collection": true,
"secret_scanning_collection": true,
"show_log_locally": true,
"write_to_s3": true
},
"settings": {
... // Other settings as required
}
}
Repository, Dependabot, and Secret Scanning Collection
These features control whether the Data Logger collects data for the respective types. If set to true
, the Data Logger will collect data for that key, otherwise it will skip the collection process for that type.
When deploying to AWS, all of these features should be set to true
to ensure that the Data Logger collects all necessary data for the dashboard.
Setting these features to false
can help speed up local testing and debugging, as it reduces the amount of data being collected and processed.
Show Log Locally
This feature controls whether the Data Logger outputs logs to a local text file. When set to true
, the Data Logger will write logs to a file in the local directory, which can be useful for debugging and testing purposes, otherwise it will not write logs locally. This can help developers see the output of the Data Logger as if looking at the CloudWatch logs in AWS.
Write to S3
This feature controls whether the Data Logger writes the collected data to AWS S3. When set to true
, the Data Logger will write the collected data to the specified S3 bucket in JSON format. If set to false
, the Data Logger will instead write the data to a local file for developers to inspect. This is particularly useful for debugging and testing purposes, as it allows developers to see the data that would be written to S3 without actually modifying the live data.
When developing locally, this must be set to false
to prevent overwriting the live data in S3. If for some reason you need to write to S3 while developing locally (i.e. to test the data with the dashboard), you should ensure that other team members are aware and that the data is not critical, as it will overwrite the existing data in S3.
Settings
The settings section contains various parameters that control the behaviour of the Data Logger, including when checks are considered to be breaches of policy. It is highly unlikely that these settings will need to be changed - unless ONS' GitHub Usage Policy changes - but they are included here for completeness.
{
"features": {
... // Feature settings as above
},
"settings": {
"thread_count": 20,
"dependabot_thresholds": {
"critical": 5,
"high": 15,
"medium": 60,
"low": 90
},
"secret_scanning_threshold": 5,
"inactivity_threshold": 1,
"signed_commit_number": 15
}
}
Thread Count
This setting controls the number of threads used by the Data Logger when collecting data from the GitHub API. Increasing this number can speed up data collection, especially for large organisations with many repositories. However, it also increases the load on the GitHub API, so it should be set to a reasonable value to avoid hitting rate limits.
For more information on how threading is used in the Data Logger, see the Threading page.
Dependabot Thresholds
These thresholds control how many days an alert must be open before it is considered to be a breach of policy. The thresholds are set for each severity level of Dependabot alerts and are derived from ONS' GitHub Usage Policy.
Secret Scanning Threshold
This threshold controls how many days a secret scanning alert must be open before it is considered to be a breach of policy. This is derived from ONS' GitHub Usage Policy.
Inactivity Threshold
This threshold controls how many years a repository can go without updates before it is considered inactive. This is derived from ONS' GitHub Usage Policy.
Signed Commit Number
This setting controls how many commits are checked when applying the signed commit policy check. This value has been agreed between stakeholders.