Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run av scanning from ClamD server #2256

Merged
merged 41 commits into from
Feb 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
1e81494
try to add a new vm for clamavd
mojotalantikite Feb 1, 2024
dc8058c
add vpcid to sg
mojotalantikite Feb 1, 2024
743bb45
add perm boundary
mojotalantikite Feb 1, 2024
aa3669c
fix path for iam
mojotalantikite Feb 2, 2024
9ec6aca
add a network interface
mojotalantikite Feb 2, 2024
9459329
fixing typo
mojotalantikite Feb 2, 2024
c2f09ac
Merge branch 'main' into mt-build-clamav-server
mojotalantikite Feb 5, 2024
5bb18de
add name to vm
mojotalantikite Feb 5, 2024
9cf4a72
only make this server in dev val prod
mojotalantikite Feb 5, 2024
86fc9fc
bump and test
mojotalantikite Feb 5, 2024
2d61fc2
bump ami
mojotalantikite Feb 5, 2024
a818d41
user data clamav latest LTS
mojotalantikite Feb 5, 2024
6cee323
fix up clamav configs
mojotalantikite Feb 7, 2024
399e8a6
add all keys to user data
mojotalantikite Feb 8, 2024
7beae94
Readme
mojotalantikite Feb 8, 2024
58e3fec
Merge branch 'mt-build-clamav-server' into mt-scan-from-lambda
mojotalantikite Feb 8, 2024
3be571c
add clamd.conf to the layer
mojotalantikite Feb 8, 2024
c2fa0c3
bump lambda layer
mojotalantikite Feb 8, 2024
a639d25
fix path name
mojotalantikite Feb 8, 2024
63a308a
do the scan remote
mojotalantikite Feb 9, 2024
814d621
drop the wait just to see
mojotalantikite Feb 9, 2024
cec7e76
let's do it live!
mojotalantikite Feb 9, 2024
11d7551
some is better than none here I guess
mojotalantikite Feb 10, 2024
53643c1
add route53 internal DNS
mojotalantikite Feb 12, 2024
be82dfe
add test dns server
mojotalantikite Feb 12, 2024
1742375
thread the vpcid
mojotalantikite Feb 12, 2024
7ae55a4
cleanup apt usage
mojotalantikite Feb 13, 2024
9617094
revert
mojotalantikite Feb 13, 2024
68c231c
swap order. clean
mojotalantikite Feb 13, 2024
d57f3e1
confirmation. remove comment.
mojotalantikite Feb 13, 2024
ef91850
use clamdscan for real
mojotalantikite Feb 13, 2024
19d5057
differentiate clamdscan from clamscan
mojotalantikite Feb 13, 2024
fc51858
add route53 to gh-oidc for ci
mojotalantikite Feb 13, 2024
ccb8f09
no need to fetch defs in this world
mojotalantikite Feb 13, 2024
8198b4b
fix the userdata script up a bit
mojotalantikite Feb 13, 2024
2765110
return faster for positive virus scan
mojotalantikite Feb 14, 2024
cbe6c8d
drop default sg
mojotalantikite Feb 14, 2024
a975a96
github-oidc isn't running here
mojotalantikite Feb 14, 2024
1f69719
revert
mojotalantikite Feb 14, 2024
0cf25c4
Merge branch 'main' into mt-scan-from-lambda
mojotalantikite Feb 15, 2024
1990611
restart if failed to start at boot
mojotalantikite Feb 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions services/app-web/src/s3/s3Amplify.ts
Original file line number Diff line number Diff line change
Expand Up @@ -99,16 +99,16 @@ function newAmplifyS3Client(bucketConfig: S3BucketConfigType): S3ClientT {
},
/*
Poll for scanning completion
- We start polling after 20s, which is the estimated time it takes scanning to start to resolve.
- In total, each file could be up to 40 sec in a loading state (20s wait for scanning + 8s of retries + extra time for uploading and scanning api requests to resolve)
- We start polling after 3s, which is the estimated time it takes scanning to start to resolve.
- We then retry with an exponential backoff for up to 15s. Most scans take < 1s, plus some additional time for tagging and response, so by 15s a file should be tagged.
- While the file is scanning, returns 403. When scanning is complete, the resource returns 200
*/
scanFile: async (
filename: string,
bucket: BucketShortName
): Promise<void | S3Error> => {
try {
await waitFor(20000)
await waitFor(3000)
try {
await retryWithBackoff(async () => {
await Storage.get(filename, {
Expand Down Expand Up @@ -213,7 +213,7 @@ const waitFor = (delay = 1000) =>
const retryWithBackoff = async (
fn: () => Promise<void | S3Error>,
retryCount = 0,
maxRetries = 6,
maxRetries = 4,
err: null | S3Error = null
): Promise<void | S3Error> => {
if (retryCount > maxRetries) {
Expand Down
26 changes: 26 additions & 0 deletions services/uploads/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,32 @@ In addition to live scanning all uploaded files, we also have a pair of lambdas

The avAuditUploads lambda pulls a list of every file in the uploads bucket and then invokes avAuditFiles for each of them in chunks of 20 or so. For any files that are found to be INFECTED, it then grabs the current s3 tags for them and verifies that they are tagged accordingly. If not, it will re-tag the file to be INFECTED, preventing further download.

## ClamAV Daemon

We have an ec2 instance that is created in dev, val, and prod that is configured with an always on ClamAV instance that accepts incoming virus scan requests on port 3310. The motivation here is that we are working towards having our av scanning lambdas use the always on ClamAV daemon rather than rely on just the lambda, as there is a high startup cost for ClamAV of around 29 seconds in our testing. This means that all virus scans take at least 29 seconds for the user. By using an always on instance we can reduce that to closer to the actual time of the virus scan (usually < 1s).

The server is restricted to only have access from connections in the default security group as well as anything else that is placed in the ClamAV security group. This allows for our AV scanning lambds to call out to the instance while restricting all other traffic. However, all of our engineers have ssh pub keys on the instance in case they need access to the machine via ssh for any reason.

### Accessing the VM

Similar to the [Postgres jumpbox](../postgres/README.md), we use the `authorized_keys` file to give access to this VM and you'll need to add your IP to the VM's security group:

1. Determine your public facing IP address. An easy way to do this is to `curl https://ifconfig.me/`
2. Locate the EC2 instance in the AWS console. Click and go into Security > Security groups.
3. There should be two security groups attached to the instance, the default and the ClamAV one. Select the ClamAV security group.
4. On the `Inbound rules` tab select `Edit inbound rules`
5. Add a rule for `ssh` with the `source` set to your local IP address with `/32` appended to it (e.g. `1.2.3.4/32`)
6. Save the rule

#### SSH to the instances

You should now be able to ssh to the jump box.

1. Locate the Public IPv4 address of the instance. This can be found by clicking into the VM on the `Instances` section of the EC2 console.
2. ssh ubuntu@public-ip

You should be using public key auth to ssh. If you need to point to your private key, use `ssh -i ~/.ssh/${yourkeyfile} ubuntu@public-ip`

## Significant dependencies

- serverless-s3-upload
Expand Down
156 changes: 145 additions & 11 deletions services/uploads/serverless.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,20 +52,19 @@ provider:
custom:
region: ${aws:region}
reactAppOtelCollectorUrl: ${env:REACT_APP_OTEL_COLLECTOR_URL, ssm:/configuration/react_app_otel_collector_url}
authorizedKeys: ${file(../postgres/scripts/authorized_keys)}
webpack:
webpackConfig: webpack.config.js
packager: yarn
packagerOptions:
lockFile: ../../yarn.lock
scripts:
hooks:
# This script is run locally when running 'serverless deploy'
package:initialize: |
set -e
curl -L --output lambda_layer.zip https://github.com/CMSgov/lambda-clamav-layer/releases/download/0.7/lambda_layer.zip
deploy:finalize: |
rm lambda_layer.zip
serverless invoke --stage ${sls:stage} --function avDownloadDefinitions -t Event
vpcId: ${ssm:/configuration/${sls:stage}/vpc/id, ssm:/configuration/default/vpc/id}
sgId: ${ssm:/configuration/${sls:stage}/vpc/sg/id, ssm:/configuration/default/vpc/sg/id}
privateSubnets:
- ${ssm:/configuration/${sls:stage}/vpc/subnets/private/a/id, ssm:/configuration/default/vpc/subnets/private/a/id}
- ${ssm:/configuration/${sls:stage}/vpc/subnets/private/b/id, ssm:/configuration/default/vpc/subnets/private/b/id}
- ${ssm:/configuration/${sls:stage}/vpc/subnets/private/c/id, ssm:/configuration/default/vpc/subnets/private/c/id}
publicSubnetA: ${ssm:/configuration/${sls:stage}/vpc/subnets/public/a/id, ssm:/configuration/default/vpc/subnets/public/a/id}
serverless-offline-ssm:
stages:
- local
Expand Down Expand Up @@ -96,8 +95,7 @@ custom:

layers:
clamAv:
package:
artifact: lambda_layer.zip
path: lambda-layers-clamav

functions:
avScan:
Expand All @@ -109,6 +107,9 @@ functions:
layers:
- !Ref ClamAvLambdaLayer
- arn:aws:lambda:us-east-1:901920570463:layer:aws-otel-nodejs-amd64-ver-1-18-1:1
vpc:
securityGroupIds: ${self:custom.sgId}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the clamAV image inside our security group that can talk to the db for instance? I don't see why it would need to be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know, there were reasons when I was setting this up, but I can't recall. We can probably drop this as the av scanning lambda doesn't need to be in the VPC with our other lambdas and DB. I'll fix it up before merging!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh actually, I remember now. If the lambda is not in this VPC with the server, then all the communicating traffic with the server is going to route over the public internet, which means we have to pay that egress cost. We can use VPC gateway endpoints, but we don't have access to create VPC related things in our account (CMS Cloud team has to do it for us). So I just dropped it in our VPC.

I think we can drop the default security group though, which will keep it away from the Aurora instance.

subnetIds: ${self:custom.privateSubnets}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove the avDownloadDefinitions lambda, too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote a ticket to do this cleanup as follow on work: https://jiraent.cms.gov/browse/MCR-3954

I started deleting the now unneeded code paths and then realized that there's a decent amount in there and the tests are going to have to get fixed up a bit. Figured I'd land this and then go through and delete/cleanup the tests.

environment:
stage: ${sls:stage}
CLAMAV_BUCKET_NAME: !Ref ClamDefsBucket
Expand Down Expand Up @@ -171,6 +172,13 @@ functions:
REACT_APP_OTEL_COLLECTOR_URL: ${self:custom.reactAppOtelCollectorUrl}

resources:
Conditions:
IsDevValProd: !Or
- !Equals ['${sls:stage}', 'main']
- !Equals ['${sls:stage}', 'val']
- !Equals ['${sls:stage}', 'prod']
- !Equals ['${sls:stage}', 'mtscanfromlambda']

Resources:
DocumentUploadsBucket:
Type: AWS::S3::Bucket
Expand Down Expand Up @@ -364,6 +372,132 @@ resources:
- !Sub ${QAUploadsBucket.Arn}/*
Sid: DenyUnencryptedConnections

ClamAVSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Security group for ClamAV daemon
VpcId: ${self:custom.vpcId}
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 3310
ToPort: 3310
SourceSecurityGroupId: ${self:custom.sgId}

ClamAVInstanceProfile:
Type: AWS::IAM::InstanceProfile
Properties:
Path: '/delegatedadmin/developer/'
Roles:
- !Ref ClamAVInstanceRole

ClamAVInstanceRole:
Type: AWS::IAM::Role
Properties:
Path: '/delegatedadmin/developer/'
PermissionsBoundary: !Sub 'arn:aws:iam::${AWS::AccountId}:policy/cms-cloud-admin/ct-ado-poweruser-permissions-boundary-policy'
RoleName: !Sub 'clamavdVm-${sls:stage}-ServiceRole'
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: ec2.amazonaws.com
Action: sts:AssumeRole
Policies:
- PolicyName: ClamAVInstancePolicy
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- logs:CreateLogGroup
- logs:CreateLogStream
- logs:PutLogEvents
Resource: '*'

ClamAVInstance:
Type: AWS::EC2::Instance
Condition: IsDevValProd
Properties:
InstanceType: t3.medium
ImageId: ami-0c7217cdde317cfec # Ubuntu 22.04 LTS
IamInstanceProfile: !Ref ClamAVInstanceProfile
NetworkInterfaces:
- AssociatePublicIpAddress: true
DeviceIndex: '0'
GroupSet:
- !Ref ClamAVSecurityGroup
SubnetId: !Sub ${self:custom.publicSubnetA}
Tags:
- Key: Name
Value: clamavd-${sls:stage}
- Key: mcr-vmuse
Value: clamavd
UserData:
Fn::Base64: !Sub |
#!/bin/bash
apt-get update
apt-get install -y clamav clamav-daemon

echo '${self:custom.authorizedKeys}' > /home/ubuntu/.ssh/authorized_keys
chown ubuntu:ubuntu /home/ubuntu/.ssh/authorized_keys
chmod 600 /home/ubuntu/.ssh/authorized_keys

# Write to the clamd.conf
echo "TCPSocket 3310" >> /etc/clamav/clamd.conf
echo "TCPAddr 0.0.0.0" >> /etc/clamav/clamd.conf

# Create a systemd service override to delay the start
cat <<EOF > /etc/systemd/system/clamav-daemon.service.d/override.conf
[Unit]
After=network.target
EOF

# Create a systemd service override to delay the start and set restart limits
cat <<EOF > /etc/systemd/system/clamav-daemon.service.d/override.conf
[Unit]
After=network.target
StartLimitIntervalSec=1h
StartLimitBurst=5
EOF

# Fix the systemctl setting
sed -i 's/^StandardOutput=syslog/StandardOutput=journal/' /lib/systemd/system/clamav-daemon.service

# Reload systemd to apply the changes
systemctl daemon-reload

# Start clamd and get defs
systemctl enable clamav-daemon
systemctl enable clamav-freshclam
systemctl start clamav-daemon
systemctl start clamav-freshclam

# Confirm we're up
systemctl status clamav-daemon
systemctl status clamav-freshclam

MCRInternalZone:
Type: AWS::Route53::HostedZone
Condition: IsDevValProd
Properties:
Name: mc-review.local
VPCs:
- VPCId: ${self:custom.vpcId}
VPCRegion: !Ref AWS::Region

ClamAVRecordSet:
Type: AWS::Route53::RecordSet
Condition: IsDevValProd
DependsOn: ClamAVInstance
Properties:
HostedZoneId: !Ref MCRInternalZone
Name: clamav.mc-review.local
Type: A
ResourceRecords:
- !GetAtt ClamAVInstance.PrivateIp
TTL: '300'

Outputs:
DocumentUploadsBucketName:
Value: !Ref DocumentUploadsBucket
Expand Down
1 change: 1 addition & 0 deletions services/uploads/src/avLayer/build/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ cp /tmp/build/usr/local/bin/clamscan /tmp/build/usr/local/bin/clamdscan /tmp/bui
cp -R /tmp/build/usr/lib64/* lib/.
cp -R /tmp/build/usr/local/lib64/* lib/.
cp freshclam.conf bin/freshclam.conf
cp clamd.conf bin/clamd.conf

zip -r9 lambda_layer.zip bin
zip -r9 lambda_layer.zip lib
12 changes: 12 additions & 0 deletions services/uploads/src/avLayer/build/clamd.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# hostname and port of the remote ClamAV daemon
TCPAddr clamav.mc-review.local
TCPSocket 3310

# Enable verbose logging
LogVerbose yes

# Path to the log file
LogFile /var/log/clamd.log

# Set the maximum number of concurrent threads for scanning
MaxThreads 10
17 changes: 17 additions & 0 deletions services/uploads/src/deps/clamAV/clamAV.ts
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,9 @@ interface ClamAVConfig {
pathToFreshclam: string
pathToConfig: string
pathToDefintions: string

pathToClamdScan: string
pathToClamdConfig: string
}

function NewClamAV(config: Partial<ClamAVConfig>, s3Client: S3UploadsClient) {
Expand All @@ -36,6 +39,9 @@ function NewClamAV(config: Partial<ClamAVConfig>, s3Client: S3UploadsClient) {
pathToFreshclam: config.pathToFreshclam || '/opt/bin/freshclam',
pathToConfig: config.pathToConfig || '/opt/bin/freshclam.conf',
pathToDefintions: config.pathToDefintions || '/tmp',

pathToClamdScan: config.pathToClamdScan || '/opt/bin/clamdscan',
pathToClamdConfig: config.pathToClamdConfig || '/opt/bin/clamd.conf',
}

return {
Expand Down Expand Up @@ -177,13 +183,24 @@ function scanForInfectedFiles(
try {
console.info('Executing clamav')

// use clamdscan to connect to our clamavd server
const avResult = spawnSync(config.pathToClamdScan, [
'--stdout',
'-v',
`--config-file=${config.pathToClamdConfig}`,
'--stream',
pathToScan,
])

/*
const avResult = spawnSync(config.pathToClamav, [
'--stdout',
'-v',
'-d',
config.pathToDefintions,
pathToScan,
])
*/

console.info('stderror', avResult.stderr && avResult.stderr.toString())
console.info('stdout', avResult.stdout && avResult.stdout.toString())
Expand Down
8 changes: 0 additions & 8 deletions services/uploads/src/lib/scanFiles.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,6 @@ export async function scanFiles(
bucket: string,
scanDir: string
): Promise<string[] | Error> {
// fetch definition files
console.info('Download AV Definitions')
const defsRes = await clamAV.downloadAVDefinitions()
if (defsRes) {
console.error('failed to fetch definitions')
return defsRes
}

// clamScan wants files to be top level in the scanned directory, so we map each key to a UUID
const filemap: { [filename: string]: string } = {}
for (const key of keys) {
Expand Down
Loading