Skip to content

BioGrids on AWS

BioGrids supports cloud computing on Amazon Web Services, providing scalable resources for bioinformatics workflows.

Benefits

  • Scalable computing: Access unlimited computational resources
  • Cost efficiency: Pay only for resources used
  • Integration: Works with S3, EFS, and other AWS services
  • Collaboration: Easy sharing of computational environments

Compute Optimized

  • c5.large to c5.24xlarge: General bioinformatics workflows
  • c6i instances: Latest generation compute

Memory Optimized

  • r5.large to r5.24xlarge: Large genome assemblies
  • x1e instances: In-memory databases, large datasets

GPU Instances

  • p3 instances: Deep learning, AI/ML workloads
  • g4dn instances: Graphics workstations, visualization

Quick Start

1. Launch EC2 Instance

# Launch with AWS CLI
aws ec2 run-instances \
    --image-id ami-0abcdef1234567890 \
    --instance-type c5.2xlarge \
    --key-name your-key-pair

2. Install BioGrids

# Connect to instance
ssh -i your-key.pem ec2-user@instance-ip

# Download and install
curl -LO https://biogrids.org/downloads/latest/biogrids-cli_linux.tar.gz
tar -zxf biogrids-cli_linux.tar.gz
./biogrids-cli activate site_name username key

3. Configure Storage

# Mount EBS volume for data
sudo mkfs -t ext4 /dev/xvdf
sudo mkdir /data
sudo mount /dev/xvdf /data

# Mount EFS for shared storage
sudo mount -t efs fs-12345678.efs.region.amazonaws.com:/ /shared

S3 Integration

Data Management

# Configure AWS credentials
aws configure

# Sync data to/from S3
aws s3 sync s3://my-bucket/datasets/ /data/input/
aws s3 sync /data/results/ s3://my-bucket/output/

Direct S3 Access

# Stream data directly from S3
aws s3 cp s3://my-bucket/genome.fasta - | blast-workflow

# Parallel uploads
aws s3 cp /data/results/ s3://my-bucket/results/ --recursive

Container Integration

AWS Batch

# Register job definition
aws batch register-job-definition \
    --job-definition-name biogrids-analysis \
    --type container \
    --container-properties '{
        "image": "biogrids/analysis:latest",
        "vcpus": 4,
        "memory": 8192
    }'

# Submit job
aws batch submit-job \
    --job-name genome-analysis \
    --job-queue biogrids-queue \
    --job-definition biogrids-analysis

ECS/Fargate

# Task definition
family: biogrids-task
networkMode: awsvpc
requiresCompatibilities: ["FARGATE"]
cpu: "2048"
memory: "4096"
containerDefinitions:
  - name: biogrids-container
    image: biogrids/suite:latest
    essential: true

Cost Optimization

Spot Instances

# Use spot instances for cost savings
aws ec2 request-spot-instances \
    --spot-price "0.50" \
    --instance-count 5 \
    --launch-specification '{
        "ImageId": "ami-0abcdef1234567890",
        "InstanceType": "c5.2xlarge"
    }'

Reserved Instances

  • Purchase Reserved Instances for predictable workloads
  • Use Savings Plans for flexible compute commitments
  • Monitor costs with CloudWatch billing alarms

Security Best Practices

IAM Policies

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "batch:SubmitJob"
      ],
      "Resource": "*"
    }
  ]
}

VPC Configuration

  • Use private subnets for compute instances
  • Configure security groups to restrict access
  • Enable VPC Flow Logs for monitoring

Monitoring and Logging

CloudWatch Integration

# Install CloudWatch agent
sudo yum install -y amazon-cloudwatch-agent

# Create log group
aws logs create-log-group --log-group-name /biogrids/application

Custom Metrics

Track BioGrids-specific performance metrics through CloudWatch custom metrics and dashboards.

Getting Started

  1. Plan deployment: Choose instance types and storage requirements
  2. Set up infrastructure: Launch instances and configure networking
  3. Install BioGrids: Follow standard installation procedures
  4. Configure integrations: Set up S3 access and monitoring
  5. Run workflows: Deploy your bioinformatics pipelines

Support

  • Documentation: Complete AWS deployment guides
  • Templates: CloudFormation templates for common deployments
  • Support: help@biogrids.org for AWS-specific questions

For detailed deployment assistance, contact support with your specific AWS requirements.