Thursday, 11 June 2015

Moving SQL server to AWS

Why?

Just a bit of background first - why move a SQL server to AWS, rather than using the Amazon RDS service?
If it is possible for you, I'd recommend using the Amazon RDS service, which comes in MS SQL flavour as well, rather than managing your own SQL server in AWS.  However, this will not always be possible.  My client made extensive use of SQL server features, including replication to a Enterprise SQL server used for BI, both reporting services (SSRS) and analysis services (SSAS).  As neither replication, nor reporting and analysis services are currently supported in RDS, we had to host our own SQL server.

What was being moved

A number of applications were hosted on the same server as the SQL Server 2008 server.  These included web services providing the data connectivity for a number of web sites, as well as the back office ERP system, responsible for all processing of orders, stock management, sales, customer support, etc.
In addition, there was extensive intergration with partner systems as well as with internally hosted Sage instance, including automated supplier payments.  Most of this was using SQL integration services (SSIS).
Most of the applications were .net web applications and were being moved to Elastic Beanstalk.  The SSIS services would be hosted on a separate SQL server in AWS - this would require some measure of manual configuration of several of the supporting services and extensions after the initial build and configuration.
Here I concentrate on building a SQL server.

Part 1: Building a SQL server in AWS

Amazon provides machine images with SQL server standard where licencing is included in the running costs.  You could alternatively bring your own licence (under certain conditions), but this did not provide the flexibility we were looking for.
I am not a specialist DBA, but I believe that the solution I put together provides a very reasonable SQL server installation, taking care of many of the standard configuration tasks that infrastructure engineers are probably not exposed to on a regular basis.  Parts of the solution may be unneccessary, and some are constrained by limitations of older versions of Windows Server.  An attempt was made to make it work with the Amazon AMI images of all versions of SQL server from 2008 SP3 to 2014, but images change, and the template and scripts changed with time, so nothing is guaranteed (use at own risk - i.e. test the build!).  You will find errors, and some things are unnecessary, but generally most of what's in there is / was required for an image somewhere.

The parts that are used

A cloudformation template is used to build the server.  While the template includes the majority of the configuration, there are a few additional scripts.  There is a PowerShell script, New-SQLServer.ps1 that is used to kick off the build, and five files are downloaded by the template and run - they could have been included, but downloading them from S3 seemed a better solution.

New-SQLServer.ps1

Please note, as I share this script, it includes a large number of default settings - you specify all parameters when calling the script, but many of them may as well have your defaults set (I had defaults for all except ComputerName).
Essentially, the script just does some checking on the parameters, finds the current AMI image for SQL server, and then calls New-CFNStack.
Just a few notes on security - the "DomainAdmin" should NOT be a domain admin, but a domain user with just sufficient rights to create computers in the required domain or even just the specific OU.
The user running the script needs to have AWS credentials loaded before running the script, with admin rights on AWS (you can lock this down to some extent, but it does require a lot), or assume a role that has the required rights.
Your Windows user has to have the rights to add the computer to the AD group.  This is not important, but if you have Read Only Domain Controllers in AWS, and they have been set to cache credentials for only one group of computers, you want the new server to be in that group.
Also note that the credentials specified (including domain user and password), will be visible to anybody with access to view CloudFormation stacks.

The template

The template is where most of the work happens.  Rather than having all of the scripts included in the template (which is how it started), or run by the New-SQLServer script, some are downloaded from S3 to the local file system, and executed from there.  I'll get to those files, but it means you have to have an S3 bucket set up with the correct scripts in there.  In addition, backups are stored in a different S3 buicket.
In addition to setting up the server, the template creates a policy that allows access to the two S3 buckets required, creates D: (data) and E: (backup) drives, and assigns the specified security group.  It's possible to create a security group during the creation, but it made more sense for me to have a single security group assigned to all SQL servers.
Note, the template won't work as it is - it contains mappings to specific subnets (you'll presumably want backend subnets) that will have to exist and match your environment.  Also note that (through pure laziness), I have hard coded the locale and time settings.
Essentially, the steps the template goes through on the server instance are the following:
1) Fix some basic issues (e.g. with firewalls), download files required later, and sets system locale and time zone, then restarts.
2) Rename the computer, then restarts.
3) Join the domain, then restarts.
4) Change SQL collation, and add Agent user to a few local groups to avoid issues later.
5) Do SQL configuration.  This runs SQLPS, as the SQLPS module isn't supported on old platforms.  Essentially, this script does the following:
- Set mixed mode authentication
- Calculate and set max server memory for SQL
- Turn on backup compression where supported
- Change the SQL Server name to match the computer name
- Set default file locations to the D: drive
- Change the service account for SQL server
- Enable TCP (not enabled by default on express)
- Change service account for SQL Server agent, enable and start agent
- Add tempdb files to any ephemeral drives you have
- Configure DB Mail and enable for SQL Server Agent
- Set up Ola Holgren's fantastic maintenance solution - you'll have to have it downloaded, and set some basic parameters, like saving to E:\SQLBackups, how long to keep backups on the server, etc.
- Schedule all the maintenance tasks - the schedule I use is as recommended.
- Create a scheduled task to upload backups to S3 on a daily basis.
6) Update Windows (strange that you have to go through as many contortions to get PowerShell to do an update).  As images are regularly updated, this is normally a small update, but is also one of the reasons you have to test!
7) Ensure remote management is running
8) Tell Amazon you're done.....

Backups

I mention the scheduled job.  The script triggered by the schedule looks for backups in the locations where the maintenance solution puts them, then uploads them to the defined S3 bucket.  It has three targets, Daily, Weekly and Monthly.  Set up lifecycle configuration for these three targets.  I have mine set up to keep daily backups for 15 days, weekly backups for 32 days, monthly backups for 32 days then move them to Glacier.

Files needed

The template downloads five files from a specified S3 bucket.
- SQL-Configmail.ps1 includes 5 additional parameters like SMTP server and credentials.  This could have been added, but I decided it's as easy to customise and fetch that from S3 instead.
- MaintenanceSolution.sql - download the newest version of Ola Holgren's solution and customise it if required.  Remember to change the backup location to E:\SQLBackups.
- SQL-ScheduleMaintenance.sql - a reasonable starting point for scheduling backup and maintenance jobs.
- Backup-Upload.ps1 - again, customise this file. 
- UploadBackups.xml - The template uses schtasks as a workaround for older versions of Windows Server, and this is the xml file that it wants as input.  Ideally create your own manually (or with PowerShell) then run schtasks /Query /XML /TN taskname.

What's missing?

If this was all there was to configuring SQL server, there'd be lots of DBAs out of work.  I hope this provides the basics - for me this provided the basic configuration that was then (with more scripts) used to mirror the live database to before failing over to this server.  I built a second server to act as mirror and a third (express) to act as witness.  A forth server hosted all SSIS jobs and acted as distributor.
All credentials were copied from the correct servers and pretty much everything else was scripted as well.
As a bonus, it was possible to automate a new test server by building it, downloading the most recent backup and anonymising the database.  It's also possible to script a rolling server upgrade (using mirroring), although halfway through you need to pause and update connection strings in all the applications.

I would appreciate any feedback.