Improving my OpenVPN Ansible Playbook

I had a working OpenVPN configuration. But it wasn’t the best it could be. The manpage for OpenVPN 2.3 (community.openvpn.net/openvpn/wiki/Openvpn23ManPage) was used to find particularly interesting options.

For most of the changes I had to find examples and more information through Googling, though blog.g3rt.nl/openvpn-security-tips.html is of particular note for popping up very often.

Improving TLS Security

  1. Added auth SHA256 so MACs on the individual packets are done with SHA256 instead of SHA1.

  2. Added tls-version-min 1.2 to drop SSL3 + TLS v1.0 support. This breaks older clients (2.3.2+), but those updated versions have been out for a while.

  3. Restricted the tls-ciphers allowed to a subset of Mozilla’s modern cipher list + DHE for older clients. ECDSA support is included for when ECDSA keys can be used. I’m uncertain of the usefulness of the ECDHE ciphers, as both my client and server support it, but the RSA cipher that’s 3rd in the list is still used. Continuing to investigate this.

The last 2 changes are gated by the openvpn_use_modern_tls variable, which defaults to true.

  1. New keys are 2048 bit by default, downgraded from 4096 bit. This is based on Mozilla’s SSL guidance, combined with the expectation of being able to use ECDSA keys in a later revision of this playbook.

  2. As part of the move to 2048 bit keys, the 4096 bit DH parameters are no longer distributed. It was originally distributed since generating it took ~75 minutes, but the new 2048 bit parameters take considerably less time.

Adding Cert Validations

OpenVPN has at least two kinds of certification validation available: (Extended) Key Usage checks, and certificate content validation.

EKU

Previously only the client was verifying that the server cert had the correct usage, now the verification is bi-directional.
OpenVPN, more about EKU checks: 1 & 2

Certificate content

Added the ability to verify the common name that is part of each certificate. This required changing the common names that each certificate is generated with, which means that the ability to wipe out the existing keys was added as well.

The server verifies client names by looking at the common name prefix using verify-x509-name ... name-prefix, while the client checks the exact name provided by the server.

Again, both these changes are gated by a variable (openvpn_verify_cn). Because this requires rather large client changes, it is off by default.

Wiping out & reinstalling

Added the ability to wipe out & reinstall OpenVPN. Currently it leaves firewall rules behind, but other than that everything is removed.

Use ansible-playbook -v openvpn.yml --extra-vars="openvpn_uninstall=true" --tags uninstall to just run the uninstall portion.

Connect over IPv6

Previously, you had to explicitly use udp6 or tcp6 to use IPv6. OpenVPN isn’t dual stacked if you use plain udp/tcp, which results in being unable to connect to the OpenVPN server if it has an AAAA record, on your device has a functional IPv6 connection, since the client will choose which stack to use if you just use plain udp/tcp.

Since this playbook is only on Linux, which supports IPv4 connections on IPv6 sockets, the server config is now IPv6 by default (github.com/OpenVPN/openvpn/blob/master/README.IPv6#L50), by means of using {{openvpn_proto}}6.

Hat tip to T-Mobile for revealing this problem with my setup.

To-do

  1. Add revoked cert check

  2. Generate ellptic curve keys instead of RSA keys However, as noted above, ECDHE ciphers don’t appear to be supported, so I’m not sure of OpenVPN will support EC keys.

  3. Add IPv6 within tunnel support (Possibly waiting for OpenVPN 2.4.0, since major changes are happening there)

This SO question seems to be my exact situation.

Both this SO question and another source are possibly related as well.

Tried splitting the assigned /64 subnet with:

ip -6 addr del 2607:5600:ae:ae::42/64 dev venet0
ip -6 addr add 2607:5600:ae:ae::42/65 dev venet0
  1. Investigate using openssl ca instead of openssl x509next version of easyrsa uses ca

, ,

No Comments

Using Amazon S3 + CloudFront + Certificate Manager to get seamless static HTTPS support

TL;DR: This post documents the process I took to get S3 to return redirect requests over HTTP + HTTPS to a given domain.

I’m trying to trim down the number of domains and subdomains that I host on my server, since I’m trying a new policy of moving servers every few months in an attempt to make sure I automate everything I can.

One of the things that I’ve done was to start consolidating static files under a single subdomain, and use 301 redirects in nginx to point to the new location. Thanks to Ansible, rolling out the redirect config is a matter of adding a new domain + target pair to a .yml file and running it against a server.

But it’d still be nice if I didn’t have as many moving parts – which meant that I looked at ways to get an external provider to host this for me. I decided to try getting S3’s static website hosting a try to see if it supported everything I wanted it to do. In this case, I want to return either a redirect to a fixed URL, or a redirect to a different domain, but same filename. Essentially, my nginx redirect configs are either return 301 kyle.io/$request_uri or return 301 kyle.io/fixed-location.

For the purposes of this post, let’s assume I have the domain tw.kyle.io that redirects to my Twitter profile.

Creating the S3 Bucket

I knew that S3 could do static site hosting – but the docs seem to indicate that while redirecting to a different domain is possible, it will still use the same path to the file. So this would work for the different domain, same file name case, but not the fixed URL case.

I found my solution in an example of redirecting on an HTTP 404 error – but it could be adjusted to redirect all the time by removing the Condition elements.

So let’s create the bucket with the AWS CLI: aws s3api create-bucket --bucket tw-kyle-io --create-bucket-configuration LocationConstraint=us-west-2

Then I had to apply the redirection rules. The CLI uses JSON format to specify the redirect rules, so to make things simple, I dumped the config into a file:

{
  "IndexDocument": {
    "Suffix": "redirect.html"
  },
  "RoutingRules": [
    {
      "Redirect": {
        "HostName": "twitter.com",
        "Protocol": "https",
        "ReplaceKeyWith": "lightweavr"
      }
    }
  ]
}

and then called it through the CLI: aws s3api put-bucket-website --bucket tw-kyle-io --website-configuration website.json. Note that the use of IndexDocument is misleading: I never actually created a file in S3, but the S3 API requires that something be specified for that key.

People who have created a static website in S3 might be saying that I used a bad bucket name, because now I can’t use CNAMEs with the bucket. Well, I have the useful experience of writing this after discovering that HTTPS requests to the bucket don’t work because S3 doesn’t support HTTPS to the website endpoint. CloudFront doesn’t have restrictions on bucket naming, especially where we’re going to use the website endpoint so we can use redirections. (Hat tip to a StackOverflow question and another one as well.)

Amazon Certificate Manager

So I ended up using CloudFront. But first, I had to create a certificate with ACM. Because I’m creating a subdomain cert, I had to use the CLI – the console only allows you to create a *.domain.com or domain.com cert.

aws acm request-certificate --domain-name tw.kyle.io --domain-validation-options DomainName=tw.kyle.io,ValidationDomain=kyle.io

I had to approve the certificate creation, which required waiting for the email. If you try to create you get an error about the cert not existing: A client error (InvalidViewerCertificate) occurred when calling the CreateDistribution operation: The specified SSL certificate doesn't exist, isn't valid, or doesn't include a valid certificate chain.

CloudFront Magic

Then, it was time to configure the CloudFront distribution. Which isn’t trivial, there’s a bunch of options, and it wouldn’t surprise me if I screwed something up. To get the options below, I created a distribution through the AWS console, then compared that against a generated template (aws cloudfront create-distribution --generate-cli-skeleton).

I put the following in a file named cf.json

{
    "DistributionConfig": {
        "CallerReference": "tw-kyle-io-20160402",
        "Aliases": {
            "Quantity": 1,
            "Items": [
                "tw.kyle.io"
            ]
        },
        "DefaultRootObject": "",
        "Origins": {
            "Quantity": 1,
            "Items": [
                {
                    "DomainName": "tw-kyle-io.s3-website-us-west-2.amazonaws.com",
                    "Id": "tw.kyle.io-redirect",
                    "CustomOriginConfig": {
                        "HTTPPort": 80,
                        "HTTPSPort": 443,
                        "OriginProtocolPolicy": "http-only",
                        "OriginSslProtocols": {
                            "Quantity": 1,
                            "Items": [
                                "TLSv1.2"
                            ]
                        }
                    }
                }
            ]
        },
        "DefaultCacheBehavior": {
            "TargetOriginId": "tw.kyle.io-redirect",
            "ForwardedValues": {
                "QueryString": false,
                "Cookies": {
                    "Forward": "none"
                },
                "Headers": {
                    "Quantity": 0
                }
            },
            "TrustedSigners": {
                "Enabled": false,
                "Quantity": 0
            },
            "ViewerProtocolPolicy": "allow-all",
            "MinTTL": 86400,
            "AllowedMethods": {
                "Quantity": 2,
                "Items": [
                  "HEAD",
                  "GET"
                ],
                "CachedMethods": {
                    "Quantity": 2,
                    "Items": [
                      "HEAD",
                      "GET"
                    ]
                }
            },
            "SmoothStreaming": false,
            "DefaultTTL": 86400,
            "MaxTTL": 31536000,
            "Compress": true
        },
        "Comment": "Redirect for tw.kyle.io",
        "Logging": {
            "Enabled": false,
            "IncludeCookies": false,
            "Bucket": "",
            "Prefix": ""
        },
        "PriceClass": "PriceClass_100",
        "Enabled": true,
        "ViewerCertificate": {
            "ACMCertificateArn": "arn:aws:acm:us-east-1:111122223333:certificate/3f1f4661-f01b-4eef-ae0d-123412341234",
            "SSLSupportMethod": "sni-only",
            "MinimumProtocolVersion": "TLSv1",
            "Certificate": "arn:aws:acm:us-east-1:111122223333:certificate/3f1f4661-f01b-4eef-ae0d-123412341234",
            "CertificateSource": "acm"
        }
    }
}

and then ran the command aws cloudfront create-distribution --cli-input-json cf.json.

Testing it

The first thing to test was simply curlling the newly created distribution – after waiting ages for it to actually be created. (I assume CloudFront is just continually working through a queue of distribution changes to each of the 40+ POPs, so it’s actually quite awesome.)

[[email protected] ~]$ curl -v d28h66yisj403x.cloudfront.net
* Rebuilt URL to: d28h66yisj403x.cloudfront.net/
>clipped<
< HTTP/1.1 301 Moved Permanently
< Location: twitter.com/lightweavr
< Server: AmazonS3
< X-Cache: Miss from cloudfront
* Connection #0 to host d28h66yisj403x.cloudfront.net left intact
[[email protected] ~]$ curl -v d28h66yisj403x.cloudfront.net
* Rebuilt URL to: d28h66yisj403x.cloudfront.net/
>clipped<
< HTTP/1.1 301 Moved Permanently
< Location: twitter.com/lightweavr
< Server: AmazonS3
< Age: 11
< X-Cache: Hit from cloudfront

Would you look at that, both the HTTP & HTTPS connections worked!

So I changed the CNAME in CloudFlare to point to the new distribution, and tried again, this time with tw.kyle.io.

[[email protected] ~]$ curl -v tw.kyle.io
* Rebuilt URL to: tw.kyle.io/
>clipped<
< HTTP/1.1 301 Moved Permanently
< Location: twitter.com/lightweavr
< Age: 577
< X-Cache: Hit from cloudfront
< Via: 1.1 f360bbb3d1999b5324e1d7ae31da1d7e.cloudfront.net (CloudFront)
< X-Amz-Cf-Id: kl8eZMDzH2BB7T3owENtjFkS2xtfcwqoOsZ4-SNxLY8LMdupbXrp9Q==
< X-Content-Type-Options: nosniff
< Server: cloudflare-nginx
< CF-RAY: 28d9413b3943302a-YYZ

Because I use CloudFlare’s caching layer, parts of the response are overwritten by CloudFlare (notably the Server section, among others). But we can still see that the request ultimately hit the CloudFront frontend. The request was still redirected, so everything looks good.

In Closing

The main thing I didn’t like about this setup? The obtuse documentation. I had far better grasp of everything working through the console, then applying that to the CLI. But there’s still tiny things.

  1. I hit the ancient ‘Conflicting Conditional Operation’ issue about S3 buckets being quick to be deleted from the console, but not actually deleted in the backend. So when I mistakenly thought that --region us-west-2 would be enough to get the S3 bucket to be created in us-west-2, it took ~1 hour to become useable again.

  2. Passing --region us-west-2 to aws s3api create-bucket will create a S3 bucket in the default region, us-east-1. You have to specifically pass --create-bucket-configuration LocationConstraint=us-west-2 to get it created in a specific region.

  3. There’s aws s3 and aws s3api. Why aren’t the s3api operations merged into s3? No other service has an api suffix.

  4. Having to trial and error to find out what parameters are required and what aren’t for different operations. The CloudFront create-distribution command was the worst offender of the commands I ran, just with the sheer number of parameters. I’m hoping the documentation improves before it comes out of beta.

  5. Weird UI bugs/features. Main one I noticed was the ACM certificate selector not being selectable unless a cert exists… and the refresh button is included in that. So the very first time I created a CloudFront distribution, I needed to refresh the page to be able to select the cert, losing my settings.

Now for the important question: Now that I’ve set it up, will I keep on using it? The simple answer is that I’m not sure. It’s nice that it’s offloaded to another provider that keeps it going without my intervention, and that after setting it up it won’t change. But at the same time, it’s another monthly expense. I’ve already put money into a server, and the extra load of a redirection config is negligible thanks to Ansible. There is some overhead with creating & managing Let’s Encrypt certs, particularly with the rate-limiting, and these redirects are entire subdomains, but I just set up an Ansible Let’s Encrypt playbook to run weekly, and the certs will be kept up to date.

It’s going to come down to how much I get billed for this.

No Comments

Using the Ansible Slurp module

I recently discovered the slurp module within Ansible when I was attempting to find new modules in Ansible 2.0. It is particularly interesting for me since I’ve been doing a bunch of stuff involving the contents of files on remote nodes for my OpenVPN playbook. So I decided to try using it in one of my latest playbooks and see how much better it is than doing command: cat <file>.

Using it

My usecase for slurp was checking if a newly bootstrapped host was Fedora 22, and upgrading it to Fedora 23 if it was. The problem in this case was that recent versions of Fedora don’t come with Python 2, so we can’t use gather facts to find the version of Fedora (and need to install Python2 before we do anything).

The suggested method was to install python using the raw command, and then run the setup module to make the facts available.

But I was going to reboot the node right after the install in any case, so I didn’t feel like running the full setup module, so this was a perfect place to try the slurp module.

Using it is simple – there’s only one parameter: src, the file you want to get the contents of.

Similarly, using the results is also simple, with one exception: The content of the file is base64 encoded, so it must be decoded before use. Thankfully, Ansible/Jinja2 provides the b64decode filter to easily get the contents into a usable form.

My final playbook ended up looking something like this:

gather_facts: no
tasks:
    - name: install packages for ansible support
      raw: dnf -y -e0 -d0 install python python-dnf
    - name: Check for Fedora 22
      slurp:
        src: /etc/fedora-release
      register: fedora
    - name: Upgrade to Fedora 23
      command: dnf -y -e0 -d0 --releasever 23 distro-sync
      when: '"Fedora release 22" in fedora.content|b64decode'

Functionally, it’s pretty much identical to using the old command: cat <file> , register, and when: xyz in cmd.stdout style to get & use the contents of files. All of those elements are still there, just renamed at most – register is still being used unmodified.

The fact that I’m using a dedicated module for it though makes my playbook look a lot more Ansible-ish, which is something I like. (And the fact I don’t need to have a changed_when entry is a strong plus for code cleanliness.)

No Comments

Backing up & restoring Jenkins

I’m moving my jenkins instance to a new server, which means meaning up & restoring it.

Backup

The nice thing about it is that it’s almost entirely self-contained in /var/lib/jenkins, which means I really only have 1 directory to backup.

I’m using duply to back the folder up – but it’s 1.9GB in size. So to save space & bandwidth, I’m going to exclude certain files. This is the content of my /etc/duply/jenkins/exclude file:

**/*.rpm
**/plugins/*/
**/plugins/*.jpi
**/plugins/*.bak
**/workspace
**/.jenkins/war

The main thing I’m excluding is the build artifacts – because I’m building RPMs, the SRPMS are rather large (nginx-pagespeed SRPMs weigh in at 110+MB), so I exclude all files ending in .rpm.

Next, I’m excluding most of the stuff in the plugin folder. My reasoning behind this is that the plugins themselves are downloadable. However, Jenkins disables plugins/pins plugins to the currently installed version by creating empty files of the form <plugin>.jpi.disabled/<plugin>.jpi.pinned. I want these settings to carry over between versions. Unfortunately trying + **/plugins/*.jpi.pinned showed that everything else got removed from the backup. I’m assuming this is due to the use of an inclusive rule, so the default include got changed to default exclude.

In any case, I end up explicitly excluding things I don’t care about, which is probably good if something that I might need ever ends up in the plugins folder.

I also exclude workspace because everything can be recreated by building from specific git commits if need be. The job information is logged in jobs/, so I can easily find past commits even though the workspace itself no longer exists.

Finally, I also exclude the jenkins war folder. I believe that this is an unpacked version of the .war file that gets installed to /usr/lib/jenkins. It seems to get created when I start jenkins itself.

With just these 6 excludes, I’ve dropped the backup archive size down to <5 MB, which is a big win.

Normally I’d just take a live backup while Jenkins is running, but in this case where I’m moving servers, I completely shutdown Jenkins first, before taking a final backup with duply jenkins full.

Restoring

For the restore, I first installed Jenkins using a Jenkins playbook from Ansible Galaxy. It’s fairly barebones, but it works well – and I don’t need to spend time developing my own playbook. I also installed duply, and I manually installed an updated version of duplicity for CentOS 7 from koji to get the latest fixes.

Once I got duply set back up, I restored all the files to a new folder with duply jenkins restore /root/jenkins. I restored it to a separate folder because duply appears to remove the entire destination folder if it exists, and I wanted to merge the two folders.

After the restore was complete, I ran rsync -rtl --remove-source-files /tmp/jenkins/ /var/lib/jenkins to merge the restored data into the newly installed Jenkins instance.

At this point, everything should have worked, except I was unable to login. After spending some time fruitlessly searching Google, I ran chown -R jenkins:jenkins /var/lib/jenkins, as the rsync didn’t preserve the file owner when it created the new files. Luckily enough, that fixed the problem, and I could now login.

I then spent a few hours working all this into an Ansible playbook so future moves are much easier.

, ,

No Comments

Ansible: Using register with with_items

The motivation for this came from trying implement running a command that depended on whether or not a previous command succeeded.

In this case, I was trying to make the creation of duply profiles idempotent. Duply will exit with an error if you attempt to create a profile that already exists, and I didn’t want that to interrupt my playbook.

My first thought was “stat the folder & check if it exists”, which got my this:

- name: Check if profile exists
stat: >
path={{duply_path}}/{{item.key}}/
with_dict: "{{folders}}"
register: profile
- name: Create duply profiles
command: duply {{item.0}} create
with_together:
- "{{folders}}"
- profile.results
when: item.1.stat.isdir is not defined

I semi-abused the with_together operator to link both my dict of folders & the results of the stat on each of the folders.

Turns out each element in the result from the stat call also contains a copy of the item it was called with, which meant my code could be reduced to

- name: Check if profile exists
stat: >
path={{duply_path}}/{{item.key}}/
with_dict: "{{folders}}"
register: profile
- name: Create duply profiles
command: duply {{item.item.key}} create
with_items: profile.results
when: item.stat.isdir is not defined

The key difference is that I used {{item.item}} to get access to the original item that was passed to the stat task.

Of course, it turns out that I don’t need the extra stat call, since I can just use the creates parameter to the command task, making my code even shorter:

- name: Create duply profiles
command: duply {{item.key}} create creates={{duply_path}}/{{item.key}}
with_dict: "{{folders}}"

I ended up dropping the whole register bit, but I wanted to note this down for the future, because it’s fairly interesting behaviour.

No Comments

Checking a SSL certificate’s expiry date with Python

Before I found the --keep-until-expiring option in the Let’s Encrypt command line client, I was thinking I’d have to parse the cert, extract the expiry date, then check it against the current date before returning True or False.

Thankfully I found the much easier option, but I decided to post the code I wrote to read the date just in case I need something like it in the future. Read the rest of this entry »

,

No Comments

Upgrading to Fedora 23 on OpenVZ

TL;DR: Run dnf –releasever 23 distro-sync instead of dnf system-upgrade on OpenVZ systems

I run Fedora on my servers almost exclusively. This means I usually fall behind in upgrading to the latest release, leading me to wonder why I don’t just go with the latest version of CentOS.

Then I have lovely cases where CentOS gets horribly outdated, and I remember why I like Fedora with its latest and greatest. (Yes I do like shiny things, thank you very much)

My servers are mostly OpenVZ based, for the simple fact that OpenVZ powered VPSes are rather cheap for what you get, especially where I don’t need high performance. I have just one bad thing about being OpenVZ based: I have no control over the kernel/boot sequence. The vast majority of the time, this isn’t an issue. Sadly, using dnf system-upgrade is one of the times when it is an issue.

Fedora 22 brought in a new way to upgrade your system – dnf system-upgrade. I’ve used it on my laptop, it’s pretty good compared to fedup and past solutions. However, the one thing that rarely failed me in the past was using the yum distro-sync functionality. (The only time I’ve had an issue with it was when the upgrade was stopped midway, but that’s another story.)

Read the rest of this entry »

, ,

No Comments

Let’s not Encrypt on CentOS5

TL;DR – Let’s Encrypt requires a newer version of OpenSSL than CentOS 5 has installed. Unless you want to pass around with compiling OpenSSL yourself, don’t try it.

When your friend will upgrade his CentOS 5 system "someday"

When your friend will upgrade his CentOS 5 system “someday”

Read the rest of this entry »

,

No Comments

Let’s Encrypt ALL THE THINGS

Got my first domain using a cert from Let’s Encrypt in under ~10 minutes, including setting up Let’s Encrypt itself. Yes, this is rather game changing.

Now to write ansible playbooks around it, and figure out how to get it working for proxied domains automatically.

Read the rest of this entry »

,

No Comments

On interviews and looking back at the future

I don’t think I’ve published a post on my job searches as part of my time in the SoftEng program at UWaterloo, even though I’ve got a few in my drafts.

Since I’ve had my last interview (and Jobmine has closed & matched), here’s some fun stats without naming companies:

  • Favourite interviewer line: “So our alphabet ends with Zee… actually wait, you go to Waterloo, right? So Zed.”
  • Favourite interviewer compliment: “If we hired based on domain name, you’d be our first choice!”
  • Most common uncommon interview question: “What’s your favourite TF2 class, and why?”
  • Most common common interview question: “Tell me about yourself!” (Still don’t have a fixed answer for this)
  • Most unexpectedly large number of interviews from a small-ish company: 4
  • Most unexpectedly small number of interviews from a large company: 1 (+ code test)
  • Weirdest application experience: Rejected on Jobmine, requested on LinkedIn (larger company, left hand doesn’t know what the right is doing)
  • Weirdest process experience: “Thanks for your interest in the Winter 2016 Internship position! We’d like to schedule your interview ASAP!” *interview* “Thanks for your interest in the Fall 2016 Internship position! We’d like to schedule your interview ASAP!”
  • Most annoying experience: “We’ll definitely consider you for the SRE position since you explicitly requested it!” “Here is the interview schedule for your SWE interviews”
  • # of code problems in phone/skype interviews left unfinished: 3
  • Best code writing platform: Coderpad
  • Worst code writing platform: Google Docs (“Let me just create some indentation groups for you”)
  • # of technologies to investigate: 6 (Stemcell, Consul, Elasticsearch, Kibana, Fluentd, Graphana)

Read the rest of this entry »

,

No Comments