Or: How I spent an afternoon doing a deep dive into the RPM spec and solving a problem for myself
tl;dr – Nginx Mainline packages are being built for Fedora & CentOS at copr.fedoraproject.org/coprs/kyl191/nginx-mainline/
My webserver’s running nginx 1.4.7, a version that hasn’t gotten non-bugfix attention since March 2013, according to the changelog. Oddly enough for a Fedora package, the version in koji is the stable branch – something that makes sense for CentOS/EPEL since that’s a long term support release, but not for Fedora. Doubly annoying was the fact that the packages for Fedora 20 (because Fedora 21 isn’t supported on OpenVZ yet) was one step behind the official stable – 1.4 instead of 1.6.
If I wanted mainline on Fedora, I was going to have to build it from source.
Read the rest of this entry »
So I wrote a bunch about VPSes a few months ago, and what I thought my future looked like with them. Well, a bunch has changed since then, and will going forward, so let’s go:
Full out cloud hosting
Still good for buy-as-you-need systems, still not right for my usecase. Other than reading about price drops in The Register/on Twitter, I’ll be passing over them.
Haven’t seen too much movement on this. I’m going to lump RamNode (www.ramnode.com) into the Cloudy VPSes section, simply because they’re at 5 locations. Sheer scale. Also, their prices are approaching Digital Ocean level. I’d still go with DO though – The cheaper plans are OpenVZ based, not KVM as DO is.
This is probably the most significant change – instead of migrating to a Digital Ocean droplet/Vultr node, I decided to go with Crissic.
I haven’t actually had my VPS with them go down over ~1 year (yet), and the weird SSH issues resolved themselves around December, so I decided to bite the bullet and extend my existing VPS plan with them. (Or should I say him – all the support tickets have been signed Skylar, so it’s looking like a one man operation.) Crissic is consistently coming in the top 5 in the LowEndTalk forum provider poll so that was additional validation.
With Nginx/PHP-FPM/MariaDB all going, I’m hovering around 300MB of RAM used. So I have a bunch of headroom, which is good.
I also picked up a bunch of NAT-ed VPSes – MegaVZ is offering 5 512MB nodes spread around the world for 20€. So I got them as OpenVPN endpoints. They’re likely going to be pressed into service as build nodes for a bunch of projects (moving Jenkins off my Crissic VPS), but those plans are still up in the air. We shall see…
The most interesting thing I found was Server Bidding. Hetzner (massive German DC company) auctions off servers that were setup for people, but have since cancelled their service. Admittedly, most of their cheaper stuff is consumer grade (i7 instead of Xeons), but I can’t really complain. There’s an i7-3770 with 32GB of RAM and 2x 3TB drives going for € 30.25/USD$34 right now. And prices only go down over time (until someone takes it).
That mix of price/specs is pretty much unmatchable. KVM servers are ~$7/month for a 1GB (and roughly scale linearly), so I’d be looking at $28/month for 4GB of RAM if I were to go normal routes. And that’s totally ignoring the 2x3TB HDD (admittedly, I’d want these in RAID 1, so effectively 3TB only.)
- Tasks do not like having the remote_user changed mid-playbook if you specify a SSH password
- Specifically, having an ‘ansible’ user created as the first task, then using that for everything in the rest of the playbook doesn’t work because ansible will always attempt to use the declared password for the newly created user, which promptly fails
- Solution: Separate runbooks!
- I kind of like the idea of having a new server group defined for the initial playbook
- People debate the usefulness of a separate config account, since it’s effectively root + key-based login. They have a point, since I can get the same security with disabling password-based auth.
- If your inventory file is +x, ansible will attempt to execute it, even if it is the standard inventory list (.INI-format plain text)
- Particularly annoying for me because I’m running ansible in a VirtualBox-powered VM, and using shared folders. Which translates to permissions of 777 in the shared folder in the Linux VM, and can’t be changed
- Solution: Python script that dumps host list to JSON
- The Ansible docs aren’t the greatest – groups of groups are possible in the static inventory list, but in a dynamic list you need to declare a empty list of hosts, thanks to a naive assumption in the ansible core
- I’m liking the look of the Ansible for DevOps book
- Ansible will automatically import GPG keys during a yum install if the matching GPG key hasn’t been imported yet (Seen after installing EPEL, then installing a package from there)
The VPS market is really really interesting to watch. Maybe it’s just me, but the idea of being able to get a year of a decent system for the price of a pizza is fascinating – and somewhat dangerous to my wallet. At my peak, I had 4 VPSes running at the same time – and each of them doing practically nothing. So I ran TF2 servers off the ones that could support it until the prepaid year was over.
It’s just so… attractive. My ‘own’ servers, without the expense of needing to buy and pizza boxes that sound like hurricanes and are apparently good places for cat puke. And now I find myself needing to get one to power everything in the near future because I’ve been unsatisfied with Dreamhost for about 2 years, and they finally pushed me over recently. So I decided to take notes with the target of migrating to having everything of mine on a VPS in December.
My usecase can be summed up simply as a always-on low traffic webserver, hosting a personal blog (WordPress., while I look at Ghost), MySQL database and Jenkins build server (them Lightroom plugins!). Somewhat price-sensitive – I’m not made of money, but at the same time I’m not going to sweat the price of a coffee at Starbucks. Bonus points for extra space for hosting friends’ sites though. They can pay me back by buying me a drink or something.
Cloud hosting is hard to quantify – off-hand, the big three are Amazon, Google And Azure. But I’m going to expand my definition to cloud = anything that’s on demand, per-usage charges. Which opens it up.
The big 3 are pretty much unsuitable. Amazon’s still aiming toward companies that need to spin up and down instances – and quickly prices themselves out looking at the alternatives. Even looking at the price of the heavy use reserved instances, that’s $51+ hourly fees/year for the smallest system (t2.micro with 1GB of RAM & a throttled CPU). But wait, that’s actually $4.25 + $2.16 = $6.41 per month (assuming the current $0.003/hour charge stays the same) which isn’t so bad.
Except I had a wonderful manager in the past who pointed out that it’s not the cost of the compute power that was the killer when it came to building a company on EC2, it’s the cost of the outbound bandwidth. Data out is still an extra fee. Oh, and disk space is charged per GB as well. The wonders of all those options!
Google Compute Engine and Microsoft Azure are pretty much the same. Google wants $6.05 for their lowest system (0.6GB of RAM & 1 “Virtual Core”, likely aka throttled CPU much like Amazon); Microsoft just states an estimated monthly price of $13/month for a A0 instance, for 1 core and 0.75GB of RAM. Here’s a nice difference though: It comes with 20GB of disk.
So the big 3 are out for me (I write, being able to look back after looking in other places). Let’s look at other providers, shall we?
So named because they are charge-per-use VPSes, but all the ancillary stuff is in the price like traditional VPSes.
I haven’t done a really extensive search for these type of providers. The biggest known one is DigitalOcean, and I found Vultr on Twitter. And then I found Atlantic.Net somewhere else. (I think a Hacker News thread on DigitalOcean, amusingly.)
The biggest thing for me recently was the sudden addition of Atlantic.net‘s lowest end system (www.atlantic.net/vps/vps-hosting/) – $0.99/month for 256MB of RAM, 10GB of disk and 1TB of transfer is pretty much unheard of. Making the ultra low price point generally available is entirely new, and it’s going to be interesting to see if anyone in the cloudy space matches it – it’s actually cheaper than traditional VPS providers for a small always on box. It’s definitely on my list to watch (and possibly get for things like a target for automatic MySQL backups).
DigitalOcean would have my vote right off the bat for that sweet sweet $100 in credit they’re giving to students (www.digitalocean.com/company/blog/were-participating-in-githubs-student-developer-program/), except for the fact it’s for new users only. Which is rather annoying.
All 3 providers appear to be about the same:
- Vultr appears to be trying to undercut DigitalOcean in the low end – More RAM/less space/same price for the cheapest box, same RAM/less space/cheaper for the next level up.
- Atlantic.Net also appears to trying to undercut DigitalOcean, just with more space and bandwidth instead of RAM. And literally cents cheaper! ($4.97 vs $5? Mhmm…)
- Meanwhile, DigitalOcean coasts along on brand recognition and testimonials of friends.
So cloudy VPSes are far better for the single always-on server usecase. How about traditional VPSes?
Tradiational VPSes and I have an interesting history. Prior to finding LowEndBox, VPSes were super expensive compared to shared hosting, eg www.1and1.com/vps-hosting – I still have a client/friend hanging onto his $50/month 1and1 VPS that I cringe about.
That pretty much changed when I found LowEndBox. For the ultralow low pricepoint (like Atlantic.Net), LowEndBox has had $10/year offers in the past, primarily for systems with less RAM (but 256MB has happened – lowendbox.com/blog/crissic-solutions-10year-256mb-openvz-vps-and-more-in-jacksonville-florida/)
Downside of traditional VPSes are that the better prices == prepay annually. No pay-as-you-go here.
- Virpus.com for 2 Xen VPSes in 2012 – I think they are one of the few who are doing Xen as opposed to OpenVZ or even KVM
- Hostigation in 2013 – for a personal project that never actually happened
- WeLoveServers in September 2013 – basic server for $19/year
- Crissic in early 2014 – Ostensibly better than WLS for $2/month that I hosted a client site on
Looking at the Virpus site, they’ve dropped their prices from $10/month to $5/month. I can’t remember what I paid for Hostigation, but I rarely even logged into that VPS.
WLS was at the very least decent. I used it mainly as an OpenVPN endpoint more than anything.
Crissic is good, but overly touchy on limits. I regularly get my SSH connection closed despite keepalives being set, and I can’t figure out why.
Overselling is a thing that happens though – CPU usage in particular is really limited, and suspensions happen. So I’m leery of moving my stuff to a traditional VPS provider without being able to test it. Especially if I just paid for a year of hosting.
New to me: RAM heavy VPSes
VPSDime (https://vpsdime.com/index.php) is a new-to-me provider. 6GB of RAM for $7 a month. Unfortunately, offset by only 30GB of storage and 2TB of bandwidth vs WeLoveServers’ 6GB/80GB/10TB for $19/month. Then again, I don’t really use that much storage or bandwidth.
Bonus: Storage VPSes!
The main reason I kept DreamHost for 2 years after I got annoyed with them was that their AUP allowed me to use RAW images on my photography site, and finding space for 500GB of photos was not cheap anywhere else. With the introduction of their new DreamObjects service (read: S3 clone), that changed, and my last reason for keeping them disappeared.
VPSDime has interesting storage VPS offers. 500GB of disk for $7 is $2 more than Amazon Glacier (not accounting for the fact that I’m unlikely to get to use all 500GB), but offset that against the fact that it’s online (no retrieval times) and I don’t have to pay retrieval fees. Maybe I’ll get to ditch one of my external drives…
Runner up: BuyVM – more established, but about double the price for the same amount of space
I’m inclined to see what Vultr’s 1GB/$7 month plan is like to host my personal stuff. Also thinking of getting a 1GB WLS VPS to be an OpenVPN endpoint/development server. Possibly Atlantic.Net‘s cheapest service instead, but an extra 58 cents per month isn’t going to break the bank and I have the advantage of not needing to give Atlantic.Net my credit card details. In the worst case I can walk away from WLS without worrying about my credit card being charged.
Yay for backups and Ansible making it possible to deploy a new server in minutes once I get the environment setup just right.
The SD Card slot is unfortunately on the PCI bus, so it doesn’t show up as a bootable device.
Have a /boot partition on an internal drive, point that at the SD card. Reclaimed ~900MB from Lenovo’s system restore partition to make a /boot partition. GRUB was added to the internal drive.
As suggested in ubuntuforums.org/showthread.php?t=986126&page=3&p=11915401#post11915401, edit the initram image to get SD card support added. In Fedora, this was done using dracut’s kernel modules option in /etc/dracut.conf, so any kernel updates would automatically get the appropriate modules added.
Sticking point – Fedora 17 seemed to have the drivers installed natively. Fedora 20 didn’t, systemd would die.
Main issue was naively assuming that grub was using the correct initramfs that was built. I installed Fedora, then did a yum update kernel to generate the initramfs image. But that didn’t work, so I tried various other things, not realising that the bugged initramfs image was still there, and because it was newer, GRUB was automatically using that instead of the initramfs that I was testing.
Other issue was selinux – the context of files no longer matched (oddly enough). Not sure how/why this happened, but it ended up that I couldn’t log in – GDM login screen never came up, and logging into a console just dumped me right back to the blank console login. Had to boot into single user mode (append “single” to the kernel boot line in grub) and found out it was selinux issues when I got avc denial errors about /bin/sh not having permissions to execute. set enforce=0 fixed this.
cd /run/media/liveuser/ba7fca87-462e-4e8e-91bc-494f352f5293/ mount -o bind /dev dev mount -o bind /sys sys mount -o bind /proc proc mount -o bind /tmp tmp mount -o bind /run/media/liveuser/b887b758-1bed-48f9-9ee9-4c78af34b487 boot (the bootable drive) chroot vi /etc/dracut.conf cd /boot dracut initramfs-3.11.10-301.fc20.x86_64.img `uname -r` --force
superuser.com/a/368640/35094 appears to suggest that I could boot grub on the internal drive and hand off control to the grub on the SD Card, which means that I can get SD Cards for different distros and not have to worry about keeping copies of the various vmlinuz/initramfses on the hard drive /boot. Which would be the ideal situation.
Posted by Sysadmin on July 20, 2014in
Or how I spent 3 cents on Digital Ocean to play MvM with my friends for 2 hours.
Maybe the MvM servers were having issues, but 4 different people trying to create a game didn’t work (or at least TF2 kept on saying ‘connection error’ – for everyone.
So I decided to try and spin up a server, like I used to do on EC2, except I decided to use the $10 of credit from Digital Ocean that I got when signing up, simply because Digital Ocean seemed a lot easier to use than Amazon’s Spot Instances.
Used Digital Ocean’s 1GB/1CPU node ($0.015/hour) in the Singapore location, no complaints about slowness/ping issues from the people in Singapore/Japan, but my ping in Ontario was ~330ms.
One line command, untested:
sudo yum -y install wget screen less vim && sudo service iptables stop && wget media.steampowered.com/client/steamcmd_linux.tar.gz && tar -xvzf steamcmd_linux.tar.gz && mkdir tf2 && ./steamcmd.sh +force_install_dir tf2 +login anonymous +app_update 232250 +quit && mkdir -p
cp linux32/steamclient.so ~/.steam/sdk32/steamclient.so &&
cd tf2 && echo "hostname famiry MvM
rcon_password tehfamiry" > tf/cfg/server.cfg && ./srcds_run -game tf -maxplayers 32 -console +map mvm_decoy
tf2cfg.info/server.php – TF2 server config generator
wiki.teamfortress.com/wiki/Dedicated_server_configuration – list of what each config option does
developer.valvesoftware.com/wiki/SteamCMD#Linux – Valve’s official guide on running TF2 servers.
pastebin.com/hcpMpmaZ & github.com/dgibbs64/linuxgameservers/blob/master/TeamFortress2/tf2server – fancy systems for automating setup & running of TF2 servers
Why Digital Ocean: In terms of money, 3 cents a week isn’t going to kill me. But I still have some Amazon credit, so it’d be nice to use that up first.
Looking at the prices (as of Jul 20), Digital Ocean is definitely cheaper than Amazon – the cheapest instance that looks like it would work is the t2.small instance, and that’s 4 cents an hour. (I’m pretty sure a t2.micro instance won’t be good enough) More interestingly, a m3.medium instance is ~10 cents an hour.
The m3.medium instance is interesting because it has a spot instance option – and the price when I checked it was 1.01 cents/hour. However, I’m pretty sure the OS install + the TF2 files would be larger than the 4GB of storage assigned to the instance, so I’d also need an additional EBS volume, say ~5GB. Those are $0.12/GB/month, so for 5GB for ~2 hours, so the cost would be pretty much negligible.
However, there is one final cost: data transfer out. Above 1GB, Amazon will charge 19 cents for a partial GB. Assuming I play 4 weekends a month, I’m pretty sure the server will send more than 1GB of data (exceeding the free data transfer tier), so I’d be charged the 19 cents. Averaging this out over 4 weekends, I’d get charged ~5 cents a weekend. Thus, even with the actual compute cost being lower, I’d still get charged more than double on EC2 than Digital Ocean.
But this is still only 7 cents a weekend.
The $10 of credit from Digital Ocean will last me approximately 6 years of weekend playing, assuming no price changes. I have ~$15 of Amazon credit, so it looks like I’ll get 10 years of TF2 playing in – and I’m pretty sure we’ll have moved onto a new game long before then.
Posted by Sysadmin on April 21, 2014in
A follow-up from kyl191.net/2012/08/rebuilding-a-partition-table-after-failed-resize/
Almost two years on,
Correcting errors in the Master File Table (MFT) mirror. Correcting errors in the master file table's (MFT) BITMAP attribute. Correcting errors in the Volume Bitmap. Windows has made corrections to the file system.
My drive is back! And seemingly OK! I’m celebrating by setting up a Python script to recursively run through my current photo backup and the drive and compare file checksums.
How I did it
The key thing was to isolate the drive and not use it. If I hadn’t done that, it would have been utterly unrecoverable.
I also remembered the layout of the drive: Exactly 500GB on the first partition, and the rest of the disk as an NTFS partition. The first partition was a member of a RAID5 set, which had LVM setup on top of it.
I had used GParted to extend the NTFS partition backwards, to get the extra 500GB. However, this failed for… some reason. I’m not too clear.
TestDisk wasn’t successful – It identified the LVM partition, then promptly skipped everything between the LVM header and where the LVM header said the partition ended. Which meant it skipped to the middle of the drive, since the RAID5 set was 1TB in size. And thus TestDisk refused to restore the partition, because it doesn’t make sense that a 2 TB drive has 2 partitions which take up more than that.
The harddisk (2000 GB / 1863 GiB) seems too small! (< 3463 GB / 3226 GiB) Results Linux LVM 0 65 2 130541 139 30 2097145856 LVM2, 1073 GB / 999 GiB HPFS - NTFS 65270 246 1 243201 13 12 2858446848 NTFS found using backup sector, blocksize=4096, 1463 GB / 1363 GiB
Having tried a bunch of methods in the 1.5 years+ and failing each time, I decided to finally go all the way and wipe out the (screwed up) partition table and recreate it. I didn’t know the original commands run to create the partition setup, so I ended up booting into a Fedora 13 Live image, and doing the ‘Install to Hard Disk’ option, selecting the disk as the install target. I was worried because it wouldn’t allow me to create a partition without also formatting it (Kind of makes sense…), so I terminated the install process after seeing “Formatting /dev/sdd1 as ext4″ – in other words, the first partition was being formatted.
I then turned to fdisk to create the partition, selecting the defaults which should have extended the partition to the end of the disk. However, there was some disagreement on what consistuted the end of the disk, leaving me with ~2MB of unallocated space. When I created the partition in Windows, it went all the way to the end of the disk. What this meant is that I ended up with a sector mismatch count (along the lines of “Error: size boot_sector 2858446848 > partition 2858446017″).
So I had a semi-working drive, just with a number screwed up. And what edits numbers on a drive? A hex editor, that’s what. So it was off to edit the NTFS boot sector, and the MBR. I had correct looking numbers from TestDisk’s analysis, so I plugged those in, and since I had the hex editor opened, I wiped out the LVM header at the same time.
Turned out wiping the LVM header was an excellent idea, because TestDisk then found the NTFS boot sector, and allowed me to restore it:
HPFS - NTFS 65270 246 1 243201 13 13 2858446849 NTFS, blocksize=4096, 1463 GB / 1363 GiB
After that, the disk still wouldn’t mount in Windows, but chkdsk at least picked it up. After letting chkdsk run overnight, I got my drive from August 2012 back, with (as far as I can tell) no data loss whatsoever.
That’s worth an awwyeah.
tl;dr: Scheduling tasks is hard
- We assume everything will go well, but we’re actually crap at estimating
- We confuse progress with effort
- Because we’re crap at estimating, managers are also crap at estimating
- Progress is poorly monitored
- If we fall behind, natural reaction is to add more people
Three stages of creation: Idea, implementation, interaction
Ideas are easy, implementation is harder (and interaction is from the end user). But our ideas are flawed.
We approach a task as a monolithic chunk, but in reality it’s many small pieces.
If we use probability and say that we have a 5% chance of issues, we would budget 5% because it’s one monolithic thing.
But the real situation is that each of the small tasks has a 5% probability of being delayed. Thus, 0.05^n
Oh, and our ideas being flawed? Yeah… virtually certainty that the 5% will be used.
Progress vs effort:
Wherein the man-month fallacy is discussed. It comes down to:
- Adding people works only if they’re completely independent and autonomous. No interaction means assign a smaller portion, which equates to being done faster
- If you can’t split it up tasks, it’s going to take a fixed amount of time. Example in this case is childbearing – you can’t shard the child across multiple mothers. (“Unpartitionable task”)
- Partitionable w/ some communication overhead – pretty much the best you can do in Software. You incur a penalty when adding new people (training time!) and a communication overhead (making sure everyone knows what is happening)
- Partitionable w/ complex interactions – Significantly greater communication overhead
Testing is usually ignored/the common victim of schedule slippage. But it’s frequently the time most needed because you’re finding & fixing issues with your ideas.
Recommended time division is 1/3 planning, 1/6 coding, 1/4 unit tests, 1/4 integration tests & live tests (I modified the naming)
Without checkins, if the schedule slips, people only know towards the end of the schedule, when the product is almost due. This is bad because a) people are preparing for the new thing, and b) the business has invested on getting code out that day (purchasing & spinning up new servers, etc)
When a schedule slips, we can either extend the schedule, or force stuff to be done to the original timeframe (crunch time!) Like an omelette, devs could increase intensity, but that rarely works out well
It’s common to schedule to an end-user’s desired date, rather than going on historical figures.
Managers need to push back against schedules done this way, going instead for at least somewhat data-based hunches instead of wishes
Fixing a schedule:
Going back to progress vs effort.
You have a delayed job. You could add more people, but that rarely turns out well. Overhead of training new people & communication takes it’s toll, and you end up taking more time than if you just stuck with the original team
Recommended thing is to reschedule; and add sufficient time to make sure that testing can be done.
Comes down to the maximum number of people depends on the number of independent subtasks. You will need time, but you can get away with fewer people.
The central idea, that one can’t substitute people for months is pretty true. I’ve read things that say it takes around 3 months to get fully integrated into a job, and I’ve found that to be largely true for me. (It’s one of my goals for future co-op terms to try and get that lowered.)
The concept of partitioning tasks makes sense, and again it comes back to services. If services are small, 1 or 2 person teams could easily take care of things, and with minimal overhead. When teams start spiraling larger, you have to be better at breaking things down into small tasks, so you can assign them easily, and integrate them (hopefully) easily. It seems a bit random, but source control helps a lot here.
Estimation is tricky for me, and will continue to be, since it only comes from experience – various ‘rules of hand’ that I’ve heard include take the time that you’ll think you’ll need, double it, then double it again.
But it’s a knock-on effect – I estimate badly, tell my manager, he has bad data, so he estimates wrongly as well… I’ve heard stories that managers pad estimates. That makes sense, especially for new people. I know estimates of my tasks have been wildly off. Things that I thought would take days end up taking a morning. Other things like changing a URL expose a recently broken dependency, and then you have to fix that entire thing… yeah. 5 minute fix became afternoon+. One thing which I’ll try to start doing is noting down how ling I expect things to take me, and then compare it at the end to see whether or not I was accurate. Right now it’s a very handwavey “Oh I took longer/shorter than I wanted, hopefully I’ll remember that next time!”
Which, sadly, I usually don’t.
Summary of the chapter:
Growing a program
A standalone product, running on the dev’s environment, is cheap.
It gets expensive if:
- You make it generic, such that it can be extended by other people. This means you have to document it, testing it (unit tests!), and maintain it.
- You make it part of a larger system. For example, cross-platformness. You have to put effort into system integration.
Each of those tasks takes ~3x effort of creating the original program. Therefore, creating a final product takes about ~9x the effort. Suddenly, it doesn’t look simple anymore.
- Sheer joy of making things. Especially things that you make yourself.
- Joy of making things for other people
- Fascination at how everything works together
- Joy of always learning
- Joy at working in an insanely flexible medium – a creative person, but the product of the creativity has a purpose. (Unlike poetry, for example)
In summary, programming is fun because it scratches an itch to design and make something, and that itch is surprisingly common among people.
Why not Software
- You have to be perfect. People introduce bugs. You are a person. Therefore you aren’t perfect, and a paradox occurs, which resolves in the program being less than perfect.
- Other people tend to dictate the function/objective of the program – leaving the writer with authority insufficient for his responsibility. In other words, you can’t order people around, even though you need stuff from them. Particularly infrastructure people, given programs that aren’t necessarily well working and they’re expected to make them run.
- Designing things is fun. Bug fixing isn’t. (This is the actual work part.) Particularly where each successive bug tends to take longer to find & isolate than the last one.
- And when you’re done, frequently what you’ve made is ready to be superseded by a new better program. Luckily, that shiny new thing is usually also in gestation, so your program gets put into service. Also, it’s natural – tech is always moving on, unlike your specs, which are generally frozen at a fixed point. The challenge is finding solutions to problems within budget and on time.
So I got ahold of the much talked about Mythical Man-Month book of essays on Software Engineering… and I’ve decided to read an essay a night, and muse about it, after writing a summary of the chapter (read: taking notes about the book, so I’m not just passively reading).
I agree with pretty much everything – and I’ll cover points in order of where they appear in the essay.
Growing a Program: The extra work done in getting systems integrated is pretty accurate. I think that’s driving a lot of the move towards offering everything as services instead of one monolithic thing. Moving to using a service means a lot of stuff is abstracted away for you – you can ignore the internal workings (more so than using a library, which you have to keep track of) in the hope that stuff works as advertised. So you save some time on the integration side of things by reducing the amount of surface area you have to integrate with.
However, the fleshing out of the program – writing tests and documenting everything, is harder to avoid. A lot of the boilerplate stuff is automated away by IDEs (auto generating test stubs, for example), but there’s still work that needs to be done to make stuff into a proper dependable system – and that’s really the stuff that’s separating the small scale, internal software from public scale.
Admittedly, that’s a bit of a tautology. But I think a lot of the growing is just forced by not wanting to keep on fixing bugs in the same code. By having a test against it, you know whether or not at the very least, the expected behaviour occurs.
Why software: I chose software over hardware in Uni because it’s so much more flexible (#5). I like making things (#1), especially those things which help people (#2). I do a mental happy dance every time someone posts a nice comment on Lightroom Plugin page on deviantArt. Though the happiness of understanding how things fit together (#3) is more of “Ha! Got <complicated API> to actually work!” And #4 is more frantically Googling so as to not look like an idiot to the rest of my team.
Why not Software: Uh… yeah. #1 & #3 – damn bugs. See the 6+2 stages of debugging. Sadly true, especially the moment of realisation, followed by how did that ever work. But fixing bugs is satisfying, particularly a new bug that you’ve never seen before. #2 – That’s, well, the nature of work when you’re not at the top. The authority/responsibility trade off is real. I like to think I’ve worked around it at Twitter by following Twitter’s “Bias towards action” guideline – I have submitted fixes for other projects, gotten reviews and submitted code. Much more efficient than filing a bug and saying “BTW, you’re blocking me right now.” And #4 – That goes along with the learning new stuff thing. Also, it’s probably a good thing that a new version will come along soon – you get closer to what the user wants by iterating. If you’ve stopping iterating, the product is either a) perfect, or b) cost/benefit analysis says it’s not worth updating, run it in pure maintenance mode.
Or c) you just really don’t care about it anymore. Which is really just a variation on b.