Fixing a mangled NTFS partition: success

A follow-up from

Almost two years on,

Correcting errors in the Master File Table (MFT) mirror.
Correcting errors in the master file table's (MFT) BITMAP attribute.
Correcting errors in the Volume Bitmap.
Windows has made corrections to the file system.

My drive is back! And seemingly OK! I’m celebrating by setting up a Python script to recursively run through my current photo backup and the drive and compare file checksums.

How I did it

The key thing was to isolate the drive and not use it. If I hadn’t done that, it would have been utterly unrecoverable.

I also remembered the layout of the drive: Exactly 500GB on the first partition, and the rest of the disk as an NTFS partition. The first partition was a member of a RAID5 set, which had LVM setup on top of it.

I had used GParted to extend the NTFS partition backwards, to get the extra 500GB. However, this failed for… some reason. I’m not too clear.

TestDisk wasn’t successful – It identified the LVM partition, then promptly skipped everything between the LVM header and where the LVM header said the partition ended. Which meant it skipped to the middle of the drive, since the RAID5 set was 1TB in size. And thus TestDisk refused to restore the partition, because it doesn’t make sense that a 2 TB drive has 2 partitions which take up more than that.

The harddisk (2000 GB / 1863 GiB) seems too small! (< 3463 GB / 3226 GiB)
     Linux LVM                0  65  2 130541 139 30 2097145856
     LVM2, 1073 GB / 999 GiB
     HPFS - NTFS          65270 246  1 243201  13 12 2858446848
     NTFS found using backup sector, blocksize=4096, 1463 GB / 1363 GiB

Having tried a bunch of methods in the 1.5 years+ and failing each time, I decided to finally go all the way and wipe out the (screwed up) partition table and recreate it. I didn’t know the original commands run to create the partition setup, so I ended up booting into a Fedora 13 Live image, and doing the ‘Install to Hard Disk’ option, selecting the disk as the install target. I was worried because it wouldn’t allow me to create a partition without also formatting it (Kind of makes sense…), so I terminated the install process after seeing “Formatting /dev/sdd1 as ext4″ – in other words, the first partition was being formatted.

I then turned to fdisk to create the partition, selecting the defaults which should have extended the partition to the end of the disk. However, there was some disagreement on what consistuted the end of the disk, leaving me with ~2MB of unallocated space. When I created the partition in Windows, it went all the way to the end of the disk. What this meant is that I ended up with a sector mismatch count (along the lines of “Error: size boot_sector 2858446848 > partition 2858446017″).

So I had a semi-working drive, just with a number screwed up. And what edits numbers on a drive? A hex editor, that’s what. So it was off to edit the NTFS boot sector, and the MBR. I had correct looking numbers from TestDisk’s analysis, so I plugged those in, and since I had the hex editor opened, I wiped out the LVM header at the same time.

Turned out wiping the LVM header was an excellent idea, because TestDisk then found the NTFS boot sector, and allowed me to restore it:

     HPFS - NTFS          65270 246  1 243201  13 13 2858446849
     NTFS, blocksize=4096, 1463 GB / 1363 GiB

After that, the disk still wouldn’t mount in Windows, but chkdsk at least picked it up. After letting chkdsk run overnight, I got my drive from August 2012 back, with (as far as I can tell) no data loss whatsoever.

That’s worth an awwyeah.

, ,

No Comments

Musings on the Mythical Man-Month Chapter 2

tl;dr: Scheduling tasks is hard

  1. We assume everything will go well, but we’re actually crap at estimating
  2. We confuse progress with effort
  3. Because we’re crap at estimating, managers are also crap at estimating
  4. Progress is poorly monitored
  5. If we fall behind, natural reaction is to add more people

Overly optimistic:

Three stages of creation: Idea, implementation, interaction

Ideas are easy, implementation is harder (and interaction is from the end user). But our ideas are flawed.

We approach a task as a monolithic chunk, but in reality it’s many small pieces.

If we use probability and say that we have a 5% chance of issues, we would budget 5% because it’s one monolithic thing.

But the real situation is that each of the small tasks has a 5% probability of being delayed. Thus, 0.05^n

Oh, and our ideas being flawed? Yeah… virtually certainty that the 5% will be used.

Progress vs effort:

Wherein the man-month fallacy is discussed. It comes down to:

  1. Adding people works only if they’re completely independent and autonomous. No interaction means assign a smaller portion, which equates to being done faster
  2. If you can’t split it up tasks, it’s going to take a fixed amount of time. Example in this case is childbearing – you can’t shard the child across multiple mothers. (“Unpartitionable task”)
  3. Partitionable w/ some communication overhead – pretty much the best you can do in Software. You incur a penalty when adding new people (training time!) and a communication overhead (making sure everyone knows what is happening)
  4. Partitionable w/ complex interactions – Significantly greater communication overhead

Estimation issues:

Testing is usually ignored/the common victim of schedule slippage. But it’s frequently the time most needed because you’re finding & fixing issues with your ideas.

Recommended time division is 1/3 planning, 1/6 coding, 1/4 unit tests, 1/4 integration tests & live tests (I modified the naming)
Without checkins, if the schedule slips, people only know towards the end of the schedule, when the product is almost due. This is bad because a) people are preparing for the new thing, and b) the business has invested on getting code out that day (purchasing & spinning up new servers, etc)

Better estimates:

When a schedule slips, we can either extend the schedule, or force stuff to be done to the original timeframe (crunch time!) Like an omelette, devs could increase intensity, but that rarely works out well

It’s common to schedule to an end-user’s desired date, rather than going on historical figures.

Managers need to push back against schedules done this way, going instead for at least somewhat data-based hunches instead of wishes

Fixing a schedule:

Going back to progress vs effort.

You have a delayed job. You could add more people, but that rarely turns out well. Overhead of training new people & communication takes it’s toll, and you end up taking more time than if you just stuck with the original team

Recommended thing is to reschedule; and add sufficient time to make sure that testing can be done.

Comes down to the maximum number of people depends on the number of independent subtasks. You will need time, but you can get away with fewer people.

The central idea, that one can’t substitute people for months is pretty true. I’ve read things that say it takes around 3 months to get fully integrated into a job, and I’ve found that to be largely true for me. (It’s one of my goals for future co-op terms to try and get that lowered.)

The concept of partitioning tasks makes sense, and again it comes back to services. If services are small, 1 or 2 person teams could easily take care of things, and with minimal overhead. When teams start spiraling larger, you have to be better at breaking things down into small tasks, so you can assign them easily, and integrate them (hopefully) easily. It seems a bit random, but source control helps a lot here.

Estimation is tricky for me, and will continue to be, since it only comes from experience – various ‘rules of hand’ that I’ve heard include take the time that you’ll think you’ll need, double it, then double it again.

But it’s a knock-on effect – I estimate badly, tell my manager, he has bad data, so he estimates wrongly as well… I’ve heard stories that managers pad estimates. That makes sense, especially for new people. I know estimates of my tasks have been wildly off. Things that I thought would take days end up taking a morning. Other things like changing a URL expose a recently broken dependency, and then you have to fix that entire thing… yeah. 5 minute fix became afternoon+. One thing which I’ll try to start doing is noting down how ling I expect things to take me, and then compare it at the end to see whether or not I was accurate. Right now it’s a very handwavey “Oh I took longer/shorter than I wanted, hopefully I’ll remember that next time!”

Which, sadly, I usually don’t.

No Comments

Musings on The Mythical Man-Month Chapter 1

Summary of the chapter:

Growing a program

A standalone product, running on the dev’s environment, is cheap.

It gets expensive if:

  1. You make it generic, such that it can be extended by other people. This means you have to document it, testing it (unit tests!), and maintain it.
  2. You make it part of a larger system. For example, cross-platformness. You have to put effort into system integration.

Each of those tasks takes ~3x effort of creating the original program. Therefore, creating a final product takes about ~9x the effort. Suddenly, it doesn’t look simple anymore.

Why Software

  1. Sheer joy of making things. Especially things that you make yourself.
  2. Joy of making things for other people
  3. Fascination at how everything works together
  4. Joy of always learning
  5. Joy at working in an insanely flexible medium – a creative person, but the product of the creativity has a purpose. (Unlike poetry, for example)

In summary, programming is fun because it scratches an itch to design and make something, and that itch is surprisingly common among people.

Why not Software

  1. You have to be perfect. People introduce bugs. You are a person. Therefore you aren’t perfect, and a paradox occurs, which resolves in the program being less than perfect.
  2. Other people tend to dictate the function/objective of the program – leaving the writer with authority insufficient for his responsibility. In other words, you can’t order people around, even though you need stuff from them. Particularly infrastructure people, given programs that aren’t necessarily well working and they’re expected to make them run.
  3. Designing things is fun. Bug fixing isn’t. (This is the actual work part.) Particularly where each successive bug tends to take longer to find & isolate than the last one.
  4. And when you’re done, frequently what you’ve made is ready to be superseded by a new better program. Luckily, that shiny new thing is usually also in gestation, so your program gets put into service. Also, it’s natural – tech is always moving on, unlike your specs, which are generally frozen at a fixed point. The challenge is finding solutions to problems within budget and on time.

So I got ahold of the much talked about Mythical Man-Month book of essays on Software Engineering… and I’ve decided to read an essay a night, and muse about it, after writing a summary of the chapter (read: taking notes about the book, so I’m not just passively reading).

I agree with pretty much everything – and I’ll cover points in order of where they appear in the essay.

Growing a Program: The extra work done in getting systems integrated is pretty accurate. I think that’s driving a lot of the move towards offering everything as services instead of one monolithic thing. Moving to using a service means a lot of stuff is abstracted away for you – you can ignore the internal workings (more so than using a library, which you have to keep track of) in the hope that stuff works as advertised. So you save some time on the integration side of things by reducing the amount of surface area you have to integrate with.

However, the fleshing out of the program – writing tests and documenting everything, is harder to avoid. A lot of the boilerplate stuff is automated away by IDEs (auto generating test stubs, for example), but there’s still work that needs to be done to make stuff into a proper dependable system – and that’s really the stuff that’s separating the small scale, internal software from public scale.

Admittedly, that’s a bit of a tautology. But I think a lot of the growing is just forced by not wanting to keep on fixing bugs in the same code. By having a test against it, you know whether or not at the very least, the expected behaviour occurs.

Why software: I chose software over hardware in Uni because it’s so much more flexible (#5). I like making things (#1), especially those things which help people (#2). I do a mental happy dance every time someone posts a nice comment on Lightroom Plugin page on deviantArt. Though the happiness of understanding how things fit together (#3) is more of “Ha! Got <complicated API> to actually work!” And #4 is more frantically Googling so as to not look like an idiot to the rest of my team.

Why not Software: Uh… yeah. #1 & #3 – damn bugs. See the 6+2 stages of debugging. Sadly true, especially the moment of realisation, followed by how did that ever work. But fixing bugs is satisfying, particularly a new bug that you’ve never seen before. #2 – That’s, well, the nature of work when you’re not at the top. The authority/responsibility trade off is real. I like to think I’ve worked around it at Twitter by following Twitter’s “Bias towards action” guideline – I have submitted fixes for other projects, gotten reviews and submitted code. Much more efficient than filing a bug and saying “BTW, you’re blocking me right now.” And #4 – That goes along with the learning new stuff thing. Also, it’s probably a good thing that a new version will come along soon – you get closer to what the user wants by iterating. If you’ve stopping iterating, the product is either a) perfect, or b) cost/benefit analysis says it’s not worth updating, run it in pure maintenance mode.

Or c) you just really don’t care about it anymore. Which is really just a variation on b.

No Comments

CyanogenMod 11 Nightly on a GSM Galaxy Nexus

I have a Galaxy Nexus, and have been pretty happy with it.  Then Google said it wouldn’t update the Galaxy Nexus to 4.4, and 4.4 is all swanky and new and most importantly, supports low memory devices better. So I want it. And guess what? CyanogenMod is pushing out 4.4 in CM11.

The real push for me to flash it is the fact that it has started dying on me – most annoyingly, the screen lit up to maximum brightness (while I was using it), then turned off. I suspect the screen just went dead, because I mashed the power button and tapped the blank screen and it vibrated at my touch, but the screen would only come back on after I pulled the battery and replaced it. Happening once is OK, I’ll chalk it up to cosmic ray events/random glitches. Having it happen 3 times in a single night is a bit much.

So it’s off to CyanogenMod!

First off, backups: I found a full backup method in an XDA thread. Reading through the thread pointed out that SMSes weren’t backed up, so first SMS Backups were handled by SMS Backup+.

Then I tried to get ADB setup. First roadblock: The Google USB Driver wasn’t in the SDK that I downloaded – had to use the SDK Manager to download it. Unfortunately, that still didn’t work, so I had to install the Samsung driver instead. I ended up with a “Samsung Android ADB Interface” entry in the Device Manager, so that was good.

However, adb seemed to have completely crashed, and it was weirdly even resistant to ending the process through Task Manager. adb.exe, java.exe and powershell.exe are all stubbornly refusing to close. This, however, was because Avast was restricting access to it, annoyingly enough. I disabled the Avast shields, and added the SDK folder as exempt from scanning.

One adb pull /sdcard/ and I had a backup of the contents of the internal SD card.  Because I had the shared stuff (I’m assuming that’s what the SD card stuff is), the next thing is adb backup -all -apk backup.ab. However, Google Music threw this for a loop here because the backup includes the cached music, so the final size of the backup was 5.11GB… which is excessively large.

Next step was actually unlocking the bootloader. fastboot required a different driver, so I ended up using the “Universal Driver” from XDA. After running fastboot oem unlock. I booted back into stock 4.2 (hadn’t flashed CM11 yet), and tried to restore my data using adb restore. It worked through ~half the apps, then died on restoring the google play store. Instead of retrying the restore, I forged ahead and flashed CM11.

I ended up using the instructions in CM’s HowTo install CM on maguro. I flashed the recovery with ClockWorkMod (discovering later that I could have actually used ClockWorkMod Touch instead), and then flashed PlayfulGod’s fork of CM11 for the Galaxy Nexus followed by flashing GApps Standard. (I specifically chose GApps standard because I wanted to get the camera APK integrated by default)

Now that I had CM11 installed, I then tried to restore my data. Which is where the weird part hit – The restore ran for a second or two, and then said “Restore completed”. After retrying a few times and getting the exact same thing, I hooked up adb logcat to monitor the restore and see what was happening.

Now I’m not too sure what was wrong, but BackupManagerService was consistently reporting “Incorrect password”. I tracked down the source, which I eventually decided that either it meant I was truly using the wrong password (I deemed this unlikely because the failed restore partially worked) or comparing the calculated key checksum with the stored key checksum failed. I heard stuff about Android’s encryption being weak, so they might have changed something, but a quick glance over the 4.2.1 and 4.4.2 BackupServiceManager files showed no obvious difference.

Whatever the case, my full adb backup was well and truly dead. So I just reverted to reloading my apps from the Play Store (ran through the list in My Apps) and waiting for the backup manager to restore whatever data Google had backed up. After installing everything, I restored Whatsapp from the database I backed up, and ran SMS Backup+ to restore all my SMS messages/call log from the backup.

So I had pretty much everything back where I wanted it, so the next thing in the list was to install a new kernel that would give me more RAM (since apparently TI ION maps a whole bunch of stuff into the RAM range, so the full 1GB wasn’t usable). That was a download and reboot into recovery mode to flash it, but straight forward since I’d already been poking around CWM.

And that was pretty much that. A shiny new (clean) install of Android 4.4.2 on an older phone, that actually revitalized it.

Future stuff:

  1. Use ClockWorkMod Touch instead of the standard ClockWorkMod (since my volume up button is broken, and scrolling through the list is tedious)
  2. Install the Genie widget to get the stock weather/news widget back
  3. In case I ever need it, recovering data from the internal memory
  4. List of potentially useful GNex resources

, ,

No Comments

Getting Django running on CentOS 6

Trying to follow this guide:

epel6 rpms only seem to install 8.4 at this time, not 9.3 (strangely, since 8.4 isn’t supported anymore, so… I have many questions)

Official 9.3 installation go!

But WTF1: It was installed to /usr/pqsql-9.3, which wasn’t in the search path, so all the createdb and etc commands didn’t work.

Had to su postgres, cd /usr/pgsql-9.3/bin, ./createdb trailstest, ./createuser -P -s trails

And I only discovered it was in /usr/pqsql-9.3 because I did rpm -ql postresql93-server

yum install virtualenv step went fine

yum install libpq-dev python-dev was wrong, yum install postgresql93-devel python-devel was the correct command

pip install psycopg2 failed because pg_config wasn’t in $PATH – surprise surprise. export PATH=$Path:/usr/pgsql-9.3/bin fixed that… (Also, I discovered that there’s no spacing for a reason! Bash syntax!)

And then I got a gcc not found message. My bad, though it’s not a dependency of python-devel? a yum install gcc later, and we’re good…

And now it’s password authentication. I created a user, but it’s not working… even root is failing…

>Pizza Interlude<

So, had “Peer authentication failed” messages, and “Ident authentication failed” messages. I fixed the issue – remove everything from the config file, it defaults to connecting over the local UNIX socket, and using the username which it is running as, so the user has to exist in postgres. If I want it to work with a specific username/password combination, I’d have to edit the hba.conf file to get any other authentication scheme running.

Trying to get DNS working… new domain isn’t resolving on the UWaterloo DNS servers, but it’s showing up on other servers. Trying to get dig installed, but I’m getting yum timeout errors. Realised belatedly that I blocked outgoing HTTP connections. Rerun iptables rule adding with dport and sport swapped for input and output. Now everything is working, dig is showing the domain is resolving, so I’m just going to wait a while for UW to realise that a new subdomain exists…

Gunicorn is complaining about missing django project files when I try to run it and bind to port 8001. Mkay, noted. The Django dev server (python runserver gives me a
working page, and I can get to it, so Django is set up! Wohoo! Time to get some GPS encoded in EXIF images up…

No Comments

Django + Nginx resources

For SE Hack Day: looks the best (along with and are Django + FastCGI because postgres

No Comments

CSS animations/transitions

Posted a bunch of stuff I came across to the WaterlUX group page, figure I might as well document them here too: (Lovely lovely annotated source at – Memories of Overused Powerpoint animations spring to mind… But I can see using the attention getters & fadeIn/Outs on a webpage.

Also, for icons without retrieving icon/font files (ie. all in JS)


, ,

No Comments

‘Solving’ SQL injection in Java

So during the summer I worked on a large enterprisey Java program. (Singleton pattern ahoy!)

One of the annoying things (besides massive code duplication) was it used database queries that naively appended user input (particularly search queries) onto selects.

And from my web background, I knew that SQL injection makes wiping the table trivial. Or even dropping the database.

So I wanted to convert as much stuff to a prepared statement as possible.

I started off with this:

Statement stmt = conn.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY);
ResultSet rs = stmt.executeQuery("select * from view where date = ' "+ inputBox.toString() + " ' ");
if (rs.first()) {

Trial and error get me this:

PreparedStatement test = conn.prepareStatement("select * from viewCalendar where date = ?");
test.setDate(1, new java.sql.Date(inputBox.toString())); 
ResultSet rs = test.executeQuery(); if ( {

The hardest part was replacing the functionality of rs.first() – the rest of the code requires a ResultSet that’s started at the first row, but the ResultSet returned by the prepared statement wasn’t. But the Java API docs had the solution – next() “moves the cursor forward one row from its current position. A ResultSet cursor is initially positioned before the first row; the first call to the method next makes the first row the current row.”

Doing this also makes the code cleaner – instead of having

if (rs.first()){
  do {
    //bunch of stuff
  } while (;

The equivalent becomes

while ({
    // Bunch of stuff

which I consider a whole lot cleaner & easier to read.

So I got my fix – my naive selects were now using preparedStatements (also, possibly JDBC/MSSQL execution path caching?), hence the solved bit in the title.

However, searches are still an issue – due to the way the multiple criteria used resulted in a variable number of parameters, and PreparedStatement not supporting variable numbers of parameters, I didn’t see an alternative to assembling a string, even if I modified the logic to insert nulls – because that would just cause the database to return no rows, because the columns don’t have NULL in the rows that I would want.

Hence the quotes around solved. :(


And a bonus: When assembling a variable parameter where string, don’t use

if (where.isEmpty()) where = string;
else where += " or " + string;

For one or two parameters it’s ok. But for 20+ parameters, you’re going to have a stupid amount of if/else blocks. I used an ArrayList (variable length!) of type string, and then had a method called buildWhere(ArrayList<String>) that builds a string parameter by parameter with ORs in the appropriate places.


No Comments

I broke… Java?

A... C error. In Java.  Huh.

A… C error. In Java.

Something from my summer job. I found it horribly amusing.


And then I fixed it.


No Comments

Recovering a broken F18 installation

For some reason (possibly a broken F17 upgrade which moved /lib around?) there were a bunch of empty files in /lib. So F18 refused to boot, with error messages like “/lib/ :File too short”

I discovered yum doesn’t like chroots: Kept on getting a message regarding build_time_vars missing. (Spoiler: This was actually Python!)

I also discovered that yum doesn’t like running from an x86_64 arch when the install arch is i686. Solution was to use setarch as mentioned here:
Essentially, spawn a new bash shell with arch set to i686, so yum will pick everything up as i686 and (re) install the correct files

Final command (with the borked system drive mounted as /media): yum –installroot=/media –skip-broken –releasever=19 distro-sync

Fingers crossed it works, though it can’t get any worse now…
And nope, it didn’t work. –skip-broken seems to have sent it into (circular) dependency hell.

Next thing to try is booting an F18 image and copying over /lib/*. Fingers crossed it’ll fix the lib-systemd missing problem.

Yay! No need to boot the F18 image and copy over stuff!

It was a bugged fedup upgrade from F18-F19. yum installed a bunch of packages, left everything in an inconcsistent state – like Python 2.7.5 was ‘installed’. but build_info stuff was never added/created as it would normally be. Manually redownloading python and python-libs for F18 (python-2.7.3), removing Python 2.7.5 (rpm -e python-2.7.5, python-libs-2.7.5) and then installing Python 2.7.3 got yum up and running again.

However, rpm -e $(rpm -qa|grep fc19)’ also removed glibc… which is needed for practically everything. (Should have been rpm -e $(yum list installed|grep fc19|awk {‘print $1′}) ). So I rebooted into the live system and tried downloading glib for fc19 – since everything wanted GLIBC_2.17, and F18 only had glibc-2.16.

Whoops. Ended up with another inconsistent system. Glibc was the only thing that was for F19. AND, because it was a dependency for just about everything, I couldn’t erase it, and yum wouldn’t let me downgrade it either.

So I decided to just do the distro-sync over it. And one distro0sync later, I discovered that I *really* should have mounted /boot (since it lived on a different partition) before installing a new kernel/initrd. So reboot into live cd, copy the files over from the root’s /boot (bloody shadowing!) into the proper /boot, mount the proper /boot into the live cd’s /boot (since doing it in a chroot made grub2-mkconfig complain), and run grub2-mkconfig -o /boot/grub2/grub.cfg.

Other than it mysteriously picking up fedup (I thought that had been disappeared!), it seems to have booted F19… at least, gotten part way there and frozen, but it’s 11:40 at night, I have a flight in 9 hours, and I kind of need to sleep, so I’m off for now…


And correcting the root=/dev/mapper/live-rw to the proper path (/dev/mapper/vg_lantea-lv_root) got it to at least boot past the root switch, but now services are failing to start, and I don’t know why… Possibly SELinux trouble, so I’m going to try putting enforcing=0 on the boot command line…

And it kind of worked, in that it gets to the login screen at least, though it won’t let me login…

, ,

No Comments