Categories: geek » analysis

RSS - Atom - Subscribe via email

Blog analysis for 2011: 173,363 words so far; also, using the Rails console to work with WordPress

| analysis, blogging, geek, review

How many posts did I post per month, not including this or future posts? (See the geek appendix below to find out how I got to the point of being able to run code snippets like this:)

posts = WpBlogPost.published.posts.year(2011)
posts.count(:id, :group => 'month(post_date)').sort { |a,b| a[0].to_i <=> b[0].to_i }

Result: [[“1”, 32], [“2”, 34], [“3”, 33], [“4”, 33], [“5”, 34], [“6”, 39], [“7”, 33], [“8”, 33], [“9”, 31], [“10”, 33], [“11”, 31], [“12”, 8]]

This is a straightforward SQL query to write, but ActiveRecord and scopes make it more fun, and I can easily slice the data in different ways. Becuase I’ve connected Rails with my WordPress data, I can use all sorts of other gems. For example, Lingua::EN::Readability can give me text statistics. It’s not a gem, but it’s easy to install with the provided install.rb. Code tends to throw off my word count, so let’s get rid of HTML tags and anything in pre tags, then calculate some text statistics:

include ActionView::Helpers::SanitizeHelper
require 'lingua/en/readability'
# Needs lots of memory =)
post_bodies = { |x| strip_tags(x.post_content.gsub(/<pre.+?<\/pre>/m, '')) }
all_text = post_bodies.join("\n").downcase
report =
Number of words in 2011 173,363
Flesch reading ease 65.3
Gunning Fog index 11.0
Flesch-Kincaid grade level 8.4

According to this, my writing should be readable by high school seniors, although they’ll probably have to be geeks in order to be interested in the first place.

The Readability library has other handy functions, like occurrences for finding out how frequently a word shows up in your text.

I 4375 #4 – It’s a personal blog, after all
you 1926 #9 – Not so bad
my 1555
time 933
people 897
work 710
W- 200
presentations 190
J- 133
Drupal 111
Rails 97
Emacs 77
zucchini 23 Oh, the summer of all that zucchini…

I want to get better at clear, specific descriptions. That means avoiding adjectives like ‘nice’ and hedging words like ‘really’.

really 227 Hmm, I can cut down on this
maybe 211 This one too
probably 211 Down with hedging!
awesome 88 I overuse this, but it’s a fun word
nice 15 The war on generic adjectives continues.

Let’s look at feelings:

happy / happiness / wonderful 107
busy 33
worried / anxious / worry 30
tired 20
excited / exciting 21
delighted 4
suck 4
sad 2

I recently used the N-Gram gem to analyze the text of Homestar reviews looking for recurring phrases. I suspected that one of the contractors we were considering had salted his reviews, and unusual recurring phrases or spikes in frequency might be a tip-off. I can use the same technique to identify any pet phrases of mine.

csv ='ngrams.csv', 'w')
n_gram =, :n => [2, 3])
csv << "NGRAM 2"
n_gram.ngrams_of_all_data[2].sort { |a,b| a[1] <=> b[1] }.map { |a| csv << a };
csv << "NGRAM 3"
n_gram.ngrams_of_all_data[3].sort { |a,b| a[1] <=> b[1] }.map { |a| csv << a };

The ten most common 3-word phrases on my blog tend to be related to planning and explaining. It figures. I can stop saying “a lot of”, though.

Phrase Frequency
i want to 158
a lot of 126
so that i 94
be able to 86
that i can 76
you want to 74
one of the 68
that you can 63
in order to 55
i need to 55

Some frequent two-word phrases:

i can 425
you can 408

Two-word phrases starting with “I’m…”

i’m going 52
i’m not 29
i’m looking 25
i’m working 24
i’m learning 23
i’m sure 16
i’m thinking 15
i’m glad 14
i’m getting 12

I wonder what other questions I might ask with this data…

Geek appendix: Using the Rails Console to work with WordPress data

The Rails console is awesome. You can do all sorts of things with it, like poke around your data objects or run scripts. With a little hacking, you can even use it as a smarter interface to other databases.

For example, I decided to get rid of all the syntax formatting that Org-mode tried to do with my blog posts when I published them to WordPress. Fortunately, this was the only use of span tags in my post content, so I could zap them all with a regular expression… if I could confidently do regular expressions in the MySQL console.

In the past, I might have written a Perl script to go through my database. If desperate, I might have even written a Perl script to do a regular expression replacement on my database dump file.

Rails to the rescue! I decided that since I was likely to want to use data from my WordPress blog in my Rails-based self-tracking system anyway, I might as well connect the two.

I found some code that created ActiveRecord models for WordPress posts and comments, and I modified it to connect to a different database. I added some scopes for easier queries, too.

class WpBlogPost < ActiveRecord::Base
  establish_connection Rails.configuration.database_configuration["wordpress"]

  set_table_name "wp_posts"
  set_primary_key "ID"

  has_many :comments, :class_name => "WpBlogComment", :foreign_key => "comment_post_ID"

  def self.find_by_permalink(year, month, day, title)
         :conditions => ["YEAR(post_date) = ? AND MONTH(post_date) = ? AND DAYOFMONTH(post_date) = ? AND post_name = ?",
                         year.to_i, month.to_i, day.to_i, title])

  scope :posts, where("post_type='post'")
  scope :published, where("post_status='publish'")
  scope :year, lambda { |year| where("year(post_date)=?", year) }
class WpBlogComment < ActiveRecord::Base
  establish_connection Rails.configuration.database_configuration["wordpress"]

  # if wordpress tables live in a different database (i.e. 'wordpress') change the following
  # line to set_table_name "wordpress.wp_comments"
  # don't forget to give the db user permissions to access the wordpress db
  set_table_name "wp_comments"
  set_primary_key "comment_ID"

  belongs_to :post , :class_name => "WpBlogPost", :foreign_key => "comment_post_ID"

  validates_presence_of :comment_post_ID, :comment_author, :comment_content, :comment_author_email

  def validate_on_create
    if WpBlogPost.find(comment_post_ID).comment_status != 'open'
      errors.add_to_base('Sorry, comments are closed for this post')


I specified the database configuration in config/database.yml, and granted my user access to the tables:

  adapter: mysql
  encoding: utf8
  database: wordpress_database_goes_here
  username: rails_username_goes_here

After I rigged that up, I could then run this little bit of code in Rails console to clean up all those entries.

WpBlogPost.where('post_content LIKE ?', '%<span style="color:%').each do |p|
  s = p.post_content.gsub /<span style="color:[^>]+>/, ''
  s.gsub! '</span>', ''
  p.update_attributes(:post_content => s)

Cleaning up subscripts (accidental use of underscore without escaping):

WpBlogPost.where('post_content LIKE ?', '%<sub>%').each do |p|
  s = p.post_content.gsub /<sub>/, '_'
  s.gsub! '</sub', ''
  p.update_attributes(:post_content => s)

Now I can use all sorts of other ActiveRecord goodness when generating my statistics, like the code above.

Tracking and organizing my clothes: substituting mathematics for fashion sense

Posted: - Modified: | analysis, clothing, geek, organization, photography, quantified, rails

Thumbnails of clothes

Inspired by my sister’s photo-assisted organization of her shoes, I decided to tackle my wardrobe. Taking an inventory would make it easier to simplify, replace, or supplement my clothes. Analyzing colour would help me substitute mathematics for a sense of style. Combining the images with the clothes log I’ve been keeping would make it easier to see patterns and maybe do some interesting visualizations. Geek time!

I took pictures of all my clothes against a convenient white wall. I corrected the images using Bibble 5 Pro and renamed the files to match my clothes-tracking database, creating new records as needed. AutoHotkey and Colorette made the task of choosing representative colours much less tedious than it would’ve been otherwise. After I created a spreadsheet of IDs, representative colours, and tags, I imported the data into my Rails-based personal dashboard, programming in new functionality along the way. (Emacs keyboard macros + Rails console = quick and easy data munging.) I used Acts as Taggable On for additional structure.

It turns out that the math for complementary and triadic colour schemes is easy when you convert RGB to HSL (hue, saturation, lightness). I used the Color gem for my RGB-HSL conversions, then calculated the complementary and triadic colours by adding or subtracting degrees as needed (180 for complementary, +/- 120 for triadic).

Here’s what the detailed view looks like now:


And the clothing log:


Clothing summary, sorted by frequency (30 days of data as of writing)



  • White balance and exposure are a little off in some shots. I tweaked some representative colours to account for that. It would be neat to get that all sorted out, and maybe drop out the background too. It’s fine the way it is. =)
  • Matches are suggested based on tags, and are not yet sorted by colour. Sorting by colour or some kind of relevance factor would be extra cool.
  • Sorting by hue can be tricky. Maybe there’s a better way to do this…
  • My colour combinations don’t quite agree with other color scheme calculators I’ve tried. They’re in the right neighbourhood, at least. Rounding errors?
  • I’ll keep an eye out for accessories that match triadic colours for the clothes I most frequently wear.
  • Quick stats: 28 casual tops, 15 skirts, 12 office-type tops, 8 pairs of pants, 5 pairs of slacks – yes, there’s definitely room to trim. It would be interesting to visualize this further. Graph theory can help me figure out if there are clothing combinations that will help me simplify my wardrobe, and it might be fun to plot colours and perhaps usage. Hmm…

Other resources:

Quantified: How I spent seven weeks

| analysis, geek, quantified

At the other Quantified Self Toronto meeting, I promised to get back into time tracking and to share my results. I’ve got seven full weeks of data from August 6 to September 23, and I can start exploring a few interesting angles.

Influenced by the OECD time study, I’ve categorized my time into sleep, work, unpaid work, personal care, and discretionary time. Sleep and work are self-explanatory. Unpaid work cover the routine things I could theoretically pay someone else to do: chores, cooking, and so on. I also include travel and commute time. Personal care involves daily routines. Discretionary time includes connecting with other people, responding to mail, exploring personal interests, and other things I choose to do.

I slept an average of 8.2 hours a day. I’ve been trying a different pattern: stay up until I feel sleepy, and wake up at around the same time. This gets me mostly in sync with my night-owl husband W-, who gets by on less sleep than I do. (Maybe it’s because he drinks coffee and I don’t.) Lately, I’ve been working on being in bed by 11, and sometimes even earlier.

Staying up means getting more discretionary time, as my wake-up times generally don’t shift unless my phone’s powered off or I sleep through my alarm. (Happened twice, fortunately with no consequences.) I think it has to do with lots of sunlight in the morning – it makes it much easier to get up. Sunrise will get later and later, though, so I’ll need to adapt.

More usefully, staying up later means creating the possibility of chunks of focused time, which is great for things like playing around with the Arduino or working on personal code. For some interests, a four-hour chunk may be better than two two-hour chunks. Setting up for woodworking or sewing can take time, for example, so it might be better to batch things.

Did I take advantage of those chunks of time? Here’s what the numbers say:

Time in 49 days Typical activities
4-5-hour chunks 3 working on personal projects (2), electronics (1)
3-hour chunk 5 volunteering (4), blogging (1)
2-hour chunk 21 writing (6), personal projects (5), electronics (3), drawing (2), piano (1), relaxing (1), volunteering (1), learning (1), reading (1)
1-hour chunk 41 writing (10), personal projects (7), drawing (7), relaxing (6), other (3), reading (3), volunteering (2), piano (1), learning (1), sewing (1)
Less than 1 hour 153 writing (42), drawing (26), personal projects (21), relaxing (21), reading (14), other (9), piano (8), learning (6), delegating (2), Latin (2), volunteering (1), gardening (1)

This tells me that freeing up a 4-hour chunk isn’t super-important, and that I can squeeze a lot of activities into the nooks and crannies of a regular sort of day.

Sleep: When I stayed up late, I felt like the discretionary time was occasionally of lower quality. It’s not quite about being tired, more like not being as excited. Maybe being up early gives you a certain smugness and feeling of control. Maybe it’s about momentum. I can see if I can move my chunks of time earlier in the morning (downside: less ambient socialization), or if I can tweak my afternoon my momentum (start work a little earlier, use a nap or household routines to transition from work, then rock on).

Tracking time affects how I spend my day. It’s like the way tracking expenses can influence what you choose to spend on. (I track practically all my expenses – tracking’s great for making better decisions.) Mostly, tracking time encourages me to keep work within limits, because I know I’ve only got so many discretionary hours to spend on my own interests.

I tend to work about 40 hours a week, sometimes a little more. This doesn’t mean that I watch the clock, waiting for the seconds to tick by. If I’m in the zone, I’ll code until I come to a good place to stop. I’ve been tweaking my non-billable work to focus on the things I can make the most difference in. For example, I maintain a Lotus Connections toolkit to help people make community newsletters and get metrics. I tend to focus on small, quick fixes that help many people. Anything bigger than that gets added to my list, and I encourage people to find someone who can work with the source code if they need it sooner. I also nudge people to send happy-notes to my manager, as he needs to provide air cover for these sorts of things whenever there’s a heavy focus on utilization.

Limiting my work hours also means that I focus more on work when I’m at work. I’ve planned the projects based on how much time I think I’ll need to finish the work, and I don’t want to get into a last-minute scramble at the end. Although my estimates factor in a reasonable buffer for meetings and other interruptions, I still don’t want to waste that margin. Result so far: pretty happy clients. My manager is happy too, as my estimates aren’t over-optimistic. (In fact, I tend to turn things around quickly, but that’s more of a bonus.) It also helps that I know I’ll have discretionary time for exploring other interests.

Our routines fit our life well. There aren’t any big gaps where I could significantly improve things for a small investment of time or money. I’m working on misplacing things less often. We’re going to experiment with scaling up. I’ve considered outsourcing or getting assistance with food preparation, but I still have to crunch the numbers on whether the increase in discretionary time makes up for the increase in our food budget. There’s no point in doing it if I’m going to waste the time, but maybe it compares well with delegating or postponing other things I want to do.

12-Aug 19-Aug 26-Aug 2-Sep 9-Sep 16-Sep 23-Sep Total Percentage of total time
UW – Cooking 6.4 4.7 1.5 7.4 3.1 4.1 1.2 28.4 2%
UW – Tidying 2.5 5.0 3.8 3.7 5.7 6.3 3.6 30.5 3%
UW – Travel 0.8 0.6 1.4 5.2 2.8 10.8 1%
P – Eating 5.0 6.1 2.0 5.0 2.8 1.7 2.1 24.5 2%
Unpaid work total 8.9 10.5 5.3 11.7 10.2 15.5 7.6 69.7 6%
P – Exercise 5.9 2.5 12.2 6.2 5.6 2.7 5.5 40.5 3%
P – Prep 0.0 0.0 0%
P – Routines 7.7 7.9 8.2 6.1 6.3 11.0 8.7 55.9 5%
Personal care 18.6 16.4 22.4 17.2 14.8 15.3 16.3 120.9 10%

My “discretionary time” allowance stays pretty consistent. It turns out that I have roughly 4.6 hours of discretionary time during weekdays and 9.3 hours of discretionary time during weekends. What I choose to spend that time on tells me about my changing interests. For example, I’ve been shifting time from Latin and piano to electronics and drawing. I’m pretty happy with that decision, although I’m thinking I might shift some time back to Latin so that I don’t lose too much to forgetting. We’ve been volunteering a lot, so we’ll see how that works out.

Discretionary time:

12-Aug 19-Aug 26-Aug 2-Sep 9-Sep 16-Sep 23-Sep Total Percentage of discretionary time
D – Break 0.7 2.4 1.9 2.4 2.0 3.4 6.0 18.8 6%
D – Delegating 0.6 0.1 0.7 0%
D – Drawing 4.1 4.3 10.0 2.0 3.2 1.9 0.7 26.2 8%
D – Electronics 2.0 2.0 1%
D – Gardening 0.2 0.2 0%
D – Latin 1.4 0.5 1.9 1%
D – Learning 0.2 1.2 9.5 10.8 3%
D – Other 4.9 4.9 2.5 12.2 4%
D – Personal 3.9 13.5 12.3 0.8 12.8 43.2 14%
D – Piano 6.6 2.6 9.2 3%
D – Reading 0.7 3.4 0.1 5.5 2.9 0.3 13.1 4%
D – Sewing 1.6 1.6 1%
D – Shopping 1.1 2.0 2.5 3.4 11.9 20.9 7%
D – Social 11.5 11.2 7.8 9.0 19.2 12.1 4.7 75.5 24%
D – Volunteering 6.3 8.0 3.8 3.5 3.7 25.4 8%
D – Writing 8.1 6.6 5.0 11.4 7.2 11.4 0.8 50.4 16%
Discretionary time total 39.5 46.4 47.0 43.3 52.0 43.0 40.9 312.2

How can I make this even better?

  • Plan the projects I want to focus on, list the next actions, and see how much of my discretionary time is used for making tangible progress towards long-term goals. It’s like the way I analyze my expenses based on short-term goals and long-term goals.
  • Shift wake-up a little earlier so that I can experiment with two smaller chunks of time instead of just one evening chunk.
  • Experiment with greater delegation.
  • Experiment with finer-grained tracking using notes.
  • Continue adding to my life dashboard (currently tracking time and clothes).

2011-09-02 Fri 19:45

Thinking about getting better at decisions

Posted: - Modified: | analysis, decision

I like analyzing my decisions. Writing about the alternatives I consider helps me think about them more deeply. Reviewing my decisions helps me learn even more. Sharing the decisions and the thought processes behind them helps me help other people.

How can I get even better at tracking and sharing my decisions? I want to get even better at remembering my reasons for decisions (useful during moments of doubt), revisiting my assumptions, and writing down additional benefits or costs.

I’ve posted the occasional decision review, but I think I’d benefit from something more structured than my blog. Maybe it’s time to resurrect some kind of a personal wiki system.

I’d like to have a system for logging and regularly reviewing decisions. I might prototype this using an Org-mode large outline text file. I already use it to write about the decisions I want to make or the decisions I’ve recently made. I can go back and write about other decisions I’ve made, and I can start structuring the file. Using Org Mode will make it easy to organize decision notes into an outline, integrate it with my task and calendar reminders, view table-based summaries, and publish snippets to my blog. If I get into the habit of scheduling reviews and thinking of questions that I might ask myself during a decision review, then I can learn even more from the decisions I make.

Decision: Write about decisions with more structure in Org and with regular reviews

Expected costs: Writing time (the software is free), occasional social risks of publishing decision notes

Expected benefits: Even more confidence in decision-making, ability to help more people with similar decisions, interesting records, fewer moments of doubt (very few already, but just in case!), deeper analysis

Alternatives considered:

  • Don’t write about decisions: Right.
  • Write only about major decisions: Small decisions are useful, too!
  • Keep decision notes just as blog posts: Hard to review over time.

Next review: In three months ( 2011-12-11 Sun)

  • How many decisions have I written about?
  • How many decisions have I reviewed?
  • How many notes have I published?
  • How have I used my notes to help improve my decision-making?

Decision: Not getting an Ontario Science Centre family membership

Posted: - Modified: | analysis, decision, life

From Sept. 5: We had fun at the Ontario Science Centre. I like science centres. I have lots of great memories of going to science centres and playing around with exhibits. We’ve decided not to buy a family membership for now, though – we’ll just buy tickets as we go. Here’s what I’ve been thinking:

Cost of family membership: $120/year Break-even point: at least two visits per year

Exhibits I liked today:

  • Stereoscopic photographs: I always like these. I think depth perception is fascinating.
  • Reptiles (special exhibition): The snake-necked turtle (Chelodina mccordi) was really cool. I also liked the exhibit showing how the fangs of snakes hinge when they close their jaws.
  • Scents: Of the five scents they had (leather, laundry, flowers, earth, vanilla), it turns out that I like the smell of clean laundry the most. So domestic!
  • Oil pumps: Mechanics and hydraulics, yay
  • The globe: I hadn’t realized China was so mountainous. I enjoyed seeing the continental shelves and looking at the underwater contours, too.
  • Paper airplanes: The paper supplies were all gone, so I picked up other people’s planes and refolded them or just threw them. I liked how they had hoops and a target if you wanted to try stunt or precision flying.

A number of new exhibits joined most of the old stalwarts. I was looking for some of the exhibits I remembered, but I couldn’t find them. That’s okay! =)


  • Equipment and exhibits make it easier to explore scientific principles (ex: pumps, levers, sound, etc.)
  • Multisensory experience / scale helps in understanding (ex: anatomy, geology, and so on)
  • Special exhibits provide additional reasons to return
  • Volunteers share their interest in science
  • Exhibits prompt you to explore things you might not have sought out by yourself (ham radio, etc.)
  • Exhibits validate interest (paper airplanes can be cool!)
  • Can use exhibits to support classroom learning:


  • Busy-ness and noise can be overwhelming
  • Tends to encourage shallow explorations/entertainment instead of deeper engagement. Hard to slow down and get deeper into something because of background noise, consideration for other people, and distractions from other exhibits
  • Pricey


J-‘s grade 8 curriculum topics:

Events that might be interesting:

Back to the decision. We’ll probably not pay for a family membership immediately. We’ll reconsider this if we find ourselves going again within a year, and if we foresee a third trip within the year. If we end up going twice in a year, then our total cost is roughly the same with or without a membership – no loss there. If we go three times, then we’ll end up paying more in total, but that’s okay because the membership will cover additional months during which we might make a fourth visit.

We’re going to put off getting a membership until we determine what frequency we’d like to go. We’ve had a family membership to the Ontario Science Centre in the past, and we made excellent use of it including trips to museums with reciprocal agreements. With lots of things changing this year, we’re going to hold off on that commitment to avoid the “I’m going to pay for the gym so that I get encouraged to use it” effect. We like science, and there are many, many ways to explore it.

Also – Is it odd that I recognize Ontario Science Centre exhibits described in other museums? I was reading Andy in Oman’s blog post about the OSC donations and I vividly remembered most of the exhibits mentioned. Including that land-like-a-cat one, which I tried many times. (Cat-related! ;) ) I had hoped to try it today with my Vibram toe shoes, but it’s probably abroad. What can I say? I like science centres. =)

Some of my favourite exhibits from other science centres:

  • The giant soap bubble exhibit from the Exploratorium
  • Tactile Dome (Exploratorium)
  • Catenary arch building blocks
  • Foucault’s pendulum traced with sand
  • Newton’s cradle
  • Kinetic sculptures
  • Rock polishing and panning for gold at Science North (ah, the stories)

I think one of the things I loved about growing up with the science museum in Manila was that there were often few visitors there. Looking back, I can wish now that it was better patronized, but I remember really appreciating the freedom. I got to spend all the time I wanted building catenary arches, playing with the magnets and iron filings, clapping into that big echo tunnel, or confusing my mind with perspective tricks. Most of the science museums I’ve been to have been crowded, which is a great thing, but which can be overwhelming. Maybe going on a weekday will help. Winter, perhaps? We’ll see.

I felt today’s trip was worth the time, money, and opportunity cost. It might have been even better if we slowed down, got deeper into a few exhibits, and maybe tried more of the timed shows. I tend to like mechanical exhibits more than exhibits that focus on screen display or video.

And yes, I still want to spend at least a week in the Smithsonian. ;)

Notes on transcription with and without a foot pedal

Posted: - Modified: | analysis, decision, kaizen, review

I finally sat down and transcribed the interview on discovering yourself through blogging, where Holly Tse puts up with my firehose braindump of things I’ve learned. It’s an hour of audio, more than 53,500 letters, and about 9,500 actual words. The words per minute measurement uses a standard of five characters per “word”. This means I clocked in at more than 180 wpm.

I like reading much more than I like listening, and a transcript makes it much easier for me to search and review what I said. After considering the options, I ended up transcribing the interview myself. I even built my own foot pedal. ;) So, here’s what I’ve learned.

I started off by trying to use ExpressScribe and Dragon NaturallySpeaking for automatic transcription. It looks like I’ll need to do a lot of training to get this ready for transcription. The fully-automated transcript was useless. I tried slowing down the recording down and speaking it into Dragon NaturallySpeaking (somewhat like simultaneous translation?). This was marginally better, but still required a lot of editing.

I gave up on dictation (temporarily) and typed the text into Emacs, using keyboard shortcuts to control rewind/stop/play in ExpressScribe.

Type Typing without a foot pedal, 50% speed
Length 15 audio minutes
Duration 60 minutes of work
Factor audio minutes x 4
Characters 14137 (~ 2800 words @ 5 characters/word)
Typing WPM ~50wpm (90 wpm input, 56% efficiency)

I took a second look at the outsourced transcription options. CastingWords had raised prices since I last checked it. Now there wasn’t much of a gap between CastingWords and TranscriptDivas, another transcription company I’d considered. With TranscriptDivas, transcribing an hour of audio would have cost around CAD 83 + tax, but I’d get it in three days.

Type Transcription company
Cost CAD 83 + tax = ~CAD 95 / audio hour

Before I signed up for the service, though, I thought I’d give transcription another try – particularly as I was curious about my DIY foot pedal.

I told myself I’d do another 15 audio minutes so that I could see what it’s like to transcribe with my foot pedal. I ended up doing the whole thing. I used ExpressScribe to play back the audio at 50% speed, and I set the following global shortcuts for my foot pedal: center-press was rewind, left was stop, and right was play. I ended up using rewind more than anything else, so it worked out wonderfully.

Type Typing with DIY foot pedal, 50% speed
Length 45 audio minutes
Duration 120 minutes of work
Factor audio minutes x 2.6
Characters 39400 (~ 7880 words)
Typing WPM ~65wpm (90 wpm input, 72% efficiency)

Discovery: Listening to myself at 50% makes it unfamiliar enough to not make me twitchy, although it can’t do anything about me being sing-song and too “like, really“. That might be improved through practice.

90wpm input was pretty okay. Faster, and I found myself pressing rewind more often so that I could re-hear speech while catching up.

Assuming sending it out to a transcription company would have cost CAD 95/audio hour and transcribing the entire thing myself would have taken 3 hours (including breaks), doing it myself results in a decent CAD 30/work hour of after-tax savings. Not bad, even though doing it myself meant I procrastinated it for two weeks. It might be cheaper if I hire a transcriptionist through oDesk or similar services. With a infrequent transcription needs, though, I’d probably spend more than two hours on screening, hiring, and delegating.

Hacking together an Arduino foot pedal was definitely a win. Transcribing with it was okay, but not my favourite activity. I might send work to a transcription company if there’s enough value in a shorter turnaround, because it took me two weeks to get around to doing this one. Good to know!

2011-08-31 Wed 21:45

Thinking about outsourcing transcription or doing it myself

| analysis, decision, kaizen, speaking

I like reading much more than I like listening to someone talk, and much, much more than listening to myself talk. Text can be quickly read and shared. Audio isn’t very searchable. Besides, I still need to work on breathing between sentences and avoiding the temptation to let a sentence run on and on because another cool idea has occurred to me. Perhaps that’s what I’d focus on next, if I ever resume Toastmasters; my prepared speeches can be nice and tight, but my ad-libbed ones wander. More pausing needed.

So. Transcription. I could do it myself. I type quickly. Unfortunately, I speak quite a bit faster than I type, so I usually need to slow it down to 50% and rewind occasionally. ExpressScribe keyboard shortcuts are handy. I’ve remapped rewind to Ctrl-H so that I don’t need to take my fingers off the home row. But there’s still the there’s the argh factor of listening to myself. This is useful for reminding me to breathe, yes, but it only takes five minutes for me to get that point. ;) The other night, it took me an hour to get through fifteen minutes, which is slower than I expected. An hour-long podcast interview should take about four hours of work, then.

I could use transcription as an excuse to train Dragon NaturallySpeaking 11, the dictation software I’d bought but for this very purpose but haven’t used as much as I thought I would. It recognizes many words, but I have a lot of training to do before I get it up to speed, and I still need to edit. This would be a time investment for uncertain rewards. I still need to time how long it takes me to dictate and edit a segment.

Foot pedals would be neat, particularly if I could reprogram them for other convenient shortcuts. Three-button pedals cost from $50-$130, not including shipping. In addition to using it to stop, play, and rewind recordings, I’d love to use it for scrolling webpages or pressing modifier keys. I often work with two laptops, so it’s tempting. (And then there’s the idea of learning how to build my own human interface device using the Arduino… ) – UPDATE: I’ve built one using the Arduino! I can’t wait to try it out.

In terms of trading money for time, I’ve been thinking about trying Casting Words, which is an Amazon Mechanical Turk-based business that slices up submitted files into short chunks. Freelancers work on transcribing these chunks, which are then reassembled and edited. The budget option costs USD 0.75 per audio minute, which means an hour-long interview will cost about USD 45 to transcribe. That option doesn’t have a guaranteed turnaround, though, so I could be waiting for weeks. In addition, I tend to talk quickly, so that might trigger a “Difficult Audio” surcharge of another USD 0.75 per minute, or about USD 90 per audio hour.

For better quality at a higher price, I could work with other transcription companies. For example, Transcript Divas will transcribe audio for CAD 1.39/minute, and they guarantee a 3-day turnaround (total for 1 hour: CAD 83.40). Production Transcripts charges USD 2.05/minute for phone interviews.

I could hire a contractor through oDesk or similar services. One of the benefits of hiring someone is that he or she can become familiar with my voice and way of speaking. Pricing is based on effort instead of a flat rate per audio minute, and it can vary quite a bit. One of my virtual assistants took 14 hours to transcribe three recordings that came to 162 minutes total. At $5.56 per work hour, that came to $0.48 per audio minute, or $28 per audio hour. oDesk contractors are usually okay with an as-needed basis, which is good because I’ve scaled down my talks a lot. (I enjoy writing more!)

So here are the options:

  • Type it myself: 4 hours of discretionary time
  • Dictation: Unknown hours of discretionary time, possible training improvements for Dragon NaturallySpeaking
  • Foot pedals: Probably down to 3.5 hours / audio hour, but requires a little money; hackability
  • Casting Words: USD 90 per audio hour, unknown timeframe
  • Transcript Divas: CAD 84 per audio hour, 3-day turnaround
  • Contractor: Can be around USD 30 per audio hour, depending on contractor

I’m going to go with dictating into Dragon NaturallySpeaking because I need to train it before I can get a sense of how good it is. It takes advantage of something I already own and am underusing. Who knows, if I can get the hang of this, I might use it to control more functionality. We’ll see!