Double Major

I’m back in school, as you’ve probably already gathered from my microblogging.  I’m finishing up a double major in Computer Science and Equity Studies at the University of Toronto, and if all goes according to plan I’ll be graduating in May 2011.

While this may sound like a strange combination, it makes perfect sense to me – I’m interested in equity issues within the STEM fields, especially computer science.

It turns out the combination of fields come in handy in unexpected ways some times.  After proofreading a paper I wrote for a Women and Gender Studies class for me my friend Valerie suggested that some quantitative data might be useful in supporting one of my assertions.  In my paper I argued that while early feminist scholarship on sexual harassment failed at intersectionality, more recent scholarship has embraced it.  To support this, I wanted to compare the number of citations for Catherine MacKinnon’s Sexual harassment of working women: a case of sex discrimination to Kimberle Crenshaw’s Demarginalizing the Intersection of Race and Sex: A Black Feminist Critique of Feminist Theory and Antiracist Politics.  These are both profoundly influential works, but I wanted to quantify how their relative influence on scholarly work.

So I did what any self-respecting CS student would do – I wrote a script to scrape Google Scholar for citation numbers over time and made a graph comparing the two 🙂

For your edification, here’s scholargraph.pl:

# (c) 2010 Leigh Honeywell
# Licensed under the Simplified BSD License, reuse as you will!

use strict;
use LWP::Simple;
use LWP;

# set up LWP user agent and cookies; pretend to be Firefox 4 just to be cheeky
my $lua = LWP::UserAgent->new(
    keep_alive => 1,
    timeout    => 180,
    agent =>
"Mozilla/5.0 (Windows NT 6.1; rv:2.0b7pre) Gecko/20100921 Firefox/4.0b7pre"
);

# edit in your citation numbers from google scholar and the appropriate
# date ranges for what you're trying to do
my $crenshaw = getCites( "10759548619514288444", "1977", "2010" );
my $mackinnon = getCites( "2195253368518808933", "1977", "2010" );

sub getCites {
   (my $cite, my $startyear, my $endyear) = @_;

    for my $year ($startyear .. $endyear) {

        #construct the query URL using the above data
        my $post =
          $lua->get( "http://scholar.google.com/scholar?cites="
              . $cite
              . "&as_ylo="
              . $year
              . "&as_yhi="
              . $year );

        # scrape the returned page for the number of results
        if ( $post->content =~ m#of (?:about )?(d*)</b># ) {
            print $cite. "," . $year . "," . $1 . "n";
        }
        elsif ( $post->content =~ m#did not match any articles# ) {
            print $cite. "," . $year . ",no resultsn";
        }
        else {
            # some kinda error happened, most likely google caught me!
            print $cite. "," . $year . "errorn";
        }
    # don't kill google's servers
    sleep(5);
    }
return 0;
}

Oh and if you’re curious, Crenshaw’s paper was cited far more than MacKinnon’s, pretty much as soon as it was published. Intersectionality FTW!

And as these things always go, of course I spend the evening working on this only to find that there’s a Perl module as well.

Vulnerability Disclosure for Open Source projects

These are the notes and some links for a brief talk I gave a few weeks ago to my classmates in the summer CS project class I’m taking at U of T.  We’re working on the Basie and Markus projects.  Both are web apps; Basie is a software project management app built on Django, and Markus is a CS-specific marking / grading app built on Rails.

The debate over full disclosure goes back hundreds of years in the locksmithing world.  Locksmiths were historically very secretive about weaknesses in their products; interestingly, they still are – here‘s an interesting note on the subject from a few years ago.

There’s nuance and detail to the recent history of disclosure practices which Wikipedia does a good treatment of, but it’s fair to say that today there are three broad categories of practices:

  • silent patching (no disclosure) – this is a bad idea for fairly obvious reasons, except (some argue) in edge cases like the Linux kernel (the “every kernel bug is a security bug” argument) (one discussion of this, another)
  • partial disclosure, where one issues the patch before explaining full details of the vulnerability
  • full disclosure, where vulnerability details (and sometimes exploit code) are released at the same time as the patch is issued

Aside from how much is being disclosed, there’s the question of  responsible disclosure on the part of security researchers, which is in a nutshell the idea of giving software vendors a set amount of time to respond to security issues before going public with them.

How to Screw Up Disclosure

  • don’t give credit in your vulnerability advisories
  • don’t even bother publishing advisories (silent patching)
  • be unresponsive
  • demand excessive, unreasonable timeframes for patching (this is of course subjective)
  • make people sign NDAs (!)
  • threaten to sue people

The last two aren’t generally screwups committed by Open Source projects, of course 🙂
How to do it right – best practices

  • have a clear security contact on your site, no more than a click away from the homepage, and easily googlable with the string “$projectname security”
  • have a gpg key posted, with a good web of trust, for that contact
  • have email to that contact go to an email list with a clear process for dealing with it so that you don’t drop the ball, or have it filed into the bugtracker automagically (in a private bug!!11)
  • have an announce-only security mailing list for your users, and post issues to it ASAP when they come out!  An RSS feed works too.  Do both!
  • ensure that someone in your project monitors lists such as full-disclosure and bugtraq for issues in both your project, upstream frameworks, and your infrastructure.  For just monitoring your project, a Google Alert works well too. “project name + bug or vulnerability or security”.  People sometimes announce vulns without disclosing at all; you want to catch these.
  • if the project ends up getting abandoned at some point in the future, at the very least post a warning that it’s deprecated and unmaintained even for security issues, and possibly take down the code.

Specific Issues for web apps

  • you may have a widely deployed base of users.  An auto-update system such as WordPress’s is awesome for getting them to $%^$&&* patch!
  • the framework you’re building on may have (security) bugs too.
  • your code may be customized by users, which makes them lazy about patching – a good plugin architecture can help mitigate this.

CSC491 – Second Milestone

Not quite as far along as I want to be, but definitely getting there.  Refreshed my rpm and general sysadminning memories in the process.  Still a lot to get done to have anything interesting…

A bit of background is in order to understand what I’ve been up to.  I’ve been working this week on getting the hang of working with the Planet-Lab infrastructure, and can mostly find my way around it manually now.  I haven’t figured out how to automate the interactions with it in the way that will be needed for this project, but it’s a start.

Planet-Lab is a network of computers around the world which researchers can obtain access to (eventually).  As a user, one gets a “slice”, which as far as I can tell is just a project-specific username.  The user can assign virtual machines on the “nodes”, which are the actual machines.  Users have limited root access on the nodes, and can install software, set up cron jobs (scheduled tasks), and run scripts.

So where has this gotten me? Well, read on….

Continue reading “CSC491 – Second Milestone”

CSC491 Capstone Design Class Notes and Status

For CSC491, the Capstone Design Project class I’m taking at the University of Toronto, I’m working with a project called InfoTrace.  The Citizen Lab, who run the project, are interested in global network reachability, particularly under adverse conditions such as DDoS attacks, BGP prefix hijacking, movement of server resources, etc.

Here’s what I’ve accomplished so far:

  • Tracked down U of T’s Principal Investigators for the Planet-Lab network and asked for access for the project
  • Set up this blog
  • Set up a GitHub account
  • Found some similar research
  • Read up on BGP
  • Explored several tools for doing traceroutes and related network tracing: hping3, nmap’s –traceroute, 0trace, and scapy.

A few links promised to my classmates, which are interesting on their own:

Miles Thibault is working on a business plan for a “Wikimovies” web site.  I think he’d get a lot out of some Long Tail reading:  Chris Anderson’s original article, and Kevin Kelly’s riff on it titled “1,000 True Fans“.

Denis Pankratov and Jennifer Ruttan are working on a really nifty-looking project to do accurate indoor localization with CDMA (that “other” cell phone protocol), and (blah) Ian Goldber’s paper on “Three Protocols for Location Privacy” from last year’s Privacy Enhancing Technologies symposium.

My goals, which were originally for the next two weeks but have been pushed back only one as I’ve fallen a bit behind on the “getting stuff up and running” side of things are:

  • Coming up with a database schema for storing connectivity information.
  • Getting a basic web interface up and running in django.

I’m working on these first rather than the network underpinnings as we don’t yet have access to the Planet-Lab infrastructure, so the constraints there aren’t entirely clear.  The front-end stuff will likely run on a server at Citizen Lab, so I can get that up and running right away.

-Leigh