Posts tagged with “programming”
File reading performance in Python
There are a few ways to read a file in Python, some of which are outlined in this page about their relative performance. I am working on a project right now that involves reading large amounts of data from text files, so I repeated the analysis on Python 2.6.6, the version currently shipping with Ubuntu 10.10. I ran three implementations (below) against a file with 1 million lines.
My test script is available here, and the functions I tested are below. Here were my results:
| Script | Time (sec) | Lines read per sec |
|---|---|---|
| fileread1: | 0.1695 | 5,899,280 lines/sec |
| fileread2: | 1.6387 | 610,236 lines/sec |
| fileread3: | 0.1278 | 7,823,156 lines/sec |
def fileread1():
file = open("test.txt")
while 1:
line = file.readlines()
if not line:
break
pass
file.close()
def fileread2():
for l in fileinput.input("test.txt"):
pass
def fileread3():
file = open("test.txt")
for l in file:
pass
On making a minified Click Modular Router driver for OpenWRT
My group is using the Click Modular Router for a project we’re doing. We’ve written several custom elements for our configuration, and we’re attempting to run it on space-constrained devices, Ubiquiti NanoStation M5’s that have only 4MB of ROM. Thus, we need to build a minified version of Click that includes only the elements we actually use. You can do this by specifying a series of elements to to the click-mkminidriver in the form “-E
cat *.click | tr ' ' '\n' | tr '(' '\n' | egrep "^[A-Z]" | grep "[a-z]$" | sort | uniq | sed "s/^/ -E /g "
Note that this assumes that all your Click elements begin with an upper-case letter. Fortunately, it’s simple to remove false positives.
Instant Runoff Voting: A bash implementation
I love shell scripts. If I have something simple to write that a normal person would try to do in a scripting language like Perl or Python, I try to do it with bash. And, as much as possible, I try to avoid using fancy stuff like variables, sed, or awk.
I didn’t succeed in the latter goal this time, but I did succeed in implementing Instant Runoff Voting as a bash script. It takes as input a file called irv.txt, which lists each ballot (ranked set of choices) on a separate line as a comma-delimited ordered list.
Here’s a sample input file, and here’s the script itself. It could be much simpler and probably cleaner, but I tried to make it clear what was going on at each step (and demonstrate the power of piped commands).
Using the Facebook API to retrieve mutual friends
A few nights ago I had a (bad) idea for a tool that leveraged the Facebook API. I’ll spare you the details, but my tool needed to retrieve the list of mutual friends for each of the logged in user’s friends. This proved to be a bit trickier to figure out than I had hoped for as a developer new to the Facebook API, so here’s a quick little PHP script that shows how I went about solving this problem.
I wound up using the REST API’s friends.getMutualFriends query. This code sample uses the new Graph API to retrieve a list of your friends, then displays the profile picture of any who have more than ten mutual friends. Note this would take a while to run on people with a large number of friends.
$facebook = new Facebook(array(
'appId' => '<your app id>',
'secret' => '<your secret id>',
'cookie' => true,
));
$session = $facebook->getSession();
$my_friends = $facebook->api('/me/friends'); // Graph API call, retrieves own friendlist
foreach ($my_friends['data'] as $person) {
$friend_uid = $person['id'];
// Old REST API call. Gets the mutual friends (source must be logged in user).
$param = array('method' => 'friends.getMutualFriends',
'source_uid' => $me['id'],
'target_uid' => $friend_uid,
'callback' => '' );
$res = $facebook->api($param);
if(count($res) > 10) {
echo "<a href=\"http://www.facebook.com/profile.php?id=".$friend_uid."\">"
."<img src=\"https://graph.facebook.com/".$friend_uid."/picture\"></a>: "
.count($res)." friends in common<br>";
}
}
Unfortunately, this does not provide a way to retrieve the full friend list of an arbitrary friend of a logged in friend. As far as I can tell, this is not possible using any of the Facebook API’s. If you know of a way, certainly leave a note in the comments!
One-liner to extract a list of link addresses from an HTML file
I’m moving my research group’s website to a new server and making some updates at the same time. One of the main things I need to do is make sure links are going to work after the transition. Here is a little one-line shell “script” (if you can call it that) that will extract link addresses from an HTML web page:
wget -q -O - http://www.google.com | tr " " "\n" | grep "href" | cut -f2 -d"\""
wget fetches the file and outputs its content to stdout. tr replaces all spaces with newlines, grep filters out every line that doesn’t contain an “href”, and finally cut displays everything between the first pair of double-quotes.
If you want to use a file you have on your local machine, you can use this variant instead:
tr " " "\n" < [file_name.html]| grep "href" | cut -f2 -d"\""
Obligatory disclaimer: HTML is NOT a regular language and in general cannot be parsed with regex’s as is done here. This is not guaranteed to work.