kouk

a sense of self

Category: tech

sed memo: output boilerplate around each line in a file

Given three files xx0{0,1,2} the following sed script will first output the contents of file xx00 and then for each line in the input it will output xx01, the line and xx02. The substition on line 5 is optional and can be expanded to include more commands or removed completely.

1 r xx00
1 !r xx02
$ !r xx01
x
1 d
s/foo/baz/

The filenames are the default output filenames of the csplit command which could be helpful in related scenarios to the above.

Also it would be easy to add a footer to the above script, but I did not need it personally at this time.

how to insert any unicode character over VNC

Been using GRNET’s ViMA service a lot lately and sometimes it happens while using a vm’s console via VNC that I need to input a unicode character in a file. The VNC viewer applet that is provided, as well as any vncviewer I’ve tried can’t seem to manage inputing these characters directly via the keyboard. Here’s how I do it:

  1. Install Vim on the vm with apt-get, yum or whatever

  2. Localy find the unicode codepoint of the desired character:

    echo "Ψ" | iconv -f utf-8 -t iso8859-1 --unicode-subst="<U+%04X>"

    In this case it prints <U+03A8>

  3. Open the file you want to input into on the vm with vim.
  4. In input mode, type Control-V, u, and the four hexadecimal digits (i.e. 0,3,A and 8)
  5. Viola!

a way to track office-suite documents with VCS?

Nice to hear Sofia‘s happy with dropbox as a solution for finding the latest version of her dissertation. I was thinking about what she could do if she had to track older or alternative versions of her dissertation, perhaps even offline, instead of only the latest version in Dropbox. Of course I thought of Mercurial which I use for my SCM needs. The problem with Mercurial is that it is good at working with text files, not binary files like those of popular office suites. So I searched a bit and found a possible solution (although I haven’t implemented and tested it yet):

David Heffelfinger posted about OpenOffice.org Document Version Control With Mercurial in which he writes about using the flat version of the OpenOffice.org ODT format. According to him this solution is not satisfactory because even in the flat format, a single letter change will change all sorts of metadata in other parts of the file. Due to this it’s hard to distinguish between the important changes between two versions and the inconsequential. Instead of going this way he is currently using the oodiff hack from the Mercurial website.

Obviously a hack would not be an acceptable solution for Sofia to adopt, except if she desperately needed the ability to track changes and could live with using OpenOffice instead of Word (or whatever she’s using). But a comment on David’s post refers to an interesting tool called Beyond Compare which seems to be able to generate differences both for Microsoft Office XML files and OpenOffice ODT files! They claim integration with popular DVCS‘s. I wonder how easy it would be to integrate Beyond Compare into say TortoiseHG. Has anyone done this? Maybe I will try it sometime soon.

Any other suggestions for tracking office suite documents with a DVCS?

akadimias vs solonos vs google maps

Ok, I’ve heard about map producers incorporating mistakes in their data to identify copies, but this goes too far:


View Larger Map

What? Can’t spot the mistake? And you call yourself an athenian? :-)

Anyway, according to Google Maps (or more accurately according to Tele Atlas) in the center of Athens there are Akadimias street.

Actually the north-eastern street that, according to Google, is named “Akadimias” is called “Solonos” and is also very important to know if you live in Athens. Searching for Solonos athens brings up plenty of places on that street that correctly report their address as Solonos str. But you can’t find the street itself. Here’s how it should be:

Google Map correction of akadimias street

Google Map correction of akadimias street

using forward slash in unix filenames

Ok, I was automating the creation of some ogg files based on title’s appearing in a text file. Doing it myself, with mplayer and oggenc, I hadn’t taken into account many of the caveats that preexisting ripping/encoding tools take care of, like the problem of forward slashes in filenames stored on a unix filesystem. But the good thing with doing it yourself is that you don’t have to automatically adopt the presuppositions that appear in preexisting tools.

One such presupposition is that you have to replace special characters in filenames, like quotes or forward slashes, so that they can be accessible to shell users. Well I have to say that shells and file managers are pretty advanced these days and you don’t really have to replace anything anymore. The only exception is forward slashes, which absolutely cannot appear in a filename on a unix system. Most tools replace the slash with an underscore character. I find that kind of lame.

In Unicode, forward slash is the character ‘SOLIDUS’ (U+002F), although according to wikipedia, calling the http://en.wikipedia.org/wiki/Slash_(punctuation) ‘solidus’ “contradicts long-established English typesetting terminology”.

So, since replacement of the slash is required in filenames, we can keep the visual nature of the forward slash by using the typographic solidus character as a replacement, which in unicode is called ‘SOLIDUS’ (U+002F). Here is the difference between the two (in the typeface your browser is using):

Slash (U+002f) Solidus (U+2044)
/

In VIM you can input the solidus by typing Ctrl+V, u, 2044 in command mode.

Mass/Bulk Follow with py-twitter

While trying to maintain the Greek Liberals twitter account, specifically trying to follow a bunch of people that I also follow, I didn’t manage to find a satisfactory solution to automatically follow a specific list of screen names.

Zac Bowling‘s bulk following tool wasn’t working for me (I get an authentication error). The only other promising solution is Flashtweet but they insist on loading the list of new people to follow from another twitter account’s follow list. At first that might not seem like a problem, since I could simply load my personal twitter account friends in Flashtweet and check the people I wanted to follow. But I already had a hand-picked list of people that I wanted the GreekLiberals twitter account to follow. I created this list by going through the CSV file that Tweeple exports where I can browse all the details of a user instead of just the twitter name like in Flashtweet’s interface. I don’t want to start mindlessly following people who for starters might not even speak greek or be otherwise interested in dialogue with the Greek Liberals.

So I decided to write a py-twitter script to do the importing. Here it is:

import twitter
import simplejson
from urllib2 import HTTPError, URLError

class MassFollower(twitter.Api):
   ''' Follow a list of people '''

   def AllFriends(self):
      friends = []
      while (not len(friends) % 100) :
         # see http://code.google.com/p/python-twitter/issues/detail?id=20
         friends.extend(self.GetFriends(page=((len(friends)/100)+1)))
      return friends

   def MassFollow(self,names):
      for name in set(names)-set([f.screen_name for f in self.AllFriends()]):
         if not self.GetUser(name).protected:
            try:
               self.CreateFriendship(name)
            except HTTPError, e:
               print "got error with code",e.code
               # see http://code.google.com/p/python-twitter/issues/detail?id=33 
               if e.code == 401 or e.code == 403:
                  data=simplejson.loads(e.read())
                  print "got data", data
                  if (hasattr(data,'error')):
                     print "can't follow ", name, " because ", err.error
                     continue
               raise

if __name__ == "__main__":
   import sys
   import getpass
   u=sys.argv[1]
   file=open(sys.argv[2],'r')
   p=getpass.getpass("Twitter password: ")
   try:
      api = MassFollower(username=u,password=p)
      api.MassFollow([ line.strip() for line in file ])
   except IOError, e:
      if hasattr(e, 'reason'):
         print "Can't contact twitter:", e.reason
      elif hasattr(e, 'code'):
         if e.code==400:
            print "Probably struck rate limit, try again later."
         else:
            print 'The server couldn\'t fulfill the request: code',e.code

Kudos to Niklas Saers whose own py-twitter utility, lasttweet.py, tipped me off to the page parameter argument to GetFriends method. At the time of this writing, an observant fellow will notice that, according to the aforelinked docs, there is no parameter named page. This is why this script is currently dependent on a version of py-twitter greater than 0.5, which is currently the stable version. So basically you need to get a svn checkout and install it manually on your system.

Apart from being my first use of the twitter API it is also the first semi-usefull thing I’ve written in python, so I would be gratefull for criticism or comments regarding the code above.

Ada Lovelace Day

Ada Lovelace

Ada Lovelace

In order to celebrate Ada Lovelace Day, after having signed the pledge started by Suw Charman-Anderson, I decided to dedicate a post to the Geek Girls. Geek Girls, lovely and interesting. Geek Girls like the kind I studied with at university. Geek Girls like the kind that will attend the 8th GGD. Here’s to you!

The green cloud

I subscribe to Bill St. Arnaud‘s CA*net 4 News mailing list (he’s got a blog too). Bill works at CANARIE Inc. which operates CA*net, the world’s first national optical Internet research and education network (est. 1998 ) in Canada. At first I subscribed to learn more about different models of fiber ownership, specifically home-owned fiber, although now there seems to be a shift in his interested more towards SOA, cloud computing and zero-carbon IT services. Which is also interesting of course. From a look at his blogger profile he has a host of other interests, like in e-democracy.

His usual thesis is that IT can go zero-carbon by locating virtualization and cloud commodity services at places like this. Impending regulations like cap and trade are thought to provide economic incentives for this (there have been many criticisms of these systems btw). Today Bill sent out an article that appeared on Slashdot as well that refers to a UC Berkeley research paper paper called “Above the Clouds: A Berkeley View of Cloud Computing” (pdf) which corroborates his thesis on economical terms. The paper’s main task is listing the main obstacles to cloud computing success, which, according to the researchers, are:

  • Availability of service
  • Data lock-in
  • Data confidentiality and auditability
  • Data transfer bottlenecks
  • Performance unpredictability
  • Scalable storage
  • Bugs in large distributed systems (author’s note: anyone working with grids knows how true this is)
  • Scaling quickly
  • Reputation fate sharing
  • Software licensing

As Bill says in his posting, reliability is a problem for zero or low carbon emission cloud services, since cloud computing must be geographical distributed to be reliable, but you can’t find low-emission-and-cost energy everywhere.

Here are some some figures about energy production cost that Bill cites from the paper:

Price of kilowatt-hours of electricity by region
Price per KWH Source Possible reasons why
3.6¢ Idaho Hydroelectric power not sent long distance
10.0¢ California Electricity transmitted long distance over the grid;
limited transmission lines in Bay Area; no coal
fired electricity allowed in California.
18.0¢ Hawaii Must ship fuel to generate electricity
Follow

Get every new post delivered to your Inbox.

Join 1,453 other followers