Code

Rants, musings and stuff I have to say regarding programming code.

Name generator

10 Feb. 2010

Here is a Python snippet that will fetch a Census name file, parse it and generate a list of fake names that look realistic. I needed 2500 names to test a system I'm building and this was handy so I'm sharing.

#!/usr/bin/env python

import urllib
import re

print 'Fetching names from the Census...'
names_list_url = 'http://www.census.gov/genealogy/names/dist.all.last'
names_data = urllib.urlopen(names_list_url).readlines()

print 'Processing Census data'
previous_name = 'John'
for line in names_data:
    m = re.findall('(\w+).*', line)
    if len(m) == 1:
        name = m[0]
        print '%s, %s' % (name.capitalize(), previous_name.capitalize())
        previous_name = name

[ Permanent link ]


Programming fonts

17 Jan. 2010

Top 10 Programming Fonts

I came across an interesting blog post discussing different programming fonts. My current one, Andale Mono, comes in at #9. I'm impressed enough by any of them to want to switch from Andale Mono, but then I saw Inconsolata at #1.

Dang. That font is pretty sweet.

I found these instructions useful for installing it. I used the ttf which I have just added to my customize git repo.

Edit: I am sticking with Andale Mono inside Emacs but really like Inconsolata as my fixed width browser font.

[ Permanent link ]


Diffing in Emacs

08 Jan. 2010

Diffing in Emacs can be surprisingly sweet. Here is my screen.

I am using diff-mode- inside Carbon Emacs along with the modifications shown below to use my colors.

10c10
< ;; Last-Updated: Sat Aug  1 15:15:22 2009 (-0700)
---
> ;; Last-Updated: Sat Dec 27 10:19:33 2008 (-0800)
12c12
< ;;     Update #: 647
---
> ;;     Update #: 646
15c15
< ;; Compatibility: GNU Emacs: 21.x, 22.x, 23.x
---
> ;; Compatibility: GNU Emacs 21.x, GNU Emacs 22.x
116c116
<   '((t (:foreground "Blue" :background "DarkSeaGreen1")))
---
>   '((t (:foreground "Blue" :background "Green")))
135c135
<   '((t (:foreground "PaleGoldenrod" :background "DarkGreen")))
---
>   '((t (:foreground "PaleGoldenrod" :background "Green")))
141c141
<   '((t (:foreground "PaleGoldenrod" :background "DarkMagenta")))
---
>   '((t (:foreground "PaleGoldenrod" :background "DarkRed")))
148c148
<  '(diff-added ((t (:foreground "DarkGreen"))) 'now)
---
>  '(diff-added ((t (:foreground "Green"))) 'now)
150,151c150,151
<  '(diff-context ((t (:foreground "Black"))) 'now)
<  '(diff-file-header ((t (:foreground "Red" :background "White"))) 'now)
---
>  '(diff-context ((t (:foreground "grey50"))) 'now)
>  '(diff-file-header ((t (:foreground "Red" :background "gray15"))) 'now)
153,154c153,154
<  '(diff-header ((t (:foreground "Red"))) 'now)
<  '(diff-hunk-header ((t (:foreground "White" :background "Salmon"))) 'now)
---
>  '(diff-header ((t (:foreground "Red" :background "gray15"))) 'now)
>  '(diff-hunk-header ((t (:foreground "White" :background "gray15"))) 'now)
157c157,159
<  '(diff-removed ((t (:foreground "DarkMagenta"))) 'now)
---
>  '(diff-removed ((t (:foreground "DarkRed"))) 'now)
>  '(diff-indicator-added-face ((t (:foreground "Green"))) 'now)
>  '(diff-indicator-removed-face ((t (:foreground "Red"))) 'now)
168c170
<     ("^\\*\\*\\* .+ \\*\\*\\*\\*". diff-file1-hunk-header-face) ;context
---
> 	("^\\*\\*\\* .+ \\*\\*\\*\\*". diff-file1-hunk-header-face) ;context
171c173
<     ("^---$" . diff-hunk-header-face)   ;normal
---
> 	("^---$" . diff-hunk-header-face)   ;normal
183c185,186
<     ("^[^-=+*!<>#].*\n" (0 diff-context-face))))
---
>     ("^[^-=+*!<>#].*\n" (0 diff-context-face))
> 	))

[ Permanent link ]


Pyango View

17 Dec. 2009

I have released a new tool called Pyango View

It's story is here, on the wikitrans blog.

[ Permanent link ]


pyfacebook

14 Nov. 2009

API implementations can be frustrating


The pyfacebook module by Samuel Cormier-Iijima is awesome. I've worked with multiple web API's now and found that their logic is rarely too hard to understand, but if you want some error checking before you query the remote source you need to use real functions.

I have come across many implementations where you're given a tool that can communicate to a URL and pass some arguments, one of which is a function name, and from there the URL, representing the remote function is called and the arguments are sent as POST data. But it's up to the programmer to know all the functions available because they're not in these API implementations. Amazon's current Python implementation of an API for handling mechanical turk is exactly one of these incomplete implementations. The programmer must also be sure they're not attempting to call functions that don't exist. They won't know this, however, until the server tells them so. You usually end up with code that looks like below when you're working with one of these abstract implementations.

# query server for hit count
query_mt('get_hit_count', server_key, secret_key)

The issue, simply, is that the programmer did not have time to do a proper implementation and instead gave a barebones framework. If they had time, they'd have done the right thing. Right?!

Pyfacebook is different


Pyfacebook solves this in a superb way. They use an IDL to describe the interface and then dynamically generate some objects encapsulating the behavior found in the language. The language is then expanded upon to have access to functions that handle the data transport and response processing. This technique is often found in RPC to leverage it's ability to open up system communication to an abstract and easily adapted interface. Basically, the systems agree to some function names having certain types of arguments and the rest is just communicating the values of the query.

If you check the pyfacebook code itself, you'll see the IDL start with the declaration of METHODS (line 116 at the time of this writing). We see a dictionary of dictionary's. The outermost dictionary's represent a namespace, like photos-related functions or admin-related functions. The next layer of dictionary's, inside a namespace, represent the functions offered by that namespace and inside each of those is a list of tuple's representing the argument list.

The tuple's consist of the argument name, the type, and any flags for describing the variable. It's common to see ('pid', int, []) or ('page_ids','list',['optional']). This entire list is iterated upon in __generate_proxies() where the transformation from IDL to actual functions takes place. This function reads the language definition and generates Python code from the list contents, calls eval() on the code and instantiates the objects from the generated code. Dynamic languages for the win!

The actual generated code looks something like what's below. Notice that every function essentially returns their name and a dictionary or arguments that have been processed.

def addLike(self, uid=None, post_id=None):
    """
    Facebook API call. See http://developers.facebook.com/documentation.php?v=1.0&method=stream.addLike
    """
    args = {}
    if uid is not None: args['uid'] = uid
    if post_id is not None: args['post_id'] = post_id
    return self('addLike', args)

def removeLike(self, uid=None, post_id=None):
    """
    Facebook API call. See http://developers.facebook.com/documentation.php?v=1.0&method=stream.removeLike
    """
    args = {}
    if uid is not None: args['uid'] = uid
    if post_id is not None: args['post_id'] = post_id
    return self('removeLike', args)

def remove(self, post_id, uid=None):
    """
    Facebook API call. See http://developers.facebook.com/documentation.php?v=1.0&method=stream.remove
    """
    args = {}
    args['post_id'] = post_id
    if uid is not None: args['uid'] = uid
    return self('remove', args)

def addComment(self, post_id, comment, uid=None):
    """
    Facebook API call. See http://developers.facebook.com/documentation.php?v=1.0&method=stream.addComment
    """
    args = {}
    args['post_id'] = post_id
    args['comment'] = comment
    if uid is not None: args['uid'] = uid
    return self('addComment', args)

Python offers programmers the chance to implement a function (__call__()) which gets called when the object itself is called as a function. If you look at the definition of the Proxy class you see an __init__() function and the __call__() function. For those who don't know Python, __init__ is the closest thing you have to a constructor. The proxy is instantiated such that it points to a _client object and this object is an instance of the Facebook class. __call__() calls _clent as a function with some arguments, one is the remote method we're calling. The Facebook class is defined lower than the IDL and lower than the proxy objects. It consists mainly of some communication and session oriented functions. When the Facebook class is called as a function (first the proxy, then the facebook class) we see Facebook's __call__() turn the arguments into post variables and query the API's URL for an answer. The facebook URL itself is http://api.facebook.com/restserver.php and the method we're calling is one of the POST arguments ('method', specifically).

The IDL functions default to basically being a list of arguments with certain types mapped to certain functions, but sometimes more code is needed than simply sending some data to a URL. Pyfacebook stores session data that comes back from some authentication based calls, as is done in AuthProxy. Another example is how PhotosProxy has an upload function defined which handles encoding the binary data for transmission. The proxy objects are first generated dynamically according to the IDL's first layer of dictionary keys (eg. 'photos'). Then the proxy object is created with the name PhotosProxy and added to the global namespace. If there are overrides you see a redeclaration of the object and it inherits from (itself?!) an object of the same name. I really dig this logic as it shows how you can use an IDL to make assumptions about typical behavior and then override the atypical cases with ease.

The Facebook class itself is largely of just an auth implementation which includes session handling for the user.

Neat stuff for sure!

[ Permanent link ]


Pyjamas

28 Oct. 2009

My friend Malcolm brought Pyjamas to my attention today. Pyjamas is a Python based implementation of Google's Web Toolkit.

Here is the Pyjamas book. It doesn't seem complete though. Pyjama's itself is only at a 0.6 release too.

Seems interesting! I will blog about it if I end up using it for any projects.

[ Permanent link ]


Goopytrans

09 Oct. 2009

I posted another python module for a web API to github tonight. This one handles talking to Google for language translations. It is called Goopytrans (GOOgle PYthon TRANSlator)

available here

Goopytrans supports translating a single body of text.

>>> goopytrans.translate('bonjour', source='fr', target='en')
'hello'

Multiple bodies of text.

>>> goopytrans.translate_list(('bonjour','merci'), source='fr', target='en')
['hello', 'thank you']

And language detection.

>>> goopytrans.detect('bonjour')
{'isReliable': False, 'confidence': 0.12033016000000001, 'language': 'fr'}

[ Permanent link ]


Wikipydia

07 Oct. 2009

I have begun work on a python module for interfacing with the wikipedia API. This is a 0.1 release and covers the gist of what I need to use for wikitrans. The module is available here.

Finding possible matches for an article name works like:


>>> import wikipydia
>>> articles_found = wikipydia.opensearch('Johns Hopkins University')
>>> for name in articles_found[1]:
...     print name
... 
Johns Hopkins University
Johns Hopkins University Press
Johns Hopkins University School of Medicine
Johns Hopkins University Hospital
Johns Hopkins University Applied Physics Laboratory
Johns Hopkins University SAIS
Johns Hopkins University School of Education
Johns Hopkins University in Popular Culture
Johns Hopkins University Carey Business School

Finding language alternatives for articles:


>>> lang_dict = wikipydia.query_language_links('Johns Hopkins University')
>>> for lang in lang_dict:
...     print '%s :: %s' % (lang, lang_dict[lang])
... 
el :: Πανεπιστήμιο Τζονς Χόπκινς
eo :: Johns Hopkins Universitato
de :: Johns Hopkins University
fr :: Université Johns-Hopkins
da :: Johns Hopkins University
fa :: دانشگاه جانز هاپکینز
ar :: جامعة جونز هوبكينز
cs :: Johns Hopkins University
fi :: Johns Hopkinsin yliopisto
es :: Universidad Johns Hopkins

It's also possible to fetch the actual text in either rendered form or wikimarkup


>>> wikimarkup_text = wikipydia.query_text_raw('Dennis')
>>> print wikimarkup_text
'''Dennis''' or '''Denis''' is a male [[given name|first name]] derived from
the [[Greco-Roman]] name [[Dionysius]] meaning "servant of
[[Dionysus]]", the [[Thracian]] god of [[wine]], which is ultimately derived
from the Greek Dios (Διός, "of [[Zeus]]") combined with [[Nysa|Nysos or
Nysa]] (Νυσα), where the young god was raised.
...

>>> rendered_text = wikipydia.query_text_rendered('Dennis')
>>> print rendered_text
<p><b>Dennis</b> or <b>Denis</b> is a male <a href="/wiki/Given_name"
title="Given name">first name</a> derived from the <a
href="/wiki/Greco-Roman" title="Greco-Roman" class="mw-redirect">
Greco-Roman</a> name <a href="/wiki/Dionysius" title="Dionysius">
Dionysius</a> meaning "servant of <a href="/wiki/Dionysus"
title="Dionysus">Dionysus</a>", the <a href="/wiki/Thracian" title="Thracian"
class="mw-redirect">Thracian</a> god of <a href="/wiki/Wine" title="Wine">
wine</a>, which is ultimately derived from the Greek Dios (Διός, "of <a
href="/wiki/Zeus" title="Zeus">Zeus</a>") combined with <a
href="/wiki/Nysa" title="Nysa">Nysos or Nysa</a> (Νυσα), where the young
god was raised.</p>
...

[ Permanent link ]


Asking Google to run a simple langauge translation

27 Sep. 2009

This is Python code for querying Google for a language translation

#!/usr/bin/env python

import urllib
import simplejson
import nltk.data
 
api_url = "http://ajax.googleapis.com/ajax/services/language/translate"
 
def translate(text, src='', to='fr'):
    params = ({'langpair': '%s|%s' % (src, to),
               'v': '1.0'
               })
    target_text=''
    sent_detector = nltk.data.load('tokenizers/punkt/english.pickle')
    for text in sent_detector.tokenize(text.strip()):
        params['q'] = text
        resp = simplejson.load(urllib.urlopen('%s' % (api_url), data =
                                              urllib.urlencode(params)))
        try:
            target_text += resp['responseData']['translatedText'] + " "
        except:
            raise
    return target_text
 
if __name__=='__main__':
    text = """
    Hello Chris. Hello Delip. I am still in my pajamas this morning and
    haven't had ANY coffee yet. But I should be heading out the door with
    my lady in a few minutes to find breakfast (and coffee). I can't decide
    if I want eggs benedict or a toasted bagel with cream cheese and
    salmon. Such is life!
    """
    print "EN :: %s" % (text)
    translated_text = translate(text)
    print "FR :: %s" % (translated_text)

[ Permanent link ]