Here is a Python snippet that will fetch a Census name file, parse it and generate a list of fake names that look realistic. I needed 2500 names to test a system I'm building and this was handy so I'm sharing.
#!/usr/bin/env python
import urllib
import re
print 'Fetching names from the Census...'
names_list_url = 'http://www.census.gov/genealogy/names/dist.all.last'
names_data = urllib.urlopen(names_list_url).readlines()
print 'Processing Census data'
previous_name = 'John'
for line in names_data:
m = re.findall('(\w+).*', line)
if len(m) == 1:
name = m[0]
print '%s, %s' % (name.capitalize(), previous_name.capitalize())
previous_name = name
[ Permanent link ]
I came across an interesting blog post discussing different programming fonts. My current one, Andale Mono, comes in at #9. I'm impressed enough by any of them to want to switch from Andale Mono, but then I saw Inconsolata at #1.
Dang. That font is pretty sweet.
I found these instructions useful for installing it. I used the ttf which I have just added to my customize git repo.
Edit: I am sticking with Andale Mono inside Emacs but really like Inconsolata as my fixed width browser font.
[ Permanent link ]
Diffing in Emacs can be surprisingly sweet. Here is my screen.
I am using diff-mode- inside Carbon Emacs along with the modifications shown below to use my colors.
10c10
< ;; Last-Updated: Sat Aug 1 15:15:22 2009 (-0700)
---
> ;; Last-Updated: Sat Dec 27 10:19:33 2008 (-0800)
12c12
< ;; Update #: 647
---
> ;; Update #: 646
15c15
< ;; Compatibility: GNU Emacs: 21.x, 22.x, 23.x
---
> ;; Compatibility: GNU Emacs 21.x, GNU Emacs 22.x
116c116
< '((t (:foreground "Blue" :background "DarkSeaGreen1")))
---
> '((t (:foreground "Blue" :background "Green")))
135c135
< '((t (:foreground "PaleGoldenrod" :background "DarkGreen")))
---
> '((t (:foreground "PaleGoldenrod" :background "Green")))
141c141
< '((t (:foreground "PaleGoldenrod" :background "DarkMagenta")))
---
> '((t (:foreground "PaleGoldenrod" :background "DarkRed")))
148c148
< '(diff-added ((t (:foreground "DarkGreen"))) 'now)
---
> '(diff-added ((t (:foreground "Green"))) 'now)
150,151c150,151
< '(diff-context ((t (:foreground "Black"))) 'now)
< '(diff-file-header ((t (:foreground "Red" :background "White"))) 'now)
---
> '(diff-context ((t (:foreground "grey50"))) 'now)
> '(diff-file-header ((t (:foreground "Red" :background "gray15"))) 'now)
153,154c153,154
< '(diff-header ((t (:foreground "Red"))) 'now)
< '(diff-hunk-header ((t (:foreground "White" :background "Salmon"))) 'now)
---
> '(diff-header ((t (:foreground "Red" :background "gray15"))) 'now)
> '(diff-hunk-header ((t (:foreground "White" :background "gray15"))) 'now)
157c157,159
< '(diff-removed ((t (:foreground "DarkMagenta"))) 'now)
---
> '(diff-removed ((t (:foreground "DarkRed"))) 'now)
> '(diff-indicator-added-face ((t (:foreground "Green"))) 'now)
> '(diff-indicator-removed-face ((t (:foreground "Red"))) 'now)
168c170
< ("^\\*\\*\\* .+ \\*\\*\\*\\*". diff-file1-hunk-header-face) ;context
---
> ("^\\*\\*\\* .+ \\*\\*\\*\\*". diff-file1-hunk-header-face) ;context
171c173
< ("^---$" . diff-hunk-header-face) ;normal
---
> ("^---$" . diff-hunk-header-face) ;normal
183c185,186
< ("^[^-=+*!<>#].*\n" (0 diff-context-face))))
---
> ("^[^-=+*!<>#].*\n" (0 diff-context-face))
> ))
[ Permanent link ]
I have released a new tool called Pyango View
It's story is here, on the wikitrans blog.
[ Permanent link ]
The pyfacebook module by Samuel Cormier-Iijima is awesome. I've worked with multiple web API's now and found that their logic is rarely too hard to understand, but if you want some error checking before you query the remote source you need to use real functions.
I have come across many implementations where you're given a tool that can communicate to a URL and pass some arguments, one of which is a function name, and from there the URL, representing the remote function is called and the arguments are sent as POST data. But it's up to the programmer to know all the functions available because they're not in these API implementations. Amazon's current Python implementation of an API for handling mechanical turk is exactly one of these incomplete implementations. The programmer must also be sure they're not attempting to call functions that don't exist. They won't know this, however, until the server tells them so. You usually end up with code that looks like below when you're working with one of these abstract implementations.
# query server for hit count
query_mt('get_hit_count', server_key, secret_key)
The issue, simply, is that the programmer did not have time to do a proper implementation and instead gave a barebones framework. If they had time, they'd have done the right thing. Right?!
Pyfacebook solves this in a superb way. They use an IDL to describe the interface and then dynamically generate some objects encapsulating the behavior found in the language. The language is then expanded upon to have access to functions that handle the data transport and response processing. This technique is often found in RPC to leverage it's ability to open up system communication to an abstract and easily adapted interface. Basically, the systems agree to some function names having certain types of arguments and the rest is just communicating the values of the query.
If you check the pyfacebook code itself, you'll see the IDL start with the declaration of METHODS (line 116 at the time of this writing). We see a dictionary of dictionary's. The outermost dictionary's represent a namespace, like photos-related functions or admin-related functions. The next layer of dictionary's, inside a namespace, represent the functions offered by that namespace and inside each of those is a list of tuple's representing the argument list.
The tuple's consist of the argument name, the type, and any flags for describing the variable. It's common to see ('pid', int, []) or ('page_ids','list',['optional']). This entire list is iterated upon in __generate_proxies() where the transformation from IDL to actual functions takes place. This function reads the language definition and generates Python code from the list contents, calls eval() on the code and instantiates the objects from the generated code. Dynamic languages for the win!
The actual generated code looks something like what's below. Notice that every function essentially returns their name and a dictionary or arguments that have been processed.
def addLike(self, uid=None, post_id=None):
"""
Facebook API call. See http://developers.facebook.com/documentation.php?v=1.0&method=stream.addLike
"""
args = {}
if uid is not None: args['uid'] = uid
if post_id is not None: args['post_id'] = post_id
return self('addLike', args)
def removeLike(self, uid=None, post_id=None):
"""
Facebook API call. See http://developers.facebook.com/documentation.php?v=1.0&method=stream.removeLike
"""
args = {}
if uid is not None: args['uid'] = uid
if post_id is not None: args['post_id'] = post_id
return self('removeLike', args)
def remove(self, post_id, uid=None):
"""
Facebook API call. See http://developers.facebook.com/documentation.php?v=1.0&method=stream.remove
"""
args = {}
args['post_id'] = post_id
if uid is not None: args['uid'] = uid
return self('remove', args)
def addComment(self, post_id, comment, uid=None):
"""
Facebook API call. See http://developers.facebook.com/documentation.php?v=1.0&method=stream.addComment
"""
args = {}
args['post_id'] = post_id
args['comment'] = comment
if uid is not None: args['uid'] = uid
return self('addComment', args)
Python offers programmers the chance to implement a function (__call__()) which gets called when the object itself is called as a function. If you look at the definition of the Proxy class you see an __init__() function and the __call__() function. For those who don't know Python, __init__ is the closest thing you have to a constructor. The proxy is instantiated such that it points to a _client object and this object is an instance of the Facebook class. __call__() calls _clent as a function with some arguments, one is the remote method we're calling. The Facebook class is defined lower than the IDL and lower than the proxy objects. It consists mainly of some communication and session oriented functions. When the Facebook class is called as a function (first the proxy, then the facebook class) we see Facebook's __call__() turn the arguments into post variables and query the API's URL for an answer. The facebook URL itself is http://api.facebook.com/restserver.php and the method we're calling is one of the POST arguments ('method', specifically).
The IDL functions default to basically being a list of arguments with certain types mapped to certain functions, but sometimes more code is needed than simply sending some data to a URL. Pyfacebook stores session data that comes back from some authentication based calls, as is done in AuthProxy. Another example is how PhotosProxy has an upload function defined which handles encoding the binary data for transmission. The proxy objects are first generated dynamically according to the IDL's first layer of dictionary keys (eg. 'photos'). Then the proxy object is created with the name PhotosProxy and added to the global namespace. If there are overrides you see a redeclaration of the object and it inherits from (itself?!) an object of the same name. I really dig this logic as it shows how you can use an IDL to make assumptions about typical behavior and then override the atypical cases with ease.
The Facebook class itself is largely of just an auth implementation which includes session handling for the user.
Neat stuff for sure!
[ Permanent link ]
My friend Malcolm brought Pyjamas to my attention today. Pyjamas is a Python based implementation of Google's Web Toolkit.
Here is the Pyjamas book. It doesn't seem complete though. Pyjama's itself is only at a 0.6 release too.
Seems interesting! I will blog about it if I end up using it for any projects.
[ Permanent link ]
I posted another python module for a web API to github tonight. This one handles talking to Google for language translations. It is called Goopytrans (GOOgle PYthon TRANSlator)
Goopytrans supports translating a single body of text.
>>> goopytrans.translate('bonjour', source='fr', target='en')
'hello'
Multiple bodies of text.
>>> goopytrans.translate_list(('bonjour','merci'), source='fr', target='en')
['hello', 'thank you']
And language detection.
>>> goopytrans.detect('bonjour')
{'isReliable': False, 'confidence': 0.12033016000000001, 'language': 'fr'}
[ Permanent link ]
I have begun work on a python module for interfacing with the wikipedia API. This is a 0.1 release and covers the gist of what I need to use for wikitrans. The module is available here.
Finding possible matches for an article name works like:
>>> import wikipydia
>>> articles_found = wikipydia.opensearch('Johns Hopkins University')
>>> for name in articles_found[1]:
... print name
...
Johns Hopkins University
Johns Hopkins University Press
Johns Hopkins University School of Medicine
Johns Hopkins University Hospital
Johns Hopkins University Applied Physics Laboratory
Johns Hopkins University SAIS
Johns Hopkins University School of Education
Johns Hopkins University in Popular Culture
Johns Hopkins University Carey Business School
Finding language alternatives for articles:
>>> lang_dict = wikipydia.query_language_links('Johns Hopkins University')
>>> for lang in lang_dict:
... print '%s :: %s' % (lang, lang_dict[lang])
...
el :: Πανεπιστήμιο Τζονς Χόπκινς
eo :: Johns Hopkins Universitato
de :: Johns Hopkins University
fr :: Université Johns-Hopkins
da :: Johns Hopkins University
fa :: دانشگاه جانز هاپکینز
ar :: جامعة جونز هوبكينز
cs :: Johns Hopkins University
fi :: Johns Hopkinsin yliopisto
es :: Universidad Johns Hopkins
It's also possible to fetch the actual text in either rendered form or wikimarkup
>>> wikimarkup_text = wikipydia.query_text_raw('Dennis')
>>> print wikimarkup_text
'''Dennis''' or '''Denis''' is a male [[given name|first name]] derived from
the [[Greco-Roman]] name [[Dionysius]] meaning "servant of
[[Dionysus]]", the [[Thracian]] god of [[wine]], which is ultimately derived
from the Greek Dios (Διός, "of [[Zeus]]") combined with [[Nysa|Nysos or
Nysa]] (Νυσα), where the young god was raised.
...
>>> rendered_text = wikipydia.query_text_rendered('Dennis')
>>> print rendered_text
<p><b>Dennis</b> or <b>Denis</b> is a male <a href="/wiki/Given_name"
title="Given name">first name</a> derived from the <a
href="/wiki/Greco-Roman" title="Greco-Roman" class="mw-redirect">
Greco-Roman</a> name <a href="/wiki/Dionysius" title="Dionysius">
Dionysius</a> meaning "servant of <a href="/wiki/Dionysus"
title="Dionysus">Dionysus</a>", the <a href="/wiki/Thracian" title="Thracian"
class="mw-redirect">Thracian</a> god of <a href="/wiki/Wine" title="Wine">
wine</a>, which is ultimately derived from the Greek Dios (Διός, "of <a
href="/wiki/Zeus" title="Zeus">Zeus</a>") combined with <a
href="/wiki/Nysa" title="Nysa">Nysos or Nysa</a> (Νυσα), where the young
god was raised.</p>
...
[ Permanent link ]
This is Python code for querying Google for a language translation
#!/usr/bin/env python
import urllib
import simplejson
import nltk.data
api_url = "http://ajax.googleapis.com/ajax/services/language/translate"
def translate(text, src='', to='fr'):
params = ({'langpair': '%s|%s' % (src, to),
'v': '1.0'
})
target_text=''
sent_detector = nltk.data.load('tokenizers/punkt/english.pickle')
for text in sent_detector.tokenize(text.strip()):
params['q'] = text
resp = simplejson.load(urllib.urlopen('%s' % (api_url), data =
urllib.urlencode(params)))
try:
target_text += resp['responseData']['translatedText'] + " "
except:
raise
return target_text
if __name__=='__main__':
text = """
Hello Chris. Hello Delip. I am still in my pajamas this morning and
haven't had ANY coffee yet. But I should be heading out the door with
my lady in a few minutes to find breakfast (and coffee). I can't decide
if I want eggs benedict or a toasted bagel with cream cheese and
salmon. Such is life!
"""
print "EN :: %s" % (text)
translated_text = translate(text)
print "FR :: %s" % (translated_text)
[ Permanent link ]