Wednesday, April 18, 2007

Development of the C Language

The Development of the C Language, by Dennis M. Ritchie

From the abstract:

"The C programming language was devised in the early 1970s as a system implementation language for the nascent Unix operating system. Derived from the typeless language BCPL, it evolved a type structure; created on a tiny machine as a tool to improve a meager programming environment, it has become one of the dominant languages of today. This paper studies its evolution."

Wednesday, March 28, 2007

Exporting selected mails from Evolution

Had an impromptu need to export different selections of emails from Evolution onto the web.

A brief (read: impatient) googling did not reveal any pre-existing solutions, so a quick hack was in order.

Digging around the ~/.evolution directory, I noted that local Evolution foldes are store in ~/evolution/mail/local/<folder_name> as flat files which looks suspiciously like the mbox format. That looks like a good start.

Searching for "mbox to HMTL", I was lead to hypermail.

Hypermail is a program that takes a file of mail messages in UNIX mailbox format and generates a set of cross-referenced HTML documents. Each file that is created represents a separate message in the mail archive and contains links to other articles, so that the entire archive can be browsed in a number of ways by following links. Archives generated by Hypermail can be incrementally updated, and Hypermail is set by default to only update archives when changes are detected.

A quick solution was now obvious:

1. Install hypermail
2. Copy selected emails to a new1 local folder, say "hypermail_temp"
3. Run hypermail
cat ~/.evolution/mail/local/hypermail_temp | hypermail -i -l "Title of Export" -d output_dir

The output produced by hypermail was exactly what I needed. Sortable by thread, author, date, subject, and attachment (Example output).

1[Note: mails deleted from within Evolution (which are sent to trash) will still remain the flat file and would thus also appear in the output of hypermail. Therefore, if running hypermail on an existing folder, you might want to Expunge (ctrl-e) the folder beforehand to ensure deleted mails do not appear]


Ideally, cooking up a tool based on the Eplugin system would have be really neat. Alas, investing too much time on a one-off requirement is just not practical. Maybe next time.

Wednesday, February 28, 2007

WebExport: It's Alive!!

The ResourceSubscription and ResourcePublished classes from twisted.web.distrib made life a lot easier.

WebExport clients publish the directories via ResourcePublished, and connects to the server to announce its existence. Upon successful registration, the server associates a ResourceSubscription to the web root on a random path, and informs the client of where the web view is published.


[lsc@home ~]$ webexport ./web/files
View exported to http://external.domain:8888/17785

When client disconnects, the ResourceSubscription is destroyed and the path removed from the web root.

WebExport server is started by root, gets daemonised, and sheds it's priviledges to nobody:nobody. It listens to two ports, one for web requests and the another for webexport clients.

Not bad of half a day's work :)
Lines of code: ~120 :p

More info to come. Time to rush for the last bus home!!!

Tuesday, February 27, 2007

WebExport: Using Twisted

Jay directed my attention to Twisted.web.

Twisted is a networking engine written in Python, supporting numerous protocols. It contains a web server, numerous chat clients, chat servers, mail servers, and more. Twisted is made up of a number of sub-projects which can be accessed individually through the twisted projects index.

Twisted Web is a web application server written in pure Python, with APIs at multiple levels of abstraction to facilitate different kinds of web programming. ... [It] provides a simple, stable resource publishing API, on top of an HTTP/1.0 server implementation with some HTTP/1.1 features.

Publishing an arbitrary web directory was a breeze using the preconfigured web server. I just had to create a web.tap file, and start the server:
[lsc@home ~]$ mktap web --path /path/to/web/dir
[lsc@home ~]$ twistd -f web.tab



While this does not yet meet the requirements of my intended project, Twisted has caught my attention.

Glancing through the HOWTOs, I was thrilled by the rich set of APIs that made network application programming a breeze (really?). Apart from the core which provides utilities for asynchronous network programming, authentication, DB interfaces, unit testion, etc, there's a host of sub projects which provide convenient protocol implementations for Web, SSH, Mail, IRC/IM, DNS, and such.

# Within a few minutes, I had my first 'server' running
from twisted.internet import protocol, reactor
from twisted.protocols import basic
from sys import stdout

class ParrotProtocol(basic.LineReceiver):
def connectionMade(self):
stdout.write("Polly found a friend\r\n")
self.sendLine("Polly wants a cracker")
def lineReceived(self, line):
if line == "cracker":
self.sendLine("Thank you!")
self.transport.loseConnection()
else:
self.sendLine("Aark!! %s" % (line))
def connectionLost(self, reason):
stdout.write("Polly lost a friend\r\n")
self.sendLine("Bye bye!")
#/end class ParrotProtocol

f = protocol.Factory()
f.protocol = ParrotProtocol
reactor.listenTCP(8888, f)
reactor.run()

Nice. This thingamagic seems like something worth investing some time in.

I've earmarked several packages/modules which might come in handy:
* twisted.python.usage
* twisted.cred
* twisted.spread
* twisted.web


[update]
Twisted by name, twisted by nature.

Naming convention used in some of the packages are a little OTT.
Eg. In the spreadable (distributed) computing package -- twisted.spead
banana... jelly... flavors... InsecureJelly... Unjellyable...
Amusing? For a while. Meaningful? Hmmmm...

Documentation is very extensive, which is great. However, there were sections of the tutorials which took quantum leaps rather than steps. Perhaps it's just due to my limited experience with python and proper program design, but I had trouble progressing from step 3 to step 4 without first groping in the dark (Enlightenment was found in the twisted.python.components tutorial).

Stumbled upon similar speedbumps in latter steps where things magically appeared warranting further scouring for the associated documentation.

Patience...
[/update]

[update2]
Useful article
[/update2]

Friday, February 23, 2007

Possible mini project: WebExport

Here's a mini project I'm thinking of. Fruits of another one of those "wouldn't it be nice if I could do this?" moments.

The scenario:
I have a bunch of HTML files on a machine within a LAN which I would like temporarily expose to the world. Instead of having to go through the hassle of obtaining webspace on an external web server, and copying all the files over, and later removing it again, I would like to be able to do something like:


[lsc@lan ~]$ webexport -d /path/to/html/dir
Reading config file ~/.webexport ...
Exporting view to webexport server ...

View available at http://external.web.domain/webexport/93123
Access password: caffeinOverdose

(Enter Ctrl-d to end session)


The config file would define where to access the WebExport server, as well as the user's authentication details (username/passord? private key? no authentication?).

Access password would be an application generated word/phrase which external viewers would have to enter in order to view the exported files. A --password [pwd] option to define my own password, or --no-password to disable authentication would be nice.

An additional feature would be to be able to export pages hosted on an internal webserver. So, if I want pages hosted in http://10.0.0.88/project1/prototype to be exported:

[lsc@lan ~]$ webexport -s http://10.0.0.88/project1/prototype
Reading config file ~/.webexport ...
Exporting view to webexport server ...

View available at http://external.web.domain/webexport/55699

(Enter Ctrl-d to end session)


Proper path translations would have to be done so that:
http://10.0.0.88/project1/prototype/admin/a.html
would be accessible via:
http://external.web.domain/webexport/55699/admin/a.html

Why do I even need this?
There are several analysis tools we provide which generate a whole bunch of HTML files as output. Users wanting to use these tools would have to do so via ssh into an internal server (we don't distribute the tools due to licensing limitations).

Typically, users would have to scp/sftp the result files back to their local machine for viewing. Not a very friendly solution, especially if repeated analysis is required after the results have been reviewed.

Currently, I've included a wrapper around the tools, which attempts to X-forward a browser instance onto the user machine, or reverts to displaying it in links if X-forwarding is not possible. It works... sort of... but it's just too much of a hack. Hence the idea of WebExport.

Possible approach:
I've only just recently picked up Python, so if I were to give WebExport (WE) a go, I'll probably pythonify it. Best way to learn french is to live in france. Python Land, here I come!

A quick googling reveals that there are several modules and frameworks that might come in handy:
* Pythons builtin BaseHTTPServer module
* CherryPy
* Medusa

Yes, I'm thinking of a client-server setup. The WE-server could act as a dynamic web proxy supporting may WE-clients, and WE-clients could be light-weight HTTP servers which registers itself with the WE-server.

Alternatively, the WE-server could be a webserver which queries for data from the WE-clients connected to it. Some form of caching could be incorporate to improve performance.

In the first incarnation, the WE-server, whatever the implementation, should probably not be connected directly to the WWW, but rather served via Apache using proxyPass. That ought to minimise security concerns, and sit on the current server setup without much modification.

Some useful links here.


Hmmm...
Is there already something out there that does this? Or is there a better approach???

Wednesday, February 21, 2007

Triggering function call on MPI process termination

Was flipping through the MPI-2 Standards when I stumbled upon this:

"There are times in which it would be convenient to have actions happen when an MPI process finishes. For example, a routine may do initializations that are useful until the MPI job (or that part of the job that being terminated in the case of dynamically created processes) is finished. This can be accomplished in MPI-2 by attaching an attribute to MPI_COMM_SELF with a callback function."


Looked like good fun. Here's a quick code to see it in action.

#include <mpi.h>

/* callback function triggered when Communicator is freed */
int comm_self_delete_fn (MPI_Comm comm, int keyval, \
void *attr_val, void *extra_state) {

printf("[%d] In callback function\n", *((int *)attr_val));
}

int main (int argc, char **argv) {

int onquit;
int rank;

MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);

/* Create attribute with callback function */
MPI_Comm_create_keyval(MPI_COMM_NULL_COPY_FN, \
comm_self_delete_fn, &onquit, (void *)0);

/* Assign attribute to MPI_COMM_SELF with tank as value */
MPI_Comm_set_attr(MPI_COMM_SELF, onquit, &rank);
MPI_Comm_free_keyval(&onquit);

printf("[%d] Before finalize\n", rank);
MPI_Finalize();
printf("[%d] After finalize\n", rank);

return 0;
}


Notes:

  • MPI_COMM_SELF is used (instead of, say MPI_COMM_WORLD) because that is the first object freed during MPI_Finalize() before any other parts of MPI are affected.

  • MPI-2 introduces attribute caching functions for Windows and Datatypes (previously only available for Communicators): MPI_{Comm,Win,Type}_{create,set}_keyval.

  • Callback function for COPY used when Communicator (or windows/datatypes) are duplicated.