Friday, February 23, 2007

Possible mini project: WebExport

Here's a mini project I'm thinking of. Fruits of another one of those "wouldn't it be nice if I could do this?" moments.

The scenario:
I have a bunch of HTML files on a machine within a LAN which I would like temporarily expose to the world. Instead of having to go through the hassle of obtaining webspace on an external web server, and copying all the files over, and later removing it again, I would like to be able to do something like:


[lsc@lan ~]$ webexport -d /path/to/html/dir
Reading config file ~/.webexport ...
Exporting view to webexport server ...

View available at http://external.web.domain/webexport/93123
Access password: caffeinOverdose

(Enter Ctrl-d to end session)


The config file would define where to access the WebExport server, as well as the user's authentication details (username/passord? private key? no authentication?).

Access password would be an application generated word/phrase which external viewers would have to enter in order to view the exported files. A --password [pwd] option to define my own password, or --no-password to disable authentication would be nice.

An additional feature would be to be able to export pages hosted on an internal webserver. So, if I want pages hosted in http://10.0.0.88/project1/prototype to be exported:

[lsc@lan ~]$ webexport -s http://10.0.0.88/project1/prototype
Reading config file ~/.webexport ...
Exporting view to webexport server ...

View available at http://external.web.domain/webexport/55699

(Enter Ctrl-d to end session)


Proper path translations would have to be done so that:
http://10.0.0.88/project1/prototype/admin/a.html
would be accessible via:
http://external.web.domain/webexport/55699/admin/a.html

Why do I even need this?
There are several analysis tools we provide which generate a whole bunch of HTML files as output. Users wanting to use these tools would have to do so via ssh into an internal server (we don't distribute the tools due to licensing limitations).

Typically, users would have to scp/sftp the result files back to their local machine for viewing. Not a very friendly solution, especially if repeated analysis is required after the results have been reviewed.

Currently, I've included a wrapper around the tools, which attempts to X-forward a browser instance onto the user machine, or reverts to displaying it in links if X-forwarding is not possible. It works... sort of... but it's just too much of a hack. Hence the idea of WebExport.

Possible approach:
I've only just recently picked up Python, so if I were to give WebExport (WE) a go, I'll probably pythonify it. Best way to learn french is to live in france. Python Land, here I come!

A quick googling reveals that there are several modules and frameworks that might come in handy:
* Pythons builtin BaseHTTPServer module
* CherryPy
* Medusa

Yes, I'm thinking of a client-server setup. The WE-server could act as a dynamic web proxy supporting may WE-clients, and WE-clients could be light-weight HTTP servers which registers itself with the WE-server.

Alternatively, the WE-server could be a webserver which queries for data from the WE-clients connected to it. Some form of caching could be incorporate to improve performance.

In the first incarnation, the WE-server, whatever the implementation, should probably not be connected directly to the WWW, but rather served via Apache using proxyPass. That ought to minimise security concerns, and sit on the current server setup without much modification.

Some useful links here.


Hmmm...
Is there already something out there that does this? Or is there a better approach???

2 comments:

Anonymous said...

Have a look at twisted, especially the mktap feature.

http://twistedmatrix.com/projects/web/documentation/howto/using-twistedweb.html

You could easily serve content from an arbitrary directory by running the command

> mktap web --path /dir/to/share

Hope this helps

shawnchin said...

looks good. Digesting the howtos now...