Opportunities of benign-neglect

wednesday, february 24th, 2010 7:37am

Cathy Marshall of Microsoft Research gave a keynote at the wonderful code4lib 2010 conference that provided a useful nudge to my thinking about repository layers.

I've suggested elsewhere that university libraries contemplating a repository should consider developing policies around repository 'layers'. This notion involves both an inner long-term, high-guarantee archival layer -- and an outer services-oriented work-space layer. Reasons for the archival layer are obvious. Perhaps less so are reasons for and benefits of the work-space layer: it fulfills a library mission to further scholarly work; it strengthens the library's position as a central part of the academic campus community; it creates opportunities for valuable work to be moved easily into the archival layer.

Though my Library doesn't conceptualize our repository in this way, it's compelling enough that I think about this layered approach regularly. Given some exciting video initiatives at my University, much of my recent work-space layer thinking has focused on how to avoid the possibility of having precious library disk space overwhelmed with hypothetical services-layer low(er) quality materials. Strategies I've considered to deal with this concern are combinations of limiting the size of an entity's (person/department) work-space, and/or limiting the number of years items may remain in the work-space. Given my strong belief in providing useful and friendly user-services, in this 'limiting' scenario, we would provide terrific charts and notifications which would allow work-space users to easily monitor their usage of this temporal, useful space -- and provide tools and Library staff assistance to easily move appropriate items into the archival layer.

But regardless of the intention to have this work-space be used productively, there would be a high likelihood that the more control we give users over their Library work-space, the more likely that a significant portion of this work-space would fill up with materials that exist simply because it's more of a hassle to delete things than it is to neglect them -- one of Marshall's key points.

While Marshall specifically noted the problems of benign-neglect as a user-strategy for handling materials, she also noted that benign neglect offers opportunities. This was the nudge. I'm finding this notion of opportunities fascinating to reflect upon; it offers new realms for thinking about interesting services that could be built for this work-space layer.

The simple accretion of data from benign neglect suggests the now-common mining strategy associated with usage-data, popularized by amazon: "you may also be interested in this". An acquaintance recently told me about 'mallet', software than can mine texts to discern topics. It would be a worthy experiment to use such a tool to offer repository users an optional discovery service based on their text-based work-space materials.

Two additions to Apple's iPhoto application in the last year or so suggest other possibilities. 'Faces' scans a user's iPhoto library, using pattern-recognition routines to create groupings of people. 'Places' scan's the library and extracts geo-location coordinates if available, and, if I recall correctly, timestamp data, to create a map view over time of photo-locations.

Other scans could be run on work-space data, looking for patterns of government data-sets or citations. And combinations of embedded metadata such as geo-location and mime-type and date could be gathered, so that if, for example, a pattern of images taken at a certain location on a certain date was detected, not only could auto-grouping of those items be presented, but external sources such as flickr could be queried as well, offering the user the ability to see other external views of this 'event'.

Many of these scan/mining ideas would also be useful to apply to the repository as a whole. Such scans could offer both automated randomized general-discovery displays, as well as offer researchers additional focused discovery-views to permitted items. But to the extent that such services enhance the quality of users' work-space experience, it might help to keep the materials in the work-space more relevant: using benign-neglect to minimize benign-neglect.

Birth tips from a 50 year-old guy

sunday, january 24th, 2010 8:23pm

That title doesn't quite sound right, does it.

As a father of two, I've had occasion, over the years, to offer my 'three tips' to first-time moms-to-be, and have received positive feedback. Recently I've been thinking about birth due to recent family and friend births, and figured I'd write them up...

Childbirth classes

Think of tips & techniques as tools, not rules.

I highly recommend childbirth classes; I really cannot say enough positive things about them -- even refresher ones. However, my advice for moms (and partners): view the tips and techniques taught as 'possible tools', not 'rules to live up to'. Specific moms may or may not find specific tips and techniques useful. The reason I note this is because I've occasionally talked to moms who have been disappointed with their 'performance' during birth -- often because they had an expectation from a class of how the birth 'should have' progressed. One mom told me she was embarrassed at how loud she was, thinking that if she would have performed the class' breathing and relaxation techniques 'better', she wouldn't have needed to yell. My thought: if yelling or grunting works for a particular woman -- that is fine, whether or not it is in the standard technique playbook.

Visitors

Pick a close friend or family member to not visit until after four weeks have elapsed.

This is mostly for the first kid. It comes from a midwife, and was dramatically confirmed by our experience with our firstborn. She told us that after the child is born, friends and family will, understandably, want to come and visit and help out. By the time a month has elapsed, everyone has come and gone, and that is when the effects of sleeplessness can become more pronounced, and help is most appreciated/needed. We planned to have one of our good friends fly in around five weeks after the birth of our first child. Our baby was colicky until about six weeks, and our friend's simple willingness to do laundry, to organize pizza deliveries, and to simply watch the baby while we both took a short walk was deeply, deeply appreciated.

Breastfeeding

It's not innate! Learn, and ask for advice.

I suspect that in times when multiple generations of families lived together, or for those women who have had a baby after a bunch of their women friends have, this fact would not be the surprise that it was to us. However, we were among the first of our set of friends to have children, and so hadn't had the experience of extensive conversations.

I had naively assumed that breastfeeding would be some sort of natural, somewhat instinctive process, but our midwife encouraged us to go to a breastfeeding class. The instructor basically noted that given that our society is no longer made up of multi-generational families living together, many are not aware that the process can be difficult for mom and baby to get used to. In particular, she noted that because first-time moms are understandably concerned about the baby, that it's easy for an unconstructive cycle to quickly develop: mom is worried that baby isn't getting enough milk -> feedings thus become more stressful -> feedings thus become more difficult -> mom worries more. The instructor showed the (clothed) moms helpful holding techniques, gave information about dealing with sore nipples (and noted how commonly that condition occurs among first-time moms especially), and gave a hotline number for information and support and even a home-visit for a bit of coaching. Breastfeeding mostly went smoothly for mom & baby, but during a few difficult periods, the information and the normalization of the problem from that class was invaluable. (I trust that it's not even necessary here to spend any time noting why breastfeeding is a Good Thing.)

By the time we had our second baby a few years later, I noticed, in our brief hospital stay, much more information being offered about breastfeeding tips and techniques. I hope information this trend has continued. Congrats to all moms-to-be and dads-to-be out there!

Fedora / Shibboleth authorization solution

saturday, january 16th, 2010 8:06am

I don't work directly on Fedora (the repository software), but am very familiar with it due to my work with a programmer who does, and because I've worked on a django front-end for ingestion of items into fedora, as well as fedora-apis. My role in fedora work is more akin to the 'corner-man' in boxing. Together the boxer and I strategize about the opponent, his defenses and threats to our plans, and devise approaches to deal with evolving challenges. We cross ourselves, the fedora programmer goes into the ring, and between rounds I provide moral support, bandage wounds, and, because of my distance from actual battle, sometimes have useful ideas for the next round. This analogy's negativity toward the software is appropriate; to use another: We've bought a car that, in hindsight, I wouldn't recommend to others, but that we're committed to getting some terrific mileage out of.

So, it's been a tough fight, but our boxer is quick, has impressive endurance, and we believe we'll come out on top.

Fedora authorization is one round in which we think we've scored well.

Fedora comes bundled with an authorization piece called XACML. I don't know if it's due to xacml, or fedora's implementation of it, but from what I gather indirectly, it's terrifying enough that few use it, and it is, in fact, scheduled to be augmented in a future release with a new Great Hope: FeSL.

But if you want to go into production now, what to do? The dearth of published authentication/authorization 'live' solutions is why, as I understand it, so many fedora installations are either completely open (all objects public), or completely locked down for internal use.

We've assumed we would use some sort of wrapper around fedora, to authenticate against Shibboleth, with which our university is slowly moving forward. Shib's lack of logout capability, and the resultant assumption that users will happily quit their browser to logout, would seem quaintly amusing if it weren't true -- but that is another topic entirely, and single sign-on is certainly convenient. Not long ago we began to tackle how, specifically, to implement shib/fedora authorization.

Recently someone described to me an authorization approach the muradora folk took. I haven't looked at any documentation myself, but I was told that they wrote a servlet filter that takes a submitted name and password, and passes it to a non-centralized custom ldap server that exists only for the purpose of allowing fedora's built in ldap-xacml code to handle authentication. (For those unfamiliar: a java servlet filter acts as a front layer of a java webapp through which incoming requests and outgoing responses must pass, and can be modified.)

A few of us heard this and had divergent reactions. It sounded like a hack, which caused some to dismiss the approach. Personally, being quite partial to hacks that work around monolithic software obstacles, I thought the hack smacked of ingenious creativity and was worth further examination. I was indulged; the result: our corner has devised an approach that initial testing indicates will work well.

First, some necessary background info...

  • Our University shib implementation is integrated with Grouper. I think grouper is, or at least historically has been, a separate project from shibboleth, but they work together brilliantly. Upon shib-login, a list of the groups to which the user belongs is accessible to the server via the shib 'is-member-of' header field.

  • Our implementation of fedora item ingestion involves creating a METS record that contains a bunch of item-info -- including a rights segment. The rights segment contains a series of entries, each one listing an identity (a shib is-member-of group) and a permission. Example (content, not format): identity='chemistry-department' & permission='view_item'

  • The mets record is handed to an ingester that converts the mets xml to FOXML, then fedora grabs the object (we're using the 'managed' option at the moment), and the java messaging built into fedora fires off a message to a listener that indexes (via Solr) parts of the foxml record, including the rights information.

So, our approach: create a fedora servlet filter that reads the shib groups/identities, then does a solr search to see if the object being requested has a 'view' permission for any of the identities in the request's shib is-member-of header. If so, the request is allowed through; if not; it is blocked. If no shib-identity is found, the servlet filter will only yield objects with 'public' view_item permissions.

The beauty of this is that fedora-access can be fully open to the internet while still allowing authorized access for those objects that require it. Further, this solution offers reasonable hope that it will survive fedora upgrades, since the servlet, though a part of the fedora webapp, is somewhat of a separate layer in front of the app. Further, by adding more granular permissions (at the moment permissions are at the object-level; they could be at the data-stream level) -- or simply by a bit of extra programming in the servlet-filter -- we could allow, say, the public to access low and medium-resolution images, but allow, say, faculty to access high-resolution images.

I'll keep this paragraph updated... Our intrepid programmer has figured out where to insert the custom servlet filter, has worked with our systems person to hook up an initial apache/tomcat connection so as to allow the shib installation on apache to pass its headers through to tomcat, and confirmed the filter's detection of the shib identity header information. A nice side-effect of installing shib on apache rather than tomcat directly is that we can allow programmatic access to port 8080.

(some technical info and some code: here)

The bell has rung; the next round begins. We cross our fingers, and the programmer heads into the ring once again.

the wave and the repository

wednesday, november 11th, 2009 8:54pm

I've been playing with Google Wave recently and am deeply impressed.

I would not be surprised if within a few years, a year for many, waves will largely replace emails. Not just for youth, whose primary forms of non-voice digital communication are sms-texts or facebook-posts, but also for those of us for whom email is currently an absolutely essential daily form of communication.

I believe it will be that significant.

For those not familiar with Google Wave, here are some links:

My mind-wheels have been spinning, envisioning what this new form of communication could impact.

Library digital repository

At the Access 2007 conference I saw an inspiring talk by Mark Leggott of the University of Prince Edward Island. He spoke about the Virtual Research Environment that his group had created, which successfully addressed a thorny issue: Libraries which had expended significant resources to build digital repository systems were having a terrible time getting campus entities to contribute content.

What was so compelling about Leggott's approach was his team's shift in perspective from expecting users to meet Library requirements -- to the Library meeting users' needs. I was still fairly new to the Library world at that time, but my sense was that institutions had built their digital repository to meet Library needs for thorough meta-data -- without much regard to user-experience or needs. The result: the new digital repository felt irrelevant to campus users. With onerous submission processes and requirements, little material was submitted. Leggott's team instead focused directly on useful services to key campus constituents such as faculty -- allowing them to more easily do the work they already did. As I recall, one simple example was that his group provided storage space for research data-sets -- but I believe the offer wasn't just for final data-sets, but data-sets under active development, revision, and analysis. The result: campus digital work worthy of inclusion into the repository was already within the UPEI Library system, which made final repository ingestion of appropriate material architecturally easy.

Layering

Now that I have one work-foot in the digital repository world, I've been wondering about an issue stemming from my recollection of the Leggott team's approach: how to design a compelling suite of storage and easy-to-use services, without the Library repository being filled with vacation and pet pictures?

My thought: the Library could develop clear, simple policies for layers of repository-usage. The outer layer would be more flexible, more transactional, and would require a lower-level of quality metadata for submitted items -- tags would be fine. We already have a mission to support transactional, often non-archival work: supporting research. For items to be accepted into the inner more archival layer, more and higher quality metadata would be required, in exchange for the guarantee of permanence, multiple-channel data exposure, and format data-migration. The benefits of this layered approach: the Library can play an increased central role in the creative work of the campus, and ensure access to quality data from across campus that would flow into the repository.

From this layering idea arises the question of how to architecturally separate the layers. Brainstorming, I've imagined that campus users could be allotted x00 GB of outer-layer 'work' storage-space -- with more inner-layer repository-space just a click away for items deemed appropriate. Or the outer-layer work space could be limited by time-frame: all files in the outer-layer workspace could have, say, a two-year lifespan, with a nice status-report system so no one would lose work unexpectedly. That'd help encourage worthy materials to be migrated into the inner layer.

Leveraging

Recently, my thinking is shifting in a different direction. I still like the idea of an outer transactional work-layer, and an inner repository layer with richer metadata and higher archival guarantees. But I question whether we need to build all the outer-layer services. An alternate approach would be to facilitate the use of existing third-party services and tools, and build Library services, plugins, and widgets to streamline the ingestion of appropriate items from those external third-party work-layer services and tools.

My recent experimentation with Google Wave was a catalyst for this shift, especially given its collaborative strengths, and its ability to easily handle files and images via drag & drop.

Vision

One of the use-cases we've envisioned for use of our digital repository is a professor organizing images for a class presentation. Imagine the professor is working with teaching assistants (TAs) to refine the presentation and associated points. With Google Wave, the professor could set up a wave to prepare for the class session. She could invite her two TAs into the wave; each could simply drag pictures into the wave, tagging them. The professor could also set up a bullet-point list in the wave, encourage the TAs to contribute to the bullet-list, and note issues for them to research in preparation for the class session.

Imagine if, when the session-material preparation is complete, the professor could then apply a Library repository-gadget to the wave which would, after a campus authentication process, ingest all the pictures and associated titles (and, optionally, the wave itself), and redirect the prof to a repository web-page to enter a bit more metadata. Upon adding this extra information, the data would be officially ingested into the repository. Because Google Wave is an open-source project, the Library or campus IT folk could, if desired, install a wave server to facilitate branding and make it all the easier for Libary services to be integrated seamlessly into users' work flow.

Google wave comes with an Extensions Gallery that provides inspiration for imagining the varied kinds of services that can be applied to a wave, and tutorials abound on how to program extensions. The same approach could be applied to flickr and facebook: Library programmers could build widgets and mini-apps to enable users to use friendly tools and services they're already comfortable with -- but to still be able to shift their works easily into the official repository. It's part of the idea of meeting users where they are, as opposed to requiring that they come to us. That this approach offers new and exciting realms for Library programmers is just delicious gravy.

mod_wsgi on os x

sunday, march 1st, 2009 5:20am

[This installation was so easy it may not seem worth the notes and output below. But over the years I've found it surprisingly useful to be able to refer back to notes such as this, and helpful to read others' detailed installation reports.]


On this page...


Goal & motivation

  • I'll be developing more and more web-apps that call services, and unless I call them all via ajax -- which might actually be a good idea -- I'll bump into the development server's single-threaded limitation. And even if I do call the services via ajax I'll likely run into problems if I'm calling more than one service simultaneously -- although the ability of an ajax call to fail gracefully via a timeout and try its resource again would be a cool thing to learn how to implement.

  • Thinking about this off and on for the last few months, I recently came across a posting by the god of apache-python integration, Graham Dumpleton, about using mod_wsgi for development. This, on top of a slow solaris installation using mod_python (the slowness probably has more to do with the box than mod_python) give me the impetus to delve into this.

  • Finally, gsf's encouragement to check out mod_wsgi at code4lib 2009 made me bite the bullet.

Sources of info

Starting setup

  • python version 2.5.1, via:

    birkinbox:~ birkin$ 
    birkinbox:~ birkin$ python
    Python 2.5.1 (r251:54863, Jan 13 2009, 10:26:13) 
    [GCC 4.0.1 (Apple Inc. build 5465)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>>
    
  • apache version 2.2.9, via phpinfo():

    apache2handler
    
    Apache Version  Apache/2.2.9 (Unix) mod_ssl/2.2.9 OpenSSL/0.9.7l DAV/2 PHP/5.2.6
    Apache API Version  20051115
    Server Administrator  you@example.com
    Hostname:Port ::1:0
    User/Group  www(70)/70
    Max Requests  Per Child: 0 - Keep Alive: on - Max Per Connection: 100
    Timeouts  Connection: 300 - Keep-Alive: 5
    Virtual Server  No
    Server Root /usr
    Loaded Modules  core prefork http_core mod_so mod_authn_file mod_authn_dbm mod_authn_anon mod_authn_dbd mod_authn_default mod_authz_host mod_authz_groupfile mod_authz_user mod_authz_dbm mod_authz_owner mod_authz_default mod_auth_basic mod_auth_digest mod_cache mod_disk_cache mod_mem_cache mod_dbd mod_dumpio mod_ext_filter mod_include mod_filter mod_deflate mod_log_config mod_log_forensic mod_logio mod_env mod_mime_magic mod_cern_meta mod_expires mod_headers mod_ident mod_usertrack mod_setenvif mod_version mod_proxy mod_proxy_connect mod_proxy_ftp mod_proxy_http mod_proxy_ajp mod_proxy_balancer mod_ssl mod_mime mod_dav mod_status mod_autoindex mod_asis mod_info mod_cgi mod_dav_fs mod_vhost_alias mod_negotiation mod_dir mod_imagemap mod_actions mod_speling mod_userdir mod_alias mod_rewrite mod_bonjour2 mod_php5
    
  • Requirement: Docs say "The GNU C compiler from the MacOS X Developer Toolkit bundle is required." -- I should be fine; developer stuff is installed.

Plan

Install

  • Got source code from the specified download page.

  • Selected single version listed (2.3)

  • Unstuff

    birkinbox:~ birkin$ 
    birkinbox:~ birkin$ mv /Users/birkin/Downloads/mod_wsgi-2.3.tar.gz /Developer_3rd/mod_wsgi-2.3.tar.gz
    birkinbox:~ birkin$ 
    birkinbox:~ birkin$ cd /Developer_3rd/
    birkinbox:Developer_3rd birkin$ 
    birkinbox:Developer_3rd birkin$ /usr/bin/tar xvfz ./mod_wsgi-2.3.tar.gz
    mod_wsgi-2.3/
    mod_wsgi-2.3/configure
    mod_wsgi-2.3/configure.ac
    mod_wsgi-2.3/LICENCE
    mod_wsgi-2.3/Makefile-1.X.in
    mod_wsgi-2.3/Makefile-2.X.in
    mod_wsgi-2.3/mod_wsgi.c
    mod_wsgi-2.3/README
    birkinbox:Developer_3rd birkin$
    
  • Configure

    birkinbox:Developer_3rd birkin$ 
    birkinbox:Developer_3rd birkin$ cd ./mod_wsgi-2.3/
    birkinbox:mod_wsgi-2.3 birkin$ 
    birkinbox:mod_wsgi-2.3 birkin$ ls -alF
    total 952
    drwxr-xr-x@  9 birkin  admin     306 Aug 23  2008 ./
    drwxr-xr-x  20 birkin  admin     680 Mar  2 08:16 ../
    -rw-r--r--@  1 birkin  admin   11358 Jun 23  2007 LICENCE
    -rw-r--r--@  1 birkin  admin    1195 Dec 13  2007 Makefile-1.X.in
    -rw-r--r--@  1 birkin  admin    1247 Dec 13  2007 Makefile-2.X.in
    -rw-r--r--@  1 birkin  admin   16440 Mar 13  2008 README
    -rwxr-xr-x@  1 birkin  admin   78314 Dec 21  2007 configure*
    -rw-r--r--@  1 birkin  admin    4151 Jan 24  2008 configure.ac
    -rw-r--r--@  1 birkin  admin  352904 Aug 23  2008 mod_wsgi.c
    birkinbox:mod_wsgi-2.3 birkin$ 
    birkinbox:mod_wsgi-2.3 birkin$ ./configure 
    checking for apxs2... no
    checking for apxs... /usr/sbin/apxs
    checking Apache version... 2.2.9
    checking for python... /usr/bin/python
    configure: creating ./config.status
    config.status: creating Makefile
    birkinbox:mod_wsgi-2.3 birkin$ 
    birkinbox:mod_wsgi-2.3 birkin$ ls -alF
    total 1032
    drwxr-xr-x@ 13 birkin  admin     442 Mar  2 10:38 ./
    drwxr-xr-x  20 birkin  admin     680 Mar  2 08:16 ../
    -rw-r--r--@  1 birkin  admin   11358 Jun 23  2007 LICENCE
    -rw-r--r--   1 birkin  admin    1559 Mar  2 10:38 Makefile
    -rw-r--r--@  1 birkin  admin    1195 Dec 13  2007 Makefile-1.X.in
    -rw-r--r--@  1 birkin  admin    1247 Dec 13  2007 Makefile-2.X.in
    lrwxr-xr-x   1 birkin  admin      15 Mar  2 10:38 Makefile.in@ -> Makefile-2.X.in
    -rw-r--r--@  1 birkin  admin   16440 Mar 13  2008 README
    -rw-r--r--   1 birkin  admin    4474 Mar  2 10:38 config.log
    -rwxr-xr-x   1 birkin  admin   20621 Mar  2 10:38 config.status*
    -rwxr-xr-x@  1 birkin  admin   78314 Dec 21  2007 configure*
    -rw-r--r--@  1 birkin  admin    4151 Jan 24  2008 configure.ac
    -rw-r--r--@  1 birkin  admin  352904 Aug 23  2008 mod_wsgi.c
    birkinbox:mod_wsgi-2.3 birkin$
    

    So far so good.

  • Build

    birkinbox:mod_wsgi-2.3 birkin$ 
    birkinbox:mod_wsgi-2.3 birkin$ 
    birkinbox:mod_wsgi-2.3 birkin$ make
    /usr/sbin/apxs -c -I/System/Library/Frameworks/Python.framework/Versions/2.5/include/python2.5 -DNDEBUG -DMACOSX -DENABLE_DTRACE  -Wc,'-arch ppc7400' -Wc,'-arch ppc64' -Wc,'-arch i386' -Wc,'-arch x86_64' mod_wsgi.c -arch ppc7400 -arch ppc64 -arch i386 -arch x86_64 -Wl,-F/System/Library/Frameworks -framework Python -u _PyMac_Error -framework Python -ldl
    /usr/share/apr-1/build-1/libtool --silent --mode=compile gcc    -DDARWIN -DSIGPROCMASK_SETS_THREAD_MASK -no-cpp-precomp  -I/usr/include/apache2  -I/usr/include/apr-1   -I/usr/include/apr-1  -arch ppc7400 -arch ppc64 -arch i386 -arch x86_64 -I/System/Library/Frameworks/Python.framework/Versions/2.5/include/python2.5 -DNDEBUG -DMACOSX -DENABLE_DTRACE  -c -o mod_wsgi.lo mod_wsgi.c && touch mod_wsgi.slo
    /usr/share/apr-1/build-1/libtool --silent --mode=link gcc -o mod_wsgi.la  -rpath /usr/libexec/apache2 -module -avoid-version    mod_wsgi.lo -arch ppc7400 -arch ppc64 -arch i386 -arch x86_64 -Wl,-F/System/Library/Frameworks -framework Python -u _PyMac_Error -framework Python -ldl
    birkinbox:mod_wsgi-2.3 birkin$ 
    birkinbox:mod_wsgi-2.3 birkin$ ls -alF
    total 2648
    drwxr-xr-x@ 18 birkin  admin     612 Mar  2 11:26 ./
    drwxr-xr-x  20 birkin  admin     680 Mar  2 08:16 ../
    drwxr-xr-x   7 birkin  admin     238 Mar  2 11:26 .libs/
    -rw-r--r--@  1 birkin  admin   11358 Jun 23  2007 LICENCE
    -rw-r--r--   1 birkin  admin    1559 Mar  2 11:26 Makefile
    -rw-r--r--@  1 birkin  admin    1195 Dec 13  2007 Makefile-1.X.in
    -rw-r--r--@  1 birkin  admin    1247 Dec 13  2007 Makefile-2.X.in
    lrwxr-xr-x   1 birkin  admin      15 Mar  2 11:26 Makefile.in@ -> Makefile-2.X.in
    -rw-r--r--@  1 birkin  admin   16440 Mar 13  2008 README
    -rw-r--r--   1 birkin  admin    4474 Mar  2 11:26 config.log
    -rwxr-xr-x   1 birkin  admin   20621 Mar  2 11:26 config.status*
    -rwxr-xr-x@  1 birkin  admin   78314 Dec 21  2007 configure*
    -rw-r--r--@  1 birkin  admin    4151 Jan 24  2008 configure.ac
    -rw-r--r--@  1 birkin  admin  352904 Aug 23  2008 mod_wsgi.c
    -rw-r--r--   1 birkin  admin     800 Mar  2 11:26 mod_wsgi.la
    -rw-r--r--   1 birkin  admin     315 Mar  2 11:26 mod_wsgi.lo
    -rw-r--r--   1 birkin  admin  817180 Mar  2 11:26 mod_wsgi.o
    -rw-r--r--   1 birkin  admin       0 Mar  2 11:26 mod_wsgi.slo
    birkinbox:mod_wsgi-2.3 birkin$
    

    Smooth! Confirm that mod_wsgi.so file exists (the point of the build):

    birkinbox:mod_wsgi-2.3 birkin$ 
    birkinbox:mod_wsgi-2.3 birkin$ ls -alF ./.libs/
    total 4432
    drwxr-xr-x   7 birkin  admin     238 Mar  2 11:26 ./
    drwxr-xr-x@ 18 birkin  admin     612 Mar  2 11:26 ../
    -rw-r--r--   1 birkin  admin  803768 Mar  2 11:26 mod_wsgi.a
    lrwxr-xr-x   1 birkin  admin      14 Mar  2 11:26 mod_wsgi.la@ -> ../mod_wsgi.la
    -rw-r--r--   1 birkin  admin     801 Mar  2 11:26 mod_wsgi.lai
    -rw-r--r--   1 birkin  admin  817100 Mar  2 11:26 mod_wsgi.o
    -rwxr-xr-x   1 birkin  admin  633872 Mar  2 11:26 mod_wsgi.so*
    birkinbox:mod_wsgi-2.3 birkin$
    

    Looks good.

  • Install

    birkinbox:mod_wsgi-2.3 birkin$ 
    birkinbox:mod_wsgi-2.3 birkin$ sudo make install
    Password:
    /usr/sbin/apxs -i -S LIBEXECDIR=/usr/libexec/apache2 -n 'mod_wsgi' mod_wsgi.la
    /usr/share/httpd/build/instdso.sh SH_LIBTOOL='/usr/share/apr-1/build-1/libtool' mod_wsgi.la /usr/libexec/apache2
    /usr/share/apr-1/build-1/libtool --mode=install cp mod_wsgi.la /usr/libexec/apache2/
    cp .libs/mod_wsgi.so /usr/libexec/apache2/mod_wsgi.so
    cp .libs/mod_wsgi.lai /usr/libexec/apache2/mod_wsgi.la
    cp .libs/mod_wsgi.a /usr/libexec/apache2/mod_wsgi.a
    ranlib /usr/libexec/apache2/mod_wsgi.a
    chmod 644 /usr/libexec/apache2/mod_wsgi.a
    ----------------------------------------------------------------------
    Libraries have been installed in:
       /usr/libexec/apache2
    
    If you ever happen to want to link against installed libraries
    in a given directory, LIBDIR, you must either use libtool, and
    specify the full pathname of the library, or use the `-LLIBDIR'
    flag during linking and do at least one of the following:
       - add LIBDIR to the `DYLD_LIBRARY_PATH' environment variable
         during execution
    
    See any operating system documentation about shared libraries for
    more information, such as the ld(1) and ld.so(8) manual pages.
    ----------------------------------------------------------------------
    chmod 755 /usr/libexec/apache2/mod_wsgi.so
    birkinbox:mod_wsgi-2.3 birkin$
    

    Confirm the mod_wsgi.so file has been installed in '/usr/libexec/apache2', as specified in this line of my Makefile:

    LIBEXECDIR = /usr/libexec/apache2
    
    birkinbox:mod_wsgi-2.3 birkin$ 
    birkinbox:mod_wsgi-2.3 birkin$ ls -alF /usr/libexec/apache2
    total 81488
    drwxr-xr-x  72 root  wheel      2448 Mar  2 11:35 ./
    drwxr-xr-x  93 root  wheel      3162 Feb 12 16:44 ../
    (...)
    -rwxr-xr-x   1 root  wheel    633872 Mar  2 11:35 mod_wsgi.so*
    birkinbox:mod_wsgi-2.3 birkin$
    

    Nice, there it is at the bottom.

  • Load module into apache

    I can never remember exactly where the httpd.conf file is.

    birkinbox:mod_wsgi-2.3 birkin$ 
    birkinbox:mod_wsgi-2.3 birkin$ locate 'httpd.conf'
    (...)
    /private/etc/apache2/httpd.conf
    (...)
    birkinbox:mod_wsgi-2.3 birkin$
    

    First a backup:

    birkinbox:apache2 birkin$ sudo cp ./httpd.conf ./2009-03-02_httpd.conf
    

    Added to the LoadModule section:

    # Added 2009-03-02
    LoadModule wsgi_module libexec/apache2/mod_wsgi.so
    
  • Restart

    birkinbox:apache2 birkin$ sudo apachectl restart
    
  • Clean up

    birkinbox:apache2 birkin$ 
    birkinbox:apache2 birkin$ cd /Developer_3rd/mod_wsgi-2.3
    birkinbox:mod_wsgi-2.3 birkin$ 
    birkinbox:mod_wsgi-2.3 birkin$ make clean
    rm -rf .libs
    rm -f mod_wsgi.o mod_wsgi.la mod_wsgi.lo mod_wsgi.slo mod_wsgi.loT
    rm -f config.log config.status
    rm -rf autom4te.cache
    birkinbox:mod_wsgi-2.3 birkin$
    

Configure & test

  • Reference: 'Configuring An Application'. Docs recommend following this to verify that mod_wsgi is actually working properly.

  • Note: phpinfo() does indicate the module is loaded.

  • Note: docs state to follow these QuickConfiguration instructions before delving into the more thorough configuration docs.

  • Test app function & directories

    Created, per instructions, test function in a file and enclosing directory:

    birkinbox:repository birkin$ 
    birkinbox:repository birkin$ cd /Users/birkin/Documents/Brown_Library/ModWsgiTest 
    birkinbox:ModWsgiTest birkin$ 
    birkinbox:ModWsgiTest birkin$ ls -alF
    total 8
    drwxr-xr-x   3 birkin  staff   102 Mar  2 15:54 ./
    drwxr-xr-x  87 birkin  staff  2958 Mar  2 15:54 ../
    -rw-r--r--@  1 birkin  staff   277 Mar  2 15:43 mod_wsgi_test.wsgi
    birkinbox:ModWsgiTest birkin$ 
    birkinbox:ModWsgiTest birkin$ cat ./mod_wsgi_test.wsgi 
    def application(environ, start_response):
        status = '200 OK'
        output = 'Hello World!'
        response_headers = [('Content-type', 'text/plain'),
                            ('Content-Length', str(len(output)))]
        start_response(status, response_headers)
        return [output]
    birkinbox:ModWsgiTest birkin$
    
  • httpd.conf file changes

    The docs give some sample configuration:

    <VirtualHost *:80>
    
      ServerName www.example.com
      ServerAlias example.com
      ServerAdmin webmaster@example.com
    
      DocumentRoot /usr/local/www/documents
    
      <Directory /usr/local/www/documents>
        Order allow,deny
        Allow from all
      </Directory>
    
      WSGIScriptAlias /myapp /usr/local/www/wsgi-scripts/myapp.wsgi
    
      <Directory /usr/local/www/wsgi-scripts>
        Order allow,deny
        Allow from all
      </Directory>
    
    </VirtualHost>
    

    My try will be:

    <VirtualHost *:80>
    
      ServerName 127.0.0.1
      ServerAlias 127.0.0.1
      ServerAdmin birkin_diana@brown.edu
    
      DocumentRoot /Users/birkin/Sites
    
      <Directory /Users/birkin/Sites>
        Order allow,deny
        Allow from all
      </Directory>
    
      WSGIScriptAlias /myapp /Users/birkin/Documents/Brown_Library/ModWsgiTest/mod_wsgi_test.wsgi
    
      <Directory /Users/birkin/Documents/Brown_Library/ModWsgiTest>
        Order allow,deny
        Allow from all
      </Directory>
    
    </VirtualHost>
    
  • Backup

    birkinbox:~ birkin$ sudo cp /private/etc/apache2/httpd.conf /private/etc/apache2/2009-03-02b_httpd.conf
    
  • Make change and restart

    birkinbox:~ birkin$ 
    birkinbox:~ birkin$ sudo apachectl restart
    Password:
    birkinbox:~ birkin$
    

    Well, no errors on restart.

    Plain old 127.0.0.1 still yields the usual default page, which is good.

    Trying 'myapp'...

    birkinbox:~ birkin$ python
    Python 2.5.1 (r251:54863, Jan 13 2009, 10:26:13) 
    [GCC 4.0.1 (Apple Inc. build 5465)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> 
    >>> import urllib
    >>> 
    >>> urllib.urlopen( 'http://127.0.0.1/myapp/' ).read()
    '<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>403 Forbidden</title>\n</head><body>\n<h1>Forbidden</h1>\n<p>You don\'t have permission to access /myapp/\non this server.</p>\n</body></html>\n'
    >>>
    

    Ok... no permission, but it seems to recognize it as a valid web-address at least -- that's a start!

    Got it; problem is permissions on an enclosing folder...

    drwx------@  81 birkin  staff     2754 Feb  4 16:03 Documents/
    
  • Test app function & directories #2

    Follow instructions this time and create a fully-accessible directory:

    birkinbox:~ birkin$ 
    birkinbox:~ birkin$ mv /Users/birkin/Documents/Brown_Library/ModWsgiTest /Users/birkin/ModWsgiTest
    birkinbox:~ birkin$ 
    birkinbox:~ birkin$ ls -alF /Users/birkin/
    total 1472
    (...)
    drwxr-xr-x    3 birkin  staff      102 Mar  2 17:03 ModWsgiTest/
    (...)
    birkinbox:~ birkin$
    

    Backup httpd.conf:

    birkinbox:~ birkin$ 
    birkinbox:~ birkin$ ls -alF /private/etc/apache2/
    total 256
    (...)
    -rw-r--r--    1 root  wheel  17614 Mar  1  2008 2008-03-02_httpd.conf
    -rw-r--r--    1 root  wheel  17613 Mar  2 11:55 2009-03-02_httpd.conf
    -rw-r--r--    1 root  wheel  17685 Mar  2 16:59 2009-03-02b_httpd.conf
    (...)
    -rw-r--r--    1 root  wheel  18156 Mar  2 17:08 httpd.conf
    (...)
    birkinbox:~ birkin$ 
    birkinbox:~ birkin$ sudo cp /private/etc/apache2/httpd.conf /private/etc/apache2/2009-03-02c_httpd.conf
    Password:
    birkinbox:~ birkin$ 
    birkinbox:~ birkin$ ls -alF /private/etc/apache2/
    total 296
    (...)
    -rw-r--r--    1 root  wheel  17614 Mar  1  2008 2008-03-02_httpd.conf
    -rw-r--r--    1 root  wheel  17613 Mar  2 11:55 2009-03-02_httpd.conf
    -rw-r--r--    1 root  wheel  17685 Mar  2 16:59 2009-03-02b_httpd.conf
    -rw-r--r--    1 root  wheel  18156 Mar  2 18:40 2009-03-02c_httpd.conf
    (...)
    -rw-r--r--    1 root  wheel  18156 Mar  2 17:08 httpd.conf
    (...)
    birkinbox:~ birkin$
    

    Change httpd.conf; section now is:

    <VirtualHost *:80>
    
        ServerName 127.0.0.1
        ServerAlias 127.0.0.1
        ServerAdmin birkin_diana@brown.edu
    
        DocumentRoot /Users/birkin/Sites
    
        <Directory /Users/birkin/Sites>
            Order allow,deny
            Allow from all
        </Directory>
    
        WSGIScriptAlias /myapp /Users/birkin/ModWsgiTest/mod_wsgi_test.wsgi
    
        <Directory /Users/birkin/ModWsgiTest>
            Order allow,deny
            Allow from all
        </Directory>
    
    </VirtualHost>
    
  • Restart & test

    birkinbox:~ birkin$ 
    birkinbox:~ birkin$ sudo apachectl restart
    birkinbox:~ birkin$ 
    birkinbox:~ birkin$ python
    Python 2.5.1 (r251:54863, Jan 13 2009, 10:26:13) 
    [GCC 4.0.1 (Apple Inc. build 5465)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> 
    >>> import urllib
    >>> 
    >>> urllib.urlopen( 'http://127.0.0.1/myapp/' ).read()
    'Hello World!'
    >>>
    

    Success!!!

Recursion

sunday, october 26th, 2008 6:27pm

After writing the code below recently, I searched out a few cohorts at work to share the joy, but it was late and no one was around...

def makeHierarchicalFolderList( folder_object_list, return_list=[], indent='' ):
    '''
    - Called by: views.uploader3()
    - Purpose: to create a list of community-folders (for upload form) with indents indicating hierarchical relationship. Will likely be replaced by better user-interface.
    '''

    for folder in folder_object_list:
        folder.name = '%s%s' % ( indent, folder.name )
        return_list.append( folder )

        # check for children
        children_count = folder.communityfolder_set.all().count()

        # handle check-results
        if children_count == 0:
            pass
        else:
            children_object_list = folder.communityfolder_set.all()
            indent = indent + '-'
            makeHierarchicalFolderList( children_object_list, return_list, indent=indent )
            indent = indent[1:] # since we're at the end of a processing chain, remove the indent

    return return_list

    # end def makeHierarchicalCommunityFolderList()

What enthused me so was the line a few up from the bottom, where I call the very function in which this line resides. This is called recursion. There's a wonderful slightly mysterious inverted mobius-strip - like quality to recursion, except that the inward tunneling by definition doesn't continue on forever.

The code above neatly meets the needs of a project I'm working on: to prepare a hierarchical listing of folders. Not shown is an initial step in which a query is made for a user-specific list of 'top-level' folders -- that is, folders which do not have a parent folder. For each folder in this list, the folder is first appended to a sort of global list-of-folders-to-return. Then a check is made to see if the folder has any children. If so, those child-folders are selected, and passed to the function itself. So for each top-level folder, a processing-chain begins that might be very, very long, or might be very short. But each processing-chain does terminate, shifting to examine the next sibling folder, and, when there are no more sibling-folders in the lowest-level generation, shifting back upward a generation to examine and process the next sibling-folder.

I use recursion infrequently enough that it sort of gets buried in my toolbox. I forget about it and from habit reach for other tools first that often do the looping job well-enough. Utilizing looping-logic is a fundamental part of programming. Where a built-in looping structure doesn't directly solve looping needs, I most often use a Controller-pattern solution to a looping challenge. That is, I'll have a controller block of code prepare a list of items that need to be looped through, and perhaps set some variables to hold the results of processing, and then call a separate function that handles the actual looping. Sometimes for more complex situations the called looping code can itself call another separate block of looping code. So it is possible to handle some situations, for which recursion might be ideal, with other techniques. But not all situations -- when my usual looping implementation begins feeling overly complex, that's often a sign to root around and dust off the recursion tool.

Despite some left-over 1960's geek-stereotypes about programming being a boring left-brain process, there can be elegance and deep beauty in programming. There are numerous ways to approach and solve a challenge, any of which may work 'well-enough'. Good programmers strive for solutions that are simplest and clearest, and when the right technique lends itself to beautifully simple code, one feels like an artist.

The dashboard initiative

monday, september 22nd, 2008 12:08am

I've been putting some productive time into something I'm calling "The Dashboard Initiative". Most of this time to date has been outside of normal work hours due to a few other priorities, but in time I expect to add this to the list of on-going work projects.

Inspired by work done by Brown's Office of Institutional Research, the concept of the dashboard is to provide useful trend information about the operation of different facets of the Library. The analogy to a car dashboard is good: whereas 'instruments' make up a car's dashboard, what I'm calling 'widgets' make up the Library's dashboard.

As shown on this dashboard information page, a dashboard widget consists of three counts (baseline, trend, and current), a trend indicator, and a 'more-info' button that itself is a miniature graph. My visions for possible future dashboard usage within the Library and across campus are grand, but it is important to remember that the dashboard idea is intended to serve a rather specific data-display purpose: to usefully display trend information. Data that lends itself to pie-chart breakdowns can be important to an organization and can be an integral part of an organization's data-farm, but is somewhat outside the scope of the dashboard focus on trends.

One of the reasons I find the dashboard concept so compelling is that it provides a kind of 'template' for data-tracking feeds. Increasingly we've been building into more projects the ability to stream out statistic counts, but to date there hasn't been a clear standardized vision of how this statistical data might be presented. The dashboard offers that standard.

If we were to rebuild from scratch the easyBorrow system, we could from the start automate count-flows that could populate widgets representing trend-usage for Josiah redirects, BorrowDirect, VirtualCatalog, InRhode, and Iliad. This of course applies to all new systems, and over time I expect we'll retrofit many of our existing ad-hoc statistical counts to flow into widgets.

I have a vision of the creation, over time, of a plethora of widgets representing useful trend-information on checkouts, interlibrary-loan usage, new-titles additions, collections-web-access usage, requests for offsite materials, and physical library attendance to name just a few. This then begs the question of how to manage all these widgets.

I envision a 'MyWidgets' page where, based on cookies and login, a user could view a listing of all Library widgets, filter by tag, and select those she finds useful for a personalized widget page. As part of my work I may pay particular attention to the flow of easyBorrow requests to our different borrowing partners, and scan other widgets tracking workbench file uploads to our in-development repository. Other folk in the library might be particularly interested in widgets that track numbers of books sent to our offsite Annex facility, as well as widgets that track the number of requests for those materials, and widgets that track how many requests for offsite materials are still made when the user is offered a link to a Google Book scan of the requested title. Our French scholarly resources librarian might choose for her page widgets tracking French new-title additions, as well as checkouts of French-language items.

Thinking even more broadly -- campus-wide -- it's easy to imagine how, if other departments adopt the dashboard idea, a facility could even be developed for, say, the chair of the French department to 'subscribe' to a Library 'French New-Titles' widget, a Library 'French-Language Checkouts' widget, as well as a Registrar widget representing the numbers of freshmen enrolled in French 1, and another Registrar widget representing numbers of French concentrators.

Along these lines, I expect to one-day add an rss and html parameter-segment to a widget-url to facilitate such cross-campus usage.

For now, starting on a small-scale, I've created via Django's default admin a simple form allowing non-technical end-users to create a widget simply by typing or pasting into the form a list of key-value data-pairs.

widget entry form

Upon submitting the form, the data-points are parsed and made into the discrete data-elements comprising a widget. This is in-place now. In fact, the widget on the dashboard information page was created (and can easily be updated) via this form. Further, this weekend I implemented the ability to view detail line-chart information using Google's chart API. So changing a label or data-point via the simple form now changes not just the widget but also the detail chart, on-the-fly. Though I expect to automate many data flows used to create dashboard widgets, the utility of the form will allow non-technical folk to take data they already create via manual processes and easily make that data much more visible to others.

We'll see how this all unfolds. The potential is exciting.

[ Update: I presented on the dashboard at the 2009 code4lib conference. Good feedback (DC, ELM). Code released. ]

Passwordless logins

friday, may 23rd, 2008 5:48am

[These are notes from a project I worked on in grad-school in 2003-2004. As part of a 'voting' project, I wanted to automate the backup of a postgres database to an offsite location via a dump and rsync. In order to script the backup, my server needed to be able to automatically login to the backup server. A fellow student, J.E., and I worked on this piece together.

Recently a co-worker described a need to do something different, but similar in some ways, so I dug up these notes and pasted 'em in here, fairly raw. Note to hackers: the servers mentioned are long offline.]


Instructions

  • Generate the key...

    [toolbox:~/Desktop] birkin% 
    [toolbox:~/Desktop] birkin% ssh-keygen -t rsa
    Generating public/private rsa key pair.
    Enter file in which to save the key (/Users/birkin/.ssh/id_rsa): 
    /Users/birkin/.ssh/id_rsa already exists.
    Overwrite (y/n)? y
    Enter passphrase (empty for no passphrase): 
    Enter same passphrase again: 
    Your identification has been saved in /Users/birkin/.ssh/id_rsa.
    Your public key has been saved in /Users/birkin/.ssh/id_rsa.pub.
    The key fingerprint is:
    71:04:1a:69:d4:ee:4a:d5:a8:b6:77:65:20:68:12:df birkin@toolbox.local
    [toolbox:~/Desktop] birkin%
    

    The '-t rsa' flag specifies ssh 2 protocol

  • Examine the created keys...

    [toolbox:~/.ssh] birkin% 
    [toolbox:~/.ssh] birkin% ls -alF
    total 48
    drwx------ 7 birkin staff 238 16 Jun 21:15 ./
    drwxr-xr-x 57 birkin staff 1938 16 Jun 17:06 ../
    -rw------- 1 birkin staff 883 17 Jun 08:05 id_rsa
    -rw-r--r-- 1 birkin staff 230 17 Jun 08:05 id_rsa.pub
    -rw------- 1 birkin staff 535 16 Jun 20:39 identity
    -rw-r--r-- 1 birkin staff 339 16 Jun 20:39 identity.pub
    -rw-r--r-- 1 birkin staff 5351 16 Jun 20:56 known_hosts
    [toolbox:~/.ssh] birkin%
    

    The 'identity' files listed were generated when I was initially trying 'ssh -t rsa1', the ssh 1 protocol, and I believe can be ignored.

    [toolbox:~/.ssh] birkin% 
    [toolbox:~/.ssh] birkin% cat id_rsa.pub 
    ssh-rsa
    AAAAB3NzaC1yc2EAAAABIwAAAIEA0xmINQ6w3KGgxEexNJeb5bRDhOyp3R5zWfL6L5ghb8TqWDoF/x1e4KxoVp3NEMd594QISQzb4w74ZNkdGKnIqOEHs1Uy3zbutijsPQhWqXvZ40AMbOpOjawLAcrTWUfqmBcC7MW54cOiu2FIzvlHJhYVOBCyy1nBVduGJUPF5s=
    birkin@toolbox.local
    [toolbox:~/.ssh] birkin%
    
  • Copy the public key to a file titled 'authorized_keys' which will be transferred to the remote computer(s) that I want to connect to.

    [toolbox:~/.ssh] birkin% 
    [toolbox:~/.ssh] birkin% cat id_rsa.pub > ~/Desktop/authorized_keys
    [toolbox:~/.ssh] birkin%
    
  • Let's take a look to make sure it looks right...

    [toolbox:~/.ssh] birkin% 
    [toolbox:~/.ssh] birkin% cd ~/Desktop/
    [toolbox:~/Desktop] birkin% 
    [toolbox:~/Desktop] birkin% ls -alF
    total 64
    drwxr-xr-x 7 birkin staff 238 17 Jun 08:21 ./
    drwxr-xr-x 57 birkin staff 1938 16 Jun 17:06 ../
    -rwxr-xr-x 1 birkin staff 21508 17 Jun 08:20 .DS_Store*
    -rw-r--r-- 1 birkin staff 253 2 Nov 2003 .bash_profile
    -rw-r--r-- 1 birkin staff 0 20 Apr 2003 .localized
    -rw-r--r-- 1 birkin staff 230 17 Jun 08:21 authorized_keys
    drwxr-xr-x 42 birkin staff 1428 17 Jun 08:20 envelope/
    [toolbox:~/Desktop] birkin%         
    [toolbox:~/Desktop] birkin% cat authorized_keys 
    ssh-rsa
    AAAAB3NzaC1yc2EAAAABIwAAAIEA0xmINQ6w3KGgxEexNJeb5bRDhOyp3R5zWfL6L5ghb8TqWDoF/x1e4KxoVp3NEMd594QISQzb4w74ZNkdGKnIqOEHs1Uy3zbutijsPQhWqXvZ40AMbOpOjawLAcrTWUfqmBcC7MW54cOiu2FIzvlHJhYVOBCyy1nBVduGJUPF5s=
    birkin@toolbox.local
    [toolbox:~/Desktop] birkin%
    

    Looks good.

  • Transfer the 'authorized keys' file from my OS X laptop to the remote computer...

    [toolbox:~/Desktop] birkin% 
    [toolbox:~/Desktop] birkin% rsync -v -e /usr/bin/ssh ~/Desktop/authorized_keys birkinbackup@harmonicas.msie.marlboro.edu:/home/birkinbackup/authorized_keys
    birkinbackup@harmonicas.msie.marlboro.edu's password: 
    authorized_keys
    wrote 316 bytes read 42 bytes 31.13 bytes/sec
    total size is 230 speedup is 0.64
    [toolbox:~/Desktop] birkin%
    
  • Make sure it looks right on the remote computer...

    [toolbox:~/Desktop] birkin% 
    [toolbox:~/Desktop] birkin% ssh birkinbackup@harmonicas.msie.marlboro.edu
    birkinbackup@harmonicas.msie.marlboro.edu's password: 
    [birkinbackup@harmonicas birkinbackup]$ 
    [birkinbackup@harmonicas birkinbackup]$ ls -alF
    total 64
    drwx------ 3 birkinbackup birkinbackup 4096 Jun 17 08:27 ./
    drwxr-xr-x 6 root root 4096 Jun 12 15:01 ../
    -rw-r--r-- 1 birkinbackup birkinbackup 230 Jun 17 08:27 authorized_keys
    -rw------- 1 birkinbackup birkinbackup 6306 Jun 17 08:22 .bash_history
    -rw-r--r-- 1 birkinbackup birkinbackup 24 Jun 12 15:01 .bash_logout
    -rw-r--r-- 1 birkinbackup birkinbackup 191 Jun 12 15:01 .bash_profile
    -rw-r--r-- 1 birkinbackup birkinbackup 124 Jun 12 15:01 .bashrc
    -rw-r--r-- 1 birkinbackup birkinbackup 29 Jun 17 08:29 datecrontest
    -rw-r--r-- 1 birkinbackup birkinbackup 847 Jun 12 15:01 .emacs
    -rw-r--r-- 1 birkinbackup birkinbackup 120 Jun 12 15:01 .gtkrc
    drwx------ 2 birkinbackup birkinbackup 4096 Jun 17 00:20 .ssh/
    -rw-rw-r-- 1 birkinbackup birkinbackup 14220 Jun 12 18:59 testdump
    [birkinbackup@harmonicas birkinbackup]$ 
    [birkinbackup@harmonicas birkinbackup]$ cat authorized_keys 
    ssh-rsa
    AAAAB3NzaC1yc2EAAAABIwAAAIEA0xmINQ6w3KGgxEexNJeb5bRDhOyp3R5zWfL6L5ghb8TqWDoF/x1e4KxoVp3NEMd594QISQzb4w74ZNkdGKnIqOEHs1Uy3zbutijs+PQhWqXvZ40AMbOpOjawLAcrTWUfqmBcC7MW54cOiu2FIzvlHJhYVOBCyy1nBVduGJUPF5s=
    birkin@toolbox.local
    [birkinbackup@harmonicas birkinbackup]$
    

    Looks good.

  • Move the file to the right place on the remote computer...

    [birkinbackup@harmonicas birkinbackup]$ 
    [birkinbackup@harmonicas birkinbackup]$ cat authorized_keys >> .ssh/authorized_keys 
    [birkinbackup@harmonicas birkinbackup]$
    

    The double brackets 'append' instead of overwrite. Also, I've checked this out -- the append is correct for our purposes in that it appends the new string on the following line. Actually, what would be nicer for inspection is this...

    [birkinbackup@harmonicas birkinbackup]$ 
    [birkinbackup@harmonicas birkinbackup]$ echo "" >> .ssh/authorized_keys 
    [birkinbackup@harmonicas birkinbackup]$ 
    [birkinbackup@harmonicas birkinbackup]$ cat authorized_keys >> .ssh/authorized_keys 
    [birkinbackup@harmonicas birkinbackup]$
    

    Let's check out the 'real' authorized_keys file (I should name the transfer file something else in the future to avoid any confusion)...

    [birkinbackup@harmonicas birkinbackup]$ 
    [birkinbackup@harmonicas birkinbackup]$ cd .ssh/
    [birkinbackup@harmonicas .ssh]$ 
    [birkinbackup@harmonicas .ssh]$ ls -alF
    total 24
    drwx------ 2 birkinbackup birkinbackup 4096 Jun 17 00:20 ./
    drwx------ 3 birkinbackup birkinbackup 4096 Jun 17 08:27 ../
    -rw-r--r-- 1 birkinbackup birkinbackup 975 Jun 17 08:42 authorized_keys
    -rw------- 1 birkinbackup birkinbackup 887 Jun 17 07:24 id_rsa
    -rw-r--r-- 1 birkinbackup birkinbackup 251 Jun 17 07:24 id_rsa.pub
    -rw-r--r-- 1 birkinbackup birkinbackup 603 Jun 16 17:34 known_hosts
    [birkinbackup@harmonicas .ssh]$ 
    [birkinbackup@harmonicas .ssh]$ cat authorized_keys 
    ssh-rsa
    AAAAB3NzaC1yc2EAAAABIwAAAIEAu4tdcJlZldiAAnfviR3vXWGjwWa4For/kbi/FvBTeTEtctxsS72/ppn5vFydv4V5iLDVdfWKrnTIwfn8BHinq2yvdX9OLsEyjzBqbu+ZIZCi7UefJxEWCdOGtDd0YWiJbQJkyuoHs4ShwF5YcuMcnmiEjOUWJ7B5N9QkXeD3wc0= birkinbackup@harmonicas.msie.marlboro.edu
    
    authorized_keys
    ssh-rsa
    AAAAB3NzaC1yc2EAAAABIwAAAIEA3+PWa9l6hu6sY43u5FASYr26AhRrUQDqcjT5VO+wePg2OaQyTedcNkRIGG6tVquFC+AXH5BOkI+EJAfSCJG2AE0YxSrM16rMgPM1wADJBlmhumiY5wuX5ROOc0azPpvLyjZwwFsSxgqpdtNtvwUCQEl94y3H5qqOvXtR+IVtp30= birkin@toolbox.local
    authorized_keys
    ssh-rsa
    AAAAB3NzaC1yc2EAAAABIwAAAIEA0xmINQ6w3KGgxEexNJeb5bRDhOyp3R5zWfL6L5ghb8TqWDoF/x1e4KxoVp3NEMd594QISQzb4w74ZNkdGKnIqOEHs1Uy3zbutijs+PQhWqXvZ40AMbOpOjawLAcrTWUfqmBcC7MW54cOiu2FIzvlHJhYVOBCyy1nBVduGJUPF5s= birkin@toolbox.local
    
    ssh-rsa
    AAAAB3NzaC1yc2EAAAABIwAAAIEA0xmINQ6w3KGgxEexNJeb5bRDhOyp3R5zWfL6L5ghb8TqWDoF/x1e4KxoVp3NEMd594QISQzb4w74ZNkdGKnIqOEHs1Uy3zbutijs+PQhWqXvZ40AMbOpOjawLAcrTWUfqmBcC7MW54cOiu2FIzvlHJhYVOBCyy1nBVduGJUPF5s= birkin@toolbox.local
    [birkinbackup@harmonicas .ssh]$
    

    The last line is the one we most recently created; the space preceding it is the result of the echo command; the lines 'authorized_keys' are mistakes from issuing echo in my experimentation instead of cat. I'm leaving these in to illustrate that there is tolerance for non-matching entries.

  • Try connecting...

    [birkinbackup@harmonicas .ssh]$ 
    [birkinbackup@harmonicas .ssh]$ exit
    logout
    Connection to harmonicas.msie.marlboro.edu closed.
    [toolbox:~/Desktop] birkin% 
    [toolbox:~/Desktop] birkin% ssh birkinbackup@harmonicas.msie.marlboro.edu
    [birkinbackup@harmonicas birkinbackup]$
    

    No password-prompt: success!

Possible 'gotchas'

  • Before actually trying a connection-script, run a manual ssh first; you may have to once manually ok that key-exchange message you normally see on a first-time ssh.

  • If things aren't working, it could be a permissions issue...

    J.E. sent me a link [2008 note: this was in 2004] to http://kimmo.suominen.com/ssh/ and pointed out the caution to check file and directory permissions if connections still aren't working right after configuring everything.

    This site shows permissions to the ~/.ssh/ directory that allow writing by 'group', even though the text says only the 'owner' should have write permissions to that directory. On my account on the remote-computer, my ~/.ssh/ directory initially allowed group-write permissions, and passwordless login was not working. Changing those to...

    drwx------ 2 birkinbackup birkinbackup 4096 Jun 17 16:18 .ssh/
    

    ...allowed passwordless login to work. The beauteous text...

    [toolbox:~] birkin% 
    [toolbox:~] birkin% ssh birkinbackup@harmonicas.msie.marlboro.edu
    [birkinbackup@harmonicas birkinbackup]$ 
    [birkinbackup@harmonicas birkinbackup]$ ssh birkinbackup@play.msie.marlboro.edu
    [birkinbackup@play birkinbackup]$
    

    No password required. Sweet.

Nice, lightweight SOA implementation

sunday, may 18th, 2008 4:17pm

I've evangelized service-oriented architecture (SOA) before.

To review, briefly and roughly: SOA promotes decoupled services. For example, a Fahrenheit-to-Celsius converter would likely be implemented as a web-service, instead of as a function/method embedded/tied into some bigger program. The benefits of this are multiple: 1) The service can be written in any programming language, and accessed by other services written in different languages. 2) SOA makes the idealized promise of code-reuse a reality.

I have a programmer friend who works for a large corporation who is familiar with implementing SOA using industrial-scale best-practices; I'm familiar with implementing it in a lightweight, seat-of-the-pants fashion.

Over the past year+ I've created well over a dozen or so SOA web-services for different projects. But I recently implemented one I put some best-practice effort into that'll be a model for my future SOA work. Some links:

What I like about this one...

  • The api urls offer 'discovery' via embedding, in the built-in returned data, contact and documentation information. Having just one of these pieces of info would be great; having both is particularly nice because web urls and staff change over time. Why is this useful? If someone is looking at the code that calls this service 5 years from now, and if I'm not around, the documentation will provide info on some extra features of the service that otherwise wouldn't be apparent if, say, the web-service just returned the word 'English'

  • The api urls are 'hackable', another way of enhancing discovery. One can intuitively try entering a code other than 'enk' to see what comes up (like 'tlh'). Also, reasonably appropriate things happen if one lops off increasing sections of the url (in this case, redirects to documentation pages).

  • The api urls are versioned. Key:value pairs can be added to this api -- but the existing key:value pairs must never be changed. The reason is that post-release, I don't know who's using it for what, thus I have to assume any changes could break someone's app. So if I want to change the label 'response' to 'language', and deliver it in xml, I can leave the existing one as is, and label the new one 'api_v2'.

  • All these urls utilize server-caching. This is an implementation rather than a design feature, but worth mentioning. Django offers a flexible and easy-to-use caching feature; I have it set so that the list and api urls only have to hit the database once a day, no matter how many times the urls are hit. Further, django's caching is intelligent: its response includes 'Cache-Control', 'Etag', and 'Expires' http-headers so that a browser or well-designed code doesn't even have to call the web-service again to redisplay the data. Nice. This would be particularly important and useful for something like RSS feeds.

Good info...

  • A terrific, hands-on review-resource on http-headers: The web-services chapter of Mark Pilgrims 'Dive Into Python' website & book.

  • Many of the features of this language_translator web-service were informed by the book 'RESTful Web Services', by Richardson & Ruby. Some parts are a bit dense, but it's chock-full of terrific detailed info and food for thought. I came across it after having written a half-dozen or so SOA web-services, each one a little different and better, and it directly addressed many issues I had begun to think about or saw referenced via web-research.

[Acknowledgements to Peter Murray's article and Richard Akerman's Access_2006 presentation that first inspired my SOA thinking.]

Weighted-randomization

wednesday, april 16th, 2008 5:28am

Many years ago one of my children hit a wall with Math in elementary school. He had always been bright, and quick to pick things up, so this was something of a surprise to him as well as to me. His class was in the early stages of learning multiplication, and it turned out that he had initially been able to add numbers in his head quickly enough that he hadn't needed to memorize the times-tables. This worked fine for him with lower numbers like 5x2, but began to break down pretty quickly with higher numbers like 9x7 and especially 23x76.

Flashcard app

So I ended up doing something I had sworn I'd never do: I created a flashcard program on the computer to help him memorize his basic times-tables. I had always thought that using computer programs for rote memorization was a travesty -- knowing that computers could be used for interesting and mind-expanding purposes rather than boring, mind-numbing ones. But this became a fascinating project. I started out implementing it in FileMaker Pro, which I had begun using extensively for some database projects, and eventually reimplemented the program in REALbasic, a wonderful program (at least at the time; I haven't used it in years) that introduced me to object-oriented programming, and through the sheer elegance of its interface and structure nurtured my growing sense that programming was about Art in addition to Logic.

In addition to wanting to present a nice interface, I also knew that I wanted the questions presented to my son to flow in such a way that he would be quizzed more often on the questions he answered incorrectly, and less often on the ones he answered correctly. In any given work-session, I also wanted him to be able to work with a narrow slice of the whole set of 'difficult' problems so that he would perceive some sense of gaining mastery over material.

To clarify this last point, imagine a problem set of 10,000 questions, 5,000 of which are very easy, and 5,000 of which are very hard. Imagine you sit down to a 10-minute 'memorization-session'. If the program utilizes a very simple algorithm which chooses randomly from among the 5,000 difficult questions, chances are that after the 10 minutes you will have learned nothing and will be quite discouraged: you'll likely not have seen the same question twice, and you'll have answered everything wrong. This would be bad enough if you were an adult committed to learning the material, but if you were a kid in elementary school, this would be likely to crush any flickering desire to learn the material.

And I wanted my program to be fun.

Details

My solution was to use what I call 'weighted-randomization'. Sometime I'll do a thorough search for the files I used those many years ago to see if I can find the code I created -- for posterity, amusement, and fond reflection. The basic idea consisted of two steps. First, from among all the... I guess 169 possible times-table queries (0x0 through 12x12), I chose a small subset: 10 as I recall, although eventually I made this number selectable via a preferences-pane (remember, I made this for a young kid, and assumed two 5-minute practice-sessions a day -- the latter one being optional but often completed because the app was fun). Second, I presented the queries. But of course the joy was in the details.

For the selection-step, my recollection is that I iterated through each possibility, and assigned it a 'selection-number', then sorted the list on selection-number and took the top 10. Before I describe the selection-number, I must first note that each possible query was initially assigned a 'score' of 0. When a query was answered correctly, the score was incremented; when answered incorrectly, the score was decremented. I eventually put the increment and decrement values into the app's preferences-pane to play around with their effect on the operation of the application, and as I recall usually had them set to increment correct answers by 1, and decrement incorrect answers by 2.

So why the selection-number rigamarole? Why not just sort on score, and take the 10 lowest numbers? That certainly would achieve the goal of presenting the user with the most-problematic problem-set. I did do that initially, but added the selection-number for two reasons. First, the simple presentation of only the hardest problems felt, well... boring. Second, I thought if I occasionally threw in some problems 'mostly' known, I would encourage crucial reinforcement to take place. And if one or two of the problem-set were 'easy', well, those queries' scores would rapidly increase and be less likely to be selected as part of the subset next time. Thus the selection-number. I don't remember the exact details, which I changed and experimented with extensively, but the basic idea: I took the lowest-score and the highest score, and split the range between them into, say, 10 equal segments. I then iterated through the queryset. For each score, I determined the segment it belonged to, then created a certain number of random-numbers, took the highest one, and assigned that as the selection-number. A more specific concrete example... Say the lowest score was -25 and the highest score was 5. That's a difference of 30. Splitting that into 10 segment-ranges yields segment ranges of 3 values. Thus the bottom segment-range would be -25 through -23, while the top segment would be 3 through 5. During the iteration step, if I came across -23 (in selection-range '10'), I would create '10' random numbers, and the highest one would become that query's selection-number. If I came across '3' (in selection-range '1'), I would create '1' random number; it would become that query's selection-number. Thus sorting then on selection-number and choosing the ten highest numbers would usually offer a subset of all queries consisting of mostly the hardest queries, mixed with some somewhat-hard queries, with an occasional easy query thrown in. Beautiful. Again, I don't remember the specific details, and do remember that I played frequently with the number of ranges and the number of random-numbers assigned for each range, but this description reflects the general approach to the slice-selection step.

The program was a hit, and did exactly what I had hoped, helped my child memorize his times-tables. Shortly after I built it, I was immersed in learning Italian, and realized that with just a few tweaks this would be a terrific tool to help me memorize vocabulary. That also worked well, especially the ability to focus a review-sesion on a subset of all the words, because I built up a vocabulary set of hundreds of words. But abstracted from the tool, I kept ruminating on what made 'weighted-randomization' so compelling.

The idea

The concept binds together polar notions in a way that feels 'right'. Variety as we've all heard is the spice of life; morphing random-chance into likelihood offers possibilities for combining intentionality with variability to create joyful experience, as any gamer knows. The game Dungeons & Dragons wonderfully codified realms of probability in spirals of detail useful to both the casual gamer and the addict: a strong character will be more 'likely' to defeat a weaker character, but more granularly, a particular delivery of a particular blow will be more or less likely to be successful based on the quality of the attacking weapon and the defending shield, the skill of the attacker and defender, etc. The varied dice used to calculate these percentages seem themselves talismans. But on a simpler level, any player of Risk or Backgammon perceives these same issues. The roll of the dice is pure chance, but the strength of ones position makes the outcome of a battle or game dependent on the same order-and-chance qualities embodied in weighted-randomization.

Making the world a better place

I find it interesting to think about applying randomization to other systems.

LifeBalance is a wonderful little to-do list program I miss using since I gave up my Palm for an iPhone. I don't have direct knowledge of its internals, but I suspect it incorporates weighted-randomization in it's preparation of one's task-list. One part of the program offers the ability to define broad areas of life (i.e. work, maintaining friendships, family-life, etc.) and assign a rough percentage of time one would ideally like to devote to each area. Since to-do tasks end up being subsets of these areas, and since each task can be assigned a rough amount of time it will take to accomplish, the program tracks how much time is actually being expended in the broad areas, and optionally adjusts the current to-do list to maintain the preferred balance. The result is that there may be occasions in which a task of a slightly lower priority can appear higher on a to-do list than a task with a higher priority, if the lower-priority task will better-serve the less-immediate goal of achieving the life-area balance specified. This approach could conceivably be implemented using algorithms that don't utilize weighted-randomization, but this is just the kind of situation that weighted-randomization would be good for -- injecting a bit of flexibility into a system while still maintaining overall goals.

As a final thought experiment, imagine possibilities for injecting weighted-randomization into political systems.

I have a vague memory from a college class, likely jumbled by time, of hearing that in 15th Florence, a cohort of possible city-council leaders was chosen from among the populace much like those chosen for jury-duty today. As I recall, each individual in the cohort was then voted on by the populace, but only whether the individual was fit to serve: a yes or no vote. If a person received enough yes-votes, his name was put into a hat from which the new leaders were drawn at random. This unusual form of weighted-randomization suggests that when injected into political systems, it might reduce corrupting influences and by inference improve quality. Usually, and reasonably, years-of-service or merit-evaluations or test-scores are the institutional barriers to corruption. Weighted-randomization could be another tool worth exploring.

Imagine that of all bills a legislative committee debates publicly, 80% are selected for debate according to usual political processes, and 20% via a weighted-randomization scheme akin to my math flash-card program (i.e., perhaps a manual ranking of preference followed by randomizations based on those rankings). Obviously no one would want chance to factor into whether a bill actually becomes law, but since so many good ideas die for want of being scheduled for committee debate (often for questionable political reasons), injecting a bit of weighted-randomization into such a process could be a good thing, certainly worth trying.

I may add to this post other realms in which weighted-randomization might offer systemic improvements. Feel free to suggest ideas; I may post a few of them here.

Appreciating django templates

sunday, april 6th, 2008 10:22pm

I'm loving Django templates.

In setting up a site for my bookgroup, I used a model for 'Meeting' that has a 'meeting_date' defined as a 'DateTimeField'. For those who haven't yet tasted the Django kool-aid, that's a field that requires both a valid date and a valid time. Made sense to me, since one of the main reasons for the site is to be able to list the location and date & time of upcoming meetings. However, when entering a few old meetings, the spreadsheet I was working from only listed the month and year (November 1997! -- we've been around for a while!).

There were a couple of ways I could have handled this. What I chose to do was to add two boolean fields: 'fake_date' (meaning 'day') and 'fake_time'. It might have been cleaner to allow we admins to have a year, a month, a date, and a time field -- but I knew going forward the single DateTimeField would work for us and I wanted to build more for the future than the past. So, when entering an old meeting with just a month and year, any date in that month is entered and any time of day, and both fake_date and fake_time are checked.

The complete DateTime object is passed to the template, and then some logic goes to work:

{% if meeting.fake_date %}
    <h2>{{ meeting.date_time|date:"F, Y"|lower }}</h2>
{% else %}
    {% if meeting.fake_time %}
        <h2>{{ meeting.date_time|date:"l, F jS, Y"|lower }}</h2>
    {% endif %}
{% endif %}

{% if not meeting.fake_date and not meeting.fake_time %}
    <h2>{{ meeting.date_time|date:"l, F jS, Y g:iA"|lower }}</h2>
{% endif %}

You can see the results of the non-fake dates here, and the result of the fake dates here (the older meetings) -- the same kind of date-object reaches the template; the logic above handles the presentation.

I could have more efficiently handled the 'happy-path' real-date case via nesting, but I find this a bit more readable.

If the first test matches, the DateTime object info will only show, as an example, 'march 1998'; if the second test matches, the same DateTime object will only show 'march 6, 1998', but not the time.

This is nice. My introduction to templates was via JSPs, using expression-language to pass in values from beans. Since pure java code can be embedded in JSPs, I had trained myself to rigidly keep logic out of templates, and in the above situation would have written that logic within a Java class. When I began working more with php, I looked around for a template system. I had heard good things about 'smarty', but it seemed too heavyweight. That, combined with my fierce aversion to template logic, scared me off. I then attended a wonderful presentation on HTML_Template_ITX, was sold whole-hog, and still use that for my php end-user web work.

What I initially loved about Django's templates is that I didn't have to use any of the logical conditions I show above; it can be used very well very simply. As I've grown more comfortable with Django and python, my philosophical aversion to template-logic has gradually evaporated -- as long as it's presentation logic. The situation above is a perfect example: It's very reasonable for the business-logic end of things to pass to the presentation-layer a date. How that date is then formatted (upper or lowercase, whether or not the day or other elements of the date are shown, etc.) is a very reasonable thing for the template to handle. And that the template can also handle the presentation based on certain conditions of the Meeting instance is very, very, nice.

Discovery Tools and Standards Trends

tuesday, april 1st, 2008 5:57am

[Got a nice little blog-recognition email a couple weeks ago by a reader asking if I would write up a report on this NISO conference for possible inclusion in a newsletter. Here's the web-version.]

Thirteen presentations were given at the NISO Forum 'Next Generation Discovery: New Tools, Aging Standards', held on March 27 and 28, 2008, in Chapel Hill, North Carolina. They covered three main areas: current user-expectations, discovery tools attempting to meet those expectations, and architectures to facilitate the development and adoption of discovery tools.

Speakers reenforced that users want searching to be easy and fast. Dinah Sanders of III noted how this process of finding information has become increasingly iterative, with users expecting to refine their queries. Vinod Chachra of VTLS described that this is why search results must be returned quickly: it allows humans to be seamlessly involved in the discernment process, scanning and instantaneously determining relevancy.

Robert Sandusky of the University of Illinois at Chicago, Cameilia Csora of 2collab.com, Karen Hawkins of scitopio.org, Dinah, and Peter Murray of OhioLINK, all showed tools that incorporate at least some now-common web elements to meet these user-expectations, including faceted results, tagging, tag-clouds, and feeds.

Many new discovery tools offer truly laudable interface-improvements over previous displays of information, but suffer from a significant architectural limitation: If a tool assumes the only end-user experience is the tool's default web-page, opportunities for discovery are drastically limited. For example, if we at Brown were to license the III Encore tool, and a user were to land on one of our 'Napoleonic Satires' collection pages, it would be wonderful to be able to query an Encore API on 'Napoleon' to display in a sidebar the terrific discovery data this tool can access. But this and many other discovery tools offer no such API. Fortunately this is changing, with some vendors updating their business models to meet current library discovery needs both deeply and broadly. Ex Libris' XServer licensing is a prime example, offering API functionality to its federated-searching tool, dramatically broadening discovery possibilities beyond the default MetaLib web interface.

The presentations that touched on system-design and architectural issues to improve discovery were particularly inspiring.

Richard Akerman of NRC CISTI noted that we should utilize our power as information-producers to produce data that more easily lends itself to machine-harvesting. This can be done by encoding data where possible utilizing existing standards such as OpenURL and COinS, as well as emerging de-facto standards such as microformats. He showed as one example information displayed in a useful time-line format by a site that had queried data from a harvester site -- possible because of standardized structured date-fields.

Mike Teets of OCLC gave a presentation on OCLC's emerging WorldCat Grid services, offering possibilities for cross-referencing standard identifiers such as ISBNs and OCLC numbers, and emerging identifiers such as 'identities' -- truly a developer's dream.

Vinod offered numerous useful suggestions for designing systems to minimize user confusion and maximize utility. He showed an example of a site that had both a facet category of LC Subject Headings, and another facet category of Dewey Subject Headings, and the ease with which a user could be confused by this explicit display of overlapping information, hindering rather than helping discovery.

Michael Winkler of the University of Pennsylvania discussed how PennTags is emblematic of an architectural approach the UPenn Library has found successful. His talk brought together multiple threads of this NISO Forum by showing how PennTags offers discovery possibilities in multiple different settings because it is designed as what he called a 'horizontal' service, as opposed to a 'vertical' service. The 'vertical' service paradigm in Winkler's view is exactly the one I described earlier as architecturally limited: much useful information is gathered together and funneled down into one website with no possibilities for alternate exposure of the gathered and massaged data. PennTags, he noted, is an example of the 'horizontal' service paradigm that he sees as the future of UPenn Library discovery software and good discovery software in general. It is not tied to any particular existing service: not the catalog, not electronic resources, not their course-software -- and yet can be used with any of these services -- and in any given context can expose interesting data from another context. Each PennTag entry is exposed as an RSS feed which illustrates the power of simple standards to enhance discovery. In fact, this shift to a horizontal paradigm is so central to the UPenn Library's current and future work, that Winkler noted he toyed with re-titling his presentation to something like 'Not PennTags, but Why'.

John Mark Ockerbloom, also of the University of Pennsylvania, followed with an update on the DLF's 'ILS Discovery Interface Task Force', of which he is the Chair. The task force is set to soon release a standard for ILS discovery services -- in other words, it will set a lightweight API standard for the OPAC layer of the ILS. In the context of this forum, this standard should help foster the shift of the OPAC from a vertical silo-service to a horizontal more flexible one, increasing opportunities for discovery of the underlying OPAC data.

I hope to see the next Forum in this 'Discovery' series showcase more tools that utilize Winkler's horizontal-service architecture concept that increases discovery possibilities. Kudos to NISO for providing another thought-provoking Forum packed with inspirational examples and ideas and conversation.

Links to presentations should be available soon from the NISO event website.

Finally installed wireshark

friday, march 21st, 2008 8:23pm

[This was like being back in school, where we did a bunch of 'compiling' and 'make'ing, and often took steps backwards, sometimes for a while, in order to go forward.]

[Update: tried the macport install again after a required library was back online; all proceeded smoothly.]

Wanted to run wireshark (formerly ethereal) to check on something. Looks like Leopard install has moved it and other stuff around. Looking into getting it back up...

I've used Fink in the past to install linux packages that don't have mac binary versions, but have heard good things about MacPorts and am trying this (I did check and they do have a wireshark port.)

  • Downloaded the universal installer for Leopard.

  • Ran install. Though the documentation indicated that macports would automatically edit my .profile file; it didn't, so I added two paths to my .profile file:

    export PATH=$PATH:/opt/local/bin
    export PATH=$PATH:/opt/local/var/macports/
    
  • Looked good:

    birkinbox:~ birkin$ 
    birkinbox:~ birkin$ which port
    /opt/local/bin/port
    birkinbox:~ birkin$ 
    birkinbox:~ birkin$ sudo port selfupdate
    
    MacPorts base version 1.600 installed
    
    Downloaded MacPorts base version 1.600
    
    The MacPorts installation is not outdated and so was not updated
    selfupdate done!
    birkinbox:~ birkin$
    
  • wireshark found:

    birkinbox:~ birkin$ 
    birkinbox:~ birkin$ port search 'wireshark'
    wireshark                      net/wireshark  0.99.8       Graphical network analyzer and capture tool
    birkinbox:~ birkin$
    
  • but install unsuccessful:

    birkinbox:~ birkin$ 
    birkinbox:~ birkin$ sudo port install 'wireshark'
    Password:
    --->  Fetching expat
    --->  Attempting to fetch expat-2.0.1.tar.gz from http://downloads.sourceforge.net/expat
    --->  Verifying checksum(s) for expat
    --->  Extracting expat
    --->  Configuring expat
    Error: Target org.macports.configure returned: configure failure: shell command " cd "/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_textproc_expat/work/expat-2.0.1" && ./configure --prefix=/opt/local --mandir=/opt/local/share/man " returned error 77
    Command output: checking build system type... i386-apple-darwin9.2.0
    checking host system type... i386-apple-darwin9.2.0
    checking for gcc... /usr/bin/gcc-4.0
    checking for C compiler default output file name... configure: error: C compiler cannot create executables
    See `config.log' for more details.
    
    Error: The following dependencies failed to build: glib2 gettext expat libiconv gperf ncurses ncursesw pkgconfig gtk2 atk cairo fontconfig freetype zlib libpng render xrender gtk-doc docbook-xml-4.1.2 xmlcatmgr docbook-xsl libxml2 perl5.8 scrollkeeper docbook-xml docbook-xml-4.2 docbook-xml-4.3 docbook-xml-4.4 docbook-xml-4.5 libxslt p5-xml-parser jpeg pango Xft2 xorg-xproto xorg-util-macros shared-mime-info tiff libpcap openssl
    Error: Status 1 encountered during processing.
    birkinbox:~ birkin$
    
  • Another step back to try and figure this out.

  • 'Amer from Boston' here, has some steps he said worked for him; I'll try them.

    • sudo port -v install gtk2 +x11

      birkinbox:~ birkin$ 
      birkinbox:~ birkin$ sudo port -v install gtk2 +x11
      Password:
      --->  Configuring expat
      checking build system type... i386-apple-darwin9.2.0
      checking host system type... i386-apple-darwin9.2.0
      checking for gcc... /usr/bin/gcc-4.0
      checking for C compiler default output file name... configure: error: C compiler cannot create executables
      See `config.log' for more details.
      Error: Target org.macports.configure returned: configure failure: shell command " cd "/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_textproc_expat/work/expat-2.0.1" && ./configure --prefix=/opt/local --mandir=/opt/local/share/man " returned error 77
      Command output: checking build system type... i386-apple-darwin9.2.0
      checking host system type... i386-apple-darwin9.2.0
      checking for gcc... /usr/bin/gcc-4.0
      checking for C compiler default output file name... configure: error: C compiler cannot create executables
      See `config.log' for more details.
      
      Warning: the following items did not execute (for expat): org.macports.activate org.macports.configure org.macports.build org.macports.destroot org.macports.install
      Error: The following dependencies failed to build: atk gettext expat libiconv gperf ncurses ncursesw glib2 pkgconfig cairo fontconfig freetype zlib libpng render xrender gtk-doc docbook-xml-4.1.2 xmlcatmgr docbook-xsl libxml2 perl5.8 scrollkeeper docbook-xml docbook-xml-4.2 docbook-xml-4.3 docbook-xml-4.4 docbook-xml-4.5 libxslt p5-xml-parser jpeg pango Xft2 xorg-xproto xorg-util-macros shared-mime-info tiff
      Error: Status 1 encountered during processing.
      birkinbox:~ birkin$
      
  • Consulting installation documentation from main documentation page. Reading through the docs I see there's a lot I didn't do that I need to, but pretty basic stuff. This is good; it's always nice to have ideas about why something's not working.

  • xcode 3.0 is required; I thought I had installed that when I installed Leopard, but there's no /Developer directory in the new installation (only one is in the 'Previous Systems' directory) so I must not have. Downloading the (big) intaller.

  • The modified date on the Utilities/X11.app is 2008-02-29, which is when I installed Leopard. So I suspect that the selection I remember on the install was for the X11 install, not the whole xcode package.

  • Hmmn... the macports xcode-installation docs say "Click Customize, expand the Applications category and click the checkbox beside X11 SDK to add it to the default items." ...but the xcode installer didn't have an Applications category; proceeding with install.

  • Looks like it went ok.

  • Looks like a good source of Leopard X11 info.

  • Based on intall docs, revised ~/.profile to:

    # for macport install (2008-03-21)
    export PATH=/opt/local/bin:/opt/local/sbin:$PATH
    export DISPLAY=:0.0
    
  • The 'Verify the shell environment' section instructs to run 'env' to confirm that the initial paths are the MacPort paths, like:

    PATH=/opt/local/bin:/opt/local/sbin:/bin:/sbin:/usr/bin:/usr/sbin
    
  • My path has the old '/sw/' paths previously installed via Fink:

    birkinbox:~ birkin$ 
    birkinbox:~ birkin$ env
    ...
    PATH=/sw/bin:/sw/sbin::/usr/lib/php/:/opt/local/bin:/opt/local/sbin:/usr/bin:/bin:/usr/sbin:...
    ...
    birkinbox:~ birkin$
    
  • Ok, disabled the fink stuff; the php lib still comes before the macport directories, but I'm thinking that's ok. Onward...

  • Trying the wireshark install again... OMG, I'm over an hour into the install command and stuff's still being downloaded and installed. Is this a natural series of dependencies? Or is an entire 'base' environment being installed? I now seem to remember something similar with Fink. It's not at all necessary to list all this output, but I'm going to, for the strange geeky factor, and because over the years I've been occasionlly been surprised at the benefits of having this bizarre documentation.

    birkinbox:~ birkin$ 
    birkinbox:~ birkin$ sudo port install 'wireshark'
    Password:
    --->  Configuring expat
    --->  Building expat with target all
    --->  Staging expat into destroot
    --->  Installing expat 2.0.1_0
    --->  Activating expat 2.0.1_0
    --->  Cleaning expat
    --->  Fetching libiconv
    --->  Attempting to fetch libiconv-1.12.tar.gz from http://ftp.gnu.org/gnu/libiconv
    --->  Verifying checksum(s) for libiconv
    --->  Extracting libiconv
    --->  Applying patches to libiconv
    --->  Configuring libiconv
    --->  Building libiconv with target all
    --->  Staging libiconv into destroot
    --->  Installing libiconv 1.12_0
    --->  Activating libiconv 1.12_0
    --->  Cleaning libiconv
    --->  Fetching ncursesw
    --->  Attempting to fetch ncurses-5.6.tar.gz from http://ftp.gnu.org/gnu/ncurses
    --->  Verifying checksum(s) for ncursesw
    --->  Extracting ncursesw
    --->  Applying patches to ncursesw
    --->  Configuring ncursesw
    --->  Building ncursesw with target all
    --->  Staging ncursesw into destroot
    --->  Installing ncursesw 5.6_1
    --->  Activating ncursesw 5.6_1
    --->  Cleaning ncursesw
    --->  Fetching ncurses
    --->  Verifying checksum(s) for ncurses
    --->  Extracting ncurses
    --->  Applying patches to ncurses
    --->  Configuring ncurses
    --->  Building ncurses with target all
    --->  Staging ncurses into destroot
    --->  Installing ncurses 5.6_0
    --->  Activating ncurses 5.6_0
    --->  Cleaning ncurses
    --->  Fetching gettext
    --->  Attempting to fetch gettext-0.17.tar.gz from http://ftp.gnu.org/gnu/gettext
    --->  Verifying checksum(s) for gettext
    --->  Extracting gettext
    --->  Applying patches to gettext
    --->  Configuring gettext
    --->  Building gettext with target all
    --->  Staging gettext into destroot
    --->  Installing gettext 0.17_3
    --->  Activating gettext 0.17_3
    --->  Cleaning gettext
    --->  Fetching pkgconfig
    --->  Attempting to fetch pkg-config-0.23.tar.gz from http://pkg-config.freedesktop.org/releases/
    --->  Verifying checksum(s) for pkgconfig
    --->  Extracting pkgconfig
    --->  Configuring pkgconfig
    --->  Building pkgconfig with target all
    --->  Staging pkgconfig into destroot
    --->  Installing pkgconfig 0.23_0
    --->  Activating pkgconfig 0.23_0
    --->  Cleaning pkgconfig
    --->  Fetching glib2
    --->  Attempting to fetch glib-2.16.1.tar.bz2 from ftp://ftp.gtk.org/pub/glib/2.16/
    --->  Attempting to fetch glib-2.16.1.tar.bz2 from http://mandril.creatis.insa-lyon.fr/linux/gnome.org/sources/glib/2.16/
    --->  Verifying checksum(s) for glib2
    --->  Extracting glib2
    --->  Applying patches to glib2
    --->  Configuring glib2
    --->  Building glib2 with target all
    --->  Staging glib2 into destroot
    --->  Installing glib2 2.16.1_0+darwin_9
    --->  Activating glib2 2.16.1_0+darwin_9
    --->  Cleaning glib2
    --->  Fetching atk
    --->  Attempting to fetch atk-1.22.0.tar.bz2 from http://mandril.creatis.insa-lyon.fr/linux/gnome.org/sources/atk/1.22/
    --->  Verifying checksum(s) for atk
    --->  Extracting atk
    --->  Configuring atk
    --->  Building atk with target all
    --->  Staging atk into destroot
    --->  Installing atk 1.22.0_1
    --->  Activating atk 1.22.0_1
    --->  Cleaning atk
    --->  Fetching zlib
    --->  Attempting to fetch zlib-1.2.3.tar.bz2 from http://www.zlib.net/
    --->  Verifying checksum(s) for zlib
    --->  Extracting zlib
    --->  Applying patches to zlib
    --->  Configuring zlib
    --->  Building zlib with target all
    --->  Staging zlib into destroot
    --->  Installing zlib 1.2.3_1
    --->  Activating zlib 1.2.3_1
    --->  Cleaning zlib
    --->  Fetching freetype
    --->  Attempting to fetch freetype-2.3.5.tar.bz2 from http://download.savannah.gnu.org/releases/freetype/
    --->  Verifying checksum(s) for freetype
    --->  Extracting freetype
    --->  Applying patches to freetype
    --->  Configuring freetype
    --->  Building freetype with target all
    --->  Staging freetype into destroot
    --->  Installing freetype 2.3.5_1
    --->  Activating freetype 2.3.5_1
    --->  Cleaning freetype
    --->  Fetching fontconfig
    --->  Attempting to fetch fontconfig-2.5.0.tar.gz from http://fontconfig.org/release/
    --->  Verifying checksum(s) for fontconfig
    --->  Extracting fontconfig
    --->  Configuring fontconfig
    --->  Building fontconfig with target all
    --->  Staging fontconfig into destroot
    --->  Installing fontconfig 2.5.0_0+macosx
    --->  Activating fontconfig 2.5.0_0+macosx
    --->  Cleaning fontconfig
    --->  Fetching libpng
    --->  Attempting to fetch libpng-1.2.25.tar.bz2 from http://downloads.sourceforge.net/libpng
    --->  Verifying checksum(s) for libpng
    --->  Extracting libpng
    --->  Configuring libpng
    --->  Building libpng with target all
    --->  Staging libpng into destroot
    --->  Installing libpng 1.2.25_0
    --->  Activating libpng 1.2.25_0
    --->  Cleaning libpng
    --->  Fetching render
    --->  Attempting to fetch renderext-0.9.tar.bz2 from http://xlibs.freedesktop.org/release/
    --->  Verifying checksum(s) for render
    --->  Extracting render
    --->  Configuring render
    --->  Building render with target all
    --->  Staging render into destroot
    --->  Installing render 0.9_1
    --->  Activating render 0.9_1
    --->  Cleaning render
    --->  Fetching xrender
    --->  Attempting to fetch libXrender-0.9.0.tar.bz2 from http://xlibs.freedesktop.org/release/
    --->  Verifying checksum(s) for xrender
    --->  Extracting xrender
    --->  Configuring xrender
    --->  Building xrender with target all
    --->  Staging xrender into destroot
    --->  Installing xrender 0.9.0_2
    --->  Activating xrender 0.9.0_2
    --->  Cleaning xrender
    --->  Fetching cairo
    --->  Attempting to fetch cairo-1.4.14.tar.gz from http://cairographics.org/releases/
    --->  Verifying checksum(s) for cairo
    --->  Extracting cairo
    --->  Configuring cairo
    --->  Building cairo with target all
    --->  Staging cairo into destroot
    --->  Installing cairo 1.4.14_0
    --->  Activating cairo 1.4.14_0
    --->  Cleaning cairo
    --->  Fetching xmlcatmgr
    --->  Attempting to fetch xmlcatmgr-2.2.tar.gz from ftp://ftp.FreeBSD.org/pub/FreeBSD/ports/distfiles/
    --->  Verifying checksum(s) for xmlcatmgr
    --->  Extracting xmlcatmgr
    --->  Configuring xmlcatmgr
    --->  Building xmlcatmgr with target all
    --->  Staging xmlcatmgr into destroot
    --->  Installing xmlcatmgr 2.2_1
    --->  Activating xmlcatmgr 2.2_1
    --->  Cleaning xmlcatmgr
    --->  Fetching docbook-xml-4.1.2
    --->  Attempting to fetch docbkx412.zip from http://www.oasis-open.org/docbook/xml/4.1.2/
    --->  Verifying checksum(s) for docbook-xml-4.1.2
    --->  Extracting docbook-xml-4.1.2
    --->  Configuring docbook-xml-4.1.2
    --->  Building docbook-xml-4.1.2 with target all
    --->  Staging docbook-xml-4.1.2 into destroot
    --->  Installing docbook-xml-4.1.2 4.1.2_1
    --->  Activating docbook-xml-4.1.2 4.1.2_1
    ######################################################################
    # As MacPorts does not currently have a post-deactivate hook, 
    # you will need to ensure that you manually remove the catalog 
    # entry for this port when you uninstall it.  To do so, run 
    # "xmlcatmgr remove nextCatalog /opt/local/share/xml/docbook/4.1.2/catalog.xml".
    ######################################################################
    --->  Cleaning docbook-xml-4.1.2
    --->  Fetching docbook-xsl
    --->  Attempting to fetch docbook-xsl-1.72.0.tar.bz2 from http://downloads.sourceforge.net/docbook
    --->  Verifying checksum(s) for docbook-xsl
    --->  Extracting docbook-xsl
    --->  Configuring docbook-xsl
    --->  Building docbook-xsl with target all
    --->  Staging docbook-xsl into destroot
    --->  Installing docbook-xsl 1.72.0_0
    --->  Activating docbook-xsl 1.72.0_0
    ######################################################################
    # As MacPorts does not currently have a post-deactivate hook, 
    # you will need to ensure that you manually remove the catalog 
    # entry for this port when you uninstall it.  To do so, run 
    # "xmlcatmgr remove nextCatalog /opt/local/share/xsl/docbook-xsl/catalog.xml".
    ######################################################################
    --->  Cleaning docbook-xsl
    --->  Fetching libxml2
    --->  Attempting to fetch libxml2-2.6.31.tar.gz from http://xmlsoft.org/sources/
    --->  Verifying checksum(s) for libxml2
    --->  Extracting libxml2
    --->  Configuring libxml2
    --->  Building libxml2 with target all
    --->  Staging libxml2 into destroot
    --->  Installing libxml2 2.6.31_0
    --->  Activating libxml2 2.6.31_0
    --->  Cleaning libxml2
    --->  Fetching perl5.8
    --->  Attempting to fetch perl-5.8.8.tar.bz2 from http://www.cpan.org/src/5.0/
    --->  Verifying checksum(s) for perl5.8
    --->  Extracting perl5.8
    --->  Applying patches to perl5.8
    --->  Configuring perl5.8
    --->  Building perl5.8 with target all
    --->  Staging perl5.8 into destroot
    --->  Installing perl5.8 5.8.8_2
    --->  Activating perl5.8 5.8.8_2
    --->  Cleaning perl5.8
    --->  Fetching docbook-xml-4.2
    --->  Attempting to fetch docbook-xml-4.2.zip from http://www.oasis-open.org/docbook/xml/4.2/
    --->  Verifying checksum(s) for docbook-xml-4.2
    --->  Extracting docbook-xml-4.2
    --->  Configuring docbook-xml-4.2
    --->  Building docbook-xml-4.2 with target all
    --->  Staging docbook-xml-4.2 into destroot
    --->  Installing docbook-xml-4.2 4.2_0
    --->  Activating docbook-xml-4.2 4.2_0
    ######################################################################
    # As MacPorts does not currently have a post-deactivate hook, 
    # you will need to ensure that you manually remove the catalog 
    # entry for this port when you uninstall it.  To do so, run 
    # "xmlcatmgr remove nextCatalog /opt/local/share/xml/docbook/4.2/catalog.xml".
    ######################################################################
    --->  Cleaning docbook-xml-4.2
    --->  Fetching docbook-xml-4.3
    --->  Attempting to fetch docbook-xml-4.3.zip from http://www.oasis-open.org/docbook/xml/4.3/
    --->  Verifying checksum(s) for docbook-xml-4.3
    --->  Extracting docbook-xml-4.3
    --->  Configuring docbook-xml-4.3
    --->  Building docbook-xml-4.3 with target all
    --->  Staging docbook-xml-4.3 into destroot
    --->  Installing docbook-xml-4.3 4.3_0
    --->  Activating docbook-xml-4.3 4.3_0
    ######################################################################
    # As MacPorts does not currently have a post-deactivate hook, 
    # you will need to ensure that you manually remove the catalog 
    # entry for this port when you uninstall it.  To do so, run 
    # "xmlcatmgr remove nextCatalog /opt/local/share/xml/docbook/4.3/catalog.xml".
    ######################################################################
    --->  Cleaning docbook-xml-4.3
    --->  Fetching docbook-xml-4.4
    --->  Attempting to fetch docbook-xml-4.4.zip from http://www.oasis-open.org/docbook/xml/4.4/
    --->  Verifying checksum(s) for docbook-xml-4.4
    --->  Extracting docbook-xml-4.4
    --->  Configuring docbook-xml-4.4
    --->  Building docbook-xml-4.4 with target all
    --->  Staging docbook-xml-4.4 into destroot
    --->  Installing docbook-xml-4.4 4.4_0
    --->  Activating docbook-xml-4.4 4.4_0
    ######################################################################
    # As MacPorts does not currently have a post-deactivate hook, 
    # you will need to ensure that you manually remove the catalog 
    # entry for this port when you uninstall it.  To do so, run 
    # "xmlcatmgr remove nextCatalog /opt/local/share/xml/docbook/4.4/catalog.xml".
    ######################################################################
    --->  Cleaning docbook-xml-4.4
    --->  Fetching docbook-xml-4.5
    --->  Attempting to fetch docbook-xml-4.5.zip from http://www.oasis-open.org/docbook/xml/4.5/
    --->  Verifying checksum(s) for docbook-xml-4.5
    --->  Extracting docbook-xml-4.5
    --->  Configuring docbook-xml-4.5
    --->  Building docbook-xml-4.5 with target all
    --->  Staging docbook-xml-4.5 into destroot
    --->  Installing docbook-xml-4.5 4.5_0
    --->  Activating docbook-xml-4.5 4.5_0
    ######################################################################
    # As MacPorts does not currently have a post-deactivate hook, 
    # you will need to ensure that you manually remove the catalog 
    # entry for this port when you uninstall it.  To do so, run 
    # "xmlcatmgr remove nextCatalog /opt/local/share/xml/docbook/4.5/catalog.xml".
    ######################################################################
    --->  Cleaning docbook-xml-4.5
    --->  Fetching docbook-xml
    --->  Verifying checksum(s) for docbook-xml
    --->  Extracting docbook-xml
    --->  Configuring docbook-xml
    --->  Building docbook-xml with target all
    --->  Staging docbook-xml into destroot
    --->  Installing docbook-xml 4.5_1
    --->  Activating docbook-xml 4.5_1
    --->  Cleaning docbook-xml
    --->  Fetching libxslt
    --->  Attempting to fetch libxslt-1.1.22.tar.gz from ftp://xmlsoft.org/libxslt/
    --->  Verifying checksum(s) for libxslt
    --->  Extracting libxslt
    --->  Configuring libxslt
    --->  Building libxslt with target all
    --->  Staging libxslt into destroot
    --->  Installing libxslt 1.1.22_0
    --->  Activating libxslt 1.1.22_0
    --->  Cleaning libxslt
    --->  Fetching p5-xml-parser
    --->  Attempting to fetch XML-Parser-2.36.tar.gz from http://ftp.ucr.ac.cr/Unix/CPAN/modules/by-module/XML
    --->  Verifying checksum(s) for p5-xml-parser
    --->  Extracting p5-xml-parser
    --->  Configuring p5-xml-parser
    --->  Building p5-xml-parser with target all
    --->  Staging p5-xml-parser into destroot
    --->  Installing p5-xml-parser 2.36_0
    --->  Activating p5-xml-parser 2.36_0
    --->  Cleaning p5-xml-parser
    --->  Fetching scrollkeeper
    --->  Attempting to fetch scrollkeeper-0.3.14.tar.gz from http://downloads.sourceforge.net/scrollkeeper
    --->  Verifying checksum(s) for scrollkeeper
    --->  Extracting scrollkeeper
    --->  Applying patches to scrollkeeper
    --->  Configuring scrollkeeper
    --->  Building scrollkeeper with target all
    --->  Staging scrollkeeper into destroot
    --->  Installing scrollkeeper 0.3.14_6
    --->  Activating scrollkeeper 0.3.14_6
    --->  Cleaning scrollkeeper
    --->  Fetching gtk-doc
    --->  Attempting to fetch gtk-doc-1.9.tar.bz2 from http://mandril.creatis.insa-lyon.fr/linux/gnome.org/sources/gtk-doc/1.9/
    --->  Verifying checksum(s) for gtk-doc
    --->  Extracting gtk-doc
    --->  Configuring gtk-doc
    --->  Building gtk-doc with target all
    --->  Staging gtk-doc into destroot
    --->  Installing gtk-doc 1.9_1
    --->  Activating gtk-doc 1.9_1
    --->  Cleaning gtk-doc
    --->  Fetching jpeg
    --->  Attempting to fetch jpegsrc.v6b.tar.gz from http://www.ijg.org/files
    --->  Attempting to fetch droppatch.tar.gz from http://sylvana.net/jpegcrop/
    --->  Verifying checksum(s) for jpeg
    --->  Extracting jpeg
    --->  Applying patches to jpeg
    --->  Configuring jpeg
    --->  Building jpeg with target all
    --->  Staging jpeg into destroot
    --->  Installing jpeg 6b_2
    --->  Activating jpeg 6b_2
    --->  Cleaning jpeg
    --->  Fetching xorg-util-macros
    --->  Attempting to fetch util-macros-1.1.5.tar.bz2 from http://www.x.org/pub/individual/util/
    --->  Verifying checksum(s) for xorg-util-macros
    --->  Extracting xorg-util-macros
    --->  Configuring xorg-util-macros
    --->  Building xorg-util-macros with target all
    --->  Staging xorg-util-macros into destroot
    --->  Installing xorg-util-macros 1.1.5_0
    --->  Activating xorg-util-macros 1.1.5_0
    --->  Cleaning xorg-util-macros
    --->  Fetching xorg-xproto
    --->  Attempting to fetch xproto-7.0.11.tar.bz2 from http://www.x.org/pub/individual/proto/
    --->  Verifying checksum(s) for xorg-xproto
    --->  Extracting xorg-xproto
    --->  Applying patches to xorg-xproto
    --->  Configuring xorg-xproto
    --->  Building xorg-xproto with target all
    --->  Staging xorg-xproto into destroot
    --->  Installing xorg-xproto 7.0.11_1
    --->  Activating xorg-xproto 7.0.11_1
    --->  Cleaning xorg-xproto
    --->  Fetching Xft2
    --->  Attempting to fetch libXft-2.1.12.tar.bz2 from http://xorg.freedesktop.org/releases/individual/lib/
    --->  Verifying checksum(s) for Xft2
    --->  Extracting Xft2
    --->  Configuring Xft2
    --->  Building Xft2 with target all
    --->  Staging Xft2 into destroot
    --->  Installing Xft2 2.1.12_0
    --->  Activating Xft2 2.1.12_0
    --->  Cleaning Xft2
    --->  Fetching pango
    --->  Attempting to fetch pango-1.20.0.tar.bz2 from http://mandril.creatis.insa-lyon.fr/linux/gnome.org/sources/pango/1.20
    --->  Verifying checksum(s) for pango
    --->  Extracting pango
    --->  Applying patches to pango
    --->  Configuring pango
    --->  Building pango with target all
    --->  Staging pango into destroot
    --->  Installing pango 1.20.0_0
    --->  Activating pango 1.20.0_0
    --->  Cleaning pango
    --->  Fetching shared-mime-info
    --->  Attempting to fetch shared-mime-info-0.23.tar.bz2 from http://people.freedesktop.org/~hadess/
    --->  Verifying checksum(s) for shared-mime-info
    --->  Extracting shared-mime-info
    --->  Configuring shared-mime-info
    --->  Building shared-mime-info with target all
    --->  Staging shared-mime-info into destroot
    --->  Installing shared-mime-info 0.23_1
    --->  Activating shared-mime-info 0.23_1
    --->  Cleaning shared-mime-info
    --->  Fetching tiff
    --->  Attempting to fetch tiff-3.8.2.tar.gz from ftp://ftp.remotesensing.org/pub/libtiff/
    --->  Verifying checksum(s) for tiff
    --->  Extracting tiff
    --->  Configuring tiff
    --->  Building tiff with target all
    --->  Staging tiff into destroot
    --->  Installing tiff 3.8.2_1+macosx
    --->  Activating tiff 3.8.2_1+macosx
    --->  Cleaning tiff
    --->  Fetching gtk2
    --->  Attempting to fetch gtk+-2.12.9.tar.bz2 from http://mandril.creatis.insa-lyon.fr/linux/gnome.org/sources/gtk+/2.12/
    --->  Verifying checksum(s) for gtk2
    --->  Extracting gtk2
    --->  Applying patches to gtk2
    --->  Configuring gtk2
    --->  Building gtk2 with target all
    --->  Staging gtk2 into destroot
    --->  Installing gtk2 2.12.9_0+x11
    --->  Activating gtk2 2.12.9_0+x11
    --->  Cleaning gtk2
    --->  Fetching libpcap
    --->  Attempting to fetch libpcap-0.9.8.tar.gz from http://www.tcpdump.org/release/
    --->  Attempting to fetch libpcap-0.9.8.tar.gz from http://svn.macports.org/repository/macports/distfiles/libpcap
    --->  Attempting to fetch libpcap-0.9.8.tar.gz from http://svn.macports.org/repository/macports/distfiles/general/
    --->  Attempting to fetch libpcap-0.9.8.tar.gz from http://svn.macports.org/repository/macports/downloads/libpcap
    --->  Attempting to fetch libpcap-0.9.8.tar.gz from http://svn.macports.org/repository/macports/distfiles/libpcap
    --->  Attempting to fetch libpcap-0.9.8.tar.gz from http://svn.macports.org/repository/macports/distfiles/general/
    --->  Attempting to fetch libpcap-0.9.8.tar.gz from http://svn.macports.org/repository/macports/downloads/libpcap
    Error: Target org.macports.fetch returned: fetch failed
    Error: The following dependencies failed to build: libpcap openssl
    Error: Status 1 encountered during processing.
    birkinbox:~ birkin$
    
  • Tried an install of libpcap directly and it failed the same way.

  • Posted a message to the MacPorts listserve, and got back a helpful suggestion to get libpcap from SfR-fresh, and put it at:

    /opt/local/var/macports/distfiles/libpcap/
    
  • Whew. Will do, but first: going to bed again. :)

  • Two weeks later...

  • Figured I'd just try the standard way again...

    birkinbox:~ birkin$ 
    birkinbox:~ birkin$ sudo port install wireshark
    Portfile changed since last build; discarding previous state.
    --->  Fetching libpcap
    --->  Attempting to fetch libpcap-0.9.8.tar.gz from http://www.tcpdump.org/release/
    --->  Verifying checksum(s) for libpcap
    --->  Extracting libpcap
    --->  Applying patches to libpcap
    --->  Configuring libpcap
    --->  Building libpcap with target all
    --->  Staging libpcap into destroot
    --->  Installing libpcap 0.9.8_0
    --->  Activating libpcap 0.9.8_0
    --->  Cleaning libpcap
    --->  Fetching wireshark
    --->  Attempting to fetch wireshark-1.0.0.tar.bz2 from http://www.wireshark.org/download/src/
    --->  Verifying checksum(s) for wireshark
    --->  Extracting wireshark
    --->  Configuring wireshark
    --->  Building wireshark with target all
    --->  Staging wireshark into destroot
    --->  Installing wireshark 1.0.0_0+darwin_9
    --->  Activating wireshark 1.0.0_0+darwin_9
    --->  Cleaning wireshark
    birkinbox:~ birkin$
    
  • Life is good!

ssh-tunneling notes

sunday, march 16th, 2008 5:49pm

[This was first posted in 2006, to a no-longer-accessible wiki of mine, to accompany a Brown Internet Programming Group talk I gave. I'm slowly consolidating some of my posts and notes to this site.]


The problem

The situation: My preferred way of working is to program, on my laptop, code that often must communicate with a database running on a remote server. What are good ways of handling this? In the past I used two different approaches.

  • Run a parallel database.
    • Pros: good when lots of database development/reconfiguration is required. No separate connection file is needed.
    • Cons: testing may lead to need to spend effort keeping database structures and sometimes data in sync.
  • Access the remote database by programming the connection-code to determine which host its running on. If running on my laptop, the connection-code would locate the database at an internet address; if running on the same server as the database, the connection-code would locate the database at the localhost address.
    • Pros: Only need to deal with one database.
    • Cons: Non-localhost connectivity may be disabled for security reasons. If others work on the same code, the differing connection-code to detect multiple hosts can be a hassle and can reveal internet passwords. Care must be taken since the password may be transmitted over a non-secure connection.

Solution: Another programmer showed me how he solves this issue via ssh-tunneling. It's a wonderful solution.

Overview

Background info to keep in mind: Common client-server internet connections generally generally do not require specification of originating ports (the computers can pick a port), but do require specification of destination ports.

Example: my browser wants to access a web-page. My browser may send out the http request from any of a range of ports, but will specifically access the server's IP address at port 80, where the web-server is listening.

In ssh-tunneling, the client computer is set up to 'listen' for incoming data on a specified port at the 127.0.0.1 localhost IP address -- and to 'forward' that data, via a pre-established ssh connection, to a specific port at the server's IP address. This terminology may be a bit confusing, because the 'client' -- say, the local development laptop -- is 'listening', which a 'server' normally does. In this case, think of the server as a remote database-server.

What this means for my database situation is that I set up my laptop to listen for incoming data at '127.0.0.1:3306' and to forward that data to 'somehost.services.brown.edu:3306'. All I have to do in my connection code is specify that it attempt to connect to the database at '127.0.0.1:3306'.

The beauty of this is two-fold...

First, the same connection code can run on my laptop and on the server with no modification at all. Example of php connection code...

<?php
    mysql_connect("127.0.0.1:3306", "username", "password") or die ("Sorry, cannot connect to server");
    mysql_select_db("databasename") or die ("Sorry, cannnot connect to database");
?>

Second, because the 'set up' is using SSH as the fowarding mechanism, all data is transferred securely.

Note that there are many different ways of tunneling; this page focuses on one: 'local client' to 'remote service' (in this case, a remote database server).

Setting up the tunnel

Unices

On Linux, Unix, and the Mac, setting up a tunnel is as easy as issuing one command in a terminal window:

ssh -N -L 3306:somehost.services.brown.edu:3306 myaccount@somehost.services.brown.edu

Even if you're going to use a GUI client to set up the tunnel, examine the details of this command to get an understanding of what's going on:

  • The ssh part is the normal secure-shell command.
  • The -N flag specifies that commands flowing over this connection won't be executed on the remote computer, just forwarded.
  • The -L flag specifies the details of the 'localPort:remoteHost:remotePort' section that follows this flag. It means that the local computer should listen for incoming connections on the specified localPort, and forward them over the ssh connection to the remote computer at the remotePort.
  • The 'myaccount@somehost.services.brown.edu' sets up the ssh connection. This prompts me to enter my account-password on the remote computer 'somehost'.

Mac: Fugu

Fugu is an open-source ssh client that supports ssh tunneling.

To set up a tunnel in fugu:

  • Select 'SSH' -> 'New SSH Tunnel'
  • Enter 'somehost.services.brown.edu' in the 'Create Tunnel to' textbox'.
  • Enter '3306' in the 'Service or Port' textbox (think of this as the 'Remote Port').
  • Enter '3306' in the 'Local Port' textbox.
  • Enter 'somehost.services.brown.edu' in the 'Remote Host' textbox.
    • I'm not sure of the distinction between the two 'host' textboxes, but entering info this way works.
  • Enter your username in the 'Username' textbox.
  • Enter '22' in the 'Port' textbox (think of this as the 'SSH Port').

Windows: Putty

Putty is a free Brown-offered ssh client that supports tunneling.

  • Open 'putty.exe' file. The 'PuTTy Configuration' window appears.
  • Click once on 'Category' -> 'Session'.
  • On the right-side of the window enter 'somehost.services.brown.edu' in the 'Host Name (or IP address)' textbox.
  • Enter '22' in the 'Port' textbox.
  • Select 'SSH' for the 'Protocol'.
  • Click once on 'Category' -> 'SSH' -> 'Tunnels'.
  • Enter '3306' in the 'Source port' textbox.
  • Enter 'somehost.services.brown.edu:3306' in the 'Destination' textbox.
  • Select the 'Remote' radio-button under the 'Destination' textbox.
  • Click the 'Add' button.
  • Click the 'Open' button. The putty terminal window opens.
  • When prompted by 'login as', enter your username and hit return.
  • When prompted by for your password, enter it and hit return.
  • That's it; the tunnel is established.

More tunnel fun

Web

The idea...

ssh -N -L 5005:123.123.123.123:80 myaccount@123.123.123.123

To implement this in Fugu or Putty, just switch the IP address, use '5005' for the 'Local Port' (Fugu) or 'Source Port' (Putty), and use '80' for the 'Service or Port' (Fugu) or the port following the host+colon in the 'Category' -> 'SSH' -> 'Tunnels' 'Destination' textbox (Putty).

You can then access a web page on the 123.123.123.123 server using an http://127.0.0.1:5005 address instead of the normal http://123.123.123.123 address.

Note that the 5005 'localPort' can really be any unused port above 1000. The only reason I keep the 'localPort' and 'remoteHostPort' the same in the database example is so my database connection code is the same and works the same on my development laptop and the actual database server.

Other

Note that these techniques can be applied in a wide variety of situations. * I've switched over to tunneled connections from Eclipse, my programming IDE, to my Subversion repositories. * Brown email is encrypted over the network, but if you have a home account that's not, you can check it from a coffee-shop unencrypted wireless network using ssh-tunneling.

Implementing horizontal scrollbars

saturday, march 8th, 2008 8:01am

[2008-03-16 update: I recently realized that IE6 does not display the horizontal scrollbars below properly (IE7 does, as do Firefox & Safari). My understanding is that IE6 does not respect the 'maxwidth' css setting, and that dokuwiki handles this via a section of javascript within a large block of javascript. I don't want to embed all the unnecessary javascript for this specific issue, so until I can learn about and implement an alternate solution, my apologies to those of you using IE6.]


The goal: To implement horizontal scrollbars for code containing long lines.

The reason: It's annoying when long code lines overrun right-hand sections of pages, or when they automatically widen the entire page, such that areas of normally-formatted text end up requiring horizontal scrolling.

Example of solution...

birkinbox:~ birkin$ 
birkinbox:~ birkin$ python
Python 2.4.1 (#1, Feb  1 2006, 18:35:57) 
[GCC 4.0.0 (Apple Computer, Inc. build 5026)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 
>>> import urllib
>>> reference = urllib.urlopen('http://google.com')
>>> reference.read()
'<html><head><meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"><title>Google</title><style>body,td,a,p,.h{font-family:arial,sans-serif}.h{font-size:20px}.h{color:#3366cc}.q{color:#00c}.ts td{padding:0}.ts{border-collapse:collapse}.lnc:link,.lnc:visited{color:#00c}.pgtab,.pgtab:hover,.pgtabselected,.pgtabside{text-align:center;text-decoration:none;color:#00c;display:block;height:27px;float:left;overflow:hidden;background:url(/intl/ja/images/productlinktabs.png) no-repeat;padding-top:8px}.pgtab{width:130px;background-position:-274px 0}.pgtab:hover{width:130px;background-position:-144px 0}.pgtabselected{width:144px}.pgtabside{width:3px;background-position:-404px 0}.ptr{cursor:pointer;cursor:hand}.iconl{background:url() no-repeat;overflow:hidden;height:px;width:px}#gbar{float:left;height:22px;padding-left:2px}.gbh,.gb2 div{border-top:1px solid #c9d7f1;font-size:0;height:0}.gbh{position:absolute;top:24px;width:100%}.gb2 div{margin:5px}#gbi{background:#fff;border:1px solid;border-color:#c9d7f1 #36c #36c #a2bae7;font-size:13px;top:24px;z-index:1000}#guser{padding-bottom:7px !important}#gbar,#guser{font-size:13px;padding-top:1px !important}@media all{.gb1,.gb3{height:22px;margin-right:.73em;vertical-align:top}.gb2 a{display:block;padding:.2em .5em}}#gbi,.gb2{display:none;position:absolute;width:8em}.gb2{z-index:1001}#gbar a{color:#00c}.gb2 a,.gb3 a{text-decoration:none}#gbar .gb2 a:hover{background:#36c;color:#fff;display:block}</style><script>window.google={kEI:"XIjSR5-NKZz0iAHLxMxp",kEXPI:"0",kHL:"en"};\nfunction sf(){document.f.q.focus()}\nwindow.gbar={};(function(){var a=window.gbar,b,g,h;function l(c,f,e){c.display=h?"none":"block";c.left=f+"px";c.top=e+"px"}a.tg=function(c){var f=0,e=0,d,m=0,n,j=window.navExtra,k,i=document;g=g||i.getElementById("gbar").getElementsByTagName("span");(c||window.event).cancelBubble=!m;if(!b){b=i.createElement(Array.every||window.createPopup?"iframe":"DIV");b.frameBorder="0";b.scrolling="no";b.src="#";g[7].parentNode.appendChild(b).id="gbi";if(j&&g[7])for(n in j){k=i.createElement("span");k.appendChild(j[n]);g[7].parentNode.insertBefore(k,g[7]).className="gb2"}i.onclick=a.close}while(d=g[++m]){if(e){l(d.style,e+1,f+25);f+=d.firstChild.tagName=="DIV"?9:20}if(d.className=="gb3"){do e+=d.offsetLeft;while(d=d.offsetParent)}}b.style.height=f+"px";l(b.style,e,24);h=!h};a.close=function(c){h&&a.tg(c)}})();</script></head><body bgcolor=#ffffff text=#000000 link=#0000cc vlink=#551a8b alink=#ff0000 onload="sf();if(document.images){new Image().src=\'/images/nav_logo3.png\'}" topmargin=3 marginheight=3><div id=gbar><nobr><span class=gb1><b>Web</b></span> <span class=gb1><a href="http://images.google.com/imghp?hl=en&tab=wi">Images</a></span> <span class=gb1><a href="http://maps.google.com/maps?hl=en&tab=wl">Maps</a></span> <span class=gb1><a href="http://news.google.com/nwshp?hl=en&tab=wn">News</a></span> <span class=gb1><a href="http://www.google.com/prdhp?hl=en&tab=wf">Shopping</a></span> <span class=gb1><a href="http://mail.google.com/mail/?hl=en&tab=wm">Gmail</a></span> <span class=gb3><a href="http://www.google.com/intl/en/options/" onclick="this.blur();gbar.tg(event);return !1"><u>more</u> <small>&#9660;</small></a></span> <span class=gb2><a href="http://video.google.com/?hl=en&tab=wv">Video</a></span> <span class=gb2><a href="http://groups.google.com/grphp?hl=en&tab=wg">Groups</a></span> <span class=gb2><a href="http://books.google.com/bkshp?hl=en&tab=wp">Books</a></span> <span class=gb2><a href="http://scholar.google.com/schhp?hl=en&tab=ws">Scholar</a></span> <span class=gb2><a href="http://finance.google.com/finance?hl=en&tab=we">Finance</a></span> <span class=gb2><a href="http://blogsearch.google.com/?hl=en&tab=wb">Blogs</a></span> <span class=gb2><div></div></a></span> <span class=gb2><a href="http://www.youtube.com/?hl=en&tab=w1">YouTube</a></span> <span class=gb2><a href="http://www.google.com/calendar/render?hl=en&tab=wc">Calendar</a></span> <span class=gb2><a href="http://picasaweb.google.com/home?hl=en&tab=wq">Photos</a></span> <span class=gb2><a href="http://docs.google.com/?hl=en&tab=wo">Documents</a></span> <span class=gb2><a href="http://www.google.com/reader/view/?hl=en&tab=wy">Reader</a></span> <span class=gb2><div></div></a></span> <span class=gb2><a href="http://www.google.com/intl/en/options/">even more &raquo;</a></span> </nobr></div><div class=gbh style=left:0></div><div class=gbh style=right:0></div><div align=right id=guser style="font-size:84%;padding:0 0 4px" width=100%><nobr><a href="/url?sa=p&pref=ig&pval=3&q=http://www.google.com/ig%3Fhl%3Den%26source%3Diglk&usg=AFQjCNFA18XPfgb7dKnXfKz7x7g1GDH1tg">iGoogle</a> | <a href="https://www.google.com/accounts/Login?continue=http://www.google.com/&hl=en">Sign in</a></nobr></div><center><br clear=all id=lgpd><img alt="Google" height=110 src="/intl/en_ALL/images/logo.gif" width=276><br><br><form action="/search" name=f><table cellpadding=0 cellspacing=0><tr valign=top><td width=25%>&nbsp;</td><td align=center nowrap><input name=hl type=hidden value=en><input type=hidden name=ie value="ISO-8859-1"><input maxlength=2048 name=q size=55 title="Google Search" value=""><br><input name=btnG type=submit value="Google Search"><input name=btnI type=submit value="I\'m Feeling Lucky"></td><td nowrap width=25%><font size=-2>&nbsp;&nbsp;<a href=/advanced_search?hl=en>Advanced Search</a><br>&nbsp;&nbsp;<a href=/preferences?hl=en>Preferences</a><br>&nbsp;&nbsp;<a href=/language_tools?hl=en>Language Tools</a></font></td></tr></table></form><br><br><font size=-1><a href="/intl/en/ads/">Advertising&nbsp;Programs</a> - <a href="/services/">Business Solutions</a> - <a href="/intl/en/about.html">About Google</a></font><p><font size=-2>&copy;2008 Google</font></p></center></body></html>'
>>>

Solution

All credit goes to Dokuwiki, which implements this solution via CSS (I've also seen javascript solutions). I used the Firefox web-developer plug-in to figure out what part of the CSS works the magic, and came up with the adapted CSS below. (Note comment implying the second style may not be necessary.)

pre { /* allows horizontal scrollbars to appear; excellent for code */
    font-size: 120%;
    padding-top: 0.5em;
    padding-right: 0.5em;
    padding-bottom: 0.5em;
    padding-left: 0.5em;
    border-top-width: 1px;
    border-right-width: 1px;
    border-bottom-width: 1px;
    border-left-width: 1px;
    border-top-style: dashed;
    border-right-style: dashed;
    border-bottom-style: dashed;
    border-left-style: dashed;
    border-top-color: #8cacbb;
    border-right-color: #8cacbb;
    border-bottom-color: #8cacbb;
    border-left-color: #8cacbb;
    color: Black;
    background-color: #f7f9fa;
    overflow-x: auto;
    overflow-y: auto;
    }
* html .insitu-footnote pre.code, * html .insitu-footnote pre.file { /* 2008-03-08-Sat note to self: this was added for horizontal scrollbar support, but doesn't seem necessary for Safari/Mac or FF/Mac -- after reinstalling Parallels test in IE/Win and delete if it's not necessary */
    padding-bottom: 18px;
    }

Nice!

Website update

monday, march 3rd, 2008 5:50am

This website is about two months old. I started working on it shortly after a Christmas/New Year's vacation in small but steady increments of early-morning time. So, some of the highlights...

  • Set up hosting account with webfaction, because they offer django-hosting via mod-python and were highly recommended by numerous django folk
  • Set up domain name bspace.us (.org & .net were taken) with pair.com's domain registration division
  • Pointed domain name to hosting site
  • Set up django and mysql. My early db training was with postgres, which I'm partial to and which webfaction also offers, but I chose mysql thinking that it would give me experience useful in my work, where we use mysql. However, django abstracts direct database access so successfully that I may switch over to postgres.
  • Figured out how to utilize webfaction's tools to set up multiple django projects with subdomains
  • Implemented early version of this blog adapting the Simpla theme I had used for an old wordpress.com blog
  • Figured out how to set up webfaction static-page serving and point django admin to that, and make this work via https. As an aside, I cannot speak highly enough about the django ssl middleware that allows super-easy specification of pages which should only be able to be accessed via https.
  • Added some current content and a bit of old content
  • Added 'published' field to allow me to work on in-process articles using the django admin
  • Most recently, figured out how to integrate Markdown syntax into django. This allows me to edit a note entry with simple formatting markers which are saved to the database, and on display are automatically converted into appropriate html.

Possible future improvements

I have a couple of other personal projects I want to get to (not to mention the fact that most of my work projects are so compelling that I enjoy working on them at home when time permits) and thought I might put 'notes' development work on hold, but will continue to work on this site through March. The improvements I'm interested in exploring include...

  • Adding a comment architecture, possibly with captcha and other anti-comment-spam features
  • Implementing long-code css, to enable horizontal scrollbars to appear when a long string of code is entered. This allows page formatting to remain intact, and prevents a user from having to scroll an entire page back and forth horizontally simply because of a single line of long code. (example here)
  • Adding a previous-version or two, and enabling a user-friendly 'diff' feature
  • Learning about and implementing google analytics
  • Learning about and implementing google charting -- for instance, I could have a dynamically-generated pie-chart showing the top tag-areas in which I post
  • Learning about and exploring google ads
  • Enabling each tag to have its own RSS feed and implementing this with a user-friendly interface
  • Experimenting with microformats
  • Learning how to most easily incorporate COinS into book mentions. I understand how to use COinS, but would like to learn about the easiest ways to generate them.
  • Utilizing django's caching framework

Doubtless I'll think of other interesting work along the way.

Better logging

saturday, february 23rd, 2008 5:22pm

I'm entranced with a new practice: logging to a database instead of a file.

Long ago I got into a habit of logging to files as a way of monitoring the workings of my programs. For shell scripts I piped the standard output to be appended to a file, and then just sort of stuck with that model as I learned other languages.

Though java and python each have a robust logging library built into the language, I didn't use those, instead focusing my language-learning on features that more directly enabled me to tackle whatever the task at hand. The result is that over time my shell and php and python and java programs ended up with log files that grew ever larger, requiring occasional manual paring.

Given an interest in best-practices, I've begun learning about and experimenting with built-in loggers when available, but on a current project have met my logging needs via a self-rolled approach that offers real benefits.

Problem -- atomized logging

Our easyBorrow project consists of a lightweight php web interface that quickly dumps the incoming request into a database queue, where a python controller takes over, calling a series of independent web tunnelers & other web services. The whole system consists of around a dozen independent web-services of varying degrees of complexity, each with a nicely scoped focus. Most of them also write to a separate log file, which in a way makes sense, but given that the majority of these web-services serve a single goal -- to move the user's request-processing along, the atomized nature of the logs can end up being a hassle.

If something goes wrong with a request, a 'history' table does given an indication of where to start tracking down the issue -- but then I may have to look into as many as half-a-dozen separate log files to see what exactly happened. This is one of those situations where problems don't arise often enough to tackle improving the existing architecture, but just enough to make the existing one annoying at times.

Problem -- data not exposed

Keeping this background in mind, I want to note another issue that happens maybe once every three months that had a role in this new architecture with which I'm so taken. Some two years ago I implemented an automated export of requests from our iii ILS for items held in an offsite location. Those requests get exported, then parsed, and then moved to a location where a different vendor's inventory-control software takes over and presents the workers at the offsite facility a list of items that need to be retrieved.

Occasionally, very occasionally, requests don't show up for the offsite staff and I'm asked if I can confirm that the requests actually got exported and parsed and handed off to the inventory-control software. So I look in my documentation to see where the server application log files are located -- grab them and let the folk know that yes, in fact, my part of the flow worked. When this happened last month, a co-worker noted that it would be terrific if they could view the information that I'm looking up so I wouldn't have to be bothered. Unfortunately, given the existing model, that would require folk having passwords to unix servers and isn't workable. But I've ruminated upon this, and given my current evangelism of APIs and exposing data, I've thought that if I had to do that logging over, I'd expose it via a web interface.

Solution

Now I'm working on a new project, or rather tackling one that's been on the back-burner far too long: exporting newly-cataloged item information from our closed and unfriendly iii ILS into a database where we can present users with useful new-item info and feeds. Like more and more projects these days, this one has many pieces, each of which, had I done this a year-plus ago, would have logged its inner workings to a separate file.

Now though, I'm logging the export script info, the posting script info, and the parsing script info to a single database table. And because one of the scripts lives on a server that doesn't have a library setup to interface with mysql, I'm 'writing' to the db by POSTing that script's log-entry info to a url which then saves it to the db. The log-table consists of (in addition to an unseen auto-incrementing id) a datestamp, an identifier, and the log message. The 'identifier' in this case is a simple number that allows me to group the entries from different sources together in the log. When I eventually apply this beauteous system back to easyBorrow, the identifier will be the request-number the system assigns early on in the process. The function/method in each separate script that writes to the log also takes a detail-level parameter, allowing me to specify a high-level of logging detail in development code, and a low-level in the ongoing in-production code.

This system is sweet. It means that I have only to look in one place to monitor the flow of all three scripts. So if the export cron job fires off at 2am, and the POST cron job fires off at 3am, and the parser cron job fires off at 4am, I can see the whole flow in one view.

Though all developers can write to a database in their sleep, since I'm writing to a django-managed table, it is and feels even easier. For those who haven't yet drunk framework kool-aid:

log_entry = Log()
log_entry.identifier = 'the_identifier'
log_entry.message = 'the_message'
log_entry.save()

Wrapping a function around this allows my log entries to look like:

updateLog( detail='low_detail', identifier='the_identifier', message='the_message' )

But wait, in true Ronco spirit, it gets better... Since I'm writing to a django-managed table, I automatically, without writing extra code, have a complete, useful, sortable and searchable web-interface -- with built-in authentication -- which means that not only can I view the flow of processing, I can easily allow anyone else to view the flow of processing by supplying a url.

The final sprinkle of luscious magic is that django makes it very easy to overwrite the built in save method of its objects. So I've added a bit of logic to the Log object's save method to delete entries older than X days (a configurable number I've put in a settings file). There's a bit of a hack in this solution. The absolute simplest code to write in this save method is just to query for all log-entries older than X days and delete them, which is what I've initially done. But this is unnecessarily expensive database access for every single log-entry, though mitigated by the fact that for this project, the scripts run only once a day and in production, log lightly. A better approach would be to have a separate job run once a day or week and perform the deletions, and I may implement that, though I've been mentally toying with an oddly enjoyable interim hack: to have the save method come up a random number such that it would have, say, a 1 in a 100 chance of running the delete code. Bottom line, though, is that auto-deletion is taken care of right up front.

Put all of these improvements together, and the new system offers more useful, more accessible, and better-sized logging.

Practical campus APIs & feeds

saturday, february 16th, 2008 10:42am

For a while now I've evangelized APIs & feeds, encouraging folk (and reminding myself) to to expose 'web-page' data by presenting it in some alternate structured format. That's partly for the purpose of making code-reuse a reality but even more-so for the purpose of making possible new and interesting uses of data.

At the Library, we've truly moved into the realm of moving code onto the network. The web-services we've created have, not surprisingly, been library-related:

  • An isbn converter.
  • A 'cleaner' for data output from an ILS API.
  • Many 'tunnelers' into consortial borrowing services returning results of searches, with the order number, if applicable.
  • A reprocessor of OCLC xISBN data that returns a only those OCLC xISBNS that have the same format and are in the same language as the submitted ISBN.
  • An OCLC to ISBN converter that will take an OCLC number and see if there are versions of that item available with ISBNs.
  • An OPAC status & location checker.
  • etc. etc. etc.

I've wondered recently what APIs the library could offer that would be of value campus-wide. More specifically, what APIs we might develop for our own needs that would be useful to the campus as a whole. Of course, many of our APIs do currently benefit the wider campus community in that students, staff, and faculty across campus use services of ours that are made possible via our behind-the-scenes use of APIs. I'm thinking more of APIs that developers in other departments might find directly useful.

When considering APIs that would be useful for developers across the campus, I naturally think of our Computing and Information Services department (CIS). I've had good conversations, and hope to have many more in the future, with CIS folk about having them develop and evangelize campus-wide APIs. My thinking has been that over time, developing such APIs could save them an enormous amount of time as well as enhance good will from departmental developers.

An example: for one of our Library projects, we need a listing of faculty and course information. I'm not directly involved in this project, but my understanding is that we periodically request a list of faculty and courses from CIS; they produce the list; and we update some db tables for web-apps that make use of this information. My sense is that if certain Banner APIs could be enabled -- obviously with appropriate security implemented -- we could get this information directly from a feed / API call, simplifying our workflow and lightening the workflow of the CIS folk who produce the list for us.

I'm encouraged from my conversations that there are folk in CIS who share this perspective and are working to realize it. While good discussions and planning proceed, I find myself gravitating to what we in the Library could do now along these lines. Three ideas...

Cafeteria menu

As part of an idea that deserves its own post (the idea sounds a bit silly without context, but indulge me), I've thought that it would be very useful on a particular Library web page to be able to display the next upcoming meal at the main campus cafeterias. I spent about ten minutes exploring the availability of that information, and found two web-pages and a downloadable excel spreadsheet. None of these are ideal sources of information to automate, but it could be done, and I wouldn't be at all surprised if the resulting structured feed would be of use to others, from individual students to the campus newspaper.

SafeRide arrival time

We have a campus shuttle system comprised of about seven vans. A couple of these have GPS receivers, and a vendor website displays on a map, via quite gnarly javascript, the current location of the GPS-enabled vans. That's nice, but the experience could be significantly improved.

I've thought it would be extremely useful to be able to display on a Library web-page (if the student is accessing the page from within the Library) a simple line like "The next SafeRide shuttle should arrive here in about 10 minutes." Simple and seriously, wonderfully useful information, that doesn't get in the way of the task at hand. That same Library web-page, if accessed from outside of the Library, simply wouldn't have that line displayed.

The API we could create, from parsing the javascript on the vendor web-page, could most simply at a minimum return location information for the GPS-enabled shuttles, which could be interpreted by our own server-side logic to approximate arrival times. But even better, the logic of determining arrival times could be embedded in the API itself. The API could take a location-parameter and return expected arrival time for the submitted location. We at the library might only implement logic that focuses on the arrival times at the Library. But by opening up the arrival-assessment code, we could allow BioMed developers to add to add arrival-time logic for shuttle-stop-locations close to BioMed buildings, and students to add arrival-time logic for shuttle-stop-locations close to particular dorms.

Since developers can determine the IP address of an incoming request for information, and since developers and computer-knowledgeable students know the IP-address ranges of buildings in their purview, we really can do this.

Public computer availability

Imagine you're a student. You need to get some good work done and know if you stay in your dorm room this evening you won't get that work done -- there are just too many distractions. So you want to go to the Library. You have a desktop computer, or maybe just don't feel like lugging your laptop in the rain, and you know the Libraries have public clusters. Problem is -- it's getting close to midterms and sometimes the clusters get pretty full. Wouldn't it be great, I mean, really, really great, to be able to access a web-page that shows public cluster availability across campus?

I've talked with some CIS folk about this and found individuals who are working hard to realize this goal. They do have software that can detect the 'in-use' status of each terminal in the clusters, and last I checked (in November, I think) had noted that the software had upgraded its web-display capability which with they were experimenting. However, public web display of cluster availability is as of this writing only accessible... from cluster machines. But the hope is that this information will eventually be made more public. That's great, but I'd like to take the data a step further, and create an API to the data. The reason is that if the data were also exposed via an API in addition to a web-page, I could solve more specific problems in a targeted way. For instance, one of our Libraries has 15 floors, with public computers available on multiple floors. Wouldn't it be terrific if a student entering that Library could glance at a display screen and see the relevant computer availability (with floor numbers listed instead of generic cluster IDs) for just that building? An API would allow that.

I have other ideas as well, but this gives a good flavor of how in the future, as we meet Library needs, we might be able to offer very useful API data to developers across campus.

To close, an exhortation... In each of these three situations, I speak of creating an API from existing publicly available electronic data. My excitement about creating and then utilizing these APIs for user-services is evident. But really, I should not have to create the APIs; I should be able to spend my time building the useful services for the Library and our campus that the APIs allow. So to all: if you know anyone creating any web-information -- encourage that person to expose their data not only via a 'regular' web-page, but also in a predictable structured way that can make its re-use easy. And to anyone purchasing any vendor-service that offers electronic information, demand that the service offers an API to the data.

Feed interfaces & urls

sunday, february 10th, 2008 2:46pm

I'm going to add feeds to the site, which django makes easy to do. But I want to think about the urls for the notes feeds. I had been thinking I'd manually create a feed that can flow to planet.code4lib.org that would have been a combination of tags (at the moment 'user-context' and 'api') so that things irrelevant to code4lib don't show up. But it'd be nice to generalize this, so that anyone coming to the site can easily select multiple tag-categories and get a feed from that.

So I want to think beyond an obvious simple elegant system for single tags. (Before the idea occurred for multiple-tag feeds the url for tag 'user-context' would have been 'http://bspace.us/notes/feeds/user-context' -- but that's not extensible.)

Tripod new titles list

The tripod newtitles list has a great interface for selecting multiple categories. Selecting a few options returns a fully-parameterized url, nicely explicit if a bit busy.

For the notes feeds, I'd like to offer combined-option feeds in two ways: as parameterized, but also in a tinyurl fashion, like:

http://bspace.us/notes/feeds/arzq

If that url returned (in addition to feed info, obviously) a documentation link, the documentation could note that the link

http://bspace.us/notes/feeds/arzq/info

...would return, among other things, the explicit url.

I'll think more on this and look for other examples of interestingly-crafted feed interfaces and urls.

Moving code onto the network

saturday, february 9th, 2008 10:46am

In 2004, while in my masters program, deeply immersed in java object-oriented programming, I saw the potential benefits of code re-use that classes offer. I envisioned over time building up libraries of class-objects; by accessing them in future projects, I expected to be more and more productive.

Code-reuse never quite worked out that way, though. What I've tended to do for new projects has been to copy a similar class from a previous project, paste it into the new project, weed out unnecessary attributes and methods, and add new code. In a way this makes sense: though I lose out on 'pure' code-reuse, I gain by having all code for a project together. That's nice for version-control and portability, and isolation of concerns in that I don't have to worry that a change in a class in one project will have unintended consequences in another project.

But reading a while back about service-oriented-architecture, and shortly thereafter having a need to code a couple of lines in python that I had just coded in php a day or two earlier -- the benefits of moving code into RESTful web-services, that is: moving it onto the network, became apparent.

I do that all the time now. Just last week I had a need to convert between 10 and 13-digit isbns -- for the second time in a recent project, so rather than coding the conversion directly in the program at hand I put it into a webservice.

http://sisko.services.brown.edu/easyborrow/isbn_converter/0688052304/

In this shift, I've finally realized that goal of code reuse, while still being able to maintain the version-control and isolation of concerns benefits of focusing on my specific project at hand.

The book 'RESTful Web Services' by Richardson and Ruby, while a bit dense, offers good insights on creating web-services (example: versioning). At some point, I'd like to come up with standards for Brown Library (and/or campus-wide) web-services. Examples: specifying versioning in the url, a documention url in the returned data, and a url in that documentation of all APIs/web-services the department offers.

For now, though, the simple shift toward moving code out of individual projects and onto the network has been extremely rewarding.

User context

saturday, january 26th, 2008 6:39pm

I recently organized a meeting of some forward-thinking folk to brainstorm about what kinds of cool library things we could do if we knew more about a user's context. I also put up a wiki-page to help the brainstorming process.

This stems from a requirement for a project: I had to be able to access a particular barcode related to a user. Turned out the only way of getting at this barcode was to first get another piece of information, and then use that first piece of information to call an API that returns a bunch of info about a user, including the barcode. Fine; did that; got the barcode and used it for a tunneler I built. I then went to the Access 2007 conference, a terrific library programming/technology conference, where, among other terrific presentations, I heard Mark Leggot speak about the repository he set up at the University of Prince Edward Island (UPEI). He mentioned the importance of understanding a user's 'context'. Something clicked, and the general implications of what our team had achieved by being able to tie a user's log-in to this API-info about the user became clear.

This is nothing new in the internet business world; Amazon has been trailblazing this path for years. But given that authentication has mostly been a simple 'boolean' system in our Library webapps of just determining whether or not a user is permitted to access a site -- this opens up worlds of exciting possibilities. Already we've implemented a proof-of-concept 'drop box' that determines, from login, the user's 'type' (faculty/staff/student) and department and uses this data to customize the page displayed after login. Exciting stuff!

Vendor API Manifesto

thursday, december 6th, 2007 11:02pm

[I wrote this early in the Fall of 2007 and circulated it to folk in the Library who were attending meetings at which vendors were advertising their wares.]

Software products are created, understandably, primarily to meet existing needs. There are varied bodies of thought as to how much a software product should be designed to meet 'future possible needs'.

At certain points in recent history, it may have been reasonable to design the sole interface to a system assuming that the user of the interface would be a person using a web-browser.

Though APIs (application programming interface) have been around for ages, the trend toward programmers wanting to access internal and external systems via APIs has accelerated tremendously over the last few years.[1] As a programmer for a creative web-services department in a creative Library, I'm part of this trend. Our team's need to be able to programmatically access systems has increased dramatically. Fortunately, a few vendors such as Ex Libris understand this and have built possibilities for programmatic access into their products. But many closed systems remain.

To managers and directors making purchasing decisions, I urge that a top-level purchasing consideration be whether the vendor's product offers an API to the information it provides (in addition to any built-in web interface). The simple reason is that a web presentation of information is designed for a single purpose: for a user to interact with the system via a browser. An API allows the system's data to be accessed in any way we see fit, now or in the future.

A concrete example for any reader not familiar with the notion of an API...

Our team is currently developing a system to simplify the process of obtaining a book through interlibrary-loan services. In order to do this we have been able to automate the process of searching a consortial web-catalog for an item, and requesting that item. But the only method of doing this involves creating a program which essentially mimics a browser, automatically simulating clicked buttons and links and reading the resulting HTML of the consortial catalog's web interface.

This works, but is terribly fragile: if the design of a web-page changes, our program may no longer work until it is reconfigured to understand the new design.

What we absolutely need (in addition to the existing web interface) is a catalog-service (the API) which would allow a defined http request to be sent to a URL that will allow a search to be performed, or an item to be requested, etc. (That http request would come from a program our team has written -- instead of from a user sitting at a browser.) Each request to this API would return predictable documented structured information (XML is one standard; there are others). Our team's program would then be able to automatically process this information.

It is worth emphasizing that I am not asking for a 'whole new program' from the catalog vendor. A system's existing internal program logic that produces the information for the regular web data-stream is applicable to production of the alternate API data-stream. Yes, it takes thought and work to create a good and secure API and document it -- but an API, essentially, presents the same data as a web interface, in a simpler format. The mind-shift in offering an API is often larger than the work-shift.

Finally, about interacting with vendors regarding this issue... Vendor sales people aren't the developers, and it sounds like I am asking for something that vendor developers would be more knowledgeable about. But I've seen different vendor sales representatives at workshops and conferences, and the representatives for products that provide APIs have universally very clearly understood the importance of this issue. Thus if a product representative does not seem to understand this important feature, I would have significant concerns with the product.

--

Notes

[1] Key aspects of this trend are articulated in this seminal article:
http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html

Welcome

friday, november 9th, 2007 6:21am

I'm Birkin James Diana, and this is my personal website: a kind of experimental technology room.

I'm a programmer & web-developer for the Brown University Library, a terrific institution doing cool work. I code primarily in the languages python, java, and php, and increasingly (and happily) use django. My interests include extreme programming, test-driven development, instant-runoff-voting, weighted-randomization, solar power -- the list is long.

One reason for this site is to have a venue to focus and disseminate a sliver of the numerous ideas I have about how to make the world a better place, mostly in a geeky web-programmer realm. Another reason is to deepen my knowledge of the django web-framework. There's a tension here between using a pre-existing solid and feature-laden service like wordpress.com to communicate ideas, and 'rolling my own' site to deepen learning.

General email address, birkin.diana@gmail.com; work-specfic email address, birkin_diana@brown.edu.

Personal statement

monday, june 9th, 2003 6:41pm

[Yes, that date of June 9, 2003 is correct. I just today (Saturday, February 16, 2008) ran a disk-search on some text and saw an old ClarisWorks-format file named 'statement' that, curious, I opened. It's a statement that was part of my application to The Person's School of Marlboro College (since renamed The Marlboro College Graduate Center) -- to the Masters of Science in Internet Engineering program. The eighth paragraph, where I convey a sense of what I would like to do in the future, struck me; I hadn't remembered writing that. Interesting how life works out.]

For over fifteen years I have been captivated by the way in which computers have created possibilities for people to work together. In the 1980s I attached a 300-baud modem to my first Mac, and discovered a world of local bulletin-board systems as well as a national service called GEnie. Though GEnie used a text interface, it hosted a game which allowed members to engage in aerial combat via downloaded software. I was fascinated by this union of a graphical front-end to underlying networking.

This interest in computers connecting people led me to create a local BBS I ran for two years which focused on social change issues and introducing non-technical folk to the benefits of email and discussion groups.

My immersion in the design and logic of BBS-hosting software primed my move from desktop publishing to database programming. I created a discussion area on America Online (the 'FoxBASE/Mac Coffeehouse') devoted to the exchange of tips and techniques, and learned first-hand how powerful a learning tool a national network can be. It was during this intensive period of dBASE programming, working on a series of complex multi-user systems, that I learned programming constructs (memory variables, loops, conditional branching, etc.) that have proven invaluable to this day. I have since created a few small educational programs using REALbasic, an object-oriented programming environment similar to Visual Basic.

For years I have dabbled with web-page creation, but my interest in the web has grown significantly as I have begun to experiment with moving beyond static pages.

My first dynamic web project was the creation of an automated script that twice a day exported data from multiple office databases to text-files, massaged these files in a text-editor, and then uploaded them to a webserver where the new data was displayed via server-side-includes. Exciting, but still not interactive.

Over the past year, I have begun to learn PHP and MySQL, and have created an initial version of a user-updated web calendar for a youth group I work with. I find this work extremely compelling; my desire to do more work like this is what inspired me to apply to the MSIE program.

Nearly all of my computer knowledge has been self-taught. While I continue to value learning from books and user-groups, I am eager to return to school, to immerse myself in learning about multiple aspects of Internet-engineering, from packet-transmission to programming and database environments. And I look forward to integrating this knowledge through the Capstone Project.

I want to work on creative, interesting projects that would utilize this knowledge I will have gained. One vision is to work in a collegial environment such as Brown University in a technology-group setting where I could implement innovative, useful technology initiatives assisting students, faculty, and administrators.

I have also been mulling over very preliminary ideas for a Capstone Project, ideas evolved from my early appreciation for how the Internet has created new possibilities for people to interact and work together. One example, stemming from an interest in alternative methods of voting and democracy, is to work with a political science professor at Brown to focus on a Rhode Island community to test Internet campaigning and voting (and to integrate into this experiment alternative voting strategies such as the 'single transferrable vote').

I have a strong interest in policy decision-making and in improving systems. However, my primary goal is to directly build and create systems and solutions. This is why I am excited about applying to the MSIE program, and why I hope to be a part of The Persons School experience this Fall.

-Birkin James Diana, MSIE applicant for Fall 2003