Opportunities of benign-neglect

wednesday, february 24th, 2010 7:37am

Cathy Marshall of Microsoft Research gave a keynote at the wonderful code4lib 2010 conference that provided a useful nudge to my thinking about repository layers.

I've suggested elsewhere that university libraries contemplating a repository should consider developing policies around repository 'layers'. This notion involves both an inner long-term, high-guarantee archival layer -- and an outer services-oriented work-space layer. Reasons for the archival layer are obvious. Perhaps less so are reasons for and benefits of the work-space layer: it fulfills a library mission to further scholarly work; it strengthens the library's position as a central part of the academic campus community; it creates opportunities for valuable work to be moved easily into the archival layer.

Though my Library doesn't conceptualize our repository in this way, it's compelling enough that I think about this layered approach regularly. Given some exciting video initiatives at my University, much of my recent work-space layer thinking has focused on how to avoid the possibility of having precious library disk space overwhelmed with hypothetical services-layer low(er) quality materials. Strategies I've considered to deal with this concern are combinations of limiting the size of an entity's (person/department) work-space, and/or limiting the number of years items may remain in the work-space. Given my strong belief in providing useful and friendly user-services, in this 'limiting' scenario, we would provide terrific charts and notifications which would allow work-space users to easily monitor their usage of this temporal, useful space -- and provide tools and Library staff assistance to easily move appropriate items into the archival layer.

But regardless of the intention to have this work-space be used productively, there would be a high likelihood that the more control we give users over their Library work-space, the more likely that a significant portion of this work-space would fill up with materials that exist simply because it's more of a hassle to delete things than it is to neglect them -- one of Marshall's key points.

While Marshall specifically noted the problems of benign-neglect as a user-strategy for handling materials, she also noted that benign neglect offers opportunities. This was the nudge. I'm finding this notion of opportunities fascinating to reflect upon; it offers new realms for thinking about interesting services that could be built for this work-space layer.

The simple accretion of data from benign neglect suggests the now-common mining strategy associated with usage-data, popularized by amazon: "you may also be interested in this". An acquaintance recently told me about 'mallet', software than can mine texts to discern topics. It would be a worthy experiment to use such a tool to offer repository users an optional discovery service based on their text-based work-space materials.

Two additions to Apple's iPhoto application in the last year or so suggest other possibilities. 'Faces' scans a user's iPhoto library, using pattern-recognition routines to create groupings of people. 'Places' scan's the library and extracts geo-location coordinates if available, and, if I recall correctly, timestamp data, to create a map view over time of photo-locations.

Other scans could be run on work-space data, looking for patterns of government data-sets or citations. And combinations of embedded metadata such as geo-location and mime-type and date could be gathered, so that if, for example, a pattern of images taken at a certain location on a certain date was detected, not only could auto-grouping of those items be presented, but external sources such as flickr could be queried as well, offering the user the ability to see other external views of this 'event'.

Many of these scan/mining ideas would also be useful to apply to the repository as a whole. Such scans could offer both automated randomized general-discovery displays, as well as offer researchers additional focused discovery-views to permitted items. But to the extent that such services enhance the quality of users' work-space experience, it might help to keep the materials in the work-space more relevant: using benign-neglect to minimize benign-neglect.

Birth tips from a 50 year-old guy

sunday, january 24th, 2010 8:23pm

That title doesn't quite sound right, does it.

As a father of two, I've had occasion, over the years, to offer my 'three tips' to first-time moms-to-be, and have received positive feedback. Recently I've been thinking about birth due to recent family and friend births, and figured I'd write them up...

Childbirth classes

Think of tips & techniques as tools, not rules.

I highly recommend childbirth classes; I really cannot say enough positive things about them -- even refresher ones. However, my advice for moms (and partners): view the tips and techniques taught as 'possible tools', not 'rules to live up to'. Specific moms may or may not find specific tips and techniques useful. The reason I note this is because I've occasionally talked to moms who have been disappointed with their 'performance' during birth -- often because they had an expectation from a class of how the birth 'should have' progressed. One mom told me she was embarrassed at how loud she was, thinking that if she would have performed the class' breathing and relaxation techniques 'better', she wouldn't have needed to yell. My thought: if yelling or grunting works for a particular woman -- that is fine, whether or not it is in the standard technique playbook.

Visitors

Pick a close friend or family member to not visit until after four weeks have elapsed.

This is mostly for the first kid. It comes from a midwife, and was dramatically confirmed by our experience with our firstborn. She told us that after the child is born, friends and family will, understandably, want to come and visit and help out. By the time a month has elapsed, everyone has come and gone, and that is when the effects of sleeplessness can become more pronounced, and help is most appreciated/needed. We planned to have one of our good friends fly in around five weeks after the birth of our first child. Our baby was colicky until about six weeks, and our friend's simple willingness to do laundry, to organize pizza deliveries, and to simply watch the baby while we both took a short walk was deeply, deeply appreciated.

Breastfeeding

It's not innate! Learn, and ask for advice.

I suspect that in times when multiple generations of families lived together, or for those women who have had a baby after a bunch of their women friends have, this fact would not be the surprise that it was to us. However, we were among the first of our set of friends to have children, and so hadn't had the experience of extensive conversations.

I had naively assumed that breastfeeding would be some sort of natural, somewhat instinctive process, but our midwife encouraged us to go to a breastfeeding class. The instructor basically noted that given that our society is no longer made up of multi-generational families living together, many are not aware that the process can be difficult for mom and baby to get used to. In particular, she noted that because first-time moms are understandably concerned about the baby, that it's easy for an unconstructive cycle to quickly develop: mom is worried that baby isn't getting enough milk -> feedings thus become more stressful -> feedings thus become more difficult -> mom worries more. The instructor showed the (clothed) moms helpful holding techniques, gave information about dealing with sore nipples (and noted how commonly that condition occurs among first-time moms especially), and gave a hotline number for information and support and even a home-visit for a bit of coaching. Breastfeeding mostly went smoothly for mom & baby, but during a few difficult periods, the information and the normalization of the problem from that class was invaluable. (I trust that it's not even necessary here to spend any time noting why breastfeeding is a Good Thing.)

By the time we had our second baby a few years later, I noticed, in our brief hospital stay, much more information being offered about breastfeeding tips and techniques. I hope information this trend has continued. Congrats to all moms-to-be and dads-to-be out there!

Fedora / Shibboleth authorization solution

saturday, january 16th, 2010 8:06am

I don't work directly on Fedora (the repository software), but am very familiar with it due to my work with a programmer who does, and because I've worked on a django front-end for ingestion of items into fedora, as well as fedora-apis. My role in fedora work is more akin to the 'corner-man' in boxing. Together the boxer and I strategize about the opponent, his defenses and threats to our plans, and devise approaches to deal with evolving challenges. We cross ourselves, the fedora programmer goes into the ring, and between rounds I provide moral support, bandage wounds, and, because of my distance from actual battle, sometimes have useful ideas for the next round. This analogy's negativity toward the software is appropriate; to use another: We've bought a car that, in hindsight, I wouldn't recommend to others, but that we're committed to getting some terrific mileage out of.

So, it's been a tough fight, but our boxer is quick, has impressive endurance, and we believe we'll come out on top.

Fedora authorization is one round in which we think we've scored well.

Fedora comes bundled with an authorization piece called XACML. I don't know if it's due to xacml, or fedora's implementation of it, but from what I gather indirectly, it's terrifying enough that few use it, and it is, in fact, scheduled to be augmented in a future release with a new Great Hope: FeSL.

But if you want to go into production now, what to do? The dearth of published authentication/authorization 'live' solutions is why, as I understand it, so many fedora installations are either completely open (all objects public), or completely locked down for internal use.

We've assumed we would use some sort of wrapper around fedora, to authenticate against Shibboleth, with which our university is slowly moving forward. Shib's lack of logout capability, and the resultant assumption that users will happily quit their browser to logout, would seem quaintly amusing if it weren't true -- but that is another topic entirely, and single sign-on is certainly convenient. Not long ago we began to tackle how, specifically, to implement shib/fedora authorization.

Recently someone described to me an authorization approach the muradora folk took. I haven't looked at any documentation myself, but I was told that they wrote a servlet filter that takes a submitted name and password, and passes it to a non-centralized custom ldap server that exists only for the purpose of allowing fedora's built in ldap-xacml code to handle authentication. (For those unfamiliar: a java servlet filter acts as a front layer of a java webapp through which incoming requests and outgoing responses must pass, and can be modified.)

A few of us heard this and had divergent reactions. It sounded like a hack, which caused some to dismiss the approach. Personally, being quite partial to hacks that work around monolithic software obstacles, I thought the hack smacked of ingenious creativity and was worth further examination. I was indulged; the result: our corner has devised an approach that initial testing indicates will work well.

First, some necessary background info...

  • Our University shib implementation is integrated with Grouper. I think grouper is, or at least historically has been, a separate project from shibboleth, but they work together brilliantly. Upon shib-login, a list of the groups to which the user belongs is accessible to the server via the shib 'is-member-of' header field.

  • Our implementation of fedora item ingestion involves creating a METS record that contains a bunch of item-info -- including a rights segment. The rights segment contains a series of entries, each one listing an identity (a shib is-member-of group) and a permission. Example (content, not format): identity='chemistry-department' & permission='view_item'

  • The mets record is handed to an ingester that converts the mets xml to FOXML, then fedora grabs the object (we're using the 'managed' option at the moment), and the java messaging built into fedora fires off a message to a listener that indexes (via Solr) parts of the foxml record, including the rights information.

So, our approach: create a fedora servlet filter that reads the shib groups/identities, then does a solr search to see if the object being requested has a 'view' permission for any of the identities in the request's shib is-member-of header. If so, the request is allowed through; if not; it is blocked. If no shib-identity is found, the servlet filter will only yield objects with 'public' view_item permissions.

The beauty of this is that fedora-access can be fully open to the internet while still allowing authorized access for those objects that require it. Further, this solution offers reasonable hope that it will survive fedora upgrades, since the servlet, though a part of the fedora webapp, is somewhat of a separate layer in front of the app. Further, by adding more granular permissions (at the moment permissions are at the object-level; they could be at the data-stream level) -- or simply by a bit of extra programming in the servlet-filter -- we could allow, say, the public to access low and medium-resolution images, but allow, say, faculty to access high-resolution images.

I'll keep this paragraph updated... Our intrepid programmer has figured out where to insert the custom servlet filter, has worked with our systems person to hook up an initial apache/tomcat connection so as to allow the shib installation on apache to pass its headers through to tomcat, and confirmed the filter's detection of the shib identity header information. A nice side-effect of installing shib on apache rather than tomcat directly is that we can allow programmatic access to port 8080.

(some technical info and some code: here)

The bell has rung; the next round begins. We cross our fingers, and the programmer heads into the ring once again.