God's Word | our words | meaning, communication, & technology | following Jesus, the Word made flesh
I've moved to a new blogging platform (goodbye Radio Userland, hello WordPress).
But if you read through an RSS aggregator (this is really important, so pay attention):
If you read directly from the website, everything will work as before at my preferred URL, http://www.semanticbible.com/blogos/. The new site includes several syndication buttons that make it easy to add Blogos to your Bloglines, MyYahoo!, or other readers.
If you have any problems with this, please send me (sean) an email at semanticbible daht com. I don't want to lose any readers in the transition (there aren't that many to start with!).
In a comment on my recent thoughts on semantic search, Matt asks a reasonable question: "Wouldn't Louw-Nida help?" Since i've recently gotten a copy of Logos 3 Scholar's Library: Silver (i'll have a lot more to say about that later, but here's the preview: it's a fantastic resource), i tried it out. For this particular question, the answer appears to be no.
Humility is under 88/G, Moral and Ethical Qualities and Related Behavior/Humility (note this is a conceptual label for the passage: the word humility doesn't actually occur). Related words here would include:
This isn't too surprising: Louw-Nida is a lexical resource, but the fundamental issue here (and the point of my post) is that there are lots of significant semantic concepts above the level of words. That's exactly what makes notions like "topic" slippery in practice.
xpound.org is a new Web 2.0 site that provides passage search, blogging, and social connections, but with an interesting new twist of Bible tagging, along the lines of del.icio.us. (I'm not sure if it's pronounced with equal stress like "slashdot", or "ex-POUND", like the verb) The basic idea of tagging is that, rather than a top-down, authoritative organization and labeling of knowledge, people can simply attach whatever labels make sense to them, in a bottom-up, unstructured (and, some would say, chaotic) fashion. The natural advantage of this kind of folksonomic tagging is that, at internet scale, it can overcome a lot of the messiness, while highly structured knowledge management approaches don't always scale. As with other tagging sites, there's no guarantee that what somebody tags as, say, africa, will have meaning to anybody else. But it means something to the person who tagged it, and thus becomes a highly personalized way to organize information.
I think using this approach for Scripture makes some sense, and i've blogged about it previously. But i also have some questions. With del.icio.us, the item being tagged is clearly defined: it's a URL. But what's the natural unit for tagging Scripture? Verses are one answer, but they often don't have enough context. Books are generally too large, and chapter divisions don't necessarily line up with the content you'd want to tag. Of course, you can tag arbitrary passages: but here's where the comparison to del.icio.us breaks down. With del.icio.us, others who use the same tags as me can point me to sites i didn't know about. But where the passages aren't necessarily bounded the same, aggregation doesn't work quite the same way.
Here's an example: a few days ago Josh tagged Ephesians 5:3-4 with five tags: gratitude, greed, immorality, impurity, and saint.
3 But sexual immorality and all impurity or covetousness must not even be named among you, as is proper among saints. 4 Let there be no filthiness nor foolish talk nor crude joking, which are out of place, but instead let there be thanksgiving. (Eph.5.3-4)
If you search on these tags (like gratitude), you'll find this unit. If instead you search by verse for Eph5.3, you still see the tag for gratitude, presumably inherited from the larger unit that was tagged (though you could argue that gratitude really only applied to verse 4). And of course, the following verses also talk about immorality and impurity, though (since they weren't included in these tags) they're not retrieved.
One of the most interesting new capabilities that del.icio.us creates is knowledge discovery: if i find someone who has bookmarked several of the same sites as i have, i can go look at their other bookmarks, and often find new sites i was unaware of. This provides a kind of search by likeminded community intelligence, a really interesting counterpart to typical web search engines. I haven't found this capability in xpound, but it would be a great addition.
(Hat tip to the ESV Blog for pointing me to xpound.org)
I'm preparing a new version of the Composite Gospel Index pages, to standardize around the ESV text, and hopefully provide both more usability and more visual appeal. Designing an interface for this data poses some interesting challenges. There's a wealth of different attributes available, and while some (like traditional verse references) are familiar to most Bible students, i'm hoping to get outside the box a bit and do some novel things.
The whole point of the Composite Gospel is to provide a different way to look at the story of Jesus' life, in particular one that is more oriented around stories, many of which are common to multiple Gospels, and to show how they fit into the whole. So i'm hoping to reinforce this in the new interface. Right now there are two ways to access the Composite Gospel, the typical entry point being the Pericope Index, a traditional single static page listing the pericope ID, title, and references, with hyperlinks to the content pages. It's got a number of faults:
The individual pages themselves have different navigational elements: next/previous pericope, and also next/previous for a given Gospel author. These are okay as far as they go: my major complaint is they don't go far enough. I'm also hoping to add more supplemental information:
It will be a while before i can do all this, though!
I've been searching for some time for the right visual metaphor (and corresponding interface code) to provide a much more visual index to replace the current text-heavy index. It would be great if you could scan a clear visualization of which authors covered a particular story, and how much content there is for it (number of tokens). Likewise, when you've selected an individual pericope, you should have a clear view of where it fits into the entire sequence.

In preparing for this, i got interested in the distribution of sources (an individual author's version) by their size. This graph shows that, binned in groups of 10: the black trend line smooths this a little further with a moving average (window of 3). There's quite a bit of variety (no surprise), ranging from a single source with just 9 tokens (Luke's description of the beginning of Jesus' Galilean preaching ministry, " And he was preaching in the synagogues of Judea.", Pericope 048: Jesus preaches throughout Galilee), to a single source with 566 tokens (Pericope 119: Jesus prepares the disciples for persecution, found in Matthew). But there's some approximation of a normal distribution (with an elongated tail on the high side), and clearly the bulk have from 30 to perhaps 270 tokens, with values near the median of around 30-40 instances (since i'm binning, this number itself isn't very meaningful). This suggests the cases i need to optimize for: i should be able to fit up to about 270 token displays on something close to a single page view (these days that really means 1024 x 768 pixels, though surprisingly i still get 15-20% of my visits from people with 800x600 displays).
Ultimately, i'd love to have a rich treemap interface to support exploring the data in a variety of different ways (this was the substance of my presentation at the Society for Biblical Literature last year). As publisher Tim O'Reilly notes in a recent post, treemaps are really made to be interfaces, not graphs: their power lies in your ability to interact with them to explore the data. Unfortunately, i don't know how to do this live on my website: i don't have permission to host the Treemap software i use myself from the University of Maryland, and i don't know of a good substitute (O'Reilly's post is about a Rails implentation, but that's outside my current scope).
But Jesus called them to him and said, "You know that the rulers of the Gentiles lord it over them, and their great ones exercise authority over them. It shall not be so among you. But whoever would be great among you must be your servant, and whoever would be first among you must be your slave, even as the Son of Man came not to be served but to serve, and to give his life as a ransom for many." (Matt.20.25-28)
I've been thinking about topic labels for Scripture passages lately: a deceptively simple idea that's quite hard to nail down. The notion of topic includes many different things: a person might be a topic (Jesus talks about John the Baptist in Luke.7.24-30), but every mention of a person probably isn't a topic in quite the same sense (the same passage mentions the Pharisees, but the passage isn't really about them, it simply mentions them). Sometimes key words and phrases are topics ("luxury" is a word in the same passage, and a relatively distinct one at that: it only occurs 4 times in the New Testament). But if that's what you mean by a topic, then word searches will usually find what you want. The toughest cases (and therefore the most interesting ones) are when you don't have a distinctive lexical item for a topic decision.
The classic Librarian Problem is that whatever i call a topic may have different meaning to someone else, or fall outside the conceptual schema they're using for searching (Shirky has a nice overview of this). The kind of folksonomic tagging popularized by del.icio.us works well at a personal level (i know what my "facets" tag means to me, even though you may not), and it works well at the larger level because enough others might happen to use the same tags that aggregation adds value. I expect this kind of tagging for Scripture will start to show up in some interesting ways in the next year under the Web2.0 rubric.
Here's what got me thinking about this: i was reading Humility by Andrew Murraythis morning (highly recommended, by the way), and he discusses the passage above as an example of Jesus' teaching about humility. I'd agree (as would Naves, and most other topic-oriented indexes): but if you wanted to label such passages in some automated fashion, what evidence would you use? The words "humble" and "humility" are nowhere to be found, and neither are their direct antonyms like "proud". Jesus mentions the contrasting examples of Gentiles who "lord it over them" and others who "exercise authority over them": but these complex semantic constructs aren't easy to take apart (and the first one isn't very typical English: the Contemporary English Version's translation of "order their people around" is arguably more natural). Certainly being the servant of others implies the personal trait of humility, but the relationship is quite abstract.
Just another argument for why this kind of annotation of Scripture will probably be done the old-fashioned way (by hand) for the foreseeable future ...
I've been putting some of the data behind the Hyper-concordance into MySQL, in preparation for computing some statistics on lexical co-occurrence. Along the way, i've been collecting some numbers that i thought others might find interesting. There are a number of other sources for NT statistics: for example, this page from Prof. Felix Just shows words per verse per chapter per book (in the Greek NT).
What's different about the numbers below is that they're based on Hyper-concordance's approach, which groups various inflected forms under their base form (what linguists call a lemma). For example, 'saying', 'says', and 'said' are all pooled under 'say' (as it turns out, the most common lemma in the New Testament, with 1946 occurrences). In the example from the Hyper-concordance home page (Mark.4.24), there are 10 content lemmas (9 of them unique) in this verse of 30 words: "say", "pay", "attention", "hear", "measure" (twice), "use", "still", "more", "add".
| Count | Unique | |
| terms | 73872 | 6333 |
| base terms | 73872 | 4526 |
| name words | 6638 | 593 |
| non-name words | 67234 | 3933 |
| singletons | 1444 | 1444 |
| name words | 281 | 281 |
"Count" is the actual instances, as opposed to the unique values (which we could call the content vocabulary of the New Testament). Some comments:
Caveats:
(The second word that occurs in both capitalized and uncapitalized forms is much less obvious, though you'll figure it out if you think a lot about it ...)