Adam Green's Web programming source code
RSS FEED IDEMS: Code.DarwinianWeb.com
- Changing the focus of this blog
I've decided to change this blog from focusing purely on Ruby programming to Web programming in general. This accompanies an effort on my part to experiment with some of the other common languages, such as Perl, Python and PHP. I'm not giving up on Ruby, but I am running into some of its limitations. I still think Ruby has the cleanest syntax of any programming language I've learned in the last 15 years or so, but I am finding weaknesses in the available libraries and documentation.
It looks like most of my coding over the next few years will be done with varieties of XML, including RSS, OPML and the results of API calls. I've looked at most of the XML tools available for Ruby, and have done a fair amount of work with REXML, and none of them have really met my needs. I like REXML's style of parsing XML, but I've consistently found it to be impossible for me to figure out how to do the things I need from the available documentation. I'm sure this is something unique to me, since other Ruby programmers have assured me they don't have these problems. It may have something to do with my age, but a long list of method names with virtually no explanation of common techniques using these methods just isn't enough. I tried contacting the author of REXML for several weeks, and when I finally did get through, his explanations of how to perform the tasks I had questions on proved to me that I would never have reached the answers from the available docs. I know there are great forums and Ruby programmers have been among the most helpful I've ever dealt with, but I like to learn how to code by reading about it.
So I plan on experimenting with other languages and recreating the work I've done with RSS in creating my RubyRiver aggregator and go further in the area of OPML and APIs. This site will serve as the publication vehicle for all of my source code in any language I try. I hope to move beyond the most familiar languages, and try some new ones. In the early Nineties Borland had a really interesting version of Prolog, and I have a book on Snobol that I've always wanted to read. XSLT is also another obvious language for me to try, but I guess I need to drive that with a procedural language like Python. There are probably a dozen or more free languages that I can try. I've always been a language junkie and this sounds like a lot of fun.
I won't give up on Ruby, but I will wait until I can either find a better way of dealing with XML or until REXML offers better documentation before doing anything more with it. The new domain name of code.darwinianweb.com will be the preferred way to access this site, but the earlier URL of ruby.darwinianweb.com will always remain in effect. I don't believe in ever breaking website links.
Thu, 16 Mar 2006 20:33:00 EST
- Fixing relative URLs in the reading list
I've discovered that lots of blogs are using relative addresses within the <link> tags that I use for autodiscovery of feeds. For example, they are using a URL of "/rss.xml," instead of "http://myblog.com/rss.xml". This is a no no, and since XML people are a rather anal group, they refuse to convert these to absolute addreses when they write aggregators. Instead feeds with relative addresses are just ignored. So I've modified by OPML creation script to convert relative addresses to absolute. The changed section of code is below.tmopml.rb
# Find the first feed link.
if tag.match(('rel=\"alternate\"') &&
('application\/rss\+xml'||'application\/rdf\+xml'||'text\/xml'
||'application\/atom\+xml'||'application\/x.atom\+xml'||'application\/x-atom\+xml') ) &&
(not feedfound)
# Extract the feed's URL
matchdata = /href=\".*?\"/.match(tag)
matchstr = matchdata.to_s
xmlurl = matchstr[6..matchstr.length-2]
feedfound = true
# NEW CODE
# Fix relative feed addresses.
require "uri"
xmluri = URI.parse(xmlurl)
if xmluri.relative?
htmluri = URI.parse(htmlurl)
xmlurl = htmluri.merge(xmlurl)
xmlurl = xmlurl.to_s
end
# NEW CODE
opmlfile.puts(' <outline type="rss" text="' + title + '
" xmlUrl="' + xmlurl + '" htmlUrl="' + htmlurl + '" />')
end
Wed, 08 Feb 2006 08:58:00 EST
- Automatic OPML reading list from Tech Memeorandum
This new script reads my XML list of Tech Memeorandum blogs and generates an OPML file with the matching RSS feeds. I do some simple RSS feed autodiscovery in each blog to find the feed. What this latest project really proves is that I don't actually know how to use regular expressions. If I did, I could probably condense this script down to a few lines. Anyway, it works, and maybe I'll get around to reading up on regex someday. The script is running on my mashup blog, and the resulting OPML file is updated every hour and placed here. You can grab the file once or subscribe to it as a reading list in your aggregator. Please let me know if you make any improvements to the script.tmopml.rb
#! /usr/bin/ruby
# tmopml.rb
# Create an OPML reading list based on a list of
# blogs cited on http://tech.memeorandum.com.
#
# Copyright (C) 2006 Adam Green
# http://mashup.darwinianweb.com, adam AT darwinianweb DOT com
# This program is distributed under the same license as Ruby.
#
require "open-uri"
require "rexml/document"
include REXML
# Create the OPML file.
opmlfile = File.new("public_html/projects/tmblogs/tmopml.xml", "w")
opmlfile.puts('<?xml version="1.0" encoding="UTF-8"?>')
opmlfile.puts('<opml version="1.1">')
opmlfile.puts(' <head>')
opmlfile.puts(' <title>Tech Memeorandum Reading List</title>')
opmlfile.puts(' <dateCreated>' + Time.now.rfc2822 + '</dateCreated>')
opmlfile.puts(' <ownerName>Adam Green - Mashup.Darwinianweb.com</ownerName>')
opmlfile.puts(' </head>')
opmlfile.puts(' <body>')
# Open the list of blogs maintained at
# http://mashup.darwinianweb.com/projects/tmblogs/tmblogs.xml
doc = Document.new(File.read("public_html/projects/tmblogs/tmblogs.xml"))
doc.elements.each("tmblogs/blog") do |blog|
title = blog.elements["title"].text
htmlurl = blog.elements["htmlUrl"].text
begin
# Get the blog page's text.
page = open(htmlurl)
pagetext = page.read
page.close
# Pull out all the link tags.
feedfound = false
pagetext.scan(/<link.*?>/i).each do |tag|
# clean up the tag.
tag = tag.delete(" ")
tag = tag.downcase
# Find the first feed link.
if tag.match(('rel=\"alternate\"') &&
('application\/rss\+xml'||'text\/xml'||'application\/atom\+xml'
||'application\/x.atom\+xml'||'application\/x-atom\+xml') ) &&
(not feedfound)
# Extract the feed's URL
matchdata = /href=\".*?\"/.match(tag)
matchstr = matchdata.to_s
xmlurl = matchstr[6..matchstr.length-2]
feedfound = true
puts title, htmlurl
opmlfile.puts(' <outline type="rss" text="' + title +
'" xmlUrl="' + xmlurl + '" htmlUrl="' + htmlurl + '" />')
end
end
* Trap time out errors.
rescue Exception
puts title, 'timeout'
end
end
opmlfile.puts(' </body>')
opmlfile.puts('</opml>')
opmlfile.close
Sun, 05 Feb 2006 08:18:00 EST
- Boston Ruby Group meeting on Feb. 7th
I haven't been to one of these before. I hope they don't laugh too much at my code.
Fri, 03 Feb 2006 15:21:00 EST
- Extracting Tech Memeorandum's blog list
I know I'm supposed to be working on my RubyRiver tutorial, but I got distracted playing with ideas for mashups I want to demo at Mashup Camp. One type of mashup I want to work on is merging people's name's and blog URL's with various search engines. Tech Memeorandum aggregates posts from a great set of blogs, so I'm going to use that site as the starting source for my people mashup data. I'll explain the full project on my mashup blog. I'm going to try and maintain the pattern of posting the Ruby code for anything I work on here on the Ruby blog. That way I can publish complete source code listings without scaring away the non-programmers who read my other blogs. The idea of this code is that it reads the home page of Tech Memeorandum, extracts the links to blogs, and saves them as an XML file. The XML file will be permanently located at this location. Right now this XML file is not updating, but once I get the whole system running, it will automatically refresh. Hopefully, others will use it as the basis for their own mashups. tmparse.rb
#! /usr/bin/ruby
# tmparse.rb
# Extract the blog citations from the home page of
# http://tech.memeorandum.com.
#
# Copyright (C) 2006 Adam Green
# http://ruby.darwinianweb.com, adam AT darwinianweb DOT com
# This program is distributed under the same license as Ruby.
#
# Each blog is identified in the page with the following entry:
# <CITE>First Last / <A HREF="http://url/">Blog Name</A>:</CITE>
# If there is no author's name, the citation is:
# <CITE> <A HREF="http://url/">Blog Name</A>:</CITE>
# Get the page's text.
require "open-uri"
page = open("http://tech.memeorandum.com/")
pagetext = page.read
page.close
# Convert ellipse entity used by TM, since it gives XML parsers fits.
pagetext = pagetext.gsub("…", "...")
# Pull out all the citations.
citelist = pagetext.scan(/<cite>.*?<\/cite>/i)
# Build a hash with them.
sortlist = {}
citelist.each do |citation|
# Only use citations with URLs.
if citation.match(/a href/i)
htmlurlstart = citation.index('="')+2
htmlurlend = citation.index('">')-1
htmlurl = citation[htmlurlstart..htmlurlend]
titlestart = htmlurlend+3
titleend = citation.index('</A>')-1
title = citation[titlestart..titleend]
# Does the citation include an author?
if citation.index("/") < citation.index("<A HREF")
authorstart = 6
authorend = citation.index("/")-2
author = citation[authorstart..authorend]
author = author.strip
author = author.squeeze(" ")
sortkey = author
else
author = ""
sortkey = title
end
# Build the hash, so it can be sorted on author or title
sortlist[sortkey.upcase] = { "author" => author,
"htmlurl" => htmlurl,
"title" => title }
end
end
# Write the sorted list out to an XML file.
xmlfile = File.new("../../projects/tmblogs/tmblogs.xml", "w")
xmlfile.puts('<?xml version="1.0" encoding="utf-8" ?>')
xmlfile.puts('<tmblogs>')
# Hash#sort returns an array.
sortarray = sortlist.sort
sortarray.each do |item|
info = item[1]
xmlfile.puts(' <blog>')
xmlfile.puts(' <author>' + info["author"] + '</author>')
xmlfile.puts(' <title>' + info["title"] + '</title>')
xmlfile.puts(' <htmlUrl>' + info["htmlurl"] + '</htmlUrl>')
xmlfile.puts(' </blog>')
end
xmlfile.puts('</tmblogs>')
xmlfile.close
Fri, 03 Feb 2006 11:47:00 EST
- Tutorial logistics
I'm going to use OpenOffice to write the tutorial, since it can output .PDF files. It actually makes sense to post both .Doc and .PDF versions. This will allow people who want to make comments to download the .Doc file, mark it up, and then email it to me. I know a Wiki could also be used, but that opens up a whole bunch of issues. A Wiki makes a lot of sense for writing a reference manual, but a tutorial needs a pretty well planned out flow, which sounds hard to do with a group. My current idea is to provide links to both the complete tutorial files as they progress, and the latest section as a separate file. This lets people download just the new parts as they appear. We'll see how all this works. I'll probably have to make adjustments.
Sun, 22 Jan 2006 16:11:00 EST
- Tutorial outline
Here is the initial outline for the RubyRiver tutorial: - Who should read this tutorial?
- What will not be included in this tutorial?
- Windows development and Linux website
- RubyRiver RSS Aggregator
- Install Ruby on Windows
- Getting started with Ruby programming
- Install Apache on Windows
- Run Ruby programs in a web browser
- Register a domain name
- Rent a Linux webserver
- Install SSH and SFTP clients
- Configure local and remote web servers
- Parallel development on Windows and Linux
- RSS anatomy
- RSS programming
- RubyRiver architecture
- Gather RSS Feeds
- Merge RSS Feeds
- HTML template programming
- Generate RubyRiver webpages
Sun, 22 Jan 2006 07:36:00 EST
- Time to start the tutorial
The RubyRiver source code has been available for a week and I haven't received a single complaint, so I guess it's safe to start building a tutorial around it. The first step is figuring out a publishing model. I want to do this blog-style, so I'll be posting updates as I write them. It doesn't make sense to post the full text of each installment in the blog, and I'd like more control of the formatting than I can get in this page format. I guess the best format will be to create individual PDF files each day, and just post links to them in individual blog posts. First thing tomorrow, I want to work out an outline, which I'll post here. I'll also add links to the installments on the RubyRIver navbar.
Fri, 20 Jan 2006 18:32:00 EST
- Ruby programmers face growing demand
According to the Indeed.com job site the number of online want ads mentioning Ruby doubled between October and the end of the year.

I have more details on my main blog, including a comparison of the most popular languages.
(Via Steve Rubel)
Sun, 15 Jan 2006 22:31:00 EST
- RubyRiver source code is online
The code for RubyRiver is now available for download. Please let me know if you try it and find any problems. I gave it a version number of 1.0, which may seem strange for such a simplistic first attempt, but I've never understood the idea of 0.X version numbers. A version number of less than 1.0 makes sense for internal versions of code where there is a clear idea of what needs to be added before the code is released. You build up to this first feature set and then launch it with version 1.0. If you are using the model of making code available during early development, how do you ever know when 1.0 has arrived? I'd rather start with 1.0 and then work upwards from there. I'll do incremental releases until I switch to an OOP version, and then the code will become 2.0. The first Rails version will mark 3.0.
Fri, 13 Jan 2006 09:22:00 EST
- Where do I want to go with Ruby?
With the RubyRiver site launched it is time for me to lay out a long-term plan for my Ruby efforts. Here is a rough outline of what I'd like to accomplish over the coming year:- Write a tutorial on building RubyRiver with procedural code on a Windows development machine.
- Rewrite RubyRiver using class libraries, and create a second volume of the tutorial to cover this work.
- Rewrite RubyRiver with Rails, and use this as the subject of a third tutorial.
- Resume my Really Simple Blog project, which will convert the CMS for my blogs from the current FoxPro code into Ruby with MySQL. I might go through the same three stage coding process (procedural, OOP, Rails), and I might consider writing this up as a tutorial as well.
Since I have other non-Ruby projects I want to work on, I estimate that completing this Ruby plan will take most of 2006.
Thu, 12 Jan 2006 09:55:00 EST
- RubyRiver.org is running
The code worked on the RubyRiver Linux server without any changes, which says a lot about the ease of using a local Windows development system. I'm not suggesting for a minute that anyone switch to Windows for local development. My point is that if you are already working in Windows, you don't have to put Linux on your local computer to create web apps in Ruby. I want to spend a day or so cleaning up the code before I publish all of it on the RubyRiver site. I also want to make a few changes, like finding better ways to select out just the Ruby posts. The category tag isn't a clear standard in RSS, but I can make better use of it anyway. I also need to learn more about REXML, which I use to parse the RSS. Our code is in conflict over things like non-XML entities. We both keep trying to strip them out, which results in our stepping on each other. Next week I want to start working on a tutorial based on the RubyRiver code, which was the reason for writing it in the first place. I also need to add a submission page for new feeds, and a simple FAQ.
Wed, 11 Jan 2006 09:20:00 EST
- Putting RubyRiver together
Finally, I can put it all together to be called once an hour by a cron job. So far, I've been running all of this code locally on a Windows machine. Tomorrow I'll see what has to be done to get it all working on the RubyRiver.org server. rubyriver.rb
#! /usr/bin/ruby
# Gather the latest copies of the feeds and
# generate a new RubyRiver home page.
require 'gather_feeds'
require 'merge_feeds'
require 'gen_home'
Tue, 10 Jan 2006 20:22:00 EST
- Generating the home page
The RubyRiver home page can now be assembled from the page.html template.gen_home.rb
#! /usr/bin/ruby
require 'get_param'
require 'gen_items'
require 'gen_feedlist'
require "open-uri"
require "rexml/document"
include REXML
# Read the page template into memory.
pagestr = File.read("page.html")
# Replace the template tokens with the proper data.
pagestr = pagestr.gsub("<*sitetitle*>", get_param("rubyriver.yml","sitetitle"))
pagestr = pagestr.gsub("<*publishedfeed*>", get_param("rubyriver.yml","publishedfeed"))
pagestr = pagestr.gsub("<*sitedescription*>", get_param("rubyriver.yml","sitedescription"))
pagestr = pagestr.gsub("<*content*>", gen_items)
pagestr = pagestr.gsub("<*feedlist*>", gen_feedlist)
# Write the filled template out as the home page.
pagefile = File.new(get_param("rubyriver.yml","webdir")+"index.html", "w+")
pagefile.puts(pagestr)
pagefile.close
# Copy the current RSS file into the home directory.
feedname = get_param("rubyriver.yml","publishedfeed") + ".xml"
require 'ftools'
File.copy(feedname,get_param("rubyriver.yml","webdir") + feedname)
Tue, 10 Jan 2006 20:03:00 EST
- Generating the item list
The feed items from the internal feed are used to generate a string of HTML when merged with the item template. gen_items.rb
#! /usr/bin/ruby
# Create an HTML string from the current feed items.
# This will appear in the RubyRiver content area.
def gen_items
# Read the entire RSS file into memory.
doc = Document.new(File.read(get_param("rubyriver.yml","internalfeed")+".xml"))
# Read the item template into memory. templatestr = File.read("item.html")
# Extract each RSS item.
# Merge with the item template to create a string of HTML.
itemstr = "" doc.elements.each('rubyriver/channel/item') do |item|
currentitemstr = templatestr
currentitemstr = currentitemstr.gsub("<*itemlink*>", item.elements['link'].text)
currentitemstr = currentitemstr.gsub("<*itemtitle*>", item.elements['title'].text)
currentitemstr = currentitemstr.gsub("<*feedlink*>", item.elements['feedlink'].text)
currentitemstr = currentitemstr.gsub("<*feedtitle*>", item.elements['feedtitle'].text)
currentitemstr = currentitemstr.gsub("<*itemdate*>", item.elements['pubDate'].text[0..21])
currentitemstr = currentitemstr.gsub("<*description*>", item.elements['description'].text)
itemstr += currentitemstr
end
return itemstr
end
Tue, 10 Jan 2006 19:54:00 EST