Merging RSS Feeds with Query of Queries

One of my all-time favorite features in ColdFusion is “query of queries”. I have yet to see this functionality duplicated as thoroughly, or as easily, in any other language outside of SQL itself. Of course the limitation with SQL is that it generally operates on relational data stored in a relational database. The query of queries functionality changes that for ColdFusion developers.

I’m a pretty blog-centric fellow. I rely on blogs to keep me up to date on the rapidliy changing world of technology, and on what it is that Adobe’s amazing community of developers is up to each week. I also publish a few blogs. I have this one for work, but I also have several personal ones. It’s a great way to distribute and share information across my family which is widely geographically distributed.

As you likely know, most blogs have an XML format called RSS that is often used to describe content in a generic form that can be read by other programs that understand RSS and/or XML in general. There are a few versions of RSS, and there’s also the emerging ATOM format. XML handling is also another great feature of ColdFusion. By comparison, in some other mainstream languages this is still amazingly painful.

Recently I decided that I’d like to merge two blogs with related content into one blog. I was publishing content too slowly to really need them both, but by combining them I could give the impression that I was very active. In my case one blog was managed by Blogger and published to my personal domain. The other blog wasn’t really even a blog at all, but my Flickr photostream. Flickr can make your photostream available as an RSS feed.

There was the basis for this little exercise - two blogs, both with RSS feeds, to be merged and sorted by date. Normally this would be quite the chore, but after some thought it occurred to me that RSS was giving me a neutral way of describing the content. The two feeds looked like identical relational database tables. If I could build a table of each feed, I could run a SQL statement on them to join them together (technically a union).

This is when the solution hit me smack in the face. Use CFHTTP to read the RSS feeds individually, parse the XML content, generate a ColdFusion query objects, fill the query objects with data and run the SQL to put them together and sort them. In the end, not only would I have merged XML structures from different sources and sorted them, but I would also have native CF query object which would be ideal for looping/output.

What I ended up with was something that looked like the following code snippet (non-pertinent information removed).

<!— Grab and parse Flickr RSS feed —>
<cfhttp url=”#flickr#” />
<cfset doc = XmlParse( CFHTTP.FileContent ) />

<!— Build query object with RSS feed data —>
<cfset photos = QueryNew( “source, title, link, description, publish”, “VarChar, VarChar, VarChar, VarChar, Date” ) />

<cfloop index=”sub” from=”1″ to=”#ArrayLen( doc.rss.channel.item )#” step=”1″>

  <!— Extract the publish date string as date/time —>
  <cfset publish = ParseDateTime( doc.rss.channel.item[sub].pubDate.XmlText ) />

  <!— Insert new row into query —>
  <cfset rows = QueryAddRow( photos ) />
  <cfset success = QuerySetCell( photos, “source”, “flickr” ) />
  <cfset success = QuerySetCell( photos, “title”, doc.rss.channel.item[sub].title.XmlText ) />
  <cfset success = QuerySetCell( photos, “link”, doc.rss.channel.item[sub].link.XmlText ) />
  <cfset success = QuerySetCell( photos, “description”, description ) />
  <cfset success = QuerySetCell( photos, “publish”, publish />
</cfloop>

<!— Print results to screen —>
<cfdump var=”#photos#” />

<!— ****************************** —>

<!— Grab and parse Blogger RSS feed —>
<cfhttp url=”#blogger#” />
<cfset doc = XmlParse( CFHTTP.FileContent ) />

<!— Build query object with ATOM feed data —>
<cfset posts = QueryNew( “source, title, link, description, publish”, “VarChar, VarChar, VarChar, VarChar, Date” ) />

<cfloop index=”sub” from=”1″ to=”#ArrayLen( doc.rss.channel.item )#” step=”1″>
  <!— Extract the publish date string as date/time —>
  <cfset publish = ParseDateTime( doc.rss.channel.item[sub].pubDate.XmlText ) />
  <cfset updated = doc.rss.channel.item[sub].updated.XmlText />
  <cfset offset = Mid( updated, Len( updated ) - 3, 1 ) />
  <cfset publish = DateAdd( “h”, offset, publish ) />

  <!— Insert new row into query —>
  <cfset rows = QueryAddRow( posts ) />
  <cfset success = QuerySetCell( posts, “source”, “blogger” ) />
  <cfset success = QuerySetCell( posts, “title”, doc.rss.channel.item[sub].title.XmlText ) />
  <cfset success = QuerySetCell( posts, “link”, doc.rss.channel.item[sub].link.XmlText ) />
  <cfset success = QuerySetCell( posts, “description”, doc.rss.channel.item[sub].description.XmlText ) />
  <cfset success = QuerySetCell( posts, “publish”, publish ) />
</cfloop>

<!— Print results to screen —>
<cfdump var=”#posts#” />

<!— Query of query to merge feed data —>
<cfquery dbtype=”query” name=”merger”>
  SELECT *
  FROM photos
  UNION
  SELECT *
  FROM posts
  ORDER BY publish DESC
</cfquery>

<!— Print merged data to screen —>
<cfdump var=”#merger#” />

I don’t know how particularly useful this is for anybody else, but I was proud of the little exercise, and felt it was worth sharing. If you do find it handy, please let me know. For me, it was simply amazing just how productive this actually was thanks to ColdFusion. The entire gathering, parsing, populating and merging of both feeds is less than 70 lines of code including comments and spacing.

WordPress database error: [Can't open file: 'wp_comments.MYI' (errno: 145)]
SELECT * FROM wp_comments WHERE comment_post_ID = '38' AND comment_approved = '1' ORDER BY comment_date

Leave a Reply