Name: Rogers Cadenhead
Member since: 2002-04-04 04:26:04
Last Login: 2008-07-09 13:49:01
Homepage: http://www.cadenhead.org/workbench
Displaying Twitter Updates on a Web Page
I recently began using Twitter, a microblogging service for posting short, chat-like blog entries and reading what other users of the service are doing. The site has severe reliability problems, but it's still an entertaining way to get real-time updates from bloggers I read along with others I know who've been sucked into Twitter's maw.
I wrote some code to display my most recent Twitter update on my weblog, Workbench, in a sidebar at upper right. This afternoon, I've released the Twitter-RSS-to-HTML PHP script under an open source license. The script requires MagpieRSS for PHP, an open source PHP library that can parse RSS and Atom feeds.
MagpieRSS caches feed data, so at times when Twitter is glacially slow or can't be accessed, this script won't hurt the performance of your server.
The first release of the script only works with a Twitter user's RSS feed, which can be found in the "RSS" link at the bottom of a user's Twitter page. The only tough part about writing the script was creating regular expressions to turn URLs into hyperlinks and "@" references into links to Twitter user pages:
// turn URLs into hyperlinks
$tweet = preg_replace("/(http://)(.*?)/([w./&=?-,:;#_~%+]*)/", "<a href="\0">Link</a>", $tweet);
// link to users in replies
$tweet = preg_replace("(@([a-zA-Z0-9]+))", "<a href="http://www.twitter.com/\1">\0</a>", $tweet);
If you're reading this and wondering why anyone should bother with Twitter, I recommend reading the updates by Jay Rosen, a journalism chair at New York University who uses the service to share a running dialogue on the media. He punches above his weight in this 140-character-or-less medium.
Following Web Page Redirects with Java
CNET moved a bunch of its blogs to a different domain this weekend, including Beyond Binary, Coop's Corner, Geek Gestalt, One More Thing, Outside the Lines and The Social. I mention this because the change hosed Meme13, which treated all six as if they were newly discovered sites.
One of my ground rules for developing Meme13 is that I won't hand-edit the site to make it smarter. I need the application to recognize when existing sites in its database have moved.
Meme13 monitors sites using a Java application I wrote that downloads web pages with the Apache HTTPClient 3.0 class library. Web servers indicate that a page has moved by sending an HTTP redirect response of either "301 Moved Permanently," which indicates a permanent move, or "302 Found," which is intended for temporary changes. I wrote a Java method that can find the current location of a web page, even if it has been redirected one or more times:
public String checkFeedUrl(String feedUrl) {
String response = feedUrl;
HttpClient client = new HttpClient();
HttpMethod method = new HeadMethod(feedUrl);
method.setFollowRedirects(false);
try {
// request feed
int statusCode = client.executeMethod(method);
if ((statusCode == 301) | (statusCode == 302)) {
// feed has moved
Header location = method.getResponseHeader("Location");
if (!location.getValue().equals("")) {
// recursively check URL until it's not redirected any more
response = checkFeedUrl(location.getValue());
}
} else {
response = feedUrl;
}
} catch (IOException ioe) {
response = feedUrl;
}
return response;
}
The HeadMethod class requests a web page's headers instead of requesting the entire page, consuming far less bandwidth as it checks for redirects. My Java method looks for both kinds of redirects, because web publishers have a bad habit of using "302 Found" when they've moved a page permanently.
Setting the Link on a ShareThis Widget
I'm continuing to work on Meme13, a site that packages together the last 13 sites to show up on the Techmeme Leaderboard so they can be sampled as a feed or web site. The site has attracted around 25 RSS subscribers in its first month.
I've added a ShareThis widget on each entry that makes it easy to share content from Meme13 on sites like De.licio.us, Digg and Facebook.
Normally, ShareThis links to the page the widget has been displayed on. That doesn't suit my purposes on Meme13, because I'm trying to promote the originators of the content. If someone reads the article about landing a startup job by Ryan Spoon on Meme13, the ShareThis widget should link to the article on Spoon's blog.
ShareThis has a JavaScript API that can be used to teach the widget new tricks. Here's the JavaScript code to set the widget's target link and display the widget:
<p><script language="javascript" type="text/javascript">
SHARETHIS.addEntry({
title:'<TMPL_VAR title>',
url:'<TMPL_VAR link ESCAPE="HTML">',
}, {button:true} );
</script></p>
The <TMPL_VAR title> and <TMPL_VAR link ESCAPE="HTML"> tags are part of the template language used by Planet Planet, the software that publishes Meme13. Here's how the same thing could be done in PHP:
<p><script language="javascript" type="text/javascript">
SHARETHIS.addEntry({
title:'<? echo $site_title; ?>',
url:'<? echo $site_link; ?>',
}, {button:true} );
</script></p>
How to Crash Your Apache Server with PHP
I returned from a trip out of town Monday to crashing web servers that ate my lunch all week long. For several days, I used the top command in Linux and watched helplessly as two servers ground to a halt with load averages higher than 100.
Top reports the processes that are taking up the most CPU, memory and time. On the server running Workbench, the culprit was always httpd, the Apache web server. This didn't make sense, because Apache serves web pages, images, and other files with incredible efficiency. You have to hose things pretty badly to make Apache suck.
If you know the process ID of a server hog, Apache can tell you what that process is doing in its server status report, a feature that requires the mod_status module. The report for Apache's web site shows what they look like.
Using this report, I found the culprit: A PHP script I wrote to receive trackback pings was loading the originating site before accepting the ping, which helps ensure it's legit:
// make sure the trackback ping's URL links back to us
$handle = fopen($url, "r");
$tb_page = '';
while (!feof($handle)) {
$tb_page .= fread($handle, 8192);
}
fclose($handle);
$pos = strpos($tb_page, "http://www.cadenhead.org/workbench");
if ($pos === false) {
$error_code = 1;
send_response(1, "No link found to this site.");
exit;
}
Most trackback pings are not legit -- I've received 600 from spammers in just the past three hours. Each ping required Apache to check the spammer's site, download a page if it existed, and look for a link to Workbench. A single process performing this task could occupy more than 50 percent of the CPU and run for a minute or more.
I'm surprised Apache ran at all after I added trackback a couple months ago. I was beginning to think the web server software was idiot-proof, but I've proven otherwise.
Twitter: Where Ruby on Rails Goes Off the Track
An article posted on eWeek today was written in an alternate universe where Twitter works:
As the maker of one of the largest applications using Ruby on Rails on the Web, Twitter knows a thing or two about scaling applications built with the popular development framework.
Britt Selvitelle, a senior engineer at Twitter, offered a few tips and tricks for scaling Ruby on Rails and expressed particular appreciation for the Rails framework itself and the language is it based on, Ruby.
"For us, for a large part of our system, Ruby has been the tool that fit," Selvitelle said.
The subhead of the article: "Twitter's reliance on Ruby and Ruby on Rails proves the language's resilience."
Twitter's a nice service, but it's one of the most crash-prone sites I've ever visited. The fact it was written in Ruby on Rails makes me wonder whether the Rails framework can scale, at least once you reach the big leagues and have several hundred thousand users hammering on your web application. On the same day as the eWeek article, TechCrunch floated a rumor that Twitter is dumping Ruby on Rails.
-- via Meme13
rcaden certified others as follows:
Others have certified rcaden as follows:
[ Certification disabled because you're not logged in. ]
FOAF updates: Trust rankings are now exported, making the data available to other users and websites. An external FOAF URI has been added, allowing users to link to an additional FOAF file.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!