PeterAhlstrom he/him Posted March 2, 2012 Posted March 2, 2012 Some of you might be interested in this, so here it is. Also, if you are good at code, maybe you can tell me where I am laughably inefficient. First I get the timeline. wget -O user_timeline.xml "http://api.twitter.com/1/statuses/user_timeline.xml?screen_name=brandsanderson&count=200&trim_user=true&since_id=168023980798779395" (The since_id changes based on whatever I retrieved last time. Also, 200 posts is the max Twitter allows without OAuth.) Then I do this: /usr/bin/perl tweetthing.pl < user_timeline.xml > sorted.html Here is my tweetthing.pl script: #!/usr/bin/perl use LWP::Simple; use URI; use URI::Find; use Time::Piece; use HTML::Entities; my $bigbuf, $buf, $i; # gather up all the input while (read(STDIN, $buf, 1024)){ $bigbuf .= $buf; } #remove multiple spaces #$bigbuf =~ s/\s+/ /g; # split up the input into relevant tokens my @parts = split(/<status>\n/, $bigbuf); @parts = reverse @parts; #remove last part, which is extraneous pop(@parts); # Add div tag to beginning of output document print "<div class='brantweets'>"; foreach (@parts) { my @brandontweet = split(/(<created_at>|<\/created_at>\n <id>|<\/id>\n <text>|<\/text>|<in_reply_to_status_id>|<\/in_reply_to_status_id>\n <in_reply_to_user_id>|<\/in_reply_to_user_id>|<in_reply_to_screen_name>|<\/in_reply_to_screen_name>)/,$_); #In the array brandontweet, Part 4 is the status number #Part 2 is the timestamp #Part 6 is status #Part 10 is in reply to status number #Part 16 is in reply to status person #Part 12 is in reply to userID my $brandonstatusid = $brandontweet[4]; #convert timezone to local my $brandondatestamp = Time::Piece->strptime($brandontweet[2], "%a %b %d %H:%M:%S %z %Y"); my $brandondate = $brandondatestamp->strftime("%a %b %d"); #get rid of html entities my $brandonstatus = decode_entities($brandontweet[6]); #remove multiple spaces between sentences $brandonstatus =~ s/\s+/ /g; find_uris($brandonstatus, sub { my ($find_uri, $orig_uri) = @_; my $uri = URI->new( $orig_uri ); $uri = $uri->canonical->as_string; return '<a href="' . $uri . '">' . $uri . '</a>'; }); my $fanuserid = $brandontweet[12]; my $fanusername = $brandontweet[16]; my $fanstatusid = $brandontweet[10]; if ($fanstatusid != ""){ my $url="http://api.twitter.com/1/statuses/show/".$fanstatusid.".xml"; my @fantweet = split(/(<created_at>|<\/created_at>|<text>|<\/text>|<profile_image_url>|<\/profile_image_url>)/,get($url)); #In the array fantweet, Part 6 is the status #Part 2 is the timestamp #Part 10 is the image URL #convert timezone to local my $fandatestamp = Time::Piece->strptime($fantweet[2], "%a %b %d %H:%M:%S %z %Y"); my $fandate = $fandatestamp->strftime("%a %b %d"); #get rid of html entities my $fanstatus = decode_entities($fantweet[6]); #remove multiple spaces between sentences $fanstatus =~ s/\s+/ /g; find_uris($fanstatus, sub { my ($find_uri, $orig_uri) = @_; my $uri = URI->new( $orig_uri ); $uri = $uri->canonical->as_string; return '<a href="' . $uri . '">' . $uri . '</a>'; }); my $fanimage = $fantweet[10]; print "<p><img src='".$fanimage."'><a href='http://twitter.com/".$fanusername."/status/".$fanstatusid."'><b>".$fanusername."</b></a> ".$fandate."<br/>".$fanstatus."</p>\n<blockquote><p class='brtw'><a href='http://twitter.com/BrandSanderson/status/".$brandonstatusid."'><b>BrandSanderson</b></a> ".$brandondate."<br/>".$brandonstatus."</p></blockquote>\n\n"; } else{ print "<p class='brtw'><a href='http://twitter.com/BrandSanderson/status/".$brandonstatusid."'><b>BrandSanderson</b></a> ".$brandondate."<br/>".$brandonstatus."</p>\n\n"; } } # Close div tag in output document print "</div>"; Then here is the css I stick at the beginning of a post (sorted.html). <style type="text/css">div.brantweets p {min-height:58px}div.brantweets img {float:left;border:0;margin:5px 5px 0 0;height:48px;width:48px}p.brtw {background:url(http://brandonsanderson.com/images/Llama_Face.png) no-repeat 0px 5px;padding:0 0 0 53px;}</style> The max size post that Brandon's website allows is about 51k, so if I have more than 30somethingk collected, I make a new Twitter posts archive. Sorry about the stretched screen... EDIT: Oh yeah, after I have the sorted html file, I go through it manually and hook up the longer conversations, or when Brandon makes more than one reply to the same tweet. And I fix when he replies to the wrong person, etc. etc. 4
KChan she/her Posted March 2, 2012 Posted March 2, 2012 No worries about the screen. It actually reminded me that I needed to fix that particular element, which turned out to be a much bigger pain than I thought it would. Anyways, it's all good now.
Eric Peters Posted March 2, 2012 Posted March 2, 2012 Here's something to store the tweet archive into a simple TSV file, that way you can later change the formatting if you ever need to. #!/usr/bin/perl # echo "168023980798779395" > lastTweet.tsv # touch tweetArchive.tsv use XML::Simple; use Data::Dumper; my $xml = new XML::Simple('SuppressEmpty' => 1); my $tmpFile = "tmpFile$$"; my $lastIdFile = "lastTweet.tsv"; my $archiveFile = "tweetArchive.tsv"; my $lastId = ""; open(FILE, "$lastIdFile") || die "couldn't open $lastIdFile: $!"; while(<FILE>) { $lastId .= $_ } chomp($lastId); close(FILE); my $URL = "http://api.twitter.com/1/statuses/user_timeline.xml?screen_name=brandsanderson&count=200&trim_user=true&since_id=$lastId"; my $curlParams = " -s "; #silent, can add other parameters my $curlCmd = 'curl '.$curlParams.' -o "'.$tmpFile.'" "'.$URL.'"'; #print $curlCmd . "\n"; system($curlCmd); open(ARCHIVE, ">>$archiveFile") || die "couldn't open archive file for appending"; $data = $xml->XMLin($tmpFile); unlink($tmpFile); #print Dumper($data) . "\n"; my %statusHash = %{$data->{status}}; foreach my $id ( keys %statusHash ) { my $unode = $statusHash{$id}; print ARCHIVE join("\t", ($id, ,$unode->{text} ,$unode->{truncated} ,$unode->{favorited} ,$unode->{in_reply_to_status_id} ,$unode->{in_reply_to_user_id} ,$unode->{in_reply_to_screen_name} ,$unode->{retweet_count} ,$unode->{retweet_count} ,$unode->{user}->{name} ,$unode->{created_at} )) . "\n"; $lastId = $id if($id > $lastId); } close(ARCHIVE); # Write out the last status ID open(FILE, ">$lastIdFile") || die "couldn't open last id tracking file: $!"; print FILE $lastId . "\n"; close(FILE);
PeterAhlstrom he/him Posted March 3, 2012 Author Posted March 3, 2012 Eric, I'm not sure if your post is directed at me when it says "you"... Storing the tweets as tab-separated values is much much less useful than what I'm currently doing.
Eric Peters Posted March 5, 2012 Posted March 5, 2012 Eric, I'm not sure if your post is directed at me when it says "you"... Storing the tweets as tab-separated values is much much less useful than what I'm currently doing. I think it's VERY useful to store tab-separated values That way you can generate random HTML archives at any point later on. *shrug* each to his own
Joe ST he/him Posted March 5, 2012 Posted March 5, 2012 Eric, I'm not sure if your post is directed at me when it says "you"... Storing the tweets as tab-separated values is much much less useful than what I'm currently doing. I think it's VERY useful to store tab-separated values That way you can generate random HTML archives at any point later on. *shrug* each to his own I think he means, you should keep them *all* in a local database, and then just syphon off what you want when you want them.
PeterAhlstrom he/him Posted March 5, 2012 Author Posted March 5, 2012 You can generate random html archives from TSV files if you have a script to do so. Which I don't. Anyway, to me, TSV seems less useful than the original XML, which is well tagged so I know exactly what each item is for. I was actually hoping for a better way to do the stuff I use "split" for, the foreach @parts thing. But maybe that's a good way to do it already? This script is essentially the whole of my perl knowledge, and I don't even understand some of the stuff it does, like s/\s+/ /g; — esoteric stuff drives me nuts. I took some computer science courses in college, Java mostly, which I have almost entirely forgotten, so I just wing it when it comes to stuff like this and the various javascript stuff on Brandon's store pages.
Joe ST he/him Posted March 5, 2012 Posted March 5, 2012 You can generate random html archives from TSV files if you have a script to do so. Which I don't. Anyway, to me, TSV seems less useful than the original XML, which is well tagged so I know exactly what each item is for. I was actually hoping for a better way to do the stuff I use "split" for, the foreach @parts thing. But maybe that's a good way to do it already? This script is essentially the whole of my perl knowledge, and I don't even understand some of the stuff it does, like s/\s+/ /g; — esoteric stuff drives me nuts. I took some computer science courses in college, Java mostly, which I have almost entirely forgotten, so I just wing it when it comes to stuff like this and the various javascript stuff on Brandon's store pages. I could try and comment it up if you want (to try and explain the more esoteric bits that I guess are just copypasta? ), and maybe add some improvements... its just I have no knowledge of perl, so I'd be just as much in the dark as you, lol
PeterAhlstrom he/him Posted March 6, 2012 Author Posted March 6, 2012 (edited) Hey, I'm totally open to using something else like php if it's better for the situation. It just has to be an end-to-end solution that does what this does already. Yeah, there's a lot of copypasta in here. Something else I don't understand at all is the -> operations. Actually, for the esoteric stuff, I'd prefer a plain English alternative that works in the code. The most esoteric I'm up for is the ternary operator in javascript, and that's with reservations because every time I want to use it I have to look it up to remind myself of the syntax. I do understand the %H:%M:%S part pretty well and the find_uris section, because they use terms that are easy to relate to the actual thing they do. Well, I don't know what @_ means. Besides one eye and a mouth. OMG, I was just thinking how much I loved Hypercard back in the day, yet it was a shame it didn't support arrays, so I just searched and found out it DID support arrays: Yes, you can use variables in a manner similar to fields to simulatearrays. For example, you can say "line 1 of data", where "data" is the name of a local or global variable. For example: put 1 into line 1 of data put 2 into line 2 of data put 4 into line 3 of data put (line 1 of data) + (line 2 of data) + (line 3 of data) into message Of course, you can use variables instead of literals: put 3 into n put line n of data into message Instead of 'lines' you can use 'items': put "ABC" into item 2 of data For multiply-dimensioned arrays you can do: put "ABC" into item 2 of line 3 of data If I'd known this back in the early 90s it would have made that Star Trek game I was making in Hypercard work much much better. Instead I had a hidden card with a ton of text fields on it and an algorithm to change a multiply-dimensioned array into a field number... Edited March 6, 2012 by PeterAhlstrom
Eric Peters Posted March 6, 2012 Posted March 6, 2012 (edited) You can generate random html archives from TSV files if you have a script to do so. Which I don't. Anyway, to me, TSV seems less useful than the original XML, which is well tagged so I know exactly what each item is for. I was actually hoping for a better way to do the stuff I use "split" for, the foreach @parts thing. But maybe that's a good way to do it already? This script is essentially the whole of my perl knowledge, and I don't even understand some of the stuff it does, like s/\s+/ /g; — esoteric stuff drives me nuts. I took some computer science courses in college, Java mostly, which I have almost entirely forgotten, so I just wing it when it comes to stuff like this and the various javascript stuff on Brandon's store pages. the first s/ does a search/replace, the \s is the regex character that matches white space characters (tabs/spaces/etc) the + matches one more more times, the second / / is replacing whitespace characters with a space, the g does a recursion on all of the matches, so it effectively all occurrences of multiple whitespace characters will just become one space instead. I have much perl fu, let me know if you have any specific questions. Guess I wasn't quite sure what you were specifically asking for. I still believe the right approach is to store the archive of the "raw" tweets/etc in some sort of data file (TSV, BDB, MySQL, etc) then you gain flexibility of reformatting them later. -Eric EDIT: Hey, I'm totally open to using something else like php if it's better for the situation. It just has to be an end-to-end solution that does what this does already. Yeah, there's a lot of copypasta in here. Something else I don't understand at all is the -> operations. Actually, for the esoteric stuff, I'd prefer a plain English alternative that works in the code. The most esoteric I'm up for is the ternary operator in javascript, and that's with reservations because every time I want to use it I have to look it up to remind myself of the syntax. I do understand the %H:%M:%S part pretty well and the find_uris section, because they use terms that are easy to relate to the actual thing they do. Well, I don't know what @_ means. Besides one eye and a mouth. OMG, I was just thinking how much I loved Hypercard back in the day, yet it was a shame it didn't support arrays, so I just searched and found out it DID support arrays:If I'd known this back in the early 90s it would have made that Star Trek game I was making in Hypercard work much much better. Instead I had a hidden card with a ton of text fields on it and an algorithm to change a multiply-dimensioned array into a field number... $_ is a scalar representation of the default input, the @_ is an array of the default inputs Good little article on it is: http://www.wellho.net/mouth/969_Perl-and-.html Generally I like to do stuff like: while(<FILE>) { chomp($_); my $line = $_; if($line =~ /blahblah/) { } } That way I can "save" the input operator in a more friendly named variable....They're also related to $1, $2, $3 for regex matching. Edited March 6, 2012 by KChan Doublepost
KChan she/her Posted March 6, 2012 Posted March 6, 2012 Eric, if you want to quote two different posts, please don't doublepost. We have a multi-quote feature for that instead. Thanks!
Joe ST he/him Posted March 6, 2012 Posted March 6, 2012 Hey there, after a bit of a code, I came up with this html page. I dont think it exactly duplicates the functionality of your script, and its probably got plenty of bugs in it (in-particular, it doesn't sort the tweets yet). I got this far and ran out of API allowance. Basically I switched to the JSON outputs, used jQuery to JSON-P them into local variables, which I then iterate over and append the `<p class='bwst'><a href=twitter.com>...` lines to the DOM directly, rather than via 'nasty' strings. Whilst doing this, I get the reply-to tweets (recursively) and append each of those onto the DOM too. I will then do something like `$('[data-date]').sort()` on the data-date attribute, leaving them all in correct date order. I can also then maybe strip out the data-date attributes if you want. tweets.html
PeterAhlstrom he/him Posted March 6, 2012 Author Posted March 6, 2012 Joe, That looks interesting and promising. I can't figure out how to save its output. If I open it in a browser and look at the source or save as an html file, it just gives me your code, not the output of your code. Would it do the time zone shifting that the original code does? Mine also does automatic URL parsing. Eric, Reformatting takes too much time to do manually. That's why I cobbled together the script in the first place.
Joe ST he/him Posted March 6, 2012 Posted March 6, 2012 Yes, it can do url-parsing, tz-shifting, etc. I just can't put that stuff in atm, as I ran out of API requests lol. Hmmm, the saving... good question, I can make it add a textarea at the bottom containing the source of the file, if you want. It wont be pretty printed though :\
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now