EDIT (Oct 2, 2014): Let's just cut to the good stuff. Links to the downloads. If you want the story then have at it below.Newest release (untested):
Oct 2, 2014 (mobi)
https://github.com/beege/MMM-Ebook/raw/master/Ebooks/MMM%20Blog%20-%20MMM.mobiOct 2, 2014 (azw3)
https://github.com/beege/MMM-Ebook/raw/master/Ebooks/MMM%20Blog%20-%20MMM.azw3Oct 2, 2014 (HTML zip)
https://github.com/beege/MMM-Ebook/raw/master/Ebooks/MMM%20Blog%20-%20MMM.zipBattle-tested stable version:
Oct 2013 version (mobi)
https://github.com/beege/MMM-Ebook/raw/9e3853a3ce5242aa15e5fd73c9650c40570a9dd6/Ebooks/MMM%20Blog%20-%20MMM.mobiOct 2013 version (HTML zip)
https://github.com/beege/MMM-Ebook/raw/9e3853a3ce5242aa15e5fd73c9650c40570a9dd6/Ebooks/MMM%20Blog%20-%20MMM.zipEDIT: After receiving Mr. Money Mustache's endorsement I have uploaded the ebook in zip and MOBI formats (attached to this post). I can upload in other formats if desired or people can download the zip format and use Calibre to convert it themselves. I will try to find the time to update the script and fix some of the issues I outlined and add MMM's latests post(s) sometime in the future.I recently traveled with my Girlfriend to visit friends some 7 hours away via car and wanted to take Mr. Mustache along for the ride for reading. Neither my girlfriend or myself have unlimited mobile data and the cell tower connections can be spotty along the route so reading the blog via an Internet connection was ill-advised. I already owned a kindle from my pre-Mustachian days so I had the idea of converting Mr. Money Mustache into an ebook for the trip. Details are at the end of this post.
Long story short, it worked pretty well and the Kindle's text to speech even worked for reading to the driver, giving Mr. Mustache a strangely dry delivery. Though posts could be a bit repetative at times and any posts involving a bit of mathematics was poorly dictated by the Kindle it was a positive experience. I would recommend to other Mustachians if they encounter a similar situation.
As I am not the copyright holder on Mr. Money Mustache I hesitate distributing the ebook I created without Mr. Money Mustache's explicit permission but I believe I can at least distribute the code and methods to create the book. Listening to the book with my Girlfriend on the road trip definitely helped us both grow our mustaches a bit more.Now let's talk about how I made the book. Note: This was hacked together right before a road trip. Definite improvements could be made. I did search around the forums here to see if anyone else had already done this but did not find anything.
The tools I used were python 2.x with lxml installed to download and parse the RSS feed and then Calibre to convert the resulting HTML to an ebook. I did not get around to automating the Calibre calls. I'll leave that as an exercise for the reader.
I have no prior experience with RSS feeds so I could be missing things. I found some URL documentation for WordPress and used it to get the RSS feed for MMM from the beginning of time in ascending order (24 pages of feed you see in the code below). This can easily be changed to reverse the order, only retrieve the most recent posts, etc.
Known Issues/Bugs:
- TOC could be improved
- Need to capture date and add to the top of each post. This will make it easier to understand the transition to a new post while kindle reads to you.
- Pictures show up for the ebook on my computer but not on the kindle. Not sure if it is because I did not download them and update the links. I pasted some code at the bottom (commented out) to download them but is based on BeautifulSoup and needs to be rewritten.
- Link rewriting algorithm runs in N-squared time. Wonder if this could be improved.
- I lamely added a "p" prefix on the created html files as the regex did not work if they started with a number. This is an unexpected bug and I did not dig into the regex any more to investigate.
Usage:
- Create a new directory and place the script below there
- Run the script
- Open calibre and import the "index.html" file as an ebook
- Run the calibre conversion process to give you the desired output format (MOBI for kindle)
This code is old and lame. Newest hotness is here:
https://github.com/beege/MMM-Ebook/tree/9e3853a3ce5242aa15e5fd73c9650c40570a9dd6#!/usr/bin/env python2
import os
import re
import sys
from lxml import etree as ET
import urllib
class RSSParser():
def __init__(self,url):
self.url = url
def parse(self):
print "Opening and parsing RSS feed @ <" + self.url + ">..."
root = ET.parse(urllib.urlopen(self.url)).getroot().find('channel')
for item in root.findall('item'):
title = item.find('title').text
url = item.find('link').text
text = item.find('.//content:encoded', namespaces=root.nsmap).text
yield (title.encode('utf-8'), text.encode('utf-8'), url.encode('utf-8'))
class Post():
next = 0
def __init__(self, title, text, url):
self.title = title
self.text = text
self.localUrl = 'p%03d.html' % (Post.next, )
Post.next = Post.next + 1
MIN=1
MAX=24 #inclusive
if __name__=="__main__":
postsInOrder = []
posts = {}
for i in range(MIN, MAX+1):
url = "http://www.mrmoneymustache.com/feed/?order=ASC&paged=%d" % (i)
#url = 'www.mrmoneymustache.com.htm'
if len(sys.argv) > 1:
url = sys.argv[1]
parser = RSSParser(url)
for (title, text, url) in parser.parse():
postsInOrder.append(url)
posts[url] = Post(title, text, url)
# Rewrite links - we do this once we have all the posts just incase MMM went and editied an earlier post to include a link to a later one
for url in postsInOrder:
post = posts[url]
for url2 in postsInOrder:
regex = re.compile('<a\\s(.*href=")%s(".*)>(.*)</a>' % url2)
post.text = regex.sub('<a \\1' + posts[url2].localUrl + '\\2>\\3</a>', post.text)
index = open('index.html', 'wb')
index.write('''<html>
<body>
<h1>Table of Contents</h1>
<p style="text-indent:0pt">''')
for url in postsInOrder:
post = posts[url]
open(post.localUrl, 'wb').write('<title>' + post.title + "</title>\n" + '<h1>' + post.title + "</h1>\n" + post.text)
index.write('<a href=%s>%s</a><br/>\n' % (post.localUrl, post.title))
index.write(''' </p>
</body>
</html>''')
#for image in soup.findAll("img"):
# print "Image: %(src)s" % image
# image_url = urlparse.urljoin(url, image['src'])
# filename = image["src"].split("/")[-1]
# outpath = os.path.join(out_folder, filename)
# urlretrieve(image_url, outpath)