<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Somethink to Chew On &#187; Professional</title>
	<atom:link href="http://www.harlan.harris.name/category/professional/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.harlan.harris.name</link>
	<description>the blog of Harlan Harris</description>
	<lastBuildDate>Sun, 07 Mar 2010 23:45:55 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>ggplot and concepts &#8212; what&#8217;s right, and what&#8217;s wrong</title>
		<link>http://www.harlan.harris.name/2010/03/ggplot-and-concepts-whats-right-and-whats-wrong/</link>
		<comments>http://www.harlan.harris.name/2010/03/ggplot-and-concepts-whats-right-and-whats-wrong/#comments</comments>
		<pubDate>Sun, 07 Mar 2010 21:52:16 +0000</pubDate>
		<dc:creator>Harlan</dc:creator>
				<category><![CDATA[Professional]]></category>
		<category><![CDATA[ggplot]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.harlan.harris.name/?p=47</guid>
		<description><![CDATA[A few months back I gave a presentation to the NYC R Meetup. (R is a statistical programming language. If this means nothing to you, feel free to stop reading now.) The presentation was on ggplot2, a popular package for generating graphs of data and statistics. In the talk (which you can see here, including [...]]]></description>
			<content:encoded><![CDATA[<p>A few months back I gave a presentation to the <a href="http://www.meetup.com/nyhackr/">NYC R Meetup</a>. (<a href="http://www.r-project.org/">R</a> is a statistical programming language. If this means nothing to you, feel free to stop reading now.) The presentation was on <a href="http://had.co.nz/ggplot2/">ggplot2</a>, a popular package for generating graphs of data and statistics. In the talk (<a href="http://www.vcasmo.com/video/drewconway/7017">which you can see here</a>, including both my slides and my patter!) I presented both the really great things about ggplot2 and some of its downsides. In this blog post, I wanted to expand a bit on my thinking on ggplot, the Grammar of Graphics, and how peoples&#8217; conceptual representations of graphs, data, ggplot, and R all interact. ggplot is both incredibly elegant and unfortunately difficult to learn to use well, I think as a consequence of the variety of representations.<span id="more-47"></span></p>
<p>The ggplot package, written by the overachieving and remarkable <a href="http://had.co.nz/">Hadley Wickham</a>, is based on <a href="http://books.google.com/books?id=_kRX4LoFfGQC&amp;dq=grammar+of+graphics&amp;printsec=frontcover&amp;source=bn&amp;hl=en&amp;ei=7kZsS8-lDI_e8Qb4hcD2BQ&amp;sa=X&amp;oi=book_result&amp;ct=result&amp;resnum=4&amp;ved=0CB8Q6AEwAw#v=onepage&amp;q=&amp;f=false">earlier more theoretical work by Leland Wilkinson</a>. Wilkinson abstracted the process of putting data onto an image, and created a Grammar of Graphics, which describes <em>how</em> the data maps to the parts of a graph, rather than describing the final graph itself. For example, here&#8217;s how to create a pie chart, clipped from Wilkinson&#8217;s book:</p>
<p><a href="http://www.harlan.harris.name/wp-content/uploads/2010/02/Screen-shot-2010-02-05-at-11.33.36-AM.png"><img class="aligncenter size-full wp-image-50" title="Wilkinson Pie Graph Example" src="http://www.harlan.harris.name/wp-content/uploads/2010/02/Screen-shot-2010-02-05-at-11.33.36-AM.png" alt="" width="607" height="393" /></a>Don&#8217;t worry about the details, but briefly, a pie chart is just a stacked bar graph (summary.proportion) plotted in polar coordinates (polar.theta). If you took the time to learn this grammar, you would realize that the hierarchical structure of a graph on a page (elements have positions and labels and visual properties like color, each of which have their own abstract structure) maps cleanly to the hierarchical structure of the grammar, and that variables in the grammar map cleanly to the linear structure of the data. As a user of this system, you would be able to see all three key representations at once: the <span style="text-decoration: underline;">data</span>, the <span style="text-decoration: underline;">grammatical mapping</span> from data to graph, and the <span style="text-decoration: underline;">graph</span> itself.</p>
<p>Now consider ggplot, the implementation of the Grammar of Graphics in the R programming language. Does ggplot maintain three visible representations, all straightforwardly mappable to each other? Sadly, it does not. Instead, users of ggplot must map among four representations: the <span style="text-decoration: underline;">data</span> (a standard data.frame object), the <span style="text-decoration: underline;">R syntax</span> for ggplot2 (which has some quirks), an <span style="text-decoration: underline;">underlying ggplot object</span> (similar to the Grammar of Graphics, but vastly more complex and impossible to examine directly), and the generated <span style="text-decoration: underline;">graph</span>.</p>
<p>Consider the simple pie graph, below.</p>
<p><a href="http://www.harlan.harris.name/wp-content/uploads/2010/02/Screen-shot-2010-02-05-at-1.56.41-PM.png"><img class="aligncenter size-full wp-image-53" title="Simple Pie Chart" src="http://www.harlan.harris.name/wp-content/uploads/2010/02/Screen-shot-2010-02-05-at-1.56.41-PM.png" alt="" width="249" height="183" /></a>This chart is generated in ggplot2 by the following R code:</p>
<pre>&gt; zz &lt;- data.frame(cat=c("a", "b"), val=c(5,3))

&gt; zz
 cat val
1   a   5
2   b   3
&gt; pp &lt;- ggplot(zz, aes(x="", y=val, fill=cat)) + geom_bar(width=1) + coord_polar("y")
&gt; print(pp)</pre>
<p>The print() function is optional within an R interpreter session, but I include because it illustrates a point that&#8217;s not initially obvious to many users. Unlike the built-in R plotting tools, the ggplot() function and its associated functions don&#8217;t plot anything on the screen, they just construct an object of type &#8220;ggplot&#8221;. Almost all of the actual work of mapping the data to stuff on your screen occurs when you print that object, using print() or ggsave().</p>
<p>So what does that object look like? If you type str(pp), you&#8217;ll get an answer, but it&#8217;s about a hundred lines of undecipherable hierarchical object and list structure, not intended to be examined by mere mortals. But there&#8217;s something critically important about that structure &#8212; like the original Grammar of Graphics, and unlike the R syntax above, it&#8217;s hierarchically structured.</p>
<p>In the R syntax, you create a base ggplot structure with the ggplot() call, then you abuse the &#8220;+&#8221; operator to make changes to that structure. The geom_bar() function adds a layer to the ggplot() object, where a layer is just what it sounds like, a set of information about one of potentially many overlaid layers of content that will be put on the graph. So you construct a ggplot object by first initializing everything about the basic plot, then tack on layers with +, right? Actually no, because the coord_polar() call doesn&#8217;t create or modify a layer at all, it modifies the base object! Even if you&#8217;ve acquired the nonobvious intuition that ggplot objects are hierarchical and are created by concatenating layers, you now have to break the analogy again to fully understand what + is doing!</p>
<p>There is a way to partially see the structure directly, but it&#8217;s not well thought-out from the point of view of someone trying to learn how to use the package. The summary() method on ggplot objects tells you about things you didn&#8217;t specify (faceting?), it&#8217;s incomplete, and it doesn&#8217;t map well to the R syntax. If something in your plot isn&#8217;t working the way you want it to, summary() won&#8217;t help you.</p>
<pre>&gt; summary(pp)
mapping:  x = , y = val, fill = cat
faceting: facet_grid(. ~ ., FALSE)
-----------------------------------
geom_bar:
stat_bin: width = 1
position_stack: (width = NULL, height = NULL)</pre>
<p>Another shortcut that leads to conceptual problems by ggplot beginners is the use of qplot(). The qplot() function is a wrapper around ggplot(). Unlike ggplot(), you can give qplot() data that is not in the form of a data.frame, and the syntax is somewhat different. There&#8217;s nothing wrong with some syntactic sugar to make life easier, but in this case, learning ggplot by starting with qplot is like trying to learn a foreign language by starting with contractions and slang. You may be able to say a few essential things on your vacation, but you won&#8217;t be able to creatively construct new sentences as new situations arise. The brilliance of the Grammar of Graphics is exactly that it&#8217;s a grammar &#8212; you can construct new graphs and new types of graphs as new situations arise! But tutorials that start with qplot, with <a href="http://had.co.nz/ggplot2/book/" target="_blank">the ggplot book </a>an unfortunate (but in other ways excellent) example, send their learners down a linguistic garden path. To fully use the power of the system requires unlearning the conceptual structures that map the slang to charts on a screen, and starting over with learning the new, more powerful ggplot() grammar and hierarchical representations.</p>
<p>I&#8217;d like to conclude this overlong rant with two notes. First, just today <a href="http://pleasescoopme.com/2010/03/07/jjplot-yet-another-plotting-library-for-r/" target="_blank">a new graphics package for R was introduced</a>. <a href="http://code.google.com/p/jjplot/" target="_blank">jjplot</a> uses many of the ideas of the Grammar of Graphics and ggplot2, but seems to avoid at least a few of the conceptual problems. The + operator is not overloaded in conceptually confusing ways, and there is no distracting qplot function to mislead new users. Additionally, a quick look at the source code finds it much, much simpler than ggplot2&#8217;s source, which will likely lead to a more active base of contributors. I look forward to trying jjplot and watching its continuing development, and hope the authors learn from both the remarkable successes and frustrating failures of ggplot. Second, I use ggplot extensively in my work. It&#8217;s simply the best available tool for quickly generating elegant graphs of data in R, especially if that generation needs to happen automatically in code. Hadley Wickham deserves extensive praise for the amount of effort he has put into developing and popularizing the Grammar of Graphics. If you want to be maximally effective when visualizing data in R, take the time to learn ggplot2, but do so while keeping in mind that the learning process will be easiest if you skip qplot and other shortcuts, think hierarchically, and prepare for some frustration. Fortunately, the support communities on the <a href="http://groups.google.com/group/ggplot2" target="_blank">ggplot mailing list </a>and <a href="http://stackoverflow.com/questions/tagged/ggplot2" target="_blank">Stack Overflow </a>are extremely helpful, as is Hadley himself.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.harlan.harris.name/2010/03/ggplot-and-concepts-whats-right-and-whats-wrong/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Online publishing, micropayments, and warm fuzzy feelings</title>
		<link>http://www.harlan.harris.name/2009/10/online-publishing-micropayments-and-warm-fuzzy-feelings/</link>
		<comments>http://www.harlan.harris.name/2009/10/online-publishing-micropayments-and-warm-fuzzy-feelings/#comments</comments>
		<pubDate>Tue, 20 Oct 2009 17:34:39 +0000</pubDate>
		<dc:creator>Harlan</dc:creator>
				<category><![CDATA[Professional]]></category>
		<category><![CDATA[internet]]></category>
		<category><![CDATA[journalism]]></category>
		<category><![CDATA[microeconomics]]></category>
		<category><![CDATA[money]]></category>

		<guid isPermaLink="false">http://www.harlan.harris.name/?p=23</guid>
		<description><![CDATA[The problem of how to monetize online publishing, particularly news publishing,  is neither new nor all that surprising. But the ongoing lack of a solution is steadily eating into news organizations across the country. Yesterday, the Times announced it was going to buy out or lay off 8% of its newsroom staff, despite being the [...]]]></description>
			<content:encoded><![CDATA[<p>The problem of how to monetize online publishing, particularly news publishing,  is neither new nor all that surprising. But the ongoing lack of a solution is steadily eating into news organizations across the country. Yesterday,<a href="http://www.nytimes.com/2009/10/20/business/media/20times.html?ref=business" target="_blank"> the Times announced it was going to buy out or lay off 8% of its newsroom staff</a>, despite being the best national newspaper in the country and probably the one making the best use of Internet technologies. (<a href="http://infosthetics.com/cgi-bin/mt/mt-search.cgi?search=nytimes&amp;IncludeBlogs=1&amp;limit=20" target="_blank">Their interactive graphics are some of the best around</a>.) How can newspapers make money on the web? Ad revenue is inadequate, and people won&#8217;t generally pay for content. <a href="http://www.niemanlab.org/2009/09/micropayments-for-news-the-holy-grail-or-just-a-dangerous-delusion/" target="_blank">This post</a> from a journalism blog at Harvard discusses why micropayments will never work:</p>
<blockquote><p>Apple can charge for music because it controls access to the songs from all the major record labels. Phone companies and cable companies can charge usurious rates for text messaging and Internet because they have little or no real competition. How does any of that apply to newspapers? &#8230; Newspapers have spent the past 100 years or so with a stranglehold on both the tools of mass publishing and the means of distribution, and much of what has happened to them over the past decade is a result of them losing both of those things. The unfortunate reality is that even the best micropayment system is not going to recreate that system of artificial scarcity and control&#8230;</p></blockquote>
<p>But I think there&#8217;s a way that might work, a way that leverages human psychology. <a href="http://www.ehow.com/how_5430509_give-children-choices.html" target="_blank">People like to feel like they&#8217;re in control</a>, and <a href="http://freakonomics.blogs.nytimes.com/2005/06/13/why-dont-economists-vote/" target="_blank">they like to feel like they have a voice in the system</a>. Micropayment systems that require you to pay 10 cents to read an article, based on a headline or a link, or subscription systems that take your money and give you something you can get elsewhere for free, just make you resentful. So instead, <em>design the system so that you associate feeling good about what you have just read with giving money to the people who produced the content</em>. Here&#8217;s how it might work.</p>
<p>If I decide I want to read content from a consortium of providers (say, anything owned by The New York Times Company, or Time-Warner, or <a href="http://seedmediagroup.com/" target="_blank">Seed Media Group</a>, or a group of publishers that set up their own consortium), I set up an account, pay my $50/year, and get access. If I like a piece of content (article, podcast, interactive graphic, whatever), I click the &#8220;Tip the Author(s)&#8221; button, and a chunk of my $50, maybe 10 cents, gets redirected to the actual people creating the content I actually <em>like </em>(not just start to read). If I don&#8217;t use up my $50 for the year, the balance just gets split internally by the consortium. This way, readers have a feeling of control and an association of paying with pleasure, providers get cash, and the best providers get the most cash.</p>
<p>Information management for this would be straightforward, and it would (I think) work. People <em>like </em>to tip for good service. Let them tip for informative, well-reported news.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.harlan.harris.name/2009/10/online-publishing-micropayments-and-warm-fuzzy-feelings/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Welcome</title>
		<link>http://www.harlan.harris.name/2009/09/welcome/</link>
		<comments>http://www.harlan.harris.name/2009/09/welcome/#comments</comments>
		<pubDate>Wed, 30 Sep 2009 20:37:19 +0000</pubDate>
		<dc:creator>Harlan</dc:creator>
				<category><![CDATA[Personal]]></category>
		<category><![CDATA[Professional]]></category>
		<category><![CDATA[meta]]></category>

		<guid isPermaLink="false">http://www.harlan.harris.name/?p=19</guid>
		<description><![CDATA[Welcome to my new web page and new blog! I will have various references about me (so far, a list of publications and a list of other sites I can be found on), as well as a blog about things I&#8217;m working on or thinking about. Posts will be categorized as Personal or Professional (or [...]]]></description>
			<content:encoded><![CDATA[<p>Welcome to my new web page and new blog! I will have various references about me (so far, <a href="http://www.harlan.harris.name/publications/">a list of publications</a> and <a href="http://www.harlan.harris.name/other-sites/">a list of other sites</a> I can be found on), as well as a blog about things I&#8217;m working on or thinking about. Posts will be categorized as <a href="http://www.harlan.harris.name/category/personal/">Personal</a> or <a href="http://www.harlan.harris.name/category/professional/">Professional</a> (or both, like this one!), and I think there&#8217;s a way to subscribe to an RSS feed of just one or the other, in case you&#8217;re uninterested in statistics or food or whatever. (google, google&#8230;) Ah, yes, you can. Here&#8217;s <a href="http://www.harlan.harris.name/category/personal/feed">the Personal feed</a> and here&#8217;s <a href="http://www.harlan.harris.name/category/professional/feed">the Professional feed</a>. I&#8217;ll figure out how to put links to them in the sidebar at some point!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.harlan.harris.name/2009/09/welcome/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
