Tag Archives: operations research

Simulating Rent Stabilization Policy at the National Day of Civic Hacking

This post was originally published on Medium on June 5, 2016.


Yesterday was the 2016 National Day of Civic Hacking, a Code for America event that encourages people with technology and related skills to explore projects related to civil society and government. My friend Josh Tauberer wrote a thoughtful post earlier about the event called Why We Hack —on what the value of this sort of event might be — please read it.

For my part, this year I worked on one of the projects he discusses, understanding the impact of DC’s rent stabilization laws and what potential policy changes might yield. As Josh noted, we discovered that it’s a hard problem. Much of the most relevant data (such as the list of properties under rent stabilization and their current and historical rents) are not available, and have to be estimated. Getting to a realistic understanding of the impact of law and policy on rents seems incredibly valuable, but hard.

So I spun off the main group, and worked on an easier but much less ambitious project that could potentially be useful in just an afternoon’s work. Instead of trying to understand the law’s effect on actual DC rents, I built a little tool to understand the law’s effect on a rather unrealistic set of simulated apartment buildings. Importantly, I did this fully aware that I’m not building with, I’m tinkering; my goal was to do something fun and interesting that might lead to something substantial and usable later, probably by someone else.

Continue reading

INFORMS Business Analytics 2014 Blog Posts

Earlier this year, I attended the INFORMS Conference on Business Analytics & Operations Research, in Boston. I was asked beforehand if I wanted to be a conference blogger, and for some reason I said I would. This meant I was able to publish posts on the conference’s WordPress web site, and was also obliged to do so!

Here are the five posts that I wrote, along with an excerpt from each. Please click through to read the full pieces:

Operations Research, from the point of view of Data Science

  • more insight, less action — deliverables tend towards predictions and storytelling, versus formal optimization
  • more openness, less big iron — open source software leads to a low-cost, highly flexible approach
  • more scruffy, less neat — data science technologies often come from black-box statistical models, vs. domain-based theory
  • more velocity, smaller projects — a hundred $10K projects beats one $1M project
  • more science, less engineering — both practitioners and methods have different backgrounds
  • more hipsters, less suits — stronger connections to the tech industry than to the boardroom
  • more rockstars, less teams — one person can now (roughly) do everything, in simple cases, for better or worse

What is a “Data Product”?

DJ Patil says “a data product is a product that facilitates an end goal through the use of data.” So, it’s not just an analysis, or a recommendation to executives, or an insight that leads to an improvement to a business process. It’s a visible component of a system. LinkedIn’s People You May Know is viewed by many millions of customers, and it’s based on the complex interactions of the customers themselves.

Healthcare (and not Education) at INFORMS Analytics

[A]s a DC resident, we often hear of “Healthcare and Education” as a linked pair of industries. Both are systems focused on social good, with intertwined government, nonprofit, and for-profit entities, highly distributed management, and (reportedly) huge opportunities for improvement. Aside from MIT Leaders for Global Operations winning the Smith Prize (and a number of shoutouts to academic partners and mentors), there was not a peep from the education sector at tonight’s awards ceremony. Is education, and particularly K-12 and postsecondary education, not amenable to OR techniques or solutions?

What’s Changed at the Practice/Analytics Conference?

In 2011, almost every talk seemed to me to be from a Fortune 500 company, or a large nonprofit, or a consulting firm advising a Fortune 500 company or a large nonprofit. Entrepeneurship around analytics was barely to be seen. This year, there are at least a few talks about Hadoop and iPhone apps and more. Has the cost of deploying advanced analytics substantially dropped?

Why OR/Analytics People Need to Know About Database Technology

It’s worthwhile learning a bit about databases, even if you have no decision-making authority in your organization, and don’t feel like becoming a database administrator (good call). But by getting involved early in the data-collection process, when IT folks are sitting around a table arguing about platform questions, you can get a word in occasionally about the things that matter for analytics — collecting all the data, storing it in a way friendly to later analytics, and so forth.

All in all, I enjoyed blogging the conference, and recommend the practice to others! It’s a great way to organize your thoughts and to summarize and synthesize your experiences.

Survey of Data Science / Analytics / Big Data / Applied Stats / Machine Learning etc. Practitioners

As I’ve discussed here before, there is a debate raging (ok, maybe not raging) about terms such as “data science”, “analytics”, “data mining”, and “big data”. What do they mean, how do they overlap, and perhaps most importantly, who are the people who work in these fields?

Along with two other DC-area Data Scientists, Marck Vaisman and Sean Murphy, I’ve put together a survey to explore some of these issues. Help us quantitatively understand the space of data-related skills and careers by participating!

Survey link: http://bit.ly/IQNM5A

It should take 10 minutes or less, data will be kept confidential, and we look forward to sharing our results and insights in a variety of venues, including this blog! Thanks!

making meat shares more efficient with R and Symphony

In my previous post, I motivated a web application that would allow small-scale sustainable meat producers to sell directly to consumers using a meat share approach, using constrained optimization techniques to maximize utility for everyone involved. In this post, I’ll walk through some R code that I wrote to demonstrate the technique on a small scale.

Although the problem is set up in R, the actual mathematical optimization is done by Symphony, an open-source mixed-integer solver that’s part of the COIN-OR project. (The problem of optimizing assignments, in this case of cuts of meat to people, is an integer planning problem, because the solution involves assigning either 0 or 1 of each cut to each person. More generally, linear programming and related optimization frameworks allow solving for real-numbered variables.) The RSymphony package allows problems set up in R to be solved by the C/C++ Symphony code with little hassle.

My code is in a public github repository called groupmeat-demo, and the demo code discussed here is in the subset_test.R file. (The other stuff in the repo is an unfinished version of a larger-scale demo with slightly more realistic data.)

For this toy problem, we want to optimally assign 6 items to 3 people, each of whom have a different utility (value) for each item. In this case, I’m ignoring any fixed utility, such as cost in dollars, but that could be added into the formulation. Additionally, assume that items #1 and #2 cannot both be assigned, as with pork loin and pork chops.

This sort of problem is fairly simple to define mathematically. To set up the problem in code, I’ll need to create some matrices that are used in the computation. Briefly, the goal is to maximize an objective expression, \mathbf{c}^T\mathbf{x}, where the \mathbf{x} are variables that will be 0 or 1, indicating an assignment or non-assignment, and the \mathbf{c} is a coefficient vector representing the utilities of assigning each item to each person. Here, there are 6 items for 3 people, so I’ll have a 6×3 matrix, flattened to an 18-vector. The goal will be to find 0’s and 1’s for \mathbf{x} that maximize the whole expression.

Here’s what the \mathbf{c} matrix looks like:

      pers1 pers2  pers3
item1 0.467 0.221 0.2151
item2 0.030 0.252 0.4979
item3 0.019 0.033 0.0304
item4 0.043 0.348 0.0158
item5 0.414 0.050 0.0096
item6 0.029 0.095 0.2311

It appears as if everyone like item1, but only person1 likes item5.

Additionally, I need to define some constraints. For starters, it makes no sense to assign an item to more than one person. So, for each row of that matrix, the sum of the variables (not the utilities) must be 1, or maybe 0 (if that item is not assigned). I’ll create a constraint matrix, where each row contains 18 columns, and the pattern of 0’s and 1’s defines a row of the assignment matrix. Since there are 6 items, there are 6 rows (for now). Each row needs to be less than or equal to one (I’ll tell the solver to use integers only later), so I also define vectors of inequality symbols and right-hand-sides.

?View Code RSLANG
# for each item/row, enforce that the sum of indicators for its assignment are <= 1
mat <- laply(1:num.items, function(ii) { x <- mat.0; x[ii, ] <- 1; as.double(x) })
dir <- rep('<=', num.items)
rhs <- rep(1, num.items)

To add the loin/chops constraint, I need to add another row, specifying that the sum of the indicators for both rows now must be 1 or less as well.

?View Code RSLANG
# for rows 1 and 2, enforce that the sum of indicators for their assignments are <= 1
mat <- rbind(mat, matrix(matrix(c(1, 1, rep(0, num.items-2)), nrow=num.items, ncol=num.pers), nrow=1))
dir <- c(dir, '<=')
rhs <- c(rhs, 1)

Here’s what those matrices and vectors look like:

> mat
     1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[1,] 1 0 0 0 0 0 1 0 0  0  0  0  1  0  0  0  0  0
[2,] 0 1 0 0 0 0 0 1 0  0  0  0  0  1  0  0  0  0
[3,] 0 0 1 0 0 0 0 0 1  0  0  0  0  0  1  0  0  0
[4,] 0 0 0 1 0 0 0 0 0  1  0  0  0  0  0  1  0  0
[5,] 0 0 0 0 1 0 0 0 0  0  1  0  0  0  0  0  1  0
[6,] 0 0 0 0 0 1 0 0 0  0  0  1  0  0  0  0  0  1
[7,] 1 1 0 0 0 0 1 1 0  0  0  0  1  1  0  0  0  0
> dir
[1] "<=" "<=" "<=" "<=" "<=" "<=" "<=" 
> rhs
[1] 1 1 1 1 1 1 1

Finally, specify that the variables must be binary (0 or 1), and call SYMPHONY to solve the problem:

?View Code RSLANG
# this is an IP problem, for now
types <- rep('B', num.items * num.pers)
max <- TRUE # maximizing utility
 
soln <- Rsymphony_solve_LP(obj, mat, dir, rhs, types=types, max=max)

And, with a bit of post-processing to recover matrices from vectors, here’s the result:

$solution
 [1] 0 0 0 0 1 0 0 0 1 1 0 0 0 1 0 0 0 1

$objval
[1] 1.52

$status
TM_OPTIMAL_SOLUTION_FOUND
                        0 

Person #1 got Items 5 worth 0.41
Person #2 got Items 3, 4 worth 0.38
Person #3 got Items 2, 6 worth 0.73

So that’s great. It found an optimal solution worth more than 50% more than the expected value of a random assignment. But there’s a problem. There’s no guarantee that everyone gets anything, and in this case, person #3 gets almost twice as much utility as person #2. Unfair! We need to enforce an additional constraint, that the difference between the maximum utility that any one person gets and the minimum utility that any one person gets is not too high. This is sometimes called a parity constraint. Adding parity constraints is a little tricky, but the basic idea here is to add two more variables to the 18 I’ve already defined. These variables are positive real numbers, and they are forced by constraints to be the maximum and minimum total utilities per person. In the objective function, then, they are weighted so that their difference is not to big. So, that expression becomes: \mathbf{c}^T\mathbf{x} - \lambda x_{19} - - \lambda x^{20}. The first variable (the maximum utility of any person) is minimized, while the second variable is maximized. The \lambda free parameter defines how much to trade off parity with total utility, and I’ll set it to 1 for now.

For the existing rows of the constraint matrix, these new variables get 0’s. But two more rows need to be added, per person, to force their values to be no bigger/smaller (and thus the same as) the maximum/minimum of any person’s assigned utility.

?View Code RSLANG
# now for those upper and lower variables
# \forall p, \sum_i u_i x_{i,p} - d.upper \le 0
# \forall p, \sum_i u_i x_{i,p} - d.lower \ge 0
# so, two more rows per person
d.constraint <- function(iperson, ul) { # ul = 1 for upper, 0 for lower
  x <- mat.utility.0
  x[, iperson ] <- 1
  x <- x * obj.utility
  c(as.double(x), (if (ul) c(-1,0) else c(0,-1)))
}
mat <- rbind(mat, maply(expand.grid(iperson=1:num.pers, ul=c(1,0)), d.constraint, .expand=FALSE))
dir <- c(dir, c(rep('<=', num.pers), rep('>=', num.pers)))
rhs <- c(rhs, rep(0, num.pers*2))

The constraint inequalities then becomes as follows:

> print(mat, digits=2)
     1     2     3     4    5     6    7    8     9   10   11    12   13  14    15    16     17   18  19 20
  1.00 0.000 0.000 0.000 0.00 0.000 1.00 0.00 0.000 0.00 0.00 0.000 1.00 0.0 0.000 0.000 0.0000 0.00  0  0
  0.00 1.000 0.000 0.000 0.00 0.000 0.00 1.00 0.000 0.00 0.00 0.000 0.00 1.0 0.000 0.000 0.0000 0.00  0  0
  0.00 0.000 1.000 0.000 0.00 0.000 0.00 0.00 1.000 0.00 0.00 0.000 0.00 0.0 1.000 0.000 0.0000 0.00  0  0
  0.00 0.000 0.000 1.000 0.00 0.000 0.00 0.00 0.000 1.00 0.00 0.000 0.00 0.0 0.000 1.000 0.0000 0.00  0  0
  0.00 0.000 0.000 0.000 1.00 0.000 0.00 0.00 0.000 0.00 1.00 0.000 0.00 0.0 0.000 0.000 1.0000 0.00  0  0
  0.00 0.000 0.000 0.000 0.00 1.000 0.00 0.00 0.000 0.00 0.00 1.000 0.00 0.0 0.000 0.000 0.0000 1.00  0  0
  1.00 1.000 0.000 0.000 0.00 0.000 1.00 1.00 0.000 0.00 0.00 0.000 1.00 1.0 0.000 0.000 0.0000 0.00  0  0
  0.47 0.030 0.019 0.043 0.41 0.029 0.00 0.00 0.000 0.00 0.00 0.000 0.00 0.0 0.000 0.000 0.0000 0.00 -1  0
  0.00 0.000 0.000 0.000 0.00 0.000 0.22 0.25 0.033 0.35 0.05 0.095 0.00 0.0 0.000 0.000 0.0000 0.00 -1  0
  0.00 0.000 0.000 0.000 0.00 0.000 0.00 0.00 0.000 0.00 0.00 0.000 0.22 0.5 0.030 0.016 0.0096 0.23 -1  0
  0.47 0.030 0.019 0.043 0.41 0.029 0.00 0.00 0.000 0.00 0.00 0.000 0.00 0.0 0.000 0.000 0.0000 0.00  0 -1
  0.00 0.000 0.000 0.000 0.00 0.000 0.22 0.25 0.033 0.35 0.05 0.095 0.00 0.0 0.000 0.000 0.0000 0.00  0 -1
  0.00 0.000 0.000 0.000 0.00 0.000 0.00 0.00 0.000 0.00 0.00 0.000 0.22 0.5 0.030 0.016 0.0096 0.23  0 -1
> dir
 [1] "<=" "<=" "<=" "<=" "<=" "<=" "<=" "<=" "<=" "<=" ">=" "<=" "<="
> rhs
 [1] 1 1 1 1 1 1 1 0 0 0 0 0 0

Looking at just the last row, this constraint says that the sum of the utilities of any assigned items for person #3, minus the lower limit, must be at least 0. That is essentially the definition of the lower limit, that that constraint holds true for all three people in this problem. Similar logic applies for the upper limit.

Running the solver with this set of inputs gives the following:

$solution
 [1] 0.000 0.000 1.000 0.000 1.000 0.000 0.000 0.000 0.000 1.000 0.000 1.000 0.000 1.000 0.000 0.000 0.000
[18] 0.000 0.498 0.433
 
$objval
[1] 1.31
 
$status
TM_OPTIMAL_SOLUTION_FOUND
                        0 
 
Person #1 got Items 3, 5 worth 0.43
Person #2 got Items 4, 6 worth 0.44
Person #3 got Items 2 worth 0.50

The last two numbers in the solution are the values of the upper and lower bounds. Note that the objective value is only 41% higher than a random assignment, but the utilities assigned to each person are much closer. Dropping the \lambda value to something closer to 0 causes the weights of the parity bounds to be less important, and the solution tends to be closer to the initial result.

Scaling this up to include constraints in pricing, farm preferences, price vs. preference meta-preferences, etc., is not conceptually difficult, but would just entail careful programming. It is left as an exercise for the well-motivated reader!

If you’ve made it this far, I’d definitely appreciate any feedback about this idea, corrections to my formulation or code or terminology, etc!

(Thanks to Paul Ruben and others on OR-Exchange, who helped me figure out how to think about the parity problem, and to the authors of WP-codebox and WP LaTeX for giving me tools to put nice scrollable R code and math in this post!)

making meat shares more efficient

A personal interest I have is the ethical and sustainable production of food. I’ve been a member of and helped run Community Supported Agriculture groups, and my wife and I currently purchase the majority of our meat from a group of upstate NY pastured-livestock producers who sell their products through CSAs. It’s an ala-carte business model, where I place an order on a website, and the next week I pick up the frozen products cut and packaged as if for retail.

A related way to get meat has become fairly popular recently — the meat CSA or meat share. As the NYC Meatshare group describes it, “Looking for healthy meat raised on pasture by small local farms? It’s expensive, but by banding together to buy whole animals we can support farmers and save money.” Members of a meatshare all pitch in to buy a whole animal, which is then butchered and split among the members. Here’s how a meatshare event described the 10th of a hog each member got: “Each person will get an equal amount of bacon and sausage (about 2 lbs each), chops (center & butt), and will divide the other cuts up as equally as possible (including ham steak, loin, organs, etc.)  If you have preferences please let me know, I will do my best to accommodate.  Or, you can swap with other members at my place.”

These two business models put a substantial burden on either the farmer (in the first case) or the consumer (in the second case). The retail model requires the farmer, or a collective of farmers, to put together a retail-ordering web site, a butchery and inventory system, and a delivery and distribution system. The meat share model takes these burdens off the farmers, but requires the consumers to set up and organize the purchase and payment system, meet at a common location, and either take what is available or perform ad hoc swaps. In a more traditional producer-consumer relationship, the supply chain, payment, inventory, and preferences-matching process is taken care of by the comodification of the animals (all cows are the same) and the services provided by a retail grocery store.

One could argue that that’s the third option — Whole Foods — but it sorta defeats the purpose of non-commidified, high-quality meat, and it tends to defeat the pocketbook too. No connection with the farm, just a promise of ethical standards (probably including the pointless “organic” label), and a substantial cut by middlemen. Not really an option at all.

So what else could be done to build sustainable relationships between animal producers and people who value high-quality, ethically produced meat? Why not leverage technology? And not just selling via web sites, but the kind of logistics technology that allows Whole Foods (and UPS, and Walmart) to efficiently get huge varieties of goods from place to place? A group at a recent food-tech hack-a-thon had the start of this idea. They put together a quick demo of a front-end web site (“groupme.at” — clever!) that would allow consumers to choose smaller sets of cuts in such a way that the whole neatly ends up with a whole animal. By setting up a platform that can be easily connected with many small producers all over the country, the problem of every producer needing to be a webmaster is eliminated. And the system to get all of the pieces to add up to whole animals reduces risk for the farmer. It’s a great start. But by leveraging additional open-source tools and some ingenuity, I think it should be possible to do even more.

Imagine a similar web site, but instead of selecting a pre-selected package of cuts, you instead indicate your preferences and price range. As animals become available, you get an emailed notification of a delivery with a set of products that are very similar to the preferences you specified. You might love pork belly and boneless loin. Your neighbor might love cured pork belly (bacon) and chops. You might hate liver, but you’d accept some pig ears every once in a while for your dog. And your neighbor might really like the fatback to render for lard, while you’d find that useless. Everyone who might be sharing in an animal indicates their preferences, and the web site would automatically give everyone as much as possible of what they like the most. Equally important, all of the parts add up to whole animals, so the farmer is not stuck with the risk of unsold inventory.

Now imagine that after a few months, you’ve ranked the cuts of meat from Alice’s Farm 5 stars, but the ones from Bob’s Ranch only 3 stars. And you’ve told the system that you’re willing to pay more to get more of what you really want, but you neighbor tells the web site that he’s willing to make trade-offs to spend less money. You’ve essentially added other constraints, that if balanced well, will make everyone as happy as possible. Also, notice that I mentioned both boneless loin and pork chops? They’re more-or-less the same part of the animal cut different ways, so you can’t sell them both off the same half of the same animal. Now you have exclusive constraints to add into the mix. Maybe everyone’s better off if you get the boneless loin, or maybe everyone’s better off if your neighbor gets the chops. It’s easy to imagine collecting all of this information, but how do you combine it all and optimize the outcome in a utilitarian way?

Why, operations research and computational optimization! Write some software that plugs everyone’s constraints into a set of equations, push a button, let the computer think for a second or two, and wham, you get a solution that balances the constraints as fairly as possible! Send the cut list to the slaughterhouse and email the product lists and bills to the customers, and you’re basically done.

In the past, this sort of supply-chain optimization required massive computing power and complex software design. But now, there are open-source code bases for solving this sort of problem, at least at the scale needed to balance the preferences of a few farmers and a few dozen or hundred customers at a time.

This is the next step in leveraging technology to make at least some aspects of the supply chain for small-scale meat operations as efficient as what Purdue does, but maintaining the high quality and personal connection to the farm that many people want now. All that’s needed are some enterprising hackers to write the code and set up a scalable, configurable web platform for preference-based meatshares.

In my next post, I’ll demonstrate how to write code that uses one of those open-source optimization libraries to solve a small version of this problem. If you’re interested in reading R code, stay tuned!

On “Analytics” and related fields

I recently attended the INFORMS Conference on Business Analytics and Operations Research, aka “INFORMS Analytics 2011”, conference in Chicago. This deserves a little bit of an explanation. INFORMS is the professional organization for Operations Research (OR) and Management Science (MS), which are terms describing approaches to improving business efficiency by use of mathematical optimization and simulation tools. OR is perhaps best known for the technique of Linear Programming (read “Programming” as “Planning”), which is a method for optimizing a useful class of mathematical expressions under various constraints extremely efficiently. You can, for example, solve scheduling, assignment, transportation, factory layout, and similar problems with millions of variables in seconds. These techniques came out of large-scale government and especially military logistics and decision-making needs of the mid-20th century, and have now been applied extensively in many industries. Have you seen the UPS “We (heart) Logistics” ad? That’s OR.

OR is useful, but it’s not sexy, despite UPS’ best efforts. Interest in OR programs in universities (often specialties of Industrial Engineering departments) has been down in recent years, as has been attendance at INFORMS conferences. On the other hand, if you ignore the part about “optimization” and just see OR as “improving business efficiency by use of mathematical processes,” this makes no sense at all! Hasn’t Analytics been a buzzword for the past few years? (“analytics buzzword” gets 2.4 million results on Google.) Haven’t there been bestselling business books about mathematical tools being used in all sorts of industries? (That last link is about baseball.) Hasn’t the use of statistical and mathematical techniques in business been called “sexy” by Google’s Chief Economist? How could a field and an industry that at some level seems to be the very definition of what’s cool in business and technology right now be seen as a relic of McNamara’s vision of the world?

To answer that rhetorical question, I think it’s worth considering the many ways that organizations can use data about their operations to improve their effectiveness. SAS has a really useful hierarchy, which it calls the Eight levels of analytics.

  1. Standard Reports – pre-processed, regular summaries of historical data
  2. Ad Hoc Reports – the ability for analysts to ask new questions and get new answers
  3. Query Drilldown – the ability for non-technical users to slice and dice data to see results interactively
  4. Alerts – systems that detect atypical conditions and notify people
  5. Statistical Analysis – use of regressions and similar to find trends and correlations in historical data
  6. Forecasting – ability to extrapolate from historical data to estimate future business
  7. Predictive Analytics – advanced forecasting, using statistical and machine-learning tools and large data sets
  8. Optimization – balance competing goals to maximize results

I like this hierarchy because it distinguishes among a bunch of different disciplines and technologies that tend to run together. For example, what’s often called “Business Intelligence” is a set of tools for doing items #1-#4. No statistics per se are involved, just the ability to provide useful summaries of data to people who need in various ways. At its most statistically advanced, BI includes tools for data visualization that are informed by research, and at its most technologically advanced, BI includes sophisticated database and data management systems to keep everything running quickly and reliably. These are not small accomplishments, and this is a substantial and useful thing to be able to do.

But it’s not what “data scientists” in industry do, or at least, it’s not what makes them sexy and valuable. When you apply the tools of scientific inquiry, statistical analysis, and machine learning to data, you get the abilities in levels #5-#7. Real causality can be separated from random noise. Eclectic data sources, including unstructured documents, can be processed for valuable predictive features. Models can predict movie revenue or recommend movies you want to see or any number of other fascinating things. Great stuff. Not BI.

And not really OR either, unless you redefine OR. OR is definitely #8, the ability to build sophisticated mathematical models that can be used not just to predict the future, but to find a way to get to the future you want.

So why did I go to an INFORMS conference with the work Analytics in its title? This same conference in the past used to be called “The INFORMS Conference on OR Practice”. Why the change? This has been the topic of constant conversation recently, among the leaders of the society, as well as among the attendees of the conference. There are a number of possible answers, from jumping on a bandwagon, to trying to protect academic turf, to trying to let “data geeks” know that there’s a whole world of “advanced” analytics beyond “just” predictive modeling.

I think all of those are right, and justifiable, despite the pejorative slant. SAS’ hierarchy does define a useful progression among useful analytic skills. INFORMS recently hired consultants to help them figure out how to place themselves, and identified a similar set of overlapping distinctions:

  • Descriptive Analytics — Analysis and reporting of patterns in historical data
  • Predictive Analytics — Predicts future trends, finds complex relationships in data
  • Prescriptive Analytics — Determines better procedures and strategies, balances constraints

They also have been using “Advanced Analytics” for the Predictive and Prescriptive categories.

I do like these definitions. But do I like the OR professional society trying to add Predictive Analytics to the scope of their domain, or at least of their Business-focused conference? I’m on the fence. It’s clearly valuable to link optimization to prediction, in business as well as other sorts of domains. (In fact, I have a recent Powerpoint slide that says “You can’t optimize what you can’t predict”!) And crosstalk among practitioners of these fields can be nothing but positive. I certainly have learned a lot about appropriate technologies from my membership in a variety of professional organizations.

But the whole scope of “analytics” is a lot of ground, and the underlying research and technology spans several very different fields. I’d be surprised if there were more than a dozen people at INFORMS with substantial expertise in text mining, for example. There almost needs to be a new business-focused advanced analytics conference, sponsored jointly by the professional societies of the machine learning, statistics, and OR fields, covering everything that businesses large and small do with data that is more mathematically sophisticated (though not necessarily more useful) than the material covered by the many business intelligence conferences and trade shows. Would that address the problem of advanced analytics better than trying to expand the definition of OR?