Big Data Analytics

Thursday, August 19, 2010

Part 2/3 of disaster estimation - Understanding the expected monetary loss

In part 1 and part 1b of this series, we reviewed some of the ways in which disaster estimation modelers go about estimating the probability of occurence of a catastrophic event. The next phase is the estimation of expected dollar losses when the catastrophe does take place. What would be some of the impacts on the economic activity within a region and how widespread would be the impacts?

This is where the 'sexiness' of model building techniques meets the harsh realities of extensive ground work and data gathering. When a disaster does occur, the biggest disruptions are usually to life and property. Then there are additional longer term impacts to the economic activity of the region and this is driven both directly by the damage to life and property and also indirectly by the impacts to business continuity and ultimately by the confidence that consumers and tradespeople alike continue to have about doing business in the region. Lets examine this one piece at a time.

The disruptions to life and property can be examined by the number of dwellings or business properties that are built specifically to resist the type of disaster event we are talking about. In the case of fires, it is the number of properties that are built with the right building codes that are built under the right safety codes. This type of information requires some gathering but is publicly available information from the property divisions of several counties. In the case of hurricanes, it can be the number of houses that are constructed after a certain year when stricter building codes started to be enforced. This type of data gathering is extremely effort intensive but is often the difference between a good approximate model versus a really accurate model that can be used for insurance pricing decisions. With a competitive market like insurance where there are many companies operating essentially on price, the ability to build accurate models is a powerful edge.

The damage to life often has a very direct correlation to the amount of property damage. Also with the early warning systems in place ahead of disasters (except earthquakes, I suppose), it has become quite common to have really large disasters like hurricanes not resulting in any major loss of life. One significant example was Hurricane Katrina where more than a thousand people lost their lives in the Gulf Coast area and particularly in New Orleans.

In the next article in the series, I will provide an overview of the ReInsurance market. Which is where a lot of this probabilistic modeling ultimately gets applied.

Saturday, July 17, 2010

Interesting links from Jul 17, 2010

1.The over-stated role of banking in the larger economy (Link here)

2. A very interesting article on the original monetary expansionist, John Law (Link here)

3. My latest area of passion, text mining and analytics. A blog entry from SAS. (Link here)

4.Commentary from Prof.Rajan on the inequality in US income and its inevitable lead to a crisis. His analysis on how income inequality forces asset-price inflation is fascinating (Link here)

Tuesday, July 13, 2010

Disaster estimations - Part 1b/3 Understanding the probability of disaster

Part 1 of my post on modeling catastrophic risk covered measuring the probability that a risk even can occur. This probability can derived based on empirical evidence as well as from other computer models that underlie destructive forces of nature. A good example of a paper that talks about how such a model is built and used is outlined in this paper by Karen Clark, a renowned catastrophic risk modeler and insurer. The paper

was a seminal one when it came out as it outlined a scientific method by which such risks could be estimated. The paper is titled "A formal approach to catastrophe risk assessment and management" and the link is here.

The paper outlines an approach to estimate losses from hurricanes impacting the US Gulf coast and the East Coast. The model has a probability assessment for hurricanes making landfall, developed using historical information (going back to about 1910) from the US Weather Service. While this is a great starting point and helps us get to a good estimate of at least a range of losses one can expect and therefore the insurance premiums one should expect to sell, there are important places where the model can be improved. One example is the cyclical nature of hurricane intensity over the last 100 years. Between 1950 and 1994, the Atlantic hurricanes have run through a benign cycle. Hurricane activity and intensity (as measured by the number of named storms and the number of major hurricanes, respectively) have shown an increase since 1994, though. So a model relying on activity from the 1950-1994 period is likely to be off in its loss estimates by more than 20%. See the table for what I am talking about.

How can a modeler correct for such errors in estimates? One way to correct for these estimates is to use the latest in scientific technology and modeling in estimating the probabilities. Developments in scientific understanding of phenomena such as hurricanes means that it is now possible to build computer models that replicate the physics behind the hurricanes. The dynamic physical models incorporate some of the more recent understanding of world climatology, such as the link between Sea Surface Temperatures or SSTs and hurricane intensity. Using some of these models, researchers have been able to replicate the increase in hurricane intensity seen in the last fifteen years in a way that the empirical models built prior to this period have not been able to. The popular science book about global warming called Storm World by Chris Mooney spells out these two different approaches to hurricane intensity estimation, and the conflicts between the chief protagonists of each of these approaches. Based on the recent evidence at least, the more physics based approach certainly appears to be tracking closer to the rapid changes to hurricane intensity. William Gray of Colorado State University, whose annual hurricane forecast has been lucky for many years has been forced to re-fit his empirical model for the rapid increase in hurricane intensity post-1995.

Finally, I leave you with another note about how some of the dynamic physical models work. This is from one of my favourite blogs which is Jeff Masters' tropical weather blog. The latest entry talks precisely about such a dynamic physical model built by the UK Met Office. And I quote:

it is based on a promising new method--running a dynamical computer model of the global atmosphere-ocean system. The CSU forecast from Phil Klotzbach is based on statistical patterns of hurricane activity observed from past years. These statistical techniques do not work very well when the atmosphere behaves in ways it has not behaved in the past. The UK Met Office forecast avoids this problem by using a global computer forecast model--the GloSea model (short for GLObal SEAsonal model). GloSea is based on the HadGEM3 model--one of the leading climate models used to formulate the influential UN Intergovernmental Panel on Climate Change (IPCC) report. GloSea subdivides the atmosphere into a 3-dimensional grid 0.86° in longitude, 0.56° in latitude (about 62 km), and up to 85 levels in the vertical. This atmospheric model is coupled to an ocean model of even higher resolution. The initial state of the atmosphere and ocean as of June 1, 2010 were fed into the model, and the mathematical equations governing the motions of the atmosphere and ocean were solved at each grid point every few minutes, progressing out in time until the end of November (yes, this takes a colossal amount of computer power!) It's well-known that slight errors in specifying the initial state of the atmosphere c

an cause large errors in the forecast. This "sensitivity to initial conditions" is taken into account by making many model runs, each with a slight variation in the starting conditions which reflect the uncertainty in the initial state. This generates an "ensemble" of forecasts and the final forecast is created by analyzing all the member forecasts of this ensemble. Forty-two ensemble members were generated for this year's UK Met Office forecast. The researchers counted how many tropical storms formed during the six months the model ran to arrive at their forecast of twenty named storms for the remainder of this hurricane season. Of course, the exact timing and location of these twenty storms are bound to differ from what the model predicts, since one cannot make accurate forecasts of this nature so far in advance.

The grid used by GloSea is fine enough to see hurricanes form, but is too coarse to properly handle important features of these storms. This lack of resolution results in the model not generating the right number of storms. This discrepancy is corrected by looking back at time for the years 1989-2002, and coming up with correction factors (i.e., "fudge" factors) that give a reasonable forecast.

If you go to the web-page of the UK Met Office hurricane forecast, you can find a link of interest Reinsurance companies. This link is to buy the hurricane forecast which the UK Met Office has obviously gone to great pains to develop. Their brochure on how the insurance industry could benefit from this research makes for very interesting reading as well.

Tuesday, June 15, 2010

The BP oil spill and the disaster estimations - Part 1/3

The BP oil spill is already the biggest oil spill in the US and is on its way to becoming an unprecedented industrial disaster, given the environmental impact of millions of barrels of oil gushing into the Gulf of Mexico. Even the most hardened of carbon lovers cannot but be moved at the sight of the fragile wildlife in the Gulf literally soaking in the oil. The ecosystem of the Gulf states which were already ravaged by unrestrained development and the odd super-cyclone is now being struck a death blow by the spewing gusher.

Could the specific chain of events leading up to this spill have been predicted? The answer is no. But that doesn't mean that the outcome could not have been anticipated. Given the technological complexity that some of the deep-sea oil drilling operations typically involve, there was always a measurable probability that one of the intermeshing systems and processes would give way and result in an oil-well that was out of control. As Donald Rumsfeld, Secretary of Defense in the Bush II administration put it, stuff happens. But where there has been an abject failure of human science and industrial technology has been in underestimating the impact of this kind of an event on a habitat and overestimating the power of technology to fix these kinds of problems.

Fundamentally, the science of estimating the impact of disasters can be broken do

wn into three estimations:
one, an estimation that failure occurs
second, the damage expected as a result of the failure
the third, (which is probably a function of the second) are our capabilities in fixing the failure or mitigating the impact of the failure.

In this post, I will discuss the first part of the problem - estimating the probability of failures occurring.

There is a thriving industry and a branch of mathematics that works on the estimation of these extremely low probability events known as Disaster Science. The techniques that the disaster scientists or statisticians use are based on the understanding of the specific industry (nuclear reactors, oil drilling, aerospace, rocket launches, etc.) and is constantly refreshed with the our increasing understanding of the physics or science in general underlying some of these endeavours. The nuclear-power industry's approach analyzes the engineering of the plant and tabulates every possible series of unfortunate events that could lead to the release of dangerous radioactive material, including equipment failure, operator error and extreme weather. Statisticians tabulate the probability of each disastrous scenario and add them together. Other industries, such as aviation, use more probability based models given the hundreds of thousands of data points available on a weekly basis. Then there are more probabilistic approaches such as tail probability estimation or extreme event estimation which uses math involving heavy-tailed distributions for the probability estimation of such events occurring. Michael Lewis in his inimitable style talked about this in an old New York Times article called In Nature's Casino.

One variable that is a factor and often the contributing factor in many such disasters is human error. Human error is extraordinarily difficult to model, just based on past behaviour because there are a number of factors that could just confound such a read. For instance, as humans encounter fewer failures, our nature is to become less vigilant and therefore at greater risk of failing. Both lack of experience and too much experience (especially without having encountered failures) are risky. The quality of the human agent is another variable that has wide variability. At one time, NASA had the brightest engineers and scientists from our best universities join. Now, the brightest and the best go to Wall Street or other private firms and it is often the rejects or the products of second-rung universities that make it to NASA. This variable of human quality is difficult to quantify or sometimes difficult to measure in a way that does not offend people on grounds like race, national origin, age and gender. Let us suppose that the brightest and the best joining NASA previously came from colleges or universities where admission standards required higher scores on standardized tests. Now we know that standardized test scores are correlated with the socio-economic levels of the test takers and hence to variables such as income, race, etc. So now if NASA goes to lower rung colleges, does it mean that it was being more exclusive and discriminatory before (by taking in people with average higher scores) and is now more inclusive now? And can we conclude that the drop in quality now is a direct function of becoming more inclusive on the admission criteria front? It is never easy to answer these questions or even tackle the question without feeling queasy about what one is likely to find while answering the question.

Another variable, again related to the human factor is the way we interact with technology. Is the human agent at ease with the technology confronting him or does he feel pressured and unsure from a decision making standpoint? I have driven stick-shift cars before and I have been more comfortable and at ease with the decision making around gear changes when the car-human interface was relatively simpler and spartan. In my most recent car, as I interact with multiple technology features such as the nav system, the bluetooth enabled radio, the steering wheel, the paddle shifter, the engine revs indicator, I find my attention diluted and I have seen that the decision making around gear changes is not as precise as it used to be.

Thursday, June 3, 2010

On Knightian Uncertainty

An interesting post appeared recently attempting to distinguish between risk and uncertainty. The view was proposed by an economist called Frank Knight. The theory proposed by Knight is that risk is something where the outcome is unknown but whose odds can be estimated. But when the odds become inestimable, risk turns to uncertainty. In other words, risk can be measured and uncertainty cannot.

There are economists who argue that Knight's distinction only applies in theory. In the world of the casino, where the probability of a 21 turning up or the roulette ball landing on a certain number can be estimated, it is possible to have risk. But anything outside simple games of probability becomes uncertainty because it is difficult to measure the uncertainty. The real world out there is so complex that it is indeed difficult to make even reasonably short term projections, let alone the really long term ones. So what is really the truth here? Does risk (as defined by Knight) even exist in the world today? Or as the recent world events (be it 9/11, the Great Recession, the threatened collapse of Greece, the oil spill in the Gulf of Mexico, the unpronounceable Icelandic volcano) have revealed, it is a mirage to try and estimate the probability of something playing out with remotely close to the kinds of odds we initially estimate.

I have a couple of reactions. First, my view is that risk can be measured and outcomes predicted more or less accurately under some conditions in the real world. When forces are more or less in equilibrium, it is possible to have some semblance of predictability about political and economic events. And therefore an ability to measure the probability of outcomes happening. When forces disrupt that equilibrium and the disruptions may be caused by the most improbable and unexpected causes, then all bets are off. Everything we have learnt from the time when Knightian risk applied is no longer true and Knightian uncertainty takes over.

Second, this points to the need for the risk management philosophy (as it is applied to a business context) to not only consider what the system knows and can observe but also the risks that the system doesn't even know exist out there. That's where good management practices such as constantly reviewing positions, eliminating extreme concentrations (even if they appear to be value-creating concentrations), constantly questioning the cognitive thinking - can lead to a set of guardrails that a business can stay within. Now these guardrails may be frowned up and even may invite derision from those interested in growing the business during good times, as the nature of these guardrails are always going to be to try and avoid too much of a good thing. However, it is important for the practitioners of risk management to stay firm to their convictions and make sure the appropriate guardrails are implemented.

Tuesday, May 4, 2010

Interesting data mining links

1. The NY Times recently had a piece on how data is increasingly part of our life. Link here.

2. The Web Coupon - a new way for retailers to know more about you. Link here.

3. On Principal Components Analysis. Link here.

Saturday, May 1, 2010

The future of publishing - and a new business model

The demise of an ages-old business model and the emergence of a new one to take its place is always an exciting thing to watch - unless you are part of the age-old business model on its way to its demise. There are old assumptions challenged, changes in the way consumers consume, the emergence of a technology trigger, new financing patterns, new winners and losers. Fascinating to someone looking-in from the outside.

An industry that has pretty much been under attack since the coming of the Internet has been the print and the publishing business. But what threatened to be a slow roll of a snowball (obviously to be replaced with new ways of consuming and disseminating information) has taken the form of a rapidly growing avalanche after digitized books and the digital book reader (the Kindle, predominantly) have become mainstream. As is to be expected, there are powerful players working to pull the rug from under the feet of the big publishing and media companies. First Google with wanting to digitize every book ever published. Amazon then came with the Kindle that cut out printing costs from the value chain and make books much more affordable for end-consumers. Of course, the elimination of the printing, warehousing and the physical distribution process would mean massive job-cuts in the big publishing and printing houses, not to mention a necessary shrinking in the margins retained by the publisher from the printing price of the book.

An interesting article in the New Yorker talks about the demise of publishing at the hands of the digital giants in more detail. Link here Amazon, Apple and Google are the big digital players jockeying for position in this market. A few years back, Microsoft would have been a contender as well but repeated failures to crack the consumer space (where MS does not have a monopolist advantage) has resulted in a little more of circumspection.