<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-6088933028650286437</id><updated>2012-02-01T06:13:18.523-05:00</updated><category term='Simulations'/><category term='Causality'/><category term='Computing Infrastructure'/><category term='Extreme Values'/><category term='Research'/><category term='Risk Management'/><category term='Technology'/><category term='Disaster Estimation'/><category term='Statistics'/><category term='Systems Modeling'/><category term='Economics'/><category term='Probability'/><category term='Predictive Analytics'/><category term='Knowledge Workers'/><category term='Leverage'/><category term='Black Swans'/><category term='Correlation'/><category term='Reinsurance'/><category term='Loyalty Programs'/><category term='Testing'/><category term='Insurance'/><category term='Computing'/><category term='Recession'/><category term='regression'/><category term='Statistical'/><category term='Text Mining'/><category term='Great recession'/><category term='Markets'/><category term='Data Centers'/><category term='Prioritization'/><category term='Estimation'/><category term='flu'/><category term='Privacy'/><category term='Risk'/><category term='swine flu'/><category term='Outsourcing'/><category term='artificial intelligence'/><category term='bias'/><category term='Data Handling'/><category term='Sensitivity Analysis'/><category term='Stress Testing'/><category term='statistical inference'/><category term='Problem Solving'/><category term='cdc'/><category term='Financial Regulation'/><category term='Principal Component Analysis'/><category term='Physics'/><category term='Macroeconomics'/><category term='Data Mining'/><category term='Uncertainty'/><category term='Occam&apos;s Razor'/><category term='Modeling'/><category term='Machine learning'/><category term='Data visualization'/><category term='Big Data'/><category term='credit downturn'/><category term='Investing'/><category term='Chemical Industry'/><category term='Simpsons Paradox'/><category term='Decision Making'/><category term='Retailing'/><category term='Too Big To Fail'/><category term='Law of Large Numbers'/><category term='Exploratory Data Analysis'/><category term='Design of Experiments'/><category term='Wall Street'/><category term='Fat-tailed probabilities'/><category term='Things to read series'/><category term='Model Validation'/><category term='Budget Deficit'/><category term='Optimization'/><title type='text'>Statistics Exchange</title><subtitle type='html'>A place where mathematical models, statistical inference, the science of decision making come together to make the future a better place</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>79</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-3420812384967417745</id><published>2012-01-11T23:08:00.000-05:00</published><updated>2012-01-11T23:08:13.135-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Testing'/><category scheme='http://www.blogger.com/atom/ns#' term='Design of Experiments'/><category scheme='http://www.blogger.com/atom/ns#' term='Causality'/><category scheme='http://www.blogger.com/atom/ns#' term='Predictive Analytics'/><title type='text'>Wooga, gaming and the power of randomized testing - part 1/3</title><content type='html'>&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;I read this really impressive piece in the &lt;a href="http://www.wired.co.uk/magazine/archive/2012/01/features/test-test-test?page=all"&gt;&lt;u&gt;Wired UK magazine&lt;/u&gt;&lt;/a&gt; that highlighted the need for systematic randomized testing. The piece is about &lt;a href="http://www.wooga.com/"&gt;&lt;u&gt;wooga&lt;/u&gt;&lt;/a&gt;, the social gaming company which uses randomized testing to develop its game ideas and concepts. Wooga is a competitor of Zynga and the conventional video gaming giant, EA.&lt;br /&gt;&lt;br /&gt;The randomized testing idea is core to the idea of predictive analytics. In the absence of randomized testing, a couple of issues could arise. The first is that the object being measured could be incorrectly estimated. The second is incorrect attribution of the effect that is being studied to causal factors. The incorrect attribution arises from the fact that non-randomized data is typically biased or “shaped” by specific conditions or factors present when the data was generated. So for example, if one were to take a week of shopping information to study a marketing promotion in the retail industry, and that week that was picked at random happened to be a few days before Thanksgiving, one might be mislead into thinking that a. Turkey sales make up a high percentage of the overall shopping basket in general and b. that skew in the shopping basket distribution was caused by the promotion.&lt;br /&gt;&lt;br /&gt;A number of analytically advanced companies have embraced this idea of randomized testing. One of the early pioneers in the financial services area was Capital One (check this really &lt;a href="http://hbr.org/2009/02/how-to-design-smart-business-experiments/ar/1"&gt;&lt;u&gt;neat link&lt;/u&gt;&lt;/a&gt; out from the HBR), but the idea since is being used by practically every financial services company. Financial services of course is a fertile ground for randomized testing because of the large number of transactions that create a deep pool of data that lends itself to really powerful statistical modeling.In a couple of subsequent posts, I will talk about some best practices in this area and also some pitfalls to watch out for.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-3420812384967417745?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/3420812384967417745/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=3420812384967417745' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/3420812384967417745'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/3420812384967417745'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2012/01/wooga-gaming-and-power-of-randomized.html' title='Wooga, gaming and the power of randomized testing - part 1/3'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-5775798658116907007</id><published>2012-01-06T00:21:00.000-05:00</published><updated>2012-01-06T00:21:53.975-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Great recession'/><category scheme='http://www.blogger.com/atom/ns#' term='artificial intelligence'/><category scheme='http://www.blogger.com/atom/ns#' term='Computing'/><title type='text'>Some interesting links</title><content type='html'>&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;Prof.Raghu Rajan from the University of Chicago has written this interesting piece on the build-up that finally ended in the financial crisis from the &lt;a href="http://forums.chicagobooth.edu/n/blogs/blog.aspx?nav=main&amp;amp;webtag=faultlines&amp;amp;entry=45"&gt;&lt;u&gt;Fault Lines&lt;/u&gt;&lt;/a&gt; blog. He explains lucidly why Keynesian spending might not work as well this time around.&lt;br /&gt;&lt;br /&gt;The other interesting write-up is about secular shifts in the unemployment picture in developed countries and some of the underlying causes. From the &lt;a href="http://www.technologyreview.com/computing/39319/?p1=MstRcnt"&gt;&lt;u&gt;MIT Technology Review.&lt;/u&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Excellent food for thought along with this piece from Joseph Stiglitz that I summarized &lt;a href="http://stat-exchange.blogspot.com/2011/12/great-recession-new-theory-linked-to.html"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-5775798658116907007?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/5775798658116907007/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=5775798658116907007' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/5775798658116907007'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/5775798658116907007'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2012/01/some-interesting-links.html' title='Some interesting links'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-526599775266391582</id><published>2012-01-05T15:59:00.001-05:00</published><updated>2012-01-05T22:28:26.511-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Big Data'/><category scheme='http://www.blogger.com/atom/ns#' term='Predictive Analytics'/><title type='text'>Big analytics or big bubble?</title><content type='html'>&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;As a practitioner of predictive analytics over the last 8-10 years, it is fascinating to sit back and observe the way the field is gathering attention and importance. Most recently, NPR did a piece on analytics. The NPR pieces were about Gary Loveman who is now the CEO of Caesar Entertainment - it is a podcast and the link is &lt;a href="http://www.npr.org/blogs/money/2011/11/15/142366953/the-tuesday-podcast-from-harvard-economist-to-casino-ceo?ps=rs"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;. I loved this line from the story:&lt;br /&gt;&lt;i&gt;&lt;/i&gt;&lt;br /&gt;&lt;b&gt;&lt;i&gt;There are three things that can get you fired from Caesars: Stealing, sexual harassment and running an experiment without a control group.&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;NPR followed up with other couple of pieces&amp;nbsp; - one was an opener on &lt;a href="http://www.npr.org/2011/11/29/142521910/the-digital-breadcrumbs-that-lead-to-big-data"&gt;&lt;u&gt;Big Data&lt;/u&gt;&lt;/a&gt; and the other was the &lt;a href="http://www.npr.org/2011/11/30/142893065/the-search-for-analysts-to-make-sense-of-big-data"&gt;&lt;u&gt;search for analytic talent&lt;/u&gt;&lt;/a&gt; that can make sense of this Big Data. What made the second article somewhat quirky was that it profiled DJ Patil, a mathematician who searches and tracks and ultimately recruits these data geniuses using - you guessed it, data!&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;So did the &lt;a href="http://online.wsj.com/article/SB10001424052970203462304577138961342097348.html"&gt;&lt;u&gt;WSJ&lt;/u&gt;&lt;/a&gt;, just yesterday. The WSJ talked about two analytics consulting firms: Mu Sigma and Opera Solutions, getting substantial amounts of venture funding (think large 8-digit numbers) to expand in this space.&lt;br /&gt;&lt;br /&gt;This is the point where I start to get really worried about big data and the b-word. Is this big data, social media, mega-analytics a rapidly building bubble that is bound to pop in some way and leave more than a few people disappointed? I have no doubts in my mind that the idea is powerful and transformational - the idea being that predictive analytics sitting on top of data can make better business decisions, drive better customer insights and resource efficiencies and improve life for us in a holistic kind of way.&lt;br /&gt;&lt;br /&gt;But when big media gets on the bandwagon and the subject goes from being talked in technical journals to being talked by the WSJ and NPR, I begin to smell some frothiness. &lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-526599775266391582?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/526599775266391582/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=526599775266391582' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/526599775266391582'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/526599775266391582'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2012/01/big-analytics-or-big-bubble.html' title='Big analytics or big bubble?'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-7608037075849717533</id><published>2011-12-28T23:46:00.001-05:00</published><updated>2011-12-28T23:46:21.403-05:00</updated><title type='text'>Car Insurance savings and too-clever marketing</title><content type='html'>&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;A quick rant post.&lt;br /&gt;&lt;br /&gt;I have been reflecting a bit on GEICO, Progressive and others claiming how you can save a lot of money (15%, so many dollars) by switching to that company. A highly deceptive form of advertising and here's why.&lt;br /&gt;&lt;br /&gt;First, to start off, the marketing message taken at face-value seems to imply causation - switch to company X and you will save money. &lt;i&gt;&lt;u&gt;In reality, the sequence of events is the opposite.&lt;/u&gt;&lt;/i&gt; People typically shop for a quote and then when they find the quote saving them money over what they currently have, they switch over. So it is likely that for every person who switches, there are one or more people who don't switch because they don't save any money or they save too little for it to be worth the hassle. So to say that switch and you will save money is somewhat disingenuous. Only some people save money with Company X and they are the ones that switch.&lt;br /&gt;&lt;br /&gt;The second part of deception comes in the dollar amount of the switch. The way this information is gathered is typically by surveying customers that have switched. Why is this deceptive? Well, because a number of behavioral economists studies have shown that we human beings tend to rationalize. We tend to give ourselves more credit than necessary or justifiable in general. This manifests itself in a number of ways such as most people thinking they are above-average drivers, people over-estimating investment returns they make and so on. So when a customer has made the (what the customer thinks) is the extremely smart decision to switch, they are likely to also over-estimate the savings that they have realized as they are proud of the switch decision they just took. And so it is very likely that the savings number is inflated to some extent. &lt;br /&gt;&lt;br /&gt;So save x% by switching to GEICO is actually a smart ploy to get people to ask for a GEICO quote. Doesn't hurt at all to get one, in an extremely crowded market-place. But promising savings in the language that these companies use doesn't seem very above board.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-7608037075849717533?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/7608037075849717533/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=7608037075849717533' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/7608037075849717533'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/7608037075849717533'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2011/12/car-insurance-savings-and-too-clever.html' title='Car Insurance savings and too-clever marketing'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-1875693680485621562</id><published>2011-12-12T11:14:00.002-05:00</published><updated>2011-12-12T11:24:59.195-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Great recession'/><category scheme='http://www.blogger.com/atom/ns#' term='Macroeconomics'/><title type='text'>Great Recession - A new theory linked to productivity improvement</title><content type='html'>&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;I &lt;a href="http://stat-exchange.blogspot.com/2009/06/great-escape-or-great-deception.html"&gt;wrote&lt;/a&gt; a couple of years back on what has come to be known as the Great Recession of the twenty-first century. I remarked that the recession appears to show no signs of abating and recent events seems to have borne that out. While GDP growth in the US is in positive territory, it barely is. And the problems in Europe and a couple of natural disasters affecting Asia (the earthquake in Japan and the flooding in Thailand) have put brakes on the emerging markets engine that was pulling the world economy along for the last 4 years.&lt;br /&gt;&lt;br /&gt;In the meantime, a number of well-argued articles and books have been written about the genesis of the crisis, and they have largely focused on the financial sector, the US mortgage market and the excesses there. The Nobel Prize winning economist, Joseph Stiglitz, &lt;a href="http://www.vanityfair.com/politics/2012/01/stiglitz-depression-201201"&gt;approaches this issue&lt;/a&gt; from a slightly different angle in a recent write-up in Vanity Fair. Stiglitz argues that the Great Recession has its roots in something more benign than mortgages gone toxic. It lay in the productivity increases in the last two decades and caused a large number of job categories employing very large portions of the labor force to basically become redundant in the economy. What is interesting about this theory is that (Stiglitz argues) this is exactly what happened leading up to the Great Depression. The productivity improvements now are in the areas of manufacturing and services and the productivity improvement then was in agriculture.To quote, &lt;i&gt;In 1900, it took a large portion of the U.S. population to produce enough food for the country as a whole. Then came a revolution in agriculture that would gain pace throughout the century—better seeds, better fertilizer, better farming practices, along with widespread mechanization. Today, 2 percent of Americans produce more food than we can consume.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Extremely interesting article and a forcefully made argument on the cause of the crisis and what could be done to solve it.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-1875693680485621562?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/1875693680485621562/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=1875693680485621562' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/1875693680485621562'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/1875693680485621562'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2011/12/great-recession-new-theory-linked-to.html' title='Great Recession - A new theory linked to productivity improvement'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-17105839608945913</id><published>2011-12-10T05:27:00.001-05:00</published><updated>2011-12-10T07:50:13.233-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Machine learning'/><category scheme='http://www.blogger.com/atom/ns#' term='Big Data'/><category scheme='http://www.blogger.com/atom/ns#' term='artificial intelligence'/><category scheme='http://www.blogger.com/atom/ns#' term='Predictive Analytics'/><title type='text'>Computing based on the human brain - the answer to Big Data?</title><content type='html'>&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;A slight detour from my usual subjects around predictive analytics. I came across &lt;a href="http://www.nytimes.com/2011/12/06/science/creating-artificial-intelligence-based-on-the-real-thing.html?"&gt;this recent article&lt;/a&gt; that is prescient of the direction of modeling and predictive analytics in general. And that is the move away from the current model of computer design, based on the famous von Neumann architecture to something that is much more similar to the thing computing and modeling and decision making are ultimately designed to emulate, viz. the human brain.&lt;br /&gt;&lt;br /&gt;&lt;table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right; margin-left: 1em; text-align: right;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-7AxOgkis5X4/TuM2vad3X-I/AAAAAAAACtU/a8vUemL2bmY/s1600/IBM-Watson.jpeg" imageanchor="1" style="clear: right; margin-bottom: 1em; margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="233" src="http://2.bp.blogspot.com/-7AxOgkis5X4/TuM2vad3X-I/AAAAAAAACtU/a8vUemL2bmY/s320/IBM-Watson.jpeg" width="320" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;IBM Watson - Super-computer or energy hog?&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;First some background. Computer architecture has consistently followed the classic &lt;a href="http://en.wikipedia.org/wiki/Von_Neumann_architecture"&gt;von Neumann architecture&lt;/a&gt;. Without getting into too many details, what the architecture boils down to is a separate processing unit (known variedly as CPU, ALU, microprocessor) and a separate memory unit, both connected by a communication channel called a Bus. This architecture has served computing well over the past 50 years, and now has brought the computer within access of every single human being on Earth. The fact that 2-year old toddlers are extremely adept with the Apple iPad is testimony to the success of the von Neumann model. After all, nothing succeeds like success. Even as processor chips have become more advanced and started incorporating their own internal memory module (called cache memory), the von Neumann architecture has been faithfully replicated. But successful doesn't mean ideal or optimal or even efficient. The burn of the laptop on my thigh as I type this post is indication that the current computing model, while successful, is also an extremely power-hungry one. The IBM- Watson machine, famous for playing and beating human opponents in Jeopardy, is also famous for consuming 4000 times the power of its human competitors. The human brain functions with about 20 watts of power while Watson consumes more than 85,000 watts. And all that Watson can do is play Jeopardy. The human brain can do a lot more like writing, recognizing pattern, expressing and feeling emotion, negotiating traffic, even designing computers!&lt;br /&gt;&lt;br /&gt;So what might a more efficient model look like? Well, it looks a little more like the human brain. The human brain has both logical problem solving, thinking as well as memory managed through one element of computing infrastructure, so to speak, which is the neuron interconnected through synapses. And that is the model that is being pursued by IBM in collaboration with Cornell, Columbia, the University of Wisconsin and the University of California, Merced. The project is also funded by DARPA, and more details can be found at the link at the start of the page. The big a-ha moment according to the project director and IBM computer scientist, Dharmendra Modha (in the middle of vacation, no less) was to drive the human-brain driven computing project through the fundamental design of the processor chip or the hardware rather than through software. To quote some details from the New York Times article by Steve Lohr,&lt;br /&gt;&lt;i&gt;The prototype chip has 256 neuron-like nodes, surrounded by more than 262,000 synaptic memory modules. That is impressive, until one considers that the human brain is estimated to house up to 100 billion neurons. In the Almaden research lab, a computer running the chip has learned to play the primitive video game Pong, correctly moving an on-screen paddle to hit a bouncing cursor. It can also recognize numbers 1 through 10 written by a person on a digital pad — most of the time.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Why is this relevant to predictive analytics? &lt;/b&gt;&lt;br /&gt;What is a mention of this project doing in a predictive analytics blog? It has to do with Big Data. Online, mobile, geo-spatial and RFID technologies are creating streams of data in amounts that would have been impossible to conceptualize even a decade back. As the availability of data increases and the power of conventional computing infrastructure and storage infrastructure gets overwhelmed, we will have to rely on a distributed memory storage and computing set-up that is more similar to the human brain. A space worth watching.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-17105839608945913?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/17105839608945913/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=17105839608945913' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/17105839608945913'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/17105839608945913'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2011/12/computing-based-on-human-drain-answer.html' title='Computing based on the human brain - the answer to Big Data?'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-7AxOgkis5X4/TuM2vad3X-I/AAAAAAAACtU/a8vUemL2bmY/s72-c/IBM-Watson.jpeg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-7160551392277243606</id><published>2011-12-08T23:32:00.001-05:00</published><updated>2011-12-08T23:35:07.405-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Loyalty Programs'/><category scheme='http://www.blogger.com/atom/ns#' term='Predictive Analytics'/><category scheme='http://www.blogger.com/atom/ns#' term='Modeling'/><title type='text'>Tesco Clubcard - Metrics and Success Factors</title><content type='html'>&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;Getting back to this subject after a really long break. In the first part on this subject, we reviewed Tesco’s loyalty program and the types of business decisions aided by the Clubcard. The Tesco crucible maintains information about:&lt;br /&gt;1. Customer demographics&lt;br /&gt;2. Detailed shopping history&lt;br /&gt;3. Purchase tastes, frequency, habits and behaviours&lt;br /&gt;4. Other individual level indicators obtained from public sources&lt;br /&gt;&lt;br /&gt;Tesco then uses this information for a number of business benefits such as:&lt;br /&gt;1. Loyalty&lt;br /&gt;2. Cross Sells&lt;br /&gt;3. More optimal inventory and store network planning&lt;br /&gt;4. Optimal targeting and marketing of manufacturer’s promotions&lt;br /&gt;5. Generating customer insights and marketing those insights&lt;br /&gt;&lt;br /&gt;The link to the previous article that details out these points is &lt;a href="http://stat-exchange.blogspot.com/2011/08/analyzing-tesco-analytics-behind-top.html"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;&lt;link 08="" 2011="" analyzing-tesco-analytics-behind-top.html="" http:="" stat-exchange.blogspot.com=""&gt;&lt;/link&gt;&lt;br /&gt;&lt;br /&gt;So what else goes into making this program successful?&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Metrics&lt;/b&gt;&lt;br /&gt;One important factor is the metrics used by Tesco to measure success. Primarily two metrics. The first is the change in share of wallet. Based on demographic information collected, Tesco has an estimate of the total spend of that household. Based on that estimate, a share of wallet can be computed based on Tesco sales. This is of course an estimated measure, but given the right kinds of assumptions, not a particularly ambitious estimate to make.&lt;br /&gt;&lt;br /&gt;(The key here is make sure the estimates are generated in an unbiased manner. An estimated metric is always prone to manipulation. For instance, a small increase in unit sales can be projected to be a larger increase in share of wallet, by manipulating the projected overall spend. This problem can be avoided if the estimation is done by an independent group that is incentivised to get its estimates right and not as much on the volume of sales. This is the role of Decision Sciences groups found in many organizations.)&lt;br /&gt;&lt;br /&gt;A related measure of share of wallet is the number of purchase categories into which Tesco has penetrated. Remember that Tesco is present in many purchase categories such as groceries, apparel, durables, banking products, vacation packages, insurance, auto sales, pharmacy products, gas, etc. Effectiveness of the Tesco brand is realized when the customer begins to use Tesco for multiple product categories. So that is a useful metric to track, both as an indication of overall profitability as well as marketing and cross-sell effectiveness.&lt;br /&gt;&lt;br /&gt;The second main metric being measured is just pure customer behaviour from a frequency standpoint. How is the company changing the frequency of visit of customers, and what sorts of visits are they getting from them? Of course, with the wide use of smart phones and the tracking devices which are inherent in these phones, it is possible to gather a lot of spatial and temporal information such as: Which store? Duration of the visit? At what time of the day or week?&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Other Success factors&lt;/b&gt;&lt;br /&gt;No company can maintain sustained growth and profitability on the strength of purely analytics without addressing the human face of the analytics - in other words, the customer service aspect. Tesco management was clear to convey the message to the store staff that the Clubcard program was an important value-add for customers and hence an inherent part of customer service. That it wasn’t fundamentally manipulative. This was done through a communication program that was rolled out across all stores and that involved every store employee of Tesco.&lt;br /&gt;&lt;br /&gt;The other important success factor that was critical was management vision. Many organizations tend to see these programs as cost drivers&amp;nbsp; and strive to minimize cost while maximizing customer satisfaction, often conflicting goals. But the Tesco management was clear about the ultimate goal of the Clubcard which is to drive loyalty. What also helped was the breadth of vision that allowed for multiple revenue streams from the ClubCard program that were not directly related to the core idea of give-back to the customers and loyalty benefits.&lt;br /&gt;&lt;br /&gt;Another philosophy that the Tesco management employed fairly successfully was test-and-learn. Most of the major improvements and enhancements were first piloted in smaller stores. Extremely rigorous measurement mechanisms were then employed to make sure that the right inferences were drawn from the test.&lt;br /&gt;&lt;br /&gt;Overall, the key realization was that the Clubard program is not just an electronic sales promotion, but rather the entire business has to be physically re-engineered to be customer-insight led.&lt;br /&gt;&lt;br /&gt;In my final piece, I will touch on the impact to the overall bottom-line - and the top-line benefits that came from the Clubcard program.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-7160551392277243606?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/7160551392277243606/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=7160551392277243606' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/7160551392277243606'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/7160551392277243606'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2011/12/tesco-clubcard-metrics-and-success.html' title='Tesco Clubcard - Metrics and Success Factors'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-1528018007090531784</id><published>2011-08-21T00:44:00.000-04:00</published><updated>2011-08-21T00:44:37.321-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Technology'/><category scheme='http://www.blogger.com/atom/ns#' term='Data Mining'/><category scheme='http://www.blogger.com/atom/ns#' term='Predictive Analytics'/><title type='text'>Analyzing Tesco - the analytics behind a top-notch loyalty program</title><content type='html'>&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;My specific interest within predictive analytics have been as much about the technology and the data mining techniques that can be applied to the data, as much as it has been about the business value that can be extracted from the data. With this second interest in mind, I am going to embark on a series of a different kind of blog posts.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://2.bp.blogspot.com/-pV0Fwe6hmwc/TlCLQjNJUkI/AAAAAAAACsg/2M0MteOym5E/s1600/Tesco-Clubcard.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"&gt;&lt;img border="0" height="139" src="http://2.bp.blogspot.com/-pV0Fwe6hmwc/TlCLQjNJUkI/AAAAAAAACsg/2M0MteOym5E/s200/Tesco-Clubcard.jpg" width="200" /&gt;&lt;/a&gt;Instead of mostly talking about theory, I am going to share examples of how companies are using the power of analytics to know their customers better, anticipate their needs and ultimately become more profitable. One of the beacons in this space on whom many a volume has been written is the Tesco, the UK (and now increasingly international) retailing giant. What I am going to cover in this piece is the Tesco loyalty card, how it works and the different ways in which a retailer can take advantage of the information base created by the card to generate economic value.&lt;br /&gt;&lt;br /&gt;First some background. Tesco hired the marketing firm of dunnhumby to develop a new loyalty program, to enable it to grow in the UK market. By 1995, the Clubcard launched with nearly instant success as Tesco enjoyed a large increase in customer loyalty and retention. Within the first five years sales had risen over 50%.&lt;br /&gt;&lt;br /&gt;&lt;u&gt;&lt;b&gt;The structure of the card, and how the data is collected&lt;/b&gt;&lt;/u&gt;The data gathering for the loyalty program starts with a typical application which might ask for some basic demographic information such as address, age, gender, the number of members in a household and their ages, dietary habits.. Against this basic information, purchase history is appended. This includes the goods shopped for, and also information such as visit history, both to stores and online.&lt;br /&gt;&lt;br /&gt;Next, a number of summary attributes are also computed. These include share of wallet information, information on frequency and duration of visits. Also information on customer preferences and tastes, as determined by some clever cluster analysis based on purchase history of specific fast-moving products. See &lt;a href="http://blog.ouseful.info/2008/11/06/the-tesco-data-business-notes-on-scoring-points/"&gt;&lt;u&gt;this link&lt;/u&gt;&lt;/a&gt; for a review of a book describing the Tesco loyalty program called “Scoring Points”.&lt;br /&gt;&lt;br /&gt;Tesco realized that better information leads to better results and created Crucible—a massive database of not only applicant information and purchase history, but also information purchased and collected elsewhere about participating consumers. Credit reports, loan applications, magazine subscription lists, Office for National Statistics, and the Land Registry are all sources of additional information that is stored in Crucible.&lt;br /&gt;&lt;br /&gt;To summarize, Tesco maintains information about:&lt;br /&gt;1. Customer demographics&lt;br /&gt;2. Detailed shopping history&lt;br /&gt;3. Purchase tastes, frequency, habits and behaviours&lt;br /&gt;4. Other individual level indicators obtained from public sources&lt;br /&gt;&lt;br /&gt;Creating this database is an undertaking in itself. Many organizations realize the value of such detailed data and are able to spend the resources to get it; however, they do such a bad job of integrating the data and making it available to analysts that only a fraction of the power in the data is realized.&lt;br /&gt;&lt;br /&gt;&lt;u&gt;&lt;b&gt;Technology challenges&lt;/b&gt;&lt;/u&gt;What were some of the challenges faced from a technology standpoint? To start with, one of scale. Specifically, how to scale up from an analytical lab scale to servicing 10 million customers. In the words of Clive Humby of dunnhumby, “we're very pragmatic, so to begin with, we worked on a sample of data. We'll find the patterns in a sample, and then look for that pattern amongst everybody, rather than just trying to find it in this huge data warehouse.”&lt;br /&gt;&lt;br /&gt;Sir Humby has revealed some interesting insights in &lt;a href="http://www.customerthink.com/interview/clive_humby_tesco_shines_at_loyalty"&gt;&lt;u&gt;this interview.&lt;/u&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Tesco uses a hybrid mix of technologies: Oracle as the main data warehouse engine, SAS for the actual modeling, and White Cross and Sand Technology as the analytic engine for applying the learnings to larger volumes of data. Additionally, the technology group used a number of home-developed technologies and algorithms. This is a &lt;a href="http://techfortesco.blogspot.com/"&gt;&lt;u&gt;nice blog&lt;/u&gt;&lt;/a&gt; written by Nick Lansley about the technology choices made by Tesco - with some filtering, of course.&lt;a href="http://techfortesco.blogspot.com/"&gt;&lt;br /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;u&gt;&lt;b&gt;And finally, the business value or the economic benefits&lt;/b&gt;&lt;/u&gt;&lt;br /&gt;&lt;u&gt;1. Loyalty&lt;/u&gt;The first clear benefit is customer loyalty and the increased spend that comes from a customer moving most of their purchases on to Tesco. The loyalty program incentivizes customers to steer a greater share of their monthly grocery spend onto Tesco, which in turn explains the increase in market share for Tesco from about 15-20% to about 30% of the total UK market in the period from 1995 to about 2005.This is a clear objective of any loyalty program and Tesco delivers on the business objective brilliantly. Tesco does this by offering vouchers on associated products - so if a family is buying infant formula, it is quite a straightforward decision to offer them discounts on diapers and get the customer to move that part of the purchase also to Tesco.&lt;br /&gt;&lt;br /&gt;&lt;u&gt;2. Cross-sells&lt;/u&gt;The most immediate extension from increasing spend within one product category is cross-selling across product families. So an example of this would be (from the previous example) marketing a college-fund financial product to a family that has newly got into infant food and diapers purchases. The way Tesco would do this, I would imagine, is to have a family or customer level flag for “Has small children” or something of the sort. An alternative would be to see Disney Cruises to a family with small children. In this case, Tesco would not only collect a channel fee from Disney for selling their cruises through its site but also a premium for being demonstrably targeted in their marketing.&lt;br /&gt;&lt;br /&gt;&lt;u&gt;3. Inventory, distribution and store network planning&lt;/u&gt;The first two applications are about knowing consumer needs better and targeting available products and services more effectively. The next benefit from this data is from materials movement. By getting a precise handle on demand and particularly, anticipating demand spikes in response to promotions, the company can do an effective job with its demand planning and managing the distribution pipeline efficiently from the (edits begin) manufacturing points to the distribution centers.&lt;br /&gt;&lt;br /&gt;Also, based on the demographic (customer self-reported) and public information that is appended to the customer level database, a basis is created for inventory planning. So lets say Tesco wants to open a store in a region where there are a large number of families with young children residing, it becomes possible to anticipate the demand for baby products if a Tesco branch were to be opened in that region and stock up accordingly.&lt;br /&gt;&lt;br /&gt;&lt;u&gt;4. Optimal targeting and use of manufacturer promotions&lt;/u&gt;Another area of value for Tesco is optimal use of manufacturer’s promotions, such as either direct purchase discounts or one-for-many type schemes. At the outset, it might appear that retailers like Tesco would love manufacturer’s coupons and rebates. Woo wouldn’t like it if there was greater foot-traffic and purchase activity that came from a scheme, and all the cost was borne by the manufacturer? In reality though, things are never as simple as that. Retailer don't really want to run too many promotions, because managing promotions (displays, new labeling, frequent restocking, possible overstocking and the cost of damaged or expired inventory) is very labor intensive and also adds to the supply-chain costs.&lt;br /&gt;&lt;br /&gt;So one of the areas that Tesco specializes in is promotion optimization. Which means, given the 100s of promotions available at any given point in time, which 25-30 to pick and suugest a price to negotiate with the manufacturer. The optimization is based on:&lt;br /&gt;- The cost of running the promotion including inventory costs and labor costs&lt;br /&gt;- Local geography based factors - what kind of customers shop at a local store and what are their unique preferences&lt;br /&gt;- Ensuring there’s something for everyone - ensuring every customer has a fair chance of getting a few promotional offers, given their typical purchase behaviour&lt;br /&gt;&lt;br /&gt;&lt;u&gt;5. Consumer insight generation and marketing those insights&lt;/u&gt;A final area of economic value for Tesco is gleaning higher-level customer insights that other entities would be interested in. For example, Procter and Gamble would be EXTREMELY interested in knowing how households of different sizes and at different points of the economic spectrum buy and use laundry detergent. And how that use changes with seasons, over time and so on. Also, what is the propensity for such customers to buy and use related products such as, say, fabric softeners. &lt;br /&gt;&lt;br /&gt;Given Tesco’s vantage point and their detailed view of what a customer’s purchases really looks like, it becomes really easy for Tesco to glean such insights from the data and see the information to a bunch of interested parties. This is another source of economic value for Tesco.&lt;br /&gt;&lt;br /&gt;This post is getting really long - so let me stop here and summarize. We just discussed the types of data that is gathered by a top-notch loyalty program like Tesco’s and also what are all the sources of economic value from this data that Tesco gathers. In my next posts, I will talk about the potential value from such a program for Tesco and its comparable costs. What have been some of the unique and honestly hard-to-replicate factors that have helped Tesco succeed in this space. Also, what have been some of the competitive responses and how is this area evolving in the emerging SOcial, LOcal, MObile (or SoLoMo) world.&lt;br /&gt;&lt;br /&gt;A set of interesting links about Tesco's loyalty program.&lt;br /&gt;&lt;a href="http://www.guardian.co.uk/lifeandstyle/2003/jul/19/shopping.features"&gt;http://www.guardian.co.uk/lifeandstyle/2003/jul/19/shopping.features&lt;/a&gt;&lt;br /&gt;&lt;a href="http://www.customerthink.com/interview/clive_humby_tesco_shines_at_loyalty"&gt;http://www.customerthink.com/interview/clive_humby_tesco_shines_at_loyalty&lt;/a&gt;&lt;br /&gt;&lt;a href="http://blog.ouseful.info/2008/11/06/the-tesco-data-business-notes-on-scoring-points/"&gt;http://blog.ouseful.info/2008/11/06/the-tesco-data-business-notes-on-scoring-points/&lt;/a&gt;&lt;br /&gt;&lt;a href="http://techfortesco.blogspot.com/"&gt;http://techfortesco.blogspot.com/&lt;/a&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-1528018007090531784?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/1528018007090531784/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=1528018007090531784' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/1528018007090531784'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/1528018007090531784'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2011/08/analyzing-tesco-analytics-behind-top.html' title='Analyzing Tesco - the analytics behind a top-notch loyalty program'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-pV0Fwe6hmwc/TlCLQjNJUkI/AAAAAAAACsg/2M0MteOym5E/s72-c/Tesco-Clubcard.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-5216121470906041267</id><published>2011-08-03T22:53:00.003-04:00</published><updated>2011-08-03T22:56:05.741-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Model Validation'/><category scheme='http://www.blogger.com/atom/ns#' term='Data Mining'/><category scheme='http://www.blogger.com/atom/ns#' term='Predictive Analytics'/><category scheme='http://www.blogger.com/atom/ns#' term='Modeling'/><title type='text'>Tips for data mining - part 4 out of 4</title><content type='html'>&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;My labor of love which started nearly seven months back is finally drawing to a close. In previous pieces, I have talked about some of the lessons I have learned in the field of data mining. The first two pieces of advice which were covered in &lt;a href="http://stat-exchange.blogspot.com/2011/01/tips-for-data-mining-part-1-out-of-many.html"&gt;&lt;u&gt;this post&lt;/u&gt;&lt;/a&gt; were &lt;br /&gt;&lt;i&gt;1. Define the problem and the design of the solution&lt;/i&gt;&lt;br /&gt;&lt;i&gt;2. Establish how the tool you are building is going to be used&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;The next pieces were covered in &lt;a href="http://stat-exchange.blogspot.com/2011/03/tips-for-data-mining-part-2-out-of-many.html"&gt;&lt;u&gt;this post&lt;/u&gt;&lt;/a&gt; and they were&lt;br /&gt;&lt;i&gt;3. Frame the approach before jumping to the actual technical solution&lt;/i&gt;&lt;br /&gt;&lt;i&gt;4. Understand the data&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;In the &lt;a href="http://stat-exchange.blogspot.com/2011/06/tips-for-data-mining-part-3-out-of-4.html"&gt;&lt;u&gt;third post&lt;/u&gt;&lt;/a&gt; in this epic story (and it has really started feeling like an epic, even though it has just been three medium length posts so far), I covered:&lt;br /&gt;&lt;i&gt;5. Beware the "hammer looking for a nail&lt;/i&gt;&lt;br /&gt;&lt;i&gt;6. Validate your solution&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Now based on everything I have talked about so far, you actually go and get some data and build a predictive model. The model seems to be working exceptionally well and showing high goodness-of-fit with the data. And there, you have reached the seventh lesson about data mining which is&lt;br /&gt;&lt;br /&gt;&lt;i&gt;&lt;b&gt;7. Beware the "smoking gun"&lt;/b&gt;&lt;/i&gt;&lt;br /&gt;Or, when something is too good to be true, it probably is not true. When the model is working so well that it seems to be answering every question that is being asked, there is something insidious going on - the model is not really predicting anything but just transferring the input straight through to the output. It could be that a field that is another representation of the target variable is used as a predictor. Lets take an example here. Let us say we are trying to predict the likelihood that a person is going to close their cellphone plan, or in business parlance, the likelihood that the customer is going to attrite. Also, let's say one of the predictors used is whether someone called up the service cancellation queue through customer service. By using the "called service cancellation queue" as a predictor, we are in effect using the outcome of the call (service cancellation) as both a predictor as well as the target variable. Of course the model is going to slope extremely nicely and put everyone who met the service cancellation queue condition as the ones most likely to attrite. This is an example of a spurious model, it is not even a bad or an overfit model. Not understanding the different predictors available (or rather not paying attention to the way the data is being collected) and providing justification as to why they are being selected as a predictor in the predictive model is the most common reason why spurious models get built. So when you see something too good to be true, watch out.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;&lt;b&gt;8. Establish the value upside and generate buy-in&lt;/b&gt;&lt;/i&gt;&lt;br /&gt;Now lets say you manage to avoid the spurious model trap and actually build a good model. A model that is not overspecified, independently validated and using a set of predictors that are tested for quality and have been well understood by the modeler (you). Now the model should be translated to business value in order to get the support of the different business stakeholders who are going to use the model or will need to support the deployment of the model. A good understanding of the economics of the underlying business model is required to value the greater predictive capability afforded by the model. It is usually not too difficult to come up with this value estimate, but this might seem like an extra step at the end of a long and arduous model build. But this is a critically important step to get right. Hard-nosed business customers are not likely to be impressed by the technical strengths of the model - they will want to know how this adds business value and either increases revenue, reduces costs or decreases the exposure to unexpected losses or risk.&lt;br /&gt;&lt;br /&gt;So, there. A summary of all that I have learned in the last 4-5 years of being very close to predictive analytics and data mining. It was enjoyable writing this down (even if it took seven months) and I hope the aspiring data scientist gets at least a fraction of the enjoyment I have had in writing this piece.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-5216121470906041267?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/5216121470906041267/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=5216121470906041267' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/5216121470906041267'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/5216121470906041267'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2011/08/tips-for-data-mining-part-4-our-of-4.html' title='Tips for data mining - part 4 out of 4'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-698668390103564617</id><published>2011-08-01T22:55:00.000-04:00</published><updated>2011-08-01T22:55:59.904-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Data Mining'/><category scheme='http://www.blogger.com/atom/ns#' term='Data Handling'/><category scheme='http://www.blogger.com/atom/ns#' term='Modeling'/><title type='text'>Good documentation about data - a must for credible analytics</title><content type='html'>One of the cardinal principles of predictive analytics is that you are only as good as the data that you use to build your analysis. However, another important principle is that the data handling processes also have to be well managed and generally free of error.&lt;br /&gt;&lt;br /&gt;Recently a set of incidents came to.light which talked about the damage that can be caused by indifferent data handling process. This was in the field of cancer research which points to the human cost from some of these mistakes. One of the popular recent techniques in cancer research analysis of gene level data is micro array analysis. A primer on what this analysis involves can be found in this link &lt;a href="http://en.wikipedia.org/wiki/DNA_microarray"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt; &lt;br /&gt;&lt;br /&gt;Duke University cancer researchers promised some revolutionary new treatments of cancer. But when patients actually enrolled in trials, the results were disappointing. Then the truth came out. The analysis was done wrong and the reports resulted from some elementary errors in data handling by the researchers. Two researchers, Baggerly and Coombes, who had to literally &lt;a href="http://www.ndns.nl/static/files/sls/presentations/Baggerly-AnnalsAppliedStats.pdf"&gt;&lt;u&gt;reverse engineer&lt;/u&gt;&lt;/a&gt; the analytical approach used concluded that some simple errors resulted in the wrong conclusions.&lt;br /&gt;&lt;br /&gt;A few takeaways for a data scientist:&lt;br /&gt;1. Data handling scripts and processes need to be checked and double-checked. Dual validation is a well-known technique; it is also known as a parallel run. The idea here is to have two independent sets of analysts or systems to process the same input data and make sure the outputs are the same.&lt;br /&gt;&lt;br /&gt;2. Data handling needs to be well-documented. The approach used to arrive at a set of significant findings can never be shrouded in mystery, either intentionally or because of sloppy documentation. At best, it gives the appearance of slipshod and careless work. At worst, it appears like a deliberate deception. Neither of these impressions are good ones to make.&lt;br /&gt;&lt;br /&gt;A summary presentation from Baggerly and Coombes about this issue can be found &lt;a href="http://madison.byu.edu/mcmski/pdfs/KeithBaggerly.pdf"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-698668390103564617?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/698668390103564617/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=698668390103564617' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/698668390103564617'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/698668390103564617'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2011/08/good-documentation-about-data-must-for.html' title='Good documentation about data - a must for credible analytics'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-2952212843838499114</id><published>2011-07-04T07:46:00.001-04:00</published><updated>2011-07-04T07:47:27.684-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Occam&apos;s Razor'/><category scheme='http://www.blogger.com/atom/ns#' term='Predictive Analytics'/><title type='text'>Errata: Simple vs Complex models</title><content type='html'>&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;The comment I attributed to Schumpeter in my last post on &lt;a href="http://stat-exchange.blogspot.com/2011/06/simple-vs-complex-models.html"&gt;&lt;u&gt;Simple v Complex models&lt;/u&gt;&lt;/a&gt; actually belongs to EF Schumacher, the writer of "Small is Beautiful". Have that book lined up in the library reading list.&lt;br /&gt;&lt;br /&gt;To summarize why I like simpler models,&lt;br /&gt;&lt;b&gt;1. More interpretable:&lt;/b&gt; Particularly important when there is data overload happening and one isn't really sure what is signal and what is noise&lt;br /&gt;&lt;b&gt;2. Easier to maintain and update&lt;/b&gt; as part of a production system&lt;br /&gt;&lt;b&gt;3. Likely a truer representation of the world:&lt;/b&gt; Going back to good old Occam's Razor principles. At least in a way that lends itself to meaningful decision making&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-2952212843838499114?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/2952212843838499114/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=2952212843838499114' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/2952212843838499114'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/2952212843838499114'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2011/07/errata-simple-vs-complex-models.html' title='Errata: Simple vs Complex models'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-4200811529000159670</id><published>2011-06-26T16:21:00.001-04:00</published><updated>2011-06-26T16:23:38.635-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Predictive Analytics'/><category scheme='http://www.blogger.com/atom/ns#' term='Modeling'/><title type='text'>Simple vs complex models</title><content type='html'>&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;I came across the subject of simple models (or crude models - somehow didn't like the word crude, it sounded .... well, crude) vs more complicated models in this very interesting blog by John D Cook called "The Endeavour". The link to the article is &lt;a href="http://www.johndcook.com/blog/2011/05/25/crude-models/"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;.&amp;nbsp; There was a good discussion on the pros and cons between simple and complex models, and so I thought I'd add some of my own thoughts on the matter.&lt;br /&gt;&lt;br /&gt;First, in terms of definitions. We are talking about reduced-form predictive analytics models here. Simple or crude models are ones that use a few number of predictors and interactions between them, complex models are ones that use many more predictors and get closer to that line between a perfectly specified and an overspecified model. John Cook makes the article come alive with an interesting quote from Schumpeter … &lt;span style="font-size: small;"&gt;&lt;i style="font-family: &amp;quot;Trebuchet MS&amp;quot;,sans-serif;"&gt;there is an awful temptation to squeeze the lemon until it is try and to present a picture of the future which through its very precision and verisimilitude carries conviction.&lt;/i&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;Simple models have their benefits and uses. They are usually quicker to build, easier to implement, easier to interpret and to update. I particularly like the easier to implement and easier to interpret/ update bits. I have seldom come across a model that was so good and so reliable that it needed no interpretation or updating. The fact of the matter is that any model captures some of the peculiarities in the training data set used to build the model, and therefore, by definition is somewhat over-specified for tht dataset. There is never a sample that is a perfect microcosm of the world - if there were, it wouldn't be a sample at all and rather, it would be almost the world that it is supposed to be a representation of. So any sample and therefore any model built off it is going to have biases. A model builder therefore would be well served to understand and mitigate those biases and build an understanding that is more robust and less cute.&lt;br /&gt;&lt;br /&gt;Also, the implementation of the model should be straightforward. The model complexity should not lead to implementation headaches, whose resolution end up costing a significant portion of the purported benefits from the model.&lt;br /&gt;&lt;br /&gt;Another reason why I prefer simpler models is their relative transparency when it comes to their ultimate use. Models invariably get applied in contexts way different from what they were designed to do. They are frequently scored on different populations (i.e., different from the training set) and used to make predictions and decisions that again are far removed from what they were originally intended to do. In those situations, I eminently prefer having the ability to understand what the model is saying and why, and then apply the corrections that my world experience and intuition tells me. Versus relying on a model that is such a "black box" that it is impossible to understand and therefore leads to this very dangerous train of thought that says "if it is so complex, it must be right".&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-4200811529000159670?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/4200811529000159670/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=4200811529000159670' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/4200811529000159670'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/4200811529000159670'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2011/06/simple-vs-complex-models.html' title='Simple vs complex models'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-4828986319536914280</id><published>2011-06-17T23:00:00.000-04:00</published><updated>2011-06-17T23:00:26.964-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Data Mining'/><category scheme='http://www.blogger.com/atom/ns#' term='Predictive Analytics'/><title type='text'>Tips for data mining - Part 3 out of 4</title><content type='html'>&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;&lt;br /&gt;I had written two pieces early on in the year about data mining tips. These talked about the first four tips to keep in mind while undertaking any data-mining project.&lt;br /&gt;1. Define the problem and the design of the solution&lt;br /&gt;2. Establish how the tool you are building is going to be used&lt;br /&gt;3. Frame the approach before jumping to the actual technical solution&lt;br /&gt;4. Understand the data&lt;br /&gt;&lt;br /&gt;The links for the first two parts can be found &lt;a href="http://stat-exchange.blogspot.com/2011/01/tips-for-data-mining-part-1-out-of-many.html"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt; and &lt;a href="http://stat-exchange.blogspot.com/2011/03/tips-for-data-mining-part-2-out-of-many.html"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;. Now let me talk about the fifth and sixth parts on what I have learned.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;&lt;b&gt;5. Beware the "hammer looking for a nail"&lt;/b&gt;&lt;/i&gt;&lt;br /&gt;This lesson basically recommends that you make sure you are using the appropriate complexity and sophistication of the analytical solution for the problem at hand. It is very easy to get excited about any one analytical solution and try to apply it to every single problem that you come across. But approaching a business problem like a "hammer looking for a nail" creates a set of issues. One, application of the technique becomes more important that understanding the problem. When that happens, the desire to implement the technique successfully becomes more important than solving the problem specifically. Two, the solution can sometimes reach a level of complexity that the problem does not really need - on several occasions, simple solutions work the best.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;&lt;b&gt;6. Validate your solution&lt;/b&gt;&lt;/i&gt;&lt;br /&gt;One of the most common mistakes that a data miner can make when confronted with a problem is to produce an overfit model. An overfit model is an overspecified model - many of the relationships between the predictor and target variables implied by the model are not real. They are an artifact of the dataset used to build the model. The problem with overfit models is that they tend to fail spectacularly when applied to a different situation. Therefore, it is crucial to do out-of-sample validation of the model. If the model does not do a good job validating on the validation sample, it usually means an overspecified model. (Holdouts from the original build dataset - the typical two-third/ one-third breakup between the build and validation datasets - don't really result in an independent validation.) The model might need to be simplified. One way to do it is to examine all the relationships between the predictor variables and the target variable and make sure they are sensible and believable. Another way of simplification is to make sure only linear relationships are maintained in the model. To be clear, this is often a gross oversimplification - but it is sometimes better than an overspecified model that is unusable.&lt;br /&gt;&lt;br /&gt;So this was tip #5 and 6. I will soon close out with the last two tips. Thanks for the patience with my slow pace of writing on this. This is the link to Tom Breuer's &lt;a href="http://www.xlntconsulting.com/newsletter-archive/how-to-build-predictive-models-november-2010.htm"&gt;tips on building predictive models&lt;/a&gt;. A lot of good ideas here as well.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-4828986319536914280?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/4828986319536914280/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=4828986319536914280' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/4828986319536914280'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/4828986319536914280'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2011/06/tips-for-data-mining-part-3-out-of-4.html' title='Tips for data mining - Part 3 out of 4'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-6593175249833938924</id><published>2011-06-01T20:29:00.000-04:00</published><updated>2011-06-01T20:29:05.865-04:00</updated><title type='text'>Principal Components - the math behind it</title><content type='html'>A really delightful tutorial on the &lt;a href="http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf"&gt;&lt;u&gt;mathematical basis for Principal Components Analysis&lt;/u&gt;&lt;/a&gt;. It really clarified a lot of the basics for me.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-6593175249833938924?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/6593175249833938924/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=6593175249833938924' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/6593175249833938924'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/6593175249833938924'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2011/06/principal-components-math-behind-it.html' title='Principal Components - the math behind it'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-9132448172472283266</id><published>2011-05-30T09:09:00.000-04:00</published><updated>2011-05-30T09:09:26.983-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Principal Component Analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='Chemical Industry'/><category scheme='http://www.blogger.com/atom/ns#' term='Predictive Analytics'/><category scheme='http://www.blogger.com/atom/ns#' term='Modeling'/><title type='text'>Interesting uses of Principal Components Analysis</title><content type='html'>I had shared a &lt;a href="http://abbottanalytics.blogspot.com/2010/02/prinicpal-components-for-modeling.html"&gt;&lt;u&gt;link on Principal Components Analysis&lt;/u&gt;&lt;/a&gt; a while back and have had the opportunity to revisit this space, or rather visit it  in a professional capacity recently. &lt;br /&gt;&lt;br /&gt;As a part of my interest, I  came across a few interesting links on this subject - one of the better  tutorials is &lt;a href="http://www.snl.salk.edu/%7Eshlens/pca.pdf"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;. The  primary purpose behind PCA is dimensionality reduction to make analysis  more efficient. Typical applications have been in the area of image  processing, though of late, there has been a lot of interest in applying  these techniques for micro-array data.&lt;br /&gt;&lt;br /&gt;Some of the historical  applications of PCA has been in the field of statistical process control  or SPC. The genesis of the application came from the chemical industry,  and the early practitioners were interestingly known as  chemometricians. The aim here was to model plant yield as a function of  its input parameters. The input parameters were typically the  temperatures and pressures at different points in the reactor vessel and  also recorded at different points in time. Since the plant operator has  control over some of these parameters, these can be varied in order to  improve the plant yield. &lt;br /&gt;&lt;br /&gt;The sheer complexity of the data  involved here is one complication. When processes have hundreds on  inputs (temperature, pressure, gradients, energy released, moisture  content - all captured by hundreds of sensors embedded within the  reactor), it becomes difficult to build any meaningful models - given  the limited number of observations available. What is helpful is that  many of these input variables are highly correlated. The temperature at  the entry point of reactor feed is going to be obviously correlated to  the temperature at the center of the reactor vessel. PCA can be used to  reduce the dimensionality of the inputs and model the outputs as a  function of the principal components rather than the input variables.  Principal components by definition is simply reducing the hundreds of  correlated inputs into a few principal components (typically 3 or 4)  which are a linear combination of these raw inputs. The other  application here is the monitoring of these reactions. When the operator  runs different reactions with different input parameters, it is  important to identify 'outliers'. Places where the inputs have been so  far away from norms that the outputs need to be appropriately flagged or  in some cases, totally discarded. Some more details on the application  of these techniques can be found &lt;a href="http://www2.sas.com/proceedings/sugi26/p252-26.pdf"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;. The link goes to a paper on Principal Component Techniques by Robert Rodriguez from SAS.  &lt;br /&gt;&lt;br /&gt;These  applications can be extended to other areas as well. In consumer behaviour modeling, PCA can be used to reduce the hundreds of different  inputs about a consumer to the essential principal components and these  can then be used to simplify the modeling and monitoring processes.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-9132448172472283266?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/9132448172472283266/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=9132448172472283266' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/9132448172472283266'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/9132448172472283266'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2011/05/interesting-uses-of-principal.html' title='Interesting uses of Principal Components Analysis'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-6013117740627631637</id><published>2011-05-18T22:02:00.001-04:00</published><updated>2011-05-19T22:04:51.978-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Data Mining'/><category scheme='http://www.blogger.com/atom/ns#' term='Predictive Analytics'/><title type='text'>The Heritage Prize</title><content type='html'>The latest in data mining competitions is the &lt;a href="http://www.heritagehealthprize.com/c/hhp"&gt;&lt;u&gt;Heritage Prize&lt;/u&gt;&lt;/a&gt;. If you haven't heard about the prize before, it is a competition to bring predictive analytics to the health-care business. The reward: a cool $3 million. This is the next phase of predictive analytics - when corporates are willing to pay good money for great analytical work.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-6013117740627631637?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/6013117740627631637/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=6013117740627631637' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/6013117740627631637'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/6013117740627631637'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2011/05/heritage-prize.html' title='The Heritage Prize'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-8532895744626500917</id><published>2011-03-05T17:38:00.000-05:00</published><updated>2011-03-05T17:38:25.320-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Data visualization'/><category scheme='http://www.blogger.com/atom/ns#' term='Exploratory Data Analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='Data Mining'/><category scheme='http://www.blogger.com/atom/ns#' term='Modeling'/><title type='text'>Tips for data mining - Part 2 out of many</title><content type='html'>Writing after a long time on the blog. Blame it on regular writer's  cramp - a marked reluctance and inertia of sorts to put pen to paper, or  rather fingers to keyboard. &lt;br /&gt;&lt;br /&gt;My last &lt;a href="http://stat-exchange.blogspot.com/2011/01/tips-for-data-mining-part-1-out-of-many.html"&gt;&lt;u&gt;post&lt;/u&gt;&lt;/a&gt; introduced the idea of defining the problem as the first step for any  data mining exercise aiming to achieve success. This is the ability to  state the problem you are trying to solve in terms of business outcomes  that are measurable. After that comes the step of envisioning the  solution and expressing it in a really simple form. The aim should be to  create a path from input to output - the output being a set of  decisions that will ultimately result in the measurable business  outcomes we mentioned above. The next step involves establishing how the  developed solution would be used in the real world. Not doing this  early enough or with enough clarity could result in the creation of a  library curio. Defining how the solution will be used will also point to  other needs such as training the users on the right way to use the  solution, the expected skills from the end users and so on.&lt;br /&gt;&lt;br /&gt;In this post, we will discuss the third and fourth steps. These are&lt;br /&gt;3. Frame the approach before jumping to the actual technical solution&lt;br /&gt;4. Understand the data&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Frame  the approach before jumping to the actual technical solution.&lt;/b&gt; Once the  business problem has been defined, it is tempting to point the closest  tool at hand at the data and starting to hack away. But often times, the  most obvious answer is not necessarily the right answer. It is valuable  to construct the nuts and bolts of the approach to get to the solution  on a whiteboard or sheet of paper before getting started. Taking the  example of some recent text-mining work I have been involved in, one of  the important steps was to create an industry-specific lexicon or  dictionary. While creating a comprehensive dictionary is often tedious  and dull work, this step is an important building block for any data  mining effort and hence deserves the upfront attention. We couldn't have seen the value of this step, but for the exercise of comprehensively thinking through the solution. This is also the  place where prototyping using sandbox tools like Excel or JMP (the  "lite" statistical software from the SAS stable) becomes extremely  valuable. Framing this approach in detail allows the data miner to  budget for all the small steps along the way that are critical for a  successful solution. It also enables putting something tangible in front  of decision makers and stakeholders which can be invaluable in getting  their buy-in and sponsorship for the solution.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Understand the  data.&lt;/b&gt; This is such an obvious step that it has almost become a cliche;  having said that, &lt;i&gt;&lt;u&gt;incomplete understanding of the data continues to be the  reason why the greatest number of data mining projects falter in  attempting to fulfill their potential and solve the business goal&lt;/u&gt;&lt;/i&gt;. Some  of the data checks like data distributions, variable attributes like  mean, standard deviations, missing rates are quite obvious but I want to  call out a couple of critical steps here that might be somewhat  non-obvious. The first is to focus extensively on data visualization or  exploratory data analysis. In the blog, I have written a few pieces  before on data visualization which can be found &lt;a href="http://stat-exchange.blogspot.com/search/label/Data%20visualization"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;.  Another good example of this type of visualization is from the &lt;a href="http://junkcharts.typepad.com/junk_charts/"&gt;&lt;u&gt;Junk  Charts blog&lt;/u&gt;&lt;/a&gt;. The  second is to track data lineage - in other words, where did the data  come from and how was it gathered. Also is it going to gathered in the  same way going forward. This step is important in understanding whether  there have been biases in the historical data. There could be coverage  bias or responder bias, where people are invited or requested to provide  information. In both these cases, the analytical reads are usually  specific to the data collected and cannot be easily extrapolated to  non-responders or people outside the coverage of the historical data. &lt;br /&gt;&lt;br /&gt;This  covers the background work that needs to take place before the solution  build can be taken up in earnest. In the next few posts, I will share  some thoughts on the things to keep in mind while building out the  actual data mining solution.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-8532895744626500917?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/8532895744626500917/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=8532895744626500917' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/8532895744626500917'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/8532895744626500917'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2011/03/tips-for-data-mining-part-2-out-of-many.html' title='Tips for data mining - Part 2 out of many'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-2083242200239148158</id><published>2011-01-15T06:53:00.000-05:00</published><updated>2011-01-15T06:53:27.414-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Data Mining'/><category scheme='http://www.blogger.com/atom/ns#' term='Predictive Analytics'/><title type='text'>Tips for data mining - part 1 out of many!</title><content type='html'>Having spent a good part of the last four years in mining data and working in general in the business/ predictive analytics areas, I thought I'd take a step back and summarize some of my lessons learnt through data mining. I was inspired to do this based on a very revealing article by Tom Breur, principal at XLT Consulting. More on Tom and his writings later.&lt;br /&gt;&lt;br /&gt;So what have I worked on these past years? As a management consultant in my previous life, working with data and tons of it was a given. Using data extensively, building business analytics models which aim to replicate real world processes, establishing objective criteria to take decisions is all bread-and-butter ways of problem solving in the consulting world. In fact, I'd go as far to say that consultants seriously suffer from a lack of self-confidence when they consider out of the box solutions that do not include any/ all of the above. My consulting experience was mainly in the area of consumer goods distribution and marketing. Then moved to retail financial services. The main area of experience there has been in credit risk modeling and consumer cash flow modeling. Modeling how much of these events (credit risk, cash flow) are driven by internal factors and how much by exogenous occurences. Also modeling consumer response to marketing products. A recent foray has been into text mining unstructured responses from applicants.&lt;br /&gt;&lt;br /&gt;So what have I learnt? I will try and summarize in a few posts, with potential reader fatigue in due consideration.At a summary level, these are the eight steps to data mining salvation.&lt;br /&gt;&lt;div style="font-family: Georgia,&amp;quot;Times New Roman&amp;quot;,serif;"&gt;1. Define the problem and the design of the solution&lt;/div&gt;&lt;div style="font-family: Georgia,&amp;quot;Times New Roman&amp;quot;,serif;"&gt;2. Establish how the tool you are building is going to be used&lt;/div&gt;&lt;div style="font-family: Georgia,&amp;quot;Times New Roman&amp;quot;,serif;"&gt;3. Frame the approach before jumping to the actual technical solution&lt;br /&gt;4. Understand the data&lt;br /&gt;5. Beware the "hammer looking for a nail"&lt;br /&gt;6. Validate your solution&lt;br /&gt;7. Beware the "smoking gun"&lt;br /&gt;8. Establish the value upside and generate buy-in&lt;/div&gt;&lt;br /&gt;I will tackle each of these steps in some detail now. &lt;br /&gt;&lt;br /&gt;&lt;b&gt;1. Define the problem and the design of the solution&lt;/b&gt;&lt;br /&gt;This is the first step. Define the problem that you are really trying to solve. The key here to defining the problem well is to frame the solution in terms of business outcomes.&lt;br /&gt;&lt;br /&gt;Complete the following sentence. &lt;i&gt;"Solving this problem will lead to x% increase in sales, y% decrease in costs, a multiplier of efficiency and speed by z"&lt;/i&gt; etc. If this sentence does not flow easily, then I am afraid you have not spent the time defining the overall problem well enough.&lt;br /&gt;&lt;br /&gt;Understand the context surrounding the problem, why it has been difficult to crack over the years, where is the data going to come from, where has the data come in the past and are there going to be changes to how it will be available in the future? Speak to others who have taken a crack at this problem and their view on where the constraints lie.&lt;br /&gt;&lt;br /&gt;Once you have defined the problem well enough, envision what the solution is going to look like. What are the parts of the solution that are pure process excellence, where do you need advanced analytics, so that the place for the analytical piece of the solution (something that the reader of this blog is primarily interested in) is clearly established. The aim should be to create a simple block diagram on how one goes from input (usually data) to output (ideally, a set of decisions) and what are all the pieces that come in between.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;2. Establish how the tool you are building is going to be used&lt;/b&gt;&lt;br /&gt;In the previous step, the problem has been defined and the solution has been scoped at a high level. Then the next step is to put some detail into how the analytical solution is going to be actually used. Will the model be used mainly for understanding purposes or for doing exploratory analysis? And therefore the results of the analysis implemented using some simple decisioning rules or heuristics. Or is the desire or the plan to use the model in the "live" production environment for decision making? It is important to get good answers or at least good likely answers to all of these questions because they play a very important role in determining the actual tools that will be used to build the solution, the checks and audits that need to be put in the overall system of decision making, the process of overrides, the infrastructure and technology needed to make the solution effective and so on. Also the people aspect needs to be considered at this step. The use-conditions of the tool will determine the type of user training that needs to be provided, also the skills of the end-users that needs to be ensured and how much of the skills can be imparted by on-the-job training vs what skills are entry conditions into the job.&lt;br /&gt;&lt;br /&gt;More about all of this in subsequent posts.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-2083242200239148158?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/2083242200239148158/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=2083242200239148158' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/2083242200239148158'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/2083242200239148158'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2011/01/tips-for-data-mining-part-1-out-of-many.html' title='Tips for data mining - part 1 out of many!'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-8268897366921902510</id><published>2011-01-13T22:42:00.001-05:00</published><updated>2011-01-13T22:42:26.895-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Data visualization'/><title type='text'>The Joy of Stats is finally online</title><content type='html'>My first post of 2011. I have been writing this blog for nearly two years now and am happy to keep having the energy and the enthusiasm to keep at it. Like I have said earlier on &lt;a href="http://stat-exchange.blogspot.com/2009/06/why-i-blog.html"&gt;&lt;u&gt;why I blog&lt;/u&gt;&lt;/a&gt;, this is way for me to keep abreast of the latest development in the fields of data mining, analytics and visualization.&lt;br /&gt;&lt;br /&gt;The Joy of Stats program was aired in BBC4 in December 2010. Now the &lt;a href="http://www.gapminder.org/videos/the-joy-of-stats/"&gt;&lt;u&gt;video is available&lt;/u&gt;&lt;/a&gt; on Hans Rosling's Gapminder website. This was the program  from which the data visualization examples used for mapping San  Francisco crimes, the graphics made by Florence Nightingale and the  Gapminder visualization of the economic and demographic statistics of  different countries over the last 200 years are highlighted. &lt;br /&gt;&lt;br /&gt;Another  example of nifty graphics. The New York Times has been a trendsetter in  putting up very clever and informative graphics supporting its news  stories. Amanda Cox of the NYT graphics department did a &lt;a href="http://newmediadays.dk/amanda-cox"&gt;&lt;u&gt;presentation&lt;/u&gt;&lt;/a&gt; recently on some of the  examples that the NYT has used in its print as well as its online media.  This is a long presentation but worth sitting through.&lt;br /&gt;&lt;br /&gt;Hopefully you will enjoy both these presentations!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-8268897366921902510?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/8268897366921902510/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=8268897366921902510' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/8268897366921902510'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/8268897366921902510'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2011/01/joy-of-stats-is-finally-online.html' title='The Joy of Stats is finally online'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-3220781504675676895</id><published>2010-12-24T14:51:00.003-05:00</published><updated>2010-12-24T14:56:41.953-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Data Mining'/><category scheme='http://www.blogger.com/atom/ns#' term='Privacy'/><category scheme='http://www.blogger.com/atom/ns#' term='Predictive Analytics'/><title type='text'>Data mining and the inevitable conflict with privacy issues</title><content type='html'>The explosion in the availability of data in the past decade and the explosion in analytical techniques to interpret this data and find patterns in them has been a huge benefit for businesses, governments as well as for individual customers. Amongst businesses, examples like Amazon.com, Harrah's, Target, Netflix and FedEx have made the analysis of large data their business model. These companies have come up with increasingly sophisticated and intricate ways of capturing data about customer behaviour and offering targeted products based on the behaviour.&lt;br /&gt;&lt;br /&gt;Big government has been somewhat late in the game but are making big strides in the field of data mining. But increasingly, areas like law enforcement, counter-terrorism, anti-money laundering, the IRS have leveraged some cutting edge techniques to get them to be more effective at what they do. Which is usually to detect needle of criminal activity amongst the haystack of normal law-abiding activities, and take the appropriate preventive or retributory action.&lt;br /&gt;&lt;br /&gt;But as the saying goes, there are two sides to every coin. While the explosion of data and its analysis has been mostly been driven by good intentions, the consequence of some of this work is beginning to look increasingly murky. For example, if there is surveillance of an individual's emails to identify money laundering trails, where is the bright line between what is legitimate monitoring of criminal activity and the unwanted intrusion in the activities of law abiding citizens? The defense from those who do the monitoring has always been that only suspicious activities are targeted - and also that they use sophisticated analytics to model these criminal activities. But as any model builder worth his salt knows, an effective model is one that maximizes true positives AND minimizes the false positives. The false positives in this case are ones that display the similar so-called 'suspicious' behaviour but turn out to be innocent. How can then one build an effective model by being very exclusive in the data points for this model (i.e. by only including behaviour that is understood to be suspicious)? In order to truly understand the false positives and attempt to reduce them, one HAS to include points in the model-build sample that are very likely to be false positives. And therein lies the paradox. To build a really good predictive system, the sample needs to be randomized to include good and bad outcomes, highly suspicious and borderline innocent behaviours.&lt;br /&gt;&lt;br /&gt;I want to share two different perspectives on this issue. The first is from the MIT Technology Review that extols the virtues of a data-driven law enforcement system, as practiced by the police department of Memphis, TN. The link is &lt;a href="http://www.technologyreview.com/business/26887/"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;. An excerpt from this article:&lt;br /&gt;&lt;div style="font-family: Georgia,&amp;quot;Times New Roman&amp;quot;,serif;"&gt;&lt;span style="font-size: small;"&gt;&lt;i&gt;The predictive software, which is called Blue CRUSH (for "criminal reduction utilizing statistical history"), works by crunching crime and arrest data, then combining it with weather forecasts, economic indicators, and information on events such as paydays and concerts. The result is a series of crime patterns that indicate when and where trouble may be on the way. "It opens your eyes within the precinct," says Godwin. "You can literally know where to put officers on a street in a given time." The city's crime rate has dropped 30 percent since the department began using the software in 2005.&lt;/i&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: Georgia,&amp;quot;Times New Roman&amp;quot;,serif;"&gt;&lt;span style="font-size: small;"&gt;&lt;i&gt;&lt;br /&gt;&lt;/i&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: Georgia,&amp;quot;Times New Roman&amp;quot;,serif;"&gt;&lt;span style="font-size: small;"&gt;&lt;i&gt;Memphis is one of a small but growing number of U.S. and U.K. police units that are turning to crime analytics software from IBM, SAS Institute, and other vendors. So far, they are reporting similar results. In Richmond, Virginia, the homicide rate dropped 32 percent in one year after the city installed its software in 2006.&lt;/i&gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;b&gt;Now read this other piece, painting a slightly different picture on what is going on.&lt;/b&gt;&lt;br /&gt;&lt;div style="font-family: Georgia,&amp;quot;Times New Roman&amp;quot;,serif;"&gt;&lt;span style="font-size: small;"&gt;&lt;i&gt;Suspicious Activity Report N03821 says a local law enforcement officer observed "a suspicious subject . . . taking photographs of the Orange County Sheriff Department Fire Boat and the Balboa Ferry with a cellular phone camera." ... noted that the subject next made a phone call, walked to his car and returned five minutes later to take more pictures. He was then met by another person, both of whom stood and "observed the boat traffic in the harbor." Next another adult with two small children joined them, and then they all boarded the ferry and crossed the channel.&lt;/i&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: Georgia,&amp;quot;Times New Roman&amp;quot;,serif;"&gt;&lt;span style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: Georgia,&amp;quot;Times New Roman&amp;quot;,serif;"&gt;&lt;span style="font-size: small;"&gt;&lt;i&gt;All of this information was forwarded to the Los Angeles fusion center for further investigation after the local officer ran information about the vehicle and its owner through several crime databases and found nothing ... there are several paths a suspicious activity report can take:&lt;/i&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: Georgia,&amp;quot;Times New Roman&amp;quot;,serif;"&gt;&lt;span style="font-size: small;"&gt;&lt;i&gt;The FBI could collect more information, find no connection to terrorism and mark the file closed, though leaving it in the database.&lt;/i&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: Georgia,&amp;quot;Times New Roman&amp;quot;,serif;"&gt;&lt;span style="font-size: small;"&gt;&lt;i&gt;It could find a possible connection and turn it into a full-fledged case.&lt;/i&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: Georgia,&amp;quot;Times New Roman&amp;quot;,serif;"&gt;&lt;span style="font-size: small;"&gt;&lt;i&gt;Or, as most often happens, it could make no specific determination, which would mean that Suspicious Activity Report N03821 would sit in limbo for as long as five years, during which time many other pieces of information could be added to the file ... employment, financial and residential histories; multiple phone numbers; audio files; video from the dashboard-mounted camera in the police cruiser at the harbor where he took pictures; anything else in government or commercial databases "that adds value".&lt;/i&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: Georgia,&amp;quot;Times New Roman&amp;quot;,serif;"&gt;&lt;span style="font-size: small;"&gt;&lt;i&gt;&lt;br /&gt;&lt;/i&gt;&lt;/span&gt;&lt;/div&gt;This is from an insightful piece in the Washington Post titled "&lt;a href="http://projects.washingtonpost.com/top-secret-america/articles/monitoring-america/"&gt;&lt;u&gt;Monitoring America&lt;/u&gt;&lt;/a&gt;". The Post Article goes on to describe the very same Memphis PD and asks some pointed questions on some of the data gathering techniques used.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;This is where this whole concept of capturing information at the individual level and using it for specific targeting enters unstable ground - when this takes place in an intrusive manner and without due consent from the individuals. When organizations do it, it can be definitely irritating and border-line creepy. When governments do it, it reminds one of George Orwell's "Big Brother". It will be interesting to see how the field of predictive analytics survives the privacy backlash which is just beginning.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-3220781504675676895?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/3220781504675676895/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=3220781504675676895' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/3220781504675676895'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/3220781504675676895'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2010/12/data-mining-and-inevitable-conflict.html' title='Data mining and the inevitable conflict with privacy issues'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-3296690374559607517</id><published>2010-12-18T22:10:00.000-05:00</published><updated>2010-12-18T22:10:30.068-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Data visualization'/><category scheme='http://www.blogger.com/atom/ns#' term='Exploratory Data Analysis'/><title type='text'>Visualization of the data and animation - part II</title><content type='html'>I had written a &lt;a href="http://stat-exchange.blogspot.com/2010/11/animating-data-and-better-story-telling.html"&gt;&lt;u&gt;piece&lt;/u&gt;&lt;/a&gt; earlier about Hans Rosling's animation of country-level data using the Gapminder tool. Here are some more examples of some extremely cool examples of data animation.&lt;br /&gt;&lt;br /&gt;At the start of this series, there is more animation from the Joy Of Stats program that Rosling hosted in the BBC. The landing page is &lt;a href="http://www.open.ac.uk/openlearn/society/the-law/criminology/the-joy-stats-why-you-might-go-hill-come-down-crime-victim"&gt;&lt;u&gt;a link&lt;/u&gt;&lt;/a&gt; that shows the plotting of crime data in downtown San Francisco and how this visual overlay on the city topography provides some valuable insights on where one might expect to find crime. This is a valuable tool for police departments (to try and prevent crime that is local to an area and has some element of predictability), residents (to research neighbourhoods before they buy property, for example) and tourists (who might want to doublecheck a part of the city before deciding on a really attractive Priceline.com hotel deal). The researchers who have created &lt;a href="http://sanfrancisco.crimespotting.org/"&gt;&lt;u&gt;this tool&lt;/u&gt;&lt;/a&gt; that maps the crime data to maps. The researchers in the clip talk about how tools such as this can be used to improve citizen power and government accountability. Another good example of crime data, this time reported by Police Departments across the US can be found &lt;a href="http://www.crimemapping.com/map/ca/sanfrancisco"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;. Finally, towards the end of the clip, the researchers go on to mention what could be the Holy Grail of this kind of visualization. They talk about how real-time data put up on social media and networking sites like Facebook and Twitter (geo-tagged perhaps) could provide a real-time feed into these maps. Now this would have been certainly in the realm of science fiction only a few years back but suddenly now it doesn't seem as impossible.&lt;br /&gt;&lt;br /&gt;The San Francisco crime mapping link has a few other really impressive videos as you scroll further down. I really like the &lt;a href="http://www.open.ac.uk/openlearn/body-mind/health/health-sciences/the-joy-stats-the-lady-data-visualisation"&gt;&lt;u&gt;one of Florence Nightingale&lt;/u&gt;&lt;/a&gt;, whose graphs during the Crimean war helped reveal important insights on how injuries and deaths were occurring in hospitals. It is interesting to know that Lady of the Lantern was not just renowned for tending for the sick, but also was a keen student of statistics. Her graphs of deaths which were accidental, caused by war injuries and wounds and finally those that were preventable (and caused by poor hygiene that was quite prevalent at the time) created a very powerful imagery of the high incidence of preventable deaths and the need to address this area with the right focus.&lt;br /&gt;&lt;br /&gt;&lt;u&gt;&lt;b&gt;Why is visualization and animation of data helpful and such a critical tool in the arsenal of any serious data scientist?&lt;/b&gt;&lt;/u&gt; For a few reasons.&lt;br /&gt;&lt;br /&gt;For one, it helps tell a story way better than equations or tables of data do. That is so essential to convey the message to people who are not necessarily experts who have insight into the tables, but are important influencers and stakeholders nevertheless who need to be educated on the subject being conveyed. Think of it as how an advertisement (either picture or moving image) is more powerful in conveying the strength of a brand as compared to boring old text.&lt;br /&gt;The other reason, in my opinion, is that graphical depiction and visualization of the data allows the powerful human brain (which is far more powerful than any computer at pattern recognition) to take over the part of data analysis that the human brain is really good at and computers generally not so good at. This is forming hypotheses on-the-fly about the data being displayed and reaching conclusions based on visual patterns in the data. Also the ability to hook into remote memory banks within our brains and form linkages. While Machine Learning and AI are admirable goals, there is still some way to go before computers can match the sheer ingenuity and flexibility of thought that the human brain possesses.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-3296690374559607517?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/3296690374559607517/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=3296690374559607517' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/3296690374559607517'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/3296690374559607517'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2010/12/visualization-of-data-and-animation.html' title='Visualization of the data and animation - part II'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-3126918276986326549</id><published>2010-12-12T16:06:00.001-05:00</published><updated>2010-12-12T16:07:27.244-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Decision Making'/><category scheme='http://www.blogger.com/atom/ns#' term='Data Mining'/><title type='text'>Thinking statistically – and why that’s so difficult</title><content type='html'>I came across &lt;a href="http://www.wired.com/magazine/2010/04/st_thompson_statistics/"&gt;&lt;u&gt;this piece&lt;/u&gt;&lt;/a&gt; from a few months back by the Wired magazine writer, Clive Thompson on “Why we should learn the language of data”. The article is one amongst a stream of recent articles in the popular media of how data-driven applications are changing our world. The New York Times has had quite a few pieces on this topic recently.&lt;br /&gt;&lt;br /&gt;Clive Thompson calls out how the language of data and statistics is going to be transformational for the world, going forward and how it needs to be core part of general education. Thompson also calls out why thinking about data trends or statistics is hard. It is hard because it is not something that the intuitive wiring in the human brain readily recognizes or appreciates. The human psyche with its fight-or-flight instincts reacts to big, dramatic events well and to subtle trends badly. We are not fundamentally good at a number of things that good decision making calls for, such as being open to both supporting and refuting evidence, not confusing correlation and causality, factoring uncertainty, estimating rare events.&lt;br /&gt;&lt;br /&gt;Most of the applications where a data-driven insight has changed the world in any meaningful way have been driven by private enterprise. These changes have also been somewhat incremental in nature. Of course, it has allowed companies to recommend movies to interested subscribers, position goods in stores more effectively, distribute at lower cost, price tickets so as to ensure maximum returns and so on. In other words, these changes may have been game changing for specific industries but not necessarily for the entire human race at large. &lt;br /&gt;&lt;br /&gt;Numbers can have greater power than just impacting a few industries at a time, one would think. Just given the sheer amount of data that is being produced in the world today and the rate at which both computing power and bandwidth continues to grow, we ought to have seen a much more wide ranging impact from data driven analysis. We should have been firmly down the road to making progress on combating global warming, diseases like heart disease, diabetes and cancer. Government agencies which are a really big part of the modern economy has not been as successful at driving this form of data driven innovation. Why is that?&lt;br /&gt;&lt;br /&gt;This probably has got to do with a fundamental lack of understanding of numbers and statistics, amongst the population at large. The places in the world where a lot of the data gathering and processing is happening, i.e. the Western world, are also the places where an education in science and math is somewhat undervalued in relation to studies like liberal arts, media, legal studies, etc. That is where the emerging economies of the world have an edge. Study of math, science and engineering has always been appropriately valued in countries like India, China and other emerging Asian giants. Now as these countries also begin to generate, process and store data, the math and science educated talent will be chafing at the bit to get into the data and harness its potential. Data has been rightly called as another factor of production like labour, capital and land. It is an irony in the world today that those who have data within easy reach are less inclined to use it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-3126918276986326549?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/3126918276986326549/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=3126918276986326549' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/3126918276986326549'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/3126918276986326549'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2010/12/thinking-statistically-and-why-thats-so.html' title='Thinking statistically – and why that’s so difficult'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-1890384748992156559</id><published>2010-12-10T13:44:00.001-05:00</published><updated>2010-12-10T13:45:29.563-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Systems Modeling'/><category scheme='http://www.blogger.com/atom/ns#' term='Predictive Analytics'/><category scheme='http://www.blogger.com/atom/ns#' term='Optimization'/><title type='text'>Swarm Intelligence, Ant Colony Optimizations – advances in analytic computing</title><content type='html'>Advances in computing have led to some new and interesting developments in the areas of new modeling techniques. This post is going to give some examples of these kinds of techniques. But before that, a small primer on basic modeling techniques. Most of the more commonly used models are generalized linear models. As the name suggests, these models try to establish a more-or-less linear relationships between what is tried to be predicted and what the inputs are. Ultimately the model fit problem is an optimization problem – an attempt to use a generalized curve to represent the data and while doing so, minimize the gap between the actual data and the approximate representation of the data produced by the model.&lt;br /&gt;&lt;br /&gt;Of course, optimization problems present themselves in a number of areas. One is of course model fitting but other applications are in areas like planning and logistics – an example being the ever-popular traveling salesman problem. One of the more recent and interesting techniques in solving optimization problems is through a technique called &lt;a href="http://en.wikipedia.org/wiki/Ant_colony_optimization"&gt;&lt;u&gt;Ant Colony Optimization&lt;/u&gt;&lt;/a&gt; (ACO). The optimization is a part of series of more generic AI/ machine learning tools called swarm intelligence. Wikipedia defines swarm intelligence as follows&lt;br /&gt;&lt;div style="font-family: &amp;quot;Helvetica Neue&amp;quot;,Arial,Helvetica,sans-serif;"&gt;&lt;span style="font-size: small;"&gt;&lt;i&gt;Swarm intelligence (SI) is the collective behaviour of decentralized, self-organized systems, natural or artificial…. SI systems are typically made up of a population of simple agents or bodies interacting locally with one another and with their environment. The agents follow very simple rules, and although there is no centralized control structure dictating how individual agents should behave, local, and to a certain degree random, interactions between such agents lead to the emergence of "intelligent" global behavior, unknown to the individual agents.&lt;/i&gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;The ACO algorithm tries to mimic the behaviour of ants in search of food. When ants forage for food, every ant involved in the foraging process moves out of the colony in random ways to search for food. When a food source is located, the ant uses the scent trail of its own pheromones to bring the food back to the colony. Other ants begin to then use the trail left behind by the first ant to make further excursions to the food source and bring back food. Also, by the very nature of the pheromone trail (which is a volatile chemical and therefore evaporates after a certain point in time), the tendency of later ants is to follow more recent and fresher trails, which should also be the shortest ones logically speaking.&lt;br /&gt;&lt;br /&gt;One of the more interesting business applications has been indeed in the area of material movement, i.e. logistics. The Italian pasta maker Barilla as well as Migros, the Swiss supermarket chain have been using these techniques to optimize their distribution networks and routes. A paper about this technique is available &lt;a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.70.1052&amp;amp;rep=rep1&amp;amp;type=pdf"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;. It is more technical. A more layman-friendly treatment of the technique appeared recently in the Economist and was also an interesting &lt;a href="http://www.economist.com/node/16789226"&gt;&lt;u&gt;read&lt;/u&gt;&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-1890384748992156559?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/1890384748992156559/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=1890384748992156559' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/1890384748992156559'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/1890384748992156559'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2010/12/swarm-intelligence-ant-colony.html' title='Swarm Intelligence, Ant Colony Optimizations – advances in analytic computing'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-5193209454285015404</id><published>2010-11-30T04:52:00.000-05:00</published><updated>2010-11-30T04:52:03.923-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Data visualization'/><category scheme='http://www.blogger.com/atom/ns#' term='Modeling'/><title type='text'>Animating the data and better "story telling"</title><content type='html'>One of the  challenges with talking about and presenting any analysis about data  mining or statistics is that a lay audience is seldom excited by the  same things as a more technical audience. A technical audience is as  interested in how the answer was reached as much as the answer itself. A  non-technical consumer of the same information is probably interested  in the implications of the answer as well as the answer itself, with  some gut-check to make sure that the process wasn't totally crazy. In  other words, they are looking for a story.&lt;br /&gt;&lt;br /&gt;Recent trends around  the pervasiveness of data and data-driven applications has meant that  there is a greater ask from data scientists to tell a compelling "story"  to support their analysis. Data scientists need to come up with ways  that tell the story behind the data and the projections of the model  that may have used the data as input, that are insight generating, that  skip some of the unnecessary detail and also paint the various facets of  the final solution. And not just tell the final answer. Data animation  and data visualization are some of the answers here.&lt;br /&gt;&lt;br /&gt;I came  across a couple of good examples of such animation recently. Hans  Rosling made an interesting presentation about a tool called Gapminder  at TED.com. The presentation is &lt;a href="http://blog.revolutionanalytics.com/2009/04/gapminder-animating-the-worlds-data.html"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;. &lt;a href="http://www.gapminder.org/"&gt;&lt;u&gt;Gapminder&lt;/u&gt;&lt;/a&gt; is an organization that  makes social, environmental and economic development data from all the  countries of the world available and accessible to all, for free. The  visualization tool at Gapminder called Gapminder World shows ways in  which this data can be animated and made come alive for the  non-technical consumer in illuminating and exciting ways. &lt;br /&gt;&lt;br /&gt;Rosling  made another trailer presentation recently from a BBC 4 program promo  called "The Joy Of Stats". &lt;object height="385" width="450"&gt;&lt;param name="movie" value="http://www.youtube.com/v/jbkSRLYSojo?fs=1&amp;amp;hl=en_US"&gt;&lt;/param&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;/param&gt;&lt;param name="allowscriptaccess" value="always"&gt;&lt;/param&gt;&lt;embed src="http://www.youtube.com/v/jbkSRLYSojo?fs=1&amp;amp;hl=en_US" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="450" height="385"&gt;&lt;/embed&gt;&lt;/object&gt; (Link is &lt;a href="http://www.youtube.com/watch?v=jbkSRLYSojo"&gt;here&lt;/a&gt; if the embed doesn't work.). One hopes that this program  airs sometime in the US. It is due to air on Dec 7 and 8 in the UK. Any  UK readers of the blog are encouraged to go and check the program and  share what they felt about it. The content of the program (Link and  timings &lt;a href="http://www.bbc.co.uk/programmes/b00wgq0l"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;) sounds  interesting enough for me to at least contemplate taking a flight to  London and catching the program on the Beeb.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-5193209454285015404?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/5193209454285015404/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=5193209454285015404' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/5193209454285015404'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/5193209454285015404'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2010/11/animating-data-and-better-story-telling.html' title='Animating the data and better &quot;story telling&quot;'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-277364349023358835</id><published>2010-10-23T10:28:00.002-04:00</published><updated>2010-10-23T10:30:30.205-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='credit downturn'/><category scheme='http://www.blogger.com/atom/ns#' term='Macroeconomics'/><title type='text'>The quintessential Greek (financial) tragedy</title><content type='html'>For the last 6 month, the travails of highly indebted countries in the European Union and Greece in particular has been the source of considerable turmoil in the financial system. In addition to increasing the cost of borrowing for Greece and other countries in a similar situation like Spain, Portugal, Ireland and Italy, the other impact has been to increase the overall systemic risk and push the world economy back into the 2008 depths.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The Greece story is particularly fascinating. Unlike other countries where banks made ruinous bets and had their capital wiped out, impacting lending and slowing down economic activity, the banks had no role to play in Greece. Instead it was the systemic lack of fiscal discipline, lack of enforcement of basic property and taxation principles and a proliferation of special interest driven that causes Greece to be on a slippery slope to sovereign bankruptcy and default. The inimitable Michael Lewis has written a highly entertaining but also illuminating piece on why this came to happen. Link is &lt;a href="http://www.blogger.com/goog_1682681014"&gt;&lt;u&gt;here.&lt;/u&gt;&lt;/a&gt;&lt;br /&gt;&lt;a href="http://www.vanityfair.com/business/features/2010/10/greeks-bearing-bonds-201010"&gt; &lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Now what is very interesting and scary is that many of the ills mentioned here is present in many countries around the world. Talk about the aversion towards taxes, the large scale tax evasion, the rampant bribery and corruption in government circles. Seems scarily familiar to people from India and other developing countries. What do you think causes Greece to fail in such spectacular fashion (well, if it hasn't already failed, this article should convince you to "short" Greek debt).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-277364349023358835?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/277364349023358835/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=277364349023358835' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/277364349023358835'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/277364349023358835'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2010/10/quintessential-greek-financial-tragedy.html' title='The quintessential Greek (financial) tragedy'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-7510574088223171805</id><published>2010-10-19T05:23:00.004-04:00</published><updated>2010-10-19T05:46:48.821-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Wall Street'/><category scheme='http://www.blogger.com/atom/ns#' term='credit downturn'/><title type='text'>Sensational ... but true</title><content type='html'>&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_nGE-rKiiGEc/TL1oXPHvP0I/AAAAAAAAClk/1aXPNcfuGJk/s1600/jeffc0_sewer.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;/a&gt;&lt;/div&gt;&lt;table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; text-align: left;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;img border="0" height="200" src="http://4.bp.blogspot.com/_nGE-rKiiGEc/TL1oXPHvP0I/AAAAAAAAClk/1aXPNcfuGJk/s200/jeffc0_sewer.jpg" style="margin-left: auto; margin-right: auto;" width="200" /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Photograph by Terry McCombs&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;My writings in this space are usually somewhat academic and what I think is an objective view point, whenever I stray from highly academic topics. A recent article from the Rolling Stone magazine, though, caught my eye. The writer talks about the systematic way in which Wall Street banks touting their financial engineering expertise have expanded their influence into small governments in America. The banks have peddled products that have invariably resulted in huge amount of financial distress for these agencies. See link &lt;a href="http://www.rollingstone.com/politics/news/12697/64833"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The phenomenon of bankers behaving badly with small governments or for that matter even with larger ones (state pension funds) is nothing new. The 1994 bankrupting of Orange County by Robert Citron (Citron? for Orange Country?), the county treasurer, is a well-known story. See a really good article about this failure &lt;a href="http://www.erisk.com/learning/casestudies/orangecounty.asp"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;. Another notable example from the same period is Proctor &amp;amp; Gamble's dalliance with derivatives that resulted in a lot of grief for themselves as well as for Banker's Trust, their investment banking advisers. The human psyche seems particularly frail and susceptible to smooth talking operators, talking interesting numbers and displaying other forms of spreadsheet gadgetry, and promising the moon in return for money. In addition to the banks' rapacity, the people at the customer end - that sought to invest in little understood financial instruments, where the risk of the counterparty is bounded but your own downside is infinite - are as much to blame. Not for the lack of financial savvy, but for getting into a situation where such financial gimmickry needed to be resorted to in the first place.&lt;br /&gt;&lt;br /&gt;The primary problem here is the particular weakness of small government bodies to be reckless about spending during good times. In an attempt to do something big and important for their constituents (ascribing the best motives), governmental bodies take on big projects when the economic cycle is positive and tax revenues are abundant. They take on big loans which need servicing even when things go bad - when tax revenues decline or interest rates rise or whatever. And then these agencies find themselves strapped for money and start to resort to financial gimmickry. And fall into the arms of the Wall Street firms.&lt;br /&gt;&lt;br /&gt;Reminds me of the famous Roald Dahl story about the old man who has a priceless painting tattooed on his back. And who goes away with a smooth talking stranger who promises to keep him happy for the rest of his life, only if the old man displayed his painting to the stranger's guests at his hotel. Needless to say, but in a few weeks, the painting appears &lt;i&gt;sans&lt;/i&gt; the old man in a famous art gallery.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-7510574088223171805?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/7510574088223171805/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=7510574088223171805' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/7510574088223171805'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/7510574088223171805'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2010/10/sensational-but-true.html' title='Sensational ... but true'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_nGE-rKiiGEc/TL1oXPHvP0I/AAAAAAAAClk/1aXPNcfuGJk/s72-c/jeffc0_sewer.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-9112147204688857179</id><published>2010-09-28T06:26:00.000-04:00</published><updated>2010-09-28T06:26:09.635-04:00</updated><title type='text'>Facebook's "revenue model"</title><content type='html'>A really &lt;a href="http://www.businessweek.com/magazine/content/10_40/b4197064860826.htm"&gt;&lt;u&gt;interesting article&lt;/u&gt;&lt;/a&gt; from Businessweek on the purported revenue model behind Facebook. One of the more insightful articles I have read recently.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-9112147204688857179?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/9112147204688857179/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=9112147204688857179' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/9112147204688857179'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/9112147204688857179'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2010/09/facebooks-revenue-model.html' title='Facebook&apos;s &quot;revenue model&quot;'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-2853846936058312088</id><published>2010-09-28T06:20:00.000-04:00</published><updated>2010-09-28T06:20:13.209-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='Modeling'/><title type='text'>Statistical models and ivory towers</title><content type='html'>Increasingly, business analytics and the use of advanced statistics in business decision making is making big surges. Companies from WalMart to FedEx to Netflix have demonstrated how they can build a sustainable business model on the foundations of good data and solid analysis of that data. To analyze the data, people with both the right statistical background and training as well as the required business acumen are critical. In other words, smart people. And this is usually one of the barriers to organizations making the transition from the pre-analytics to the post-analytics space.&lt;br /&gt;&lt;br /&gt;Ever so often, the people who push analytics within an organization come from an angle of intellectual superiority. "I can do math better than you and therefore I am right and better than you" is the mindset that many such practitioners bring to the field. This often results in resistance and sometimes, downright hostility to what the "statistical ones" are recommending, from the rest of the organization. Statistical practitioners often end up plowing a lonely furrow in organizations. And then one day, when the implicit sponsorship that got them into that position goes away, so follow the statistical modelers out of the organization. The feeling when they leave is one of profound disappointment and disillusionment on the side of the modelers, and profound relief and also some good old schadenfreude on the side of the old organization hands. How can this situation be averted? How can people who are obviously so intelligent and well-educated prevent making fools of themselves because they failed to fit into an organization?&lt;br /&gt;&lt;br /&gt;A few pieces of advice:&lt;br /&gt;&lt;b&gt;1. Be there to "solve the problem" vs "showcase your smarts"&lt;/b&gt;&lt;br /&gt;It is important to keep in mind why smart people are hired by organizations. It is usually to solve some business problem or the other. It is not because the organization suddenly discovered that they needed show ponies to come out and parade their smarts. So the first advice to the smart ones is to focus on fixing organizational challenges, i.e. focus on what they are hired for and build their credibility. Once credibility is built up, it becomes infinitely easier to take on work that is more intellectually stimulating and challenging.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;2. Simplify your communication around the solution&lt;/b&gt;&lt;br /&gt;Smart people often have the ability to get into really deep thinking about the work that they are involved in. Deep thinking indeed is required to fix many of the more difficult problems that companies and society is faced with. However deep thinking around communications is counter-productive. Human beings are simple creatures and usually favour clean narratives over complex ones. Keep the communication around the solution simple and crisp - you may need to get rid of some of the fancy footwork to get there but the trade-off is usually worth it.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;3. Be open to idea "give and take"&lt;/b&gt;&lt;br /&gt;Finally, approach idea sharing with positive intent. Ideas usually get better when they are critiqued by other people. Valuable perspectives come to light and unrecognized (by the idea creator) weaknesses are called out. Smart people tend to have a bias towards thinking "my way or the highway". This not only prevents ideas from realizing their full potential but also destroy the buy-in that is required from stakeholders. Buy-in is the oxygen that ideas need to survive and grow and developing the political savvy and getting that buy-in is always critical.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-2853846936058312088?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/2853846936058312088/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=2853846936058312088' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/2853846936058312088'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/2853846936058312088'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2010/09/statistical-models-and-ivory-towers.html' title='Statistical models and ivory towers'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-2793852862979674024</id><published>2010-08-26T05:44:00.000-04:00</published><updated>2010-08-26T05:44:22.151-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Great recession'/><category scheme='http://www.blogger.com/atom/ns#' term='Decision Making'/><category scheme='http://www.blogger.com/atom/ns#' term='statistical inference'/><category scheme='http://www.blogger.com/atom/ns#' term='Modeling'/><title type='text'>The Judgment Deficit - a real-wordliness deficit</title><content type='html'>I usually don't use my blog to take on or pick apart published pieces -  my aim with the blog is to create a diversity of ideas and viewpoints to  the reader. There is plenty of intelligent writing in the Web that is  thought-provoking and worth bringing to the attention of readers  interested in the general ideas of statistics and machine learning. But I  came across this learning recently that - I have to admit - caused a  fair amount of angst and therefore an urge-to-act. This was the  &lt;a href="http://hbr.org/product/the-judgment-deficit/an/R1009B-HCB-ENG?Ns=publication_date%7C1&amp;amp;Ntt=finance"&gt;&lt;u&gt;Judgment Deficit&lt;/u&gt;&lt;/a&gt; by Amar Bhide, a professor of Finance at Tufts  University.&lt;br /&gt;&lt;br /&gt;The  journal article from HBS talks about how machines or computers can make  decisions in certain types of situations and human judgment needs to  come in at other places. Fair enough. The article then bemoans the  recent Great Recession and lays part of the blame on statistical models  used in Finance. Specifically the author says &lt;br /&gt;&lt;i&gt;In recent times,  though, a new form of centralized control has taken root: mechanistic  decision making based on top-down statistical models and algorithms.  This has been especially true in finance, where risk models have  replaced the judgments of thousands of individual bankers and investors,  to disastrous effect.&lt;/i&gt;This kind of thinking is not only delusional  but also dangerous. (Another part of the article that didn't necessarily  get me singing from the rooftops was the lengthy encomium heaped on the  economics of Freidrich Hayek, the libertarian economist and the founder  of the famous Austrian Economists school. I am still not clear how is  that related to the topic at hand.)&lt;br /&gt;&lt;br /&gt;The fundamental reason why  banks took the risks they took were because there were incentives to do  so and there was not enough of an appreciation of the downside. Bankers  thought the spiral of rising home-prices, the ability to take the assets  off balance sheet and maintain minimal capital reserves, was an  unending one and were unable to either spot the inevitable edge of the  cliff or were too late to pull back once they spotted it. Also the  desire to have these activities as unregulated as possible (to allow  free pursuit of profit, or to make 'markets efficient' as Wall Street  would argue) led to a number of opacities (about risk) developing in the  system which lead to situations where high-schools in Norway were  exposed to the collapse of Bear Stearns. So lets not put the blame on  top-down statistical models and algorithms. If the alternative that  Bhide suggests, having manual underwriters take more of the decisions,  were to have happened, I am not sure whether the conclusions reached by  these underwriters would have been any different. Apart from a few  economists, fund managers and people like Nourini and Taleb (who have  made an image of themselves as Cassandras of Doom and therefore have to  say anything to maintain that image), nobody - let me say that again -  nobody saw this edifice collapsing. No one thought house prices in the  US would ever come down. Everyone (human beings and computers alike)  were victims of the rear-view mirror bias, i.e. expecting that the  future would play out exactly as the past.&lt;br /&gt;&lt;br /&gt;So let's go a little bit easy on computers, statistical models and automated decision making.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-2793852862979674024?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/2793852862979674024/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=2793852862979674024' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/2793852862979674024'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/2793852862979674024'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2010/08/judgment-deficit-real-wordliness.html' title='The Judgment Deficit - a real-wordliness deficit'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-580785307634337783</id><published>2010-08-19T22:00:00.000-04:00</published><updated>2010-08-19T22:00:49.172-04:00</updated><title type='text'>Part 2/3 of disaster estimation - Understanding the expected monetary loss</title><content type='html'>In &lt;a href="http://stat-exchange.blogspot.com/2010/06/bp-oil-spill-and-disaster-estimations.html"&gt;&lt;u&gt;part 1&lt;/u&gt;&lt;/a&gt; and &lt;a href="http://stat-exchange.blogspot.com/2010/07/disaster-estimations-part-1b3.html"&gt;&lt;u&gt;part 1b&lt;/u&gt;&lt;/a&gt; of this series, we reviewed some of  the ways in which disaster estimation modelers go about estimating the  probability of occurence of a catastrophic event. The next phase is the  estimation of expected dollar losses when the catastrophe does take  place. What would be some of the impacts on the economic activity within  a region and how widespread would be the impacts?&lt;br /&gt;&lt;br /&gt;This is where  the 'sexiness' of model building techniques meets the harsh realities of  extensive ground work and data gathering. When a disaster does occur,  the biggest disruptions are usually to life and property. Then there are  additional longer term impacts to the economic activity of the region  and this is driven both directly by the damage to life and property and  also indirectly by the impacts to business continuity and ultimately by  the confidence that consumers and tradespeople alike continue to have  about doing business in the region. Lets examine this one piece at a  time.&lt;br /&gt;&lt;br /&gt;The disruptions to life and property can be examined by the  number of dwellings or business properties that are built specifically  to resist the type of disaster event we are talking about. In the case  of fires, it is the number of properties that are built with the right  building codes that are built under the right safety codes. This type of  information requires some gathering but is publicly available  information from the property divisions of several counties. In the case  of hurricanes, it can be the number of houses that are constructed  after a certain year when stricter building codes started to be  enforced. This type of data gathering is extremely effort intensive but  is often the difference between a good approximate model versus a really  accurate model that can be used for insurance pricing decisions. With a  competitive market like insurance where there are many companies  operating essentially on price, the ability to build accurate models is a  powerful edge.&lt;br /&gt;&lt;br /&gt;The damage to life often has a very direct  correlation to the amount of property damage. Also with the early  warning systems in place ahead of disasters (except earthquakes, I  suppose), it has become quite common to have really large disasters like  hurricanes not resulting in any major loss of life. One significant  example was Hurricane Katrina where more than a thousand people lost  their lives in the Gulf Coast area and particularly in New Orleans.&lt;br /&gt;&lt;br /&gt;In  the next article in the series, I will provide an overview of the  ReInsurance market. Which is where a lot of this probabilistic modeling  ultimately gets applied.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-580785307634337783?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/580785307634337783/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=580785307634337783' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/580785307634337783'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/580785307634337783'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2010/08/part-23-of-disaster-estimation.html' title='Part 2/3 of disaster estimation - Understanding the expected monetary loss'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-8306599812121816209</id><published>2010-07-17T00:17:00.001-04:00</published><updated>2010-07-17T00:18:51.593-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='credit downturn'/><category scheme='http://www.blogger.com/atom/ns#' term='Text Mining'/><category scheme='http://www.blogger.com/atom/ns#' term='Great recession'/><category scheme='http://www.blogger.com/atom/ns#' term='Economics'/><category scheme='http://www.blogger.com/atom/ns#' term='Predictive Analytics'/><title type='text'>Interesting links from Jul 17, 2010</title><content type='html'>&lt;div style="font-family: &amp;quot;Helvetica Neue&amp;quot;,Arial,Helvetica,sans-serif;"&gt;1.The over-stated role of banking in the larger economy (Link &lt;a href="http://www.economist.com/node/16592286"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;)&lt;/div&gt;&lt;div style="font-family: &amp;quot;Helvetica Neue&amp;quot;,Arial,Helvetica,sans-serif;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: &amp;quot;Helvetica Neue&amp;quot;,Arial,Helvetica,sans-serif;"&gt;2. A very interesting article on the original monetary expansionist, John Law (Link &lt;a href="http://economist.com/blogs/buttonwood/2010/07/monetary_policy_asset_prices_and_wealth"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;)&lt;/div&gt;&lt;div style="font-family: &amp;quot;Helvetica Neue&amp;quot;,Arial,Helvetica,sans-serif;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: &amp;quot;Helvetica Neue&amp;quot;,Arial,Helvetica,sans-serif;"&gt;3. My latest area of passion, text mining and analytics. A blog entry from SAS. (Link &lt;a href="http://blogs.sas.com/inspire/index.php?/archives/10-Business-Analytics-101-Text-Analytics.html"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;)&lt;br /&gt;&lt;br /&gt;4.Commentary from Prof.Rajan on the inequality in US income and its inevitable lead to a crisis. His analysis on how income inequality forces asset-price inflation is fascinating (Link &lt;a href="http://www.project-syndicate.org/commentary/rajan7/English"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;)&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-8306599812121816209?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/8306599812121816209/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=8306599812121816209' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/8306599812121816209'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/8306599812121816209'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2010/07/interesting-links-from-jul-17-2010.html' title='Interesting links from Jul 17, 2010'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-4720779193506438934</id><published>2010-07-13T05:15:00.009-04:00</published><updated>2010-07-13T05:40:53.996-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Insurance'/><category scheme='http://www.blogger.com/atom/ns#' term='Risk Management'/><category scheme='http://www.blogger.com/atom/ns#' term='Probability'/><category scheme='http://www.blogger.com/atom/ns#' term='Disaster Estimation'/><category scheme='http://www.blogger.com/atom/ns#' term='Reinsurance'/><title type='text'>Disaster estimations - Part 1b/3 Understanding the probability of disaster</title><content type='html'>Part 1 of my post on modeling catastrophic risk covered measuring the  probability that a risk even can occur. This probability can derived  based on empirical evidence as well as from other computer models that  underlie destructive forces of nature. A good example of a paper that  talks about how such a model is built and used is outlined in this paper  by Karen Clark, a renowned catastrophic risk modeler and insurer. The  paper &lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_nGE-rKiiGEc/TDwzJcUYaLI/AAAAAAAACks/ujIyPht3SLY/s1600/table7.gif"&gt;&lt;img style="float: right; margin: 0pt 0pt 10px 10px; cursor: pointer; width: 320px; height: 171px;" src="http://4.bp.blogspot.com/_nGE-rKiiGEc/TDwzJcUYaLI/AAAAAAAACks/ujIyPht3SLY/s320/table7.gif" alt="" id="BLOGGER_PHOTO_ID_5493321882798680242" border="0" /&gt;&lt;/a&gt;was a seminal one when it came out as it outlined a scientific  method by which such risks could be estimated. The paper is titled "A  formal approach to catastrophe risk assessment and management" and the  link is &lt;a href="http://www.signallake.com/innovation/Clark110986.pdf"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The  paper outlines an approach to estimate losses from hurricanes impacting  the US Gulf coast and the East Coast. The model has a probability  assessment for hurricanes making landfall, developed using historical  information (going back to about 1910) from the US Weather Service.  While this is a great starting point and helps us get to a good estimate  of at least a range of losses one can expect and therefore the  insurance premiums one should expect to sell, there are important places  where the model can be improved. One example is the cyclical nature of  hurricane intensity over the last 100 years. Between 1950 and 1994, the  Atlantic hurricanes have run through a benign cycle. Hurricane activity  and intensity (as measured by the number of named storms and the number  of major hurricanes, respectively) have shown an increase since 1994,  though. So a model relying on activity from the 1950-1994 period is  likely to be off in its loss estimates by more than 20%. See the table  for what I am talking about.&lt;br /&gt;&lt;br /&gt;How can a modeler correct for such  errors in estimates? One way to correct for these estimates is to use  the latest in scientific technology and modeling in estimating the  probabilities. Developments in scientific understanding of phenomena  such as hurricanes means that it is now possible to build computer  models that replicate the physics behind the hurricanes. The dynamic  physical models incorporate some of the more recent understanding of  world climatology, such as the link between Sea Surface Temperatures or  SSTs and hurricane intensity. Using some of these models, researchers  have been able to replicate the increase in hurricane intensity seen in  the last fifteen years in a way that the empirical models built prior to  this period have not been able to. The popular science book about  global warming called &lt;a href="http://www.amazon.com/Storm-World-Hurricanes-Politics-Warming/dp/0151012873"&gt;&lt;u&gt;Storm World&lt;/u&gt;&lt;/a&gt; by Chris Mooney spells out these two different approaches to hurricane  intensity estimation, and the conflicts between the chief protagonists  of each of these approaches. Based on the recent evidence at least, the  more physics based approach certainly appears to be tracking closer to  the rapid changes to hurricane intensity. William Gray of Colorado State  University, whose annual hurricane forecast has been lucky for many  years has been forced to re-fit his empirical model for the rapid  increase in hurricane intensity post-1995.&lt;br /&gt;&lt;br /&gt;Finally, I leave you  with another note about how some of the dynamic physical models work.  This is from one of my favourite blogs which is Jeff Masters' tropical weather blog.  The latest &lt;a href="http://www.wunderground.com/blog/JeffMasters/comment.html?entrynum=1541"&gt;&lt;u&gt;entry&lt;/u&gt;&lt;/a&gt; talks precisely about such a dynamic physical model  built by the UK Met Office. And I quote:&lt;br /&gt;&lt;br /&gt;&lt;i&gt;&lt;span class="small" id="lj1k"&gt;it is based on a promising new method--running a dynamical  computer model of the global atmosphere-ocean system. The CSU forecast  from Phil Klotzbach is based on statistical patterns of hurricane  activity observed from past years. These statistical techniques do not  work very well when the atmosphere behaves in ways it has not behaved in  the past. The UK Met Office forecast avoids this problem by using a  global computer forecast model--the GloSea model (short for GLObal  SEAsonal&lt;/span&gt;&lt;/i&gt;&lt;i&gt;&lt;span class="small" id="lj1k"&gt; model). GloSea is based on the HadGEM3 model--one of the  leading climate models used to formulate the influential UN  Intergovernmental Panel on Climate Change (IPCC) report. GloSea  subdivides the atmosphere into a 3-dimensional grid 0.86° in longitude,  0.56° in latitude (about 62 km), and up to 85 levels in the vertical.  This atmospheric model is coupled to an ocean model of even higher  resolution. The initial state of the atmosphere and ocean as of June 1,  2010 were fed into the model, and the mathematical equations governing  the motions of the atmosphere and ocean were solved at each grid point  every few minutes, progressing out in time until the end of November  (yes, this takes a colossal amount of computer power!) It's well-known  that slight errors in specifying the initial state of the atmosphere c&lt;/span&gt;&lt;/i&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_nGE-rKiiGEc/TDwzj3M7XHI/AAAAAAAACk0/0syUKWufk28/s1600/glosea.gif"&gt;&lt;img style="float: right; margin: 0pt 0pt 10px 10px; cursor: pointer; width: 233px; height: 190px;" src="http://1.bp.blogspot.com/_nGE-rKiiGEc/TDwzj3M7XHI/AAAAAAAACk0/0syUKWufk28/s320/glosea.gif" alt="" id="BLOGGER_PHOTO_ID_5493322336691772530" border="0" /&gt;&lt;/a&gt;&lt;i&gt;&lt;span class="small" id="lj1k"&gt;an  cause large errors in the forecast. This "sensitivity to initial  conditions" is taken into account by making many model runs, each with a  slight variation in the starting conditions which reflect the  uncertainty in the initial state. This generates an "ensemble" of  forecasts and the final forecast is created by analyzing all the member  forecasts of this ensemble. Forty-two ensemble members were generated  for this year's UK Met Office forecast. The researchers counted how many  tropical storms formed during the six months the model ran to arrive at  their forecast of twenty named storms for the remainder of this  hurricane season. Of course, the exact timing and location of these  twenty storms are bound to differ from what the model predicts, since  one cannot make accurate forecasts of this nature so far in advance.&lt;br /&gt;&lt;br /&gt;The  grid used by GloSea is fine enough to see hurricanes form, but is too  coarse to properly handle important features of these storms. This lack  of resolution results in the model not generating the right number of  storms. This discrepancy is corrected by looking back at time for the  years 1989-2002, and coming up with correction factors (i.e., "fudge"  factors) that give a reasonable forecast.&lt;br /&gt;&lt;/span&gt;&lt;/i&gt;&lt;br /&gt;If you go  to the web-page of the UK Met Office &lt;a href="http://www.metoffice.gov.uk/weather/tropicalcyclone/northatlantic.html"&gt;&lt;u&gt;hurricane forecast&lt;/u&gt;&lt;/a&gt;,  you can find a link of interest Reinsurance companies. This link is to  buy the hurricane forecast which the UK Met Office has obviously gone to  great pains to develop. Their brochure on how the insurance industry  could benefit from this research makes for very interesting reading as  well.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-4720779193506438934?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/4720779193506438934/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=4720779193506438934' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/4720779193506438934'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/4720779193506438934'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2010/07/disaster-estimations-part-1b3.html' title='Disaster estimations - Part 1b/3 Understanding the probability of disaster'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_nGE-rKiiGEc/TDwzJcUYaLI/AAAAAAAACks/ujIyPht3SLY/s72-c/table7.gif' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-840477406018408780</id><published>2010-06-15T22:03:00.006-04:00</published><updated>2010-06-15T22:54:07.040-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Risk Management'/><category scheme='http://www.blogger.com/atom/ns#' term='Black Swans'/><category scheme='http://www.blogger.com/atom/ns#' term='Disaster Estimation'/><category scheme='http://www.blogger.com/atom/ns#' term='Fat-tailed probabilities'/><title type='text'>The BP oil spill and the disaster estimations - Part 1/3</title><content type='html'>The BP oil spill is already the biggest oil spill in the US and is on  its way to becoming an unprecedented industrial disaster, given the  environmental impact of millions of barrels of oil gushing into the Gulf  of Mexico. Even the most hardened of carbon lovers cannot but be moved  at the sight of the fragile wildlife in the Gulf literally soaking in  the oil. The ecosystem of the Gulf states which were already ravaged by  unrestrained development and the odd super-cyclone is now being struck a  death blow by the spewing gusher.&lt;br /&gt;&lt;br /&gt;Could the specific chain of  events leading up to this spill have been predicted? The answer is no.  But that doesn't mean that the outcome could not have been anticipated.  Given the technological complexity that some of the deep-sea oil  drilling operations typically involve, there was always a measurable  probability that one of the intermeshing systems and processes would  give way and result in an oil-well that was out of control. As Donald  Rumsfeld, Secretary of Defense in the Bush II administration put it,  stuff happens. But where there has been an abject failure of human  science and industrial technology has been in underestimating the impact  of this kind of an event on a habitat and overestimating the power of  technology to fix these kinds of problems.&lt;br /&gt;&lt;br /&gt;Fundamentally, the  science of estimating the impact of disasters can be broken do&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_nGE-rKiiGEc/TBg73y_27mI/AAAAAAAACkc/DWtEkgLLWec/s1600/NA-BG445A_NUMBG_NS_20100611232002.jpg"&gt;&lt;img style="float: right; margin: 0pt 0pt 10px 10px; cursor: pointer; width: 200px; height: 383px;" src="http://4.bp.blogspot.com/_nGE-rKiiGEc/TBg73y_27mI/AAAAAAAACkc/DWtEkgLLWec/s320/NA-BG445A_NUMBG_NS_20100611232002.jpg" alt="" id="BLOGGER_PHOTO_ID_5483198376092036706" border="0" /&gt;&lt;/a&gt;wn into  three estimations:&lt;br /&gt;one, an estimation that failure occurs&lt;br /&gt;second,  the damage expected as a result of the failure&lt;br /&gt;the third, (which is  probably a function of the second) are our capabilities in fixing the  failure or mitigating the impact of the failure.&lt;br /&gt;&lt;br /&gt;In this post, I will  discuss the first part of the problem - estimating the probability of  failures occurring.&lt;br /&gt;&lt;br /&gt;There is a thriving industry and a branch of  mathematics that works on the estimation of these extremely low  probability events known as Disaster Science. The techniques that the  disaster scientists or statisticians use are based on the understanding  of the specific industry (nuclear reactors, oil drilling, aerospace,  rocket launches, etc.) and is constantly refreshed with the our  increasing understanding of the physics or science in general underlying  some of these endeavours. The nuclear-power industry's approach  analyzes the engineering of the plant and tabulates every possible  series of unfortunate events that could lead to the release of dangerous  radioactive material, including equipment failure, operator error and  extreme weather. Statisticians tabulate the probability of each  disastrous scenario and add them together. Other industries, such as  aviation, use more probability based models given the hundreds of  thousands of data points available on a weekly basis. Then there are  more probabilistic approaches such as tail probability estimation or  extreme event estimation which uses math involving heavy-tailed  distributions for the probability estimation of such events occurring.  Michael Lewis in his inimitable style talked about this in an old New  York Times article called &lt;a href="http://www.nytimes.com/2007/08/26/magazine/26neworleans-t.html"&gt;&lt;u&gt;In Nature's Casino.&lt;/u&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;One  variable that is a factor and often the contributing factor in many  such disasters is human error. Human error is extraordinarily difficult  to model, just based on past behaviour because there are a number of  factors that could just confound such a read. For instance, as humans  encounter fewer failures, our nature is to become less vigilant and  therefore at greater risk of failing. Both lack of experience and too  much experience (especially without having encountered failures) are  risky. The quality of the human agent is another variable that has wide  variability. At one time, NASA had the brightest engineers and  scientists from our best universities join. Now, the brightest and the  best go to Wall Street or other private firms and it is often the  rejects or the products of second-rung universities that make it to  NASA. This variable of human quality is difficult to quantify or  sometimes difficult to measure in a way that does not offend people on  grounds like race, national origin, age and gender. Let us suppose that  the brightest and the best joining NASA previously came from colleges or  universities where admission standards required higher scores on  standardized tests. Now we know that standardized test scores are  correlated with the socio-economic levels of the test takers and hence  to variables such as income, race, etc. So now if NASA goes to lower  rung colleges, does it mean that it was being more exclusive and  discriminatory before (by taking in people with average higher scores)  and is now more inclusive now? And can we conclude that the drop in  quality now is a direct function of becoming more inclusive on the  admission criteria front? It is never easy to answer these questions or  even tackle the question without feeling queasy about what one is likely  to find while answering the question.&lt;br /&gt;&lt;br /&gt;Another variable, again  related to the human factor is the way we interact with technology. Is  the human agent at ease with the technology confronting him or does he  feel pressured and unsure from a decision making standpoint? I have  driven stick-shift cars before and I have been more comfortable and at  ease with the decision making around gear changes when the car-human  interface was relatively simpler and spartan. In my most recent car, as I  interact with multiple technology features such as the nav system, the  bluetooth enabled radio, the steering wheel, the paddle shifter, the  engine revs indicator, I find my attention diluted and I have seen that  the decision making around gear changes is not as precise as it used to  be.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-840477406018408780?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/840477406018408780/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=840477406018408780' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/840477406018408780'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/840477406018408780'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2010/06/bp-oil-spill-and-disaster-estimations.html' title='The BP oil spill and the disaster estimations - Part 1/3'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_nGE-rKiiGEc/TBg73y_27mI/AAAAAAAACkc/DWtEkgLLWec/s72-c/NA-BG445A_NUMBG_NS_20100611232002.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-1773662639597010173</id><published>2010-06-03T22:47:00.000-04:00</published><updated>2010-06-03T22:48:54.628-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Uncertainty'/><category scheme='http://www.blogger.com/atom/ns#' term='Risk Management'/><category scheme='http://www.blogger.com/atom/ns#' term='Risk'/><title type='text'>On Knightian Uncertainty</title><content type='html'>An interesting post appeared recently attempting to distinguish between  risk and uncertainty. The view was proposed by an economist called Frank  Knight. The theory proposed by Knight is that risk is something where  the outcome is unknown but whose odds can be estimated. But when the  odds become inestimable, risk turns to uncertainty. In other words, risk  can be measured and uncertainty cannot.&lt;br /&gt;&lt;br /&gt;There are economists  who argue that Knight's distinction only applies in theory. In the world  of the casino, where the probability of a 21 turning up or the roulette  ball landing on a certain number can be estimated, it is possible to  have risk. But anything outside simple games of probability becomes  uncertainty because it is difficult to measure the uncertainty. The real  world out there is so complex that it is indeed difficult to make even  reasonably short term projections, let alone the really long term ones.  So what is really the truth here? Does risk (as defined by Knight) even  exist in the world today? Or as the recent world events (be it 9/11, the  Great Recession, the threatened collapse of Greece, the oil spill in  the Gulf of Mexico, the unpronounceable Icelandic volcano) have  revealed, it is a mirage to try and estimate the probability of  something playing out with remotely close to the kinds of odds we  initially estimate.&lt;br /&gt;&lt;br /&gt;I have a couple of reactions. First, my view  is that risk can be measured and outcomes predicted more or less  accurately under some conditions in the real world. When forces are more  or less in equilibrium, it is possible to have some semblance of  predictability about political and economic events. And therefore an  ability to measure the probability of outcomes happening. When forces  disrupt that equilibrium and the disruptions may be caused by the most  improbable and unexpected causes, then all bets are off. Everything we  have learnt from the time when Knightian risk applied is no longer true  and Knightian uncertainty takes over.&lt;br /&gt;&lt;br /&gt;Second, this points to  the need for the risk management philosophy (as it is applied to a  business context) to not only consider what the system knows and can  observe but also the risks that the system doesn't even know exist out  there. That's where good management practices such as constantly  reviewing positions, eliminating extreme concentrations (even if they  appear to be value-creating concentrations), constantly questioning the  cognitive thinking - can lead to a set of guardrails that a business can  stay within. Now these guardrails may be frowned up and even may invite  derision from those interested in growing the business during good  times, as the nature of these guardrails are always going to be to try  and avoid too much of a good thing. However, it is important for the  practitioners of risk management to stay firm to their convictions and  make sure the appropriate guardrails are implemented.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-1773662639597010173?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/1773662639597010173/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=1773662639597010173' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/1773662639597010173'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/1773662639597010173'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2010/06/on-knightian-uncertainty.html' title='On Knightian Uncertainty'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-9009372701191851822</id><published>2010-05-04T05:57:00.003-04:00</published><updated>2010-05-04T06:00:41.031-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Technology'/><category scheme='http://www.blogger.com/atom/ns#' term='Retailing'/><category scheme='http://www.blogger.com/atom/ns#' term='Data visualization'/><category scheme='http://www.blogger.com/atom/ns#' term='Computing Infrastructure'/><category scheme='http://www.blogger.com/atom/ns#' term='Data Mining'/><category scheme='http://www.blogger.com/atom/ns#' term='Predictive Analytics'/><title type='text'>Interesting data mining links</title><content type='html'>1. The NY Times recently had a piece on how data is increasingly part of our life. Link &lt;a href="http://www.nytimes.com/2010/05/02/magazine/02self-measurement-t.html?ref=magazine"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;2. The Web Coupon - a new way for retailers to know more about you. Link &lt;a href="http://www.nytimes.com/2010/04/17/business/media/17coupon.html?scp=1&amp;amp;sq=web%20coupons&amp;amp;st=cse"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;3. On Principal Components Analysis. Link &lt;a href="http://abbottanalytics.blogspot.com/2010/02/prinicpal-components-for-modeling.html"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-9009372701191851822?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/9009372701191851822/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=9009372701191851822' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/9009372701191851822'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/9009372701191851822'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2010/05/interesting-data-mining-links.html' title='Interesting data mining links'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-1004489357872996918</id><published>2010-05-01T09:20:00.001-04:00</published><updated>2010-05-01T09:24:34.078-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Technology'/><category scheme='http://www.blogger.com/atom/ns#' term='Computing Infrastructure'/><title type='text'>The future of publishing - and a new business model</title><content type='html'>The demise of  an ages-old business model and the emergence of a new one to take its  place is always an exciting thing to watch - unless you are part of the  age-old business model on its way to its demise. There are old  assumptions challenged, changes in the way consumers consume, the  emergence of a technology trigger, new financing patterns, new winners  and losers. Fascinating to someone looking-in from the outside.&lt;br /&gt;&lt;br /&gt;An  industry that has pretty much been under attack since the coming of the  Internet has been the print and the publishing business. But what  threatened to be a slow roll of a snowball (obviously to be replaced  with new ways of consuming and disseminating information) has taken the  form of a rapidly growing avalanche after digitized books and the  digital book reader (the Kindle, predominantly) have become mainstream.  As is to be expected, there are powerful players working to pull the rug  from under the feet of the big publishing and media companies. First  Google with wanting to digitize every book ever published. Amazon then  came with the Kindle that cut out printing costs from the value chain  and make books much more affordable for end-consumers. Of course, the  elimination of the printing, warehousing and the physical distribution  process would mean massive job-cuts in the big publishing and printing  houses, not to mention a necessary shrinking in the margins retained by  the publisher from the printing price of the book.&lt;br /&gt;&lt;br /&gt;An  interesting article in the New Yorker talks about the demise of  publishing at the hands of the digital giants in more detail. Link &lt;a href="http://www.newyorker.com/reporting/2010/04/26/100426fa_fact_auletta"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt; Amazon, Apple and Google are the big digital players jockeying for  position in this market. A few years back, Microsoft would have been a  contender as well but repeated failures to crack the consumer space  (where MS does not have a monopolist advantage) has resulted in a little  more of circumspection.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-1004489357872996918?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/1004489357872996918/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=1004489357872996918' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/1004489357872996918'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/1004489357872996918'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2010/05/future-of-publishing-and-new-business.html' title='The future of publishing - and a new business model'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-8074615548261507295</id><published>2010-02-20T00:50:00.000-05:00</published><updated>2010-02-20T00:52:03.053-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Risk Management'/><category scheme='http://www.blogger.com/atom/ns#' term='Financial Regulation'/><title type='text'>Bank Regulation in the Canadian context - Part 2</title><content type='html'>To paraphrase from my previous post on the subject (link &lt;a href="http://stat-exchange.blogspot.com/2010/02/bank-regulation-in-canadian-context.html"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;), the stock prices of Canadian banks outperformed large American banks during two separate periods through the late 90s and the 2000s. One was a benign period from 1998 to 2005, and the other was the period from 2002 to 2009 (which culminated with the Great Recession), i.e. a combined good and bad period. However, Canada all through this period faced tighter banking regulation than the US banks. What worked in the Canadian example?&lt;br /&gt;&lt;br /&gt;Per the FT article, there were three factors involved. And extrapolating from these factors, my belief is that it translated to one important difference in the operating philosophy of Canadian banks vis-a-vis US banks, or for that matter, even the ones in the UK and continental Europe.&lt;br /&gt;- The first factor was &lt;b&gt;a simple regulatory framework.&lt;/b&gt; The US famously had an alphabet soup of regulatory agencies that were competing for banks' business. Canada by contrast had a very simple set up. One agency to serve as the central bank - responsible for the stability of the overall system, one as a banking supervisor, one agency for consumer protection and the finance ministry that set the broad rules on ownership of financial institutions and the design of financial products.&lt;br /&gt;- The second factor was a set of really &lt;b&gt;simple and easy-to-follow risk guardrails&lt;/b&gt; on individual institutions, having little to no room for flexibility. the first such rule as a requirement of 7% of assets to be maintained as Tangible Common Equity or TCE. Now, 7% is quite a conservative number when compared with the 4.5-6% that US regulators have been comfortable at different points in time. Additionally, the OSFI required that the capital maintained be of the highest quality - shareholder equity. The Canadian regulators require that 75% of TCE should be comprised of shareholder equity. There is no room for quasi-equity products like preferred shares (which, incidentally have not turned out to be very useful from a capital standpoint for US institutions). Finally, the third requirement was a leverage cap of 20:1. Compare this with US banks that have consistently maintained higher leverage ratios in an attempt to expand investments and improve returns to stakeholders in an environment supposedly insulated from risk.&lt;br /&gt;- Finally, a third important factor were the dealings between the Canadian bank regulator and the banks when it came to following rules. The Canadian system was &lt;b&gt;based on principles, rather than narrowly following specific rules&lt;/b&gt;. It is about the spirit rather than the letter of the law. The head of the OSFI regularly met with the bank CEOs and was a frequent attendee to board meetings, especially in the ones having the non-executive board members attending. The bank CEOs on their part took interest in maintaining a stable system and paid serious attention to the advisement of the regulators.&lt;br /&gt;&lt;br /&gt;Now, I am attempting to fill in the blanks beyond this point. My hypothesis on the operating philosophy of Canadian banks is that these simple and non-negotiable guidelines did not leave too much room for adventures such as optimization around edges, getting into illiquid and structurally untested asset classes (like the synthetic ABSs and MBSs), etc. Canadian banks realized that the one safe and reliable way of making money would be to focus on consumer/ business borrowing needs and meet them with simple lending products. The returns from a plain-vanilla banking business which centered around taking deposits and lending them directly to consumers and businesses, were secure and good enough to generate a healthy return on capital for these banks. Which then got captured in a healthy stock price. The creativity and the management talent of the bankers went towards meeting customer needs, as against getting into even more arcane areas of structured finance.&lt;br /&gt;&lt;br /&gt;What does this all mean for risk management and its application? There is a myth out there somewhere that tighter regulations tends to dampen shareholder returns. The high-impact downside resulting from tail-events is prevented, but that is at the cost of profits during more normal times. However, that doesn't seem to have been the case considering the performance of Canadian banks. Canadian banks were more tightly regulated than US banks, i.e. risk management was tighter. But the banks clearly did not suffer as a result. Rather a principles-based risk management practice resulted in greater co-operation between banks and the regulators, allowed the banks to focus on the long-term drivers of value in banking and ultimately returned better returns to shareholders.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-8074615548261507295?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/8074615548261507295/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=8074615548261507295' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/8074615548261507295'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/8074615548261507295'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2010/02/bank-regulation-in-canadian-context_20.html' title='Bank Regulation in the Canadian context - Part 2'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-8488008958532258027</id><published>2010-02-09T17:04:00.018-05:00</published><updated>2010-02-09T17:36:01.353-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Risk Management'/><category scheme='http://www.blogger.com/atom/ns#' term='Wall Street'/><category scheme='http://www.blogger.com/atom/ns#' term='Financial Regulation'/><category scheme='http://www.blogger.com/atom/ns#' term='Stress Testing'/><title type='text'>Bank Regulation in the Canadian context - Part 1</title><content type='html'>The fallout of the 2008-09 Great Recession in terms of failed banks, lost jobs, shuttered plants, bankrupt companies, is news to all by now. What started off as a repayment crisis had an amplified impact on the overall economy - driven by reckless risk-taking by big banks, over-leveraging and ultimately pursuing a path that seems to suggest that they believed they were too big to fail. Which turned out to be the case ultimately. Read bailout of AIG, the arranged marriage for Bear, government takeovers of Fannie and Freddie and so on.&lt;br /&gt;&lt;br /&gt;The contagion has not been limited to US banks and institutions by any means. European Banks (UBS, Deutsche and Societe Generale), British banks, Irish and Icelandic banks - all showed similar behaviours, similar disdain for any considerations of their long-term health believing themselves to be too big to fail. One glorious exception in all of this has been large Canadian banks. As compared to some of their US and European rivals, these large banks have been the very paragon of well-managed and well-run financial institutions and have hardly suffered a blip to their profitability or needed any government largesse over the Great Recession to survive. In fact, Canada is the only G7 country to survive the financial crisis without a state bail-out for its financial sector.&lt;br /&gt;&lt;br /&gt;(The top 5 Canadian banks are Royal Bank of Canada, Scotiabank, Toronto-Dominion Bank, Bank of Montreal and the Canadian Imperial Bank of Commerce. Besides cornering nearly 90% of the Canadian market, these banks are in reality large international banks with operations in 40-50 countries, and stock listings on multiple exchanges. A quick primer on Canadian banks is &lt;a href="http://ezinearticles.com/?Canadian-Banks---The-Big-Five-Banks&amp;amp;id=838750"&gt;here&lt;/a&gt;.)&lt;br /&gt;&lt;br /&gt;What caused the Canadian banks to survive? An immediate reaction (which incidentally would be wrong) is that Canadians are somehow too nice to participate in the kind of no-holds-barred plundering practiced by the American banks. They play a soft form of capitalism, one that protects the downside but also somehow limits the upside. Hmmmm, not entirely true. The net shareholder returns of Canadian banks have exceeded that of UK and US banks in the last 5 years, as evidenced in the graph below.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_nGE-rKiiGEc/S3HcgWovZcI/AAAAAAAACjY/-Wr24CVZXUc/s1600-h/shareholder+returns.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 169px;" src="http://4.bp.blogspot.com/_nGE-rKiiGEc/S3HcgWovZcI/AAAAAAAACjY/-Wr24CVZXUc/s320/shareholder+returns.jpg" alt="" id="BLOGGER_PHOTO_ID_5436368673603282370" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;What about returns over a larger time period? How do the top Canada banks compare to the top US banks in terms of stockprice performance?&lt;br /&gt;&lt;br /&gt;Looking at a 7 1/2 year period from mid-2002, the total returns on a basket of large Canadian banks (the ones mentioned above) was 144%. In the same period, US large banks (Citi, Chase, BofA, Wells, Goldman, Morgan Stanley) had a return of a paltry 2%. OK, the US banks returns were decimated because of the recent credit crisis. The market over-reacted maybe. If you look at returns from a period from Jan 1998 to Dec 2005, when we were having a so-called 'Goldilocks' economy, the story isn't too different. US bank stocks rises to a more respectable 69% but the performance of Canadian bank stocks improves even more to 183%.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;u&gt;Table of stock price performance for top Canadian banks - followed by US banks&lt;br /&gt;(Boom and Bust Period)&lt;br /&gt;&lt;/u&gt;&lt;/span&gt;&lt;table str="" style="border-collapse: collapse; width: 168pt;" width="224" border="0" cellpadding="0" cellspacing="0"&gt;&lt;tbody&gt;&lt;tr style="height: 12.75pt;" height="17"&gt;&lt;td style="height: 12.75pt; width: 48pt;" width="64" height="17"&gt;&lt;br /&gt;&lt;/td&gt;&lt;td class="xl24" style="width: 60pt;" str="'June 2002" width="80"&gt;June 2002&lt;/td&gt;   &lt;td class="xl25" style="width: 60pt;" str="'Feb 2010" width="80"&gt;Feb 2010&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 12.75pt;" height="17"&gt;   &lt;td style="height: 12.75pt;" height="17"&gt;RBC&lt;/td&gt;   &lt;td class="xl25" num=""&gt;16.06&lt;/td&gt;   &lt;td class="xl25" num=""&gt;50.44&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 12.75pt;" height="17"&gt;   &lt;td style="height: 12.75pt;" height="17"&gt;TD&lt;/td&gt;   &lt;td class="xl25" num=""&gt;23.19&lt;/td&gt;   &lt;td class="xl25" num=""&gt;59.56&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 12.75pt;" height="17"&gt;   &lt;td style="height: 12.75pt;" height="17"&gt;CIBC&lt;/td&gt;   &lt;td class="xl25" num=""&gt;32.85&lt;/td&gt;   &lt;td class="xl25" num="59.134999999999998"&gt;59.135&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 12.75pt;" height="17"&gt;   &lt;td style="height: 12.75pt;" height="17"&gt;BofM&lt;/td&gt;   &lt;td class="xl25" num=""&gt;21.86&lt;/td&gt;   &lt;td class="xl25" num=""&gt;48.86&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 12.75pt;" height="17"&gt;   &lt;td style="height: 12.75pt;" height="17"&gt;Scotia&lt;/td&gt;   &lt;td class="xl25" num=""&gt;17.41&lt;/td&gt;   &lt;td class="xl25" num=""&gt;42.87&lt;/td&gt;  &lt;/tr&gt; &lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;table str="" style="border-collapse: collapse; width: 168pt;" width="224" border="0" cellpadding="0" cellspacing="0"&gt;&lt;col style="width: 48pt;" width="64"&gt;  &lt;col style="width: 60pt;" span="2" width="80"&gt;  &lt;tbody&gt;&lt;tr style="height: 12.75pt;" height="17"&gt;   &lt;td style="height: 12.75pt; width: 48pt;" width="64" height="17"&gt;&lt;br /&gt;&lt;/td&gt;   &lt;td class="xl24" style="width: 60pt;" str="'June 2002" width="80"&gt;June 2002&lt;/td&gt;   &lt;td class="xl25" style="width: 60pt;" str="'Feb 2010" width="80"&gt;Feb 2010&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 12.75pt;" height="17"&gt;   &lt;td style="height: 12.75pt;" height="17"&gt;Chase&lt;/td&gt;   &lt;td class="xl25" num=""&gt;22.25&lt;/td&gt;   &lt;td class="xl25" num=""&gt;38.39&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 12.75pt;" height="17"&gt;   &lt;td style="height: 12.75pt;" height="17"&gt;Wells&lt;/td&gt;   &lt;td class="xl25" num=""&gt;25.2&lt;/td&gt;   &lt;td class="xl25" num=""&gt;26.71&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 12.75pt;" height="17"&gt;   &lt;td style="height: 12.75pt;" height="17"&gt;BofA&lt;/td&gt;   &lt;td class="xl25" num=""&gt;35.18&lt;/td&gt;   &lt;td class="xl25" num=""&gt;14.47&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 12.75pt;" height="17"&gt;   &lt;td style="height: 12.75pt;" height="17"&gt;Citi&lt;/td&gt;   &lt;td class="xl25" num=""&gt;28.68&lt;/td&gt;   &lt;td class="xl25" num=""&gt;3.18&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 12.75pt;" height="17"&gt;   &lt;td style="height: 12.75pt;" height="17"&gt;Goldman&lt;/td&gt;   &lt;td class="xl25" num=""&gt;73.35&lt;/td&gt;   &lt;td class="xl25" num=""&gt;152.49&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 12.75pt;" height="17"&gt;   &lt;td style="height: 12.75pt;" height="17"&gt;MS&lt;/td&gt;   &lt;td class="xl25" num=""&gt;35.62&lt;/td&gt;   &lt;td class="xl25" num=""&gt;27.13&lt;/td&gt;  &lt;/tr&gt; &lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;u&gt;Table of stock price performance for top Canadian banks - followed by US banks&lt;br /&gt;(Boom Period only)&lt;/u&gt;&lt;/span&gt;&lt;br /&gt; &lt;table str="" style="border-collapse: collapse; width: 168pt;" width="224" border="0" cellpadding="0" cellspacing="0"&gt;&lt;col style="width: 48pt;" width="64"&gt;  &lt;col style="width: 60pt;" span="2" width="80"&gt;  &lt;tbody&gt;&lt;tr style="height: 12.75pt;" height="17"&gt;   &lt;td style="height: 12.75pt; width: 48pt;" width="64" height="17"&gt;&lt;br /&gt;&lt;/td&gt;   &lt;td class="xl24" style="width: 60pt;" str="'Jan 1998" width="80"&gt;Jan 1998&lt;/td&gt;   &lt;td class="xl24" style="width: 60pt;" str="'Dec 2005" width="80"&gt;Dec 2005&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 12.75pt;" height="17"&gt;   &lt;td style="height: 12.75pt;" height="17"&gt;RBC&lt;/td&gt;   &lt;td class="xl25" num=""&gt;12.15&lt;/td&gt;   &lt;td class="xl25" num=""&gt;39.27&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 12.75pt;" height="17"&gt;   &lt;td style="height: 12.75pt;" height="17"&gt;TD&lt;/td&gt;   &lt;td class="xl25" num=""&gt;17.49&lt;/td&gt;   &lt;td class="xl25" num=""&gt;52.55&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 12.75pt;" height="17"&gt;   &lt;td style="height: 12.75pt;" height="17"&gt;CIBC&lt;/td&gt;   &lt;td class="xl25" num=""&gt;24.02&lt;/td&gt;   &lt;td class="xl25" num=""&gt;65.8&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 12.75pt;" height="17"&gt;   &lt;td style="height: 12.75pt;" height="17"&gt;BofM&lt;/td&gt;   &lt;td class="xl25" num=""&gt;20.22&lt;/td&gt;   &lt;td class="xl25" num=""&gt;55.94&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 12.75pt;" height="17"&gt;   &lt;td style="height: 12.75pt;" height="17"&gt;Scotia&lt;/td&gt;   &lt;td class="xl25" num=""&gt;16.6&lt;/td&gt;   &lt;td class="xl25" num=""&gt;39.93&lt;/td&gt;  &lt;/tr&gt; &lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt; &lt;table str="" style="border-collapse: collapse; width: 168pt;" width="224" border="0" cellpadding="0" cellspacing="0"&gt;&lt;col style="width: 48pt;" width="64"&gt;  &lt;col style="width: 60pt;" span="2" width="80"&gt;  &lt;tbody&gt;&lt;tr style="height: 12.75pt;" height="17"&gt;   &lt;td style="height: 12.75pt; width: 48pt;" width="64" height="17"&gt;&lt;br /&gt;&lt;/td&gt;   &lt;td class="xl24" style="width: 60pt;" str="'Jan 1998" width="80"&gt;Jan 1998&lt;/td&gt;   &lt;td class="xl24" style="width: 60pt;" str="'Dec 2005" width="80"&gt;Dec 2005&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 12.75pt;" height="17"&gt;   &lt;td style="height: 12.75pt;" height="17"&gt;Chase&lt;/td&gt;   &lt;td class="xl25" num=""&gt;51.29&lt;/td&gt;   &lt;td class="xl25" num=""&gt;48.3&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 12.75pt;" height="17"&gt;   &lt;td style="height: 12.75pt;" height="17"&gt;Wells&lt;/td&gt;   &lt;td class="xl25" num=""&gt;18.22&lt;/td&gt;   &lt;td class="xl25" num=""&gt;35.56&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 12.75pt;" height="17"&gt;   &lt;td style="height: 12.75pt;" height="17"&gt;BofA&lt;/td&gt;   &lt;td class="xl25" num=""&gt;29.94&lt;/td&gt;   &lt;td class="xl25" num=""&gt;46.15&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 12.75pt;" height="17"&gt;   &lt;td style="height: 12.75pt;" height="17"&gt;Citi&lt;/td&gt;   &lt;td class="xl25" num=""&gt;24.78&lt;/td&gt;   &lt;td class="xl25" num=""&gt;48.53&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 12.75pt;" height="17"&gt;   &lt;td style="height: 12.75pt;" height="17"&gt;Goldman&lt;/td&gt;   &lt;td class="xl25" num=""&gt;73.72&lt;/td&gt;   &lt;td class="xl25" num=""&gt;133.26&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 12.75pt;" height="17"&gt;   &lt;td style="height: 12.75pt;" height="17"&gt;MS&lt;/td&gt;   &lt;td class="xl25" num=""&gt;29.19&lt;/td&gt;   &lt;td class="xl25" num=""&gt;56.74&lt;/td&gt;  &lt;/tr&gt; &lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-style: italic;"&gt;(Goldman Sachs and Bank of Montreal did not have full information over these periods. But having them in the numbers - or taking them out - doesn't change the story.)&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;SO what can explain the better performance of Canadian banks? What allows them the ability to not only perform better through the cycle but also do so with mininal government handouts? The answer is superior risk management and that will form part of the next part on this subject.&lt;br /&gt;&lt;br /&gt;Christya Freeland of FT.com has a fascinating article on the subject and the link is &lt;a href="http://www.ft.com/cms/s/2/db2b340a-0a1b-11df-8b23-00144feabdc0.html"&gt;here&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-8488008958532258027?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/8488008958532258027/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=8488008958532258027' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/8488008958532258027'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/8488008958532258027'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2010/02/bank-regulation-in-canadian-context.html' title='Bank Regulation in the Canadian context - Part 1'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_nGE-rKiiGEc/S3HcgWovZcI/AAAAAAAACjY/-Wr24CVZXUc/s72-c/shareholder+returns.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-4933614245457332817</id><published>2009-12-29T00:07:00.003-05:00</published><updated>2009-12-29T00:08:52.519-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Decision Making'/><category scheme='http://www.blogger.com/atom/ns#' term='statistical inference'/><category scheme='http://www.blogger.com/atom/ns#' term='Modeling'/><title type='text'>A serious problem - but analytics may have some common-sense solutions</title><content type='html'>My family and I just got back from a India vacation. As always, we had a great time and as always, the travel was painful. One, because of its length and also because of all the documentation checks at various points in the journey. But in hindsight, I am feeling thankful that we were back in the States before the latest terrorist attack on the NWA jetliner to Detroit took place. A Nigerian man, Umar Farouk AbdulMutallak, tried to set off an explosive device but thankfully did not succeed.&lt;br /&gt;&lt;br /&gt;Now apparently, this individual was on the anti-terrorism radar for a while. He was on the terrorist watch-list but not on the official no-fly list. Hence, he was allowed to board the flight going from Amsterdam to Detroit, where he tried to perpetrate his misdeed. The events have raised a number of valid questions on the job the TSA (the agency in charge of ensuring safe air travel within and to/from the US) is doing in spotting these kinds of threats. There were a number of red flags in this case. A passenger who had visited Yemen - a place as bad as Pakistan when it comes to providing a safe haven for terrorists. A ticket paid in cash. Just one carry-on bag and no bags checked in. A warning coming from this individual's family, no less. A denied British visa - another country that has as much to fear from terrorism as the US. The question I have is: could more have been done? Could analytics have been deployed more effectively to identify and isolate the perpetrator? And how could all of this be achieved without giving a very overt impression of profiling? A few ideas come to mind.&lt;br /&gt;&lt;br /&gt;First, a scoring system to constantly upgrade the threat level of individuals and provide a greater amount of precision in understanding the threat posed by an individual at a certain point in time. A terror list of 555,000 is too bloated and is likely to contain a fair number of false positives. This model would use latest information about the traveler, all of which can be gathered at the time of travel or before travel. Is the traveler a US citizen or a citizen of a friendly country? (US Citizen or Perm Resident = 1, Citizen of US ally = 2, Other countries = 3, Known terrorist nation = 5) Has the person bought the ticket in cash or by electronic payment? (Electronic payment = 1, Physical instrument such as a cheque = 2, Cash = 5) Does the person have a US contact? Is the contact a US citizen or a permanent resident? Is the person traveling to a valid residential address? What are the countries the individual has visited in the last 24 months? And so on. You get the idea. Now the weights that have been attached are quite arbitrary to start, but they can always be adjusted as the perception of these risk factors change and our understanding evolves.&lt;br /&gt;&lt;br /&gt;Now what needs to be done is to update the parameters of this model every 3-6 months or so. Then every individual on the database as well as very person traveling needs to be scored using this model and high scorers (high risk of either having connections to terrorist network or traveling with some nefarious intent) can be identified for additional screening and scrutiny. These are the types of common-sense solutions that can be deployed to solve these types of ticklish problems. When the size of the problem has been reduced from 555,000 people on whom you need to spend the same amount of time, to one where the amount of scrutiny can be sloped based on the propensity to cause trouble, the problem suddenly becomes a lot more tractable.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-4933614245457332817?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/4933614245457332817/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=4933614245457332817' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/4933614245457332817'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/4933614245457332817'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/12/serious-problem-but-analytics-may-have.html' title='A serious problem - but analytics may have some common-sense solutions'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-8453645377223406118</id><published>2009-12-28T22:31:00.004-05:00</published><updated>2009-12-28T22:46:22.590-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Investing'/><category scheme='http://www.blogger.com/atom/ns#' term='statistical inference'/><category scheme='http://www.blogger.com/atom/ns#' term='Macroeconomics'/><title type='text'>Some end of year reading</title><content type='html'>1. Krugman on the America's own lost decade - link &lt;a href="http://www.nytimes.com/2009/12/28/opinion/28krugman.html"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt; &lt;span style="font-style: italic;"&gt;&lt;br /&gt;The usual Krugman rant on how things are going downhill and accelerating.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;2. The Freakonomics blog on the practice of not inflation-adjusting stock returns - link &lt;a href="http://freakonomics.blogs.nytimes.com/2009/12/28/the-quiet-danger-of-non-inflation-adjusted-stock-returns/"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Stock returns are seldom adjusted for inflation, transaction costs and taxes. While usually savvy investors do account for these factors, it is easy to get misled unless one reads the fine print.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;3. How did buy and hold do in the last 10 years? link &lt;a href="http://www.nytimes.com/2000/02/20/business/business-10-stocks-for-2010-buy-and-hold-picks-from-top-investors.html"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;It is unfair asking a stock picker to pick just one stock. Makes for good headlines but does not really allow the stock picker to demonstrate their skills. The probability that any one company could be impacted by freak events is usually pretty high.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;4. Health Statistics and the mammogram controversy - link &lt;a href="http://online.wsj.com/article/SB126031689043682715.html"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt; (from the WSJ) and &lt;a href="http://blogs.wsj.com/numbersguy/mammogram-math-858/"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt; from the Numbers Guy blog&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Reading any kind of reporting coming from the US (except sports, maybe) has become a painful drag through the ideology of the author.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;5. Happiness - State of mind or state of body? - link &lt;a href="http://www.nytimes.com/2009/12/22/nyregion/22nyc.html"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;An interesting 'light' piece of reading. Turns out that they are one and the same thing.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-8453645377223406118?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/8453645377223406118/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=8453645377223406118' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/8453645377223406118'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/8453645377223406118'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/12/some-end-of-year-reading.html' title='Some end of year reading'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-5012595396469749424</id><published>2009-12-19T13:36:00.002-05:00</published><updated>2009-12-19T13:40:54.009-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Systems Modeling'/><title type='text'>The place of Systems Modeling in Analytics</title><content type='html'>When one talks about predictive analytics, the typical thought process goes in the direction of regression, neural nets, data mining techniques. Techniques that savvy marketers (consumer product companies, banks) have been using for close to two decades now in building insights about consumer behaviour. Systems modeling or Systems Dynamics is not something that immediately springs to mind.&lt;br /&gt;&lt;br /&gt;So what is systems modeling all about? Systems modeling is &lt;span style="font-weight: bold; color: rgb(51, 51, 255);"&gt;creating a mathematical representation of a real-world phenomenon&lt;/span&gt;, trying to cover as wide range a set of inputs as feasible and the most valuable outputs. The systems model tries to explain how the inputs translate to outputs. How the systems model is different from a statistical predictive model is that the purpose of the systems model is not to try and explain variance in the output. The systems model instead tries to establish structural relationships between the input and the output. The model then further stresses the structural relationship by varying the inputs and looking at the impact on the output.&lt;br /&gt;&lt;br /&gt;A good example of a subject that can be systems-modeled (my verb!) is the problem of terrorism. The problem has different inputs: unhappy people, territorial disputes, foreign power wanting to create trouble, funding, media coverage, etc. The immediate output is various actions of terrorism such as assassinations of leaders, suicide bombings, etc. It might be feasible to build a model that creates a structure on how these various inputs combine and interact with one another and cause the outputs. (If one goes back over the past 150 years, there should be plenty of data points.) Another way of looking at the output is a more holistic view that measures the damage done in terms of lives lost, economic damage incurred, etc.&lt;br /&gt;&lt;br /&gt;What would be the purposes of this model? In my opinion, the value of such a model is less around where the next terrorist strike is going to be, or how big the next strike is going to be. (This is incidentally what a classic statistical model is going to try to do.) But rather, the model should try and explain what are the confluence of factors that produce a large output event (lives lost, economic damage) and how can some of the factors be controlled, ONCE an insurgency is already underway. The hypothetical model I am talking about does not try to predict, but rather to strengthen our understanding of the system dynamics. The model would have a PoV on what inputs can be controlled and to what extent are they controllable.&lt;br /&gt;&lt;br /&gt;The model would then be used to understand how a large impact event can be prevented or its impact minimized. So if the federal government had a $100 billion to spend, how much should they spend on homeland security vs. promoting a positive image of the United States through foreign media? The model might tell that it is pointless to spend more than, say, $500 million on putting in a sophisticated software to block large untraced wire transfers as there are other ways in which the funding can be made available to the perpetrators of the terrorism act. So controlling the funding for an insurgency through sophisticated money laundering and layering detection algorithms may be pointless if the actual money gets exchanged through a non-electronic channel.&lt;br /&gt;&lt;br /&gt;So an agency interested in curbing terrorism, might be better advised in, say, over-investing in trauma care health facilities and emergency services in vulnerable areas. This is so that when a strike does take place, medical help for the people who are affected is close at hand and casualties are minimized.&lt;br /&gt;&lt;br /&gt;Why am I writing all this? Analytical problem solving is not just about fancy statistical algorithms or cool math, it is also about thinking hard about problems and creating their mathematical representations - and then being crystal clear about what those mathematical representations can and cannot do. This is where the systems modeling approach can be a very effective portion of the arsenal of a business modeler.&lt;br /&gt;&lt;p class="MsoNormal"&gt; I'll close out with a couple of links, which prompted this wave of thinking on this subject. One is a paper in the Nature journal where the authors have presented a statistical model of insurgency events. The link is &lt;a href="http://www.nature.com/nature/journal/v462/n7275/full/nature08631.html"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;. It's a gated article.&lt;br /&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;The following &lt;a href="http://www.drewconway.com/zia/?p=1623"&gt;&lt;u&gt;link&lt;/u&gt;&lt;/a&gt; has a very good critique on the article.&lt;br /&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-5012595396469749424?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/5012595396469749424/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=5012595396469749424' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/5012595396469749424'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/5012595396469749424'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/12/place-of-systems-modeling-in-analytics.html' title='The place of Systems Modeling in Analytics'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-7867965049675319205</id><published>2009-12-14T23:21:00.003-05:00</published><updated>2009-12-14T23:30:13.060-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Macroeconomics'/><category scheme='http://www.blogger.com/atom/ns#' term='Computing'/><title type='text'>The new lean economy</title><content type='html'>I have commented earlier (link &lt;a href="http://stat-exchange.blogspot.com/2009/09/productivity-growth-and-never-to-return.html"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;) on the phenomenon of the jobless recovery that the US economy (and definitely many of the more 'open' economies) are likely to be facing in the next few years. Of course, this precludes any serious effort by the government to create jobs through stimulus like efforts - though there is going to be a limit to that as well, given the relation of stimulus efforts to future debt creation.&lt;br /&gt;&lt;br /&gt;In my opinion, the 2008-09 Great Recession has forced companies to seriously evaluate their cost structures. A lot of what passed before has been cut (in the fat bubble years) and companies have begun to realize serious benefits from cutting out fat, leveraging efficiencies at the work place by eliminating redundancies, moving their applications to open source platforms and so on. And my intuition is that many of these changes are not going to be just a reaction to the downturn. Companies are seeing that the quality of output has not significantly suffered because there aren't enough people to do the work, or because the work is no longer being done by expensive software. Thomas Friedman, in a &lt;a href="http://www.nytimes.com/2009/12/13/opinion/13friedman.html"&gt;&lt;u&gt;piece&lt;/u&gt;&lt;/a&gt; in the New York Times, wrote that the Great Recession has also brought about a Great Inflection.&lt;br /&gt;&lt;br /&gt;According to Friedman, the Great Inflection is "the mass diffusion of low-cost, high-powered innovation technologies — from hand-held computers to Web sites that offer any imaginable service — plus cheap connectivity. They are transforming how business is done." Friedman talks about two examples in his piece. One is a small, not-for-profit that needs to create an ad campaign. Given constrained budgets, the need of the hour is to be innovative, but with cost efficiencies firmly in mind. The ad creator uses a mix of collaboration tools (enabled by cheap and high throughput communication), online sourcing (through the availability of online marketplaces for media products) and multimedia editing (enabled by software and hardware improvements) to deliver a solution that is both innovative and appealing as well as one that fits in the client budget.&lt;br /&gt;&lt;br /&gt;The second example, of the furniture manufacturer Ethan Allen, talks about transformations the business has had to make to drive productivity improvements. The transformations have been both traditional: workforce reductions of over 25%, multi&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0"&gt;skilling&lt;/span&gt; of the remaining workers to make them more fungible, consolidation of manufacturing and process engineering. Additionally the company has also adopted other non-conventional means to conserve cash and survive. This has included moving a lot of the advertising activity in-house leveraging the multimedia desktop tools that are available today.&lt;br /&gt;&lt;br /&gt;Finally, Friedman makes a point that the flow of credit (which is still very constrained) would make these companies create jobs. I disagree. I think it is going to take a lot for companies to start hiring again. And when they do, they are going to be the &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_1"&gt;multiskilled&lt;/span&gt; talent that is now constituting the workforce at Ethan Allen. Companies across the spectrum have tasted blood - of keeping productivity and output high and costs low. They are not likely to go back to being fat again anytime soon.&lt;br /&gt;&lt;br /&gt;In Indian banks, I am seeing an increasing phenomenon in the growth of branches. Most big banks are expanding their branch networks - like &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_2"&gt;HDFC&lt;/span&gt; Bank, &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_3"&gt;ICICI&lt;/span&gt; Bank and even the venerable State Bank of India. But the branches increasingly are being staffed at low staff levels. Staff is usually multi-skilled. Specialists are assigned across branches and are mobile. As a customer, if you need any specialized service, the representative at the branch contacts the specialist who then makes an appointment within 24 hours. Instead of having committed staff in every branch, the staffing model comprises fungible generalists allotted to branches and shared, mobile specialists across branches.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-7867965049675319205?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/7867965049675319205/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=7867965049675319205' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/7867965049675319205'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/7867965049675319205'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/12/new-lean-economy.html' title='The new lean economy'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-812479003166830800</id><published>2009-12-08T10:33:00.004-05:00</published><updated>2009-12-08T10:47:22.071-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Financial Regulation'/><category scheme='http://www.blogger.com/atom/ns#' term='Too Big To Fail'/><title type='text'>Too Big To Fail - contd.</title><content type='html'>I have &lt;a href="http://stat-exchange.blogspot.com/2009/11/too-big-to-fail-or-too-scared-to.html"&gt;&lt;u&gt;commented earlier&lt;/u&gt;&lt;/a&gt; on the TBTF doctrine.&lt;br /&gt;&lt;br /&gt;Recently, I came across a couple of other references on the TBTF situation and what to do about it. The &lt;a href="http://blogs.ft.com/maverecon/2009/06/too-big-to-fail-is-too-big/"&gt;&lt;u&gt;first from the FT&lt;/u&gt;&lt;/a&gt; has the author Willem Buiter presenting a slew of solutions on what to do about banks becoming TBTF. (Interesting how this abbreviation seems to have taken on a life of its own!) Definitely worth taking a more detailed read as the author goes to a fair degree of detail on what are some of the probable solutions.&lt;br /&gt;&lt;br /&gt;The second is a novel way of valuing the benefit that banks get from becoming TBTF. The approach (proposed by a couple of economists Elijah Brewer and Julapa Jagtiani from the Philadelphia Fed) argues that the measure of the benefit that banks expected to get could be ascertained by the acquisition premium paid by these very banks along their journey to becoming TBTF. The estimate of this premium (looking at acquisitions from 1991 to 2004) is about $14 billion. This &lt;a href="http://www.philadelphiafed.org/research-and-data/publications/working-papers/2009/wp09-34.pdf"&gt;&lt;u&gt;link&lt;/u&gt;&lt;/a&gt; references the actual paper written by the economists.&lt;br /&gt;&lt;br /&gt;Given my obsession on getting to an optimal risk management framework for financial institutions, I thought I'd share a couple of these links.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-812479003166830800?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/812479003166830800/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=812479003166830800' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/812479003166830800'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/812479003166830800'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/12/too-big-to-fail-contd.html' title='Too Big To Fail - contd.'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-138536474943525872</id><published>2009-12-08T09:36:00.005-05:00</published><updated>2009-12-08T10:20:41.542-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='credit downturn'/><category scheme='http://www.blogger.com/atom/ns#' term='Leverage'/><category scheme='http://www.blogger.com/atom/ns#' term='Simpsons Paradox'/><title type='text'>Interesting reads from Dec 8</title><content type='html'>Interesting reads from the Net&lt;br /&gt;&lt;br /&gt;1. Consumer credit declines for the 9th straight month - &lt;a href="http://www.calculatedriskblog.com/2009/12/consumer-credit-declines-for-9th.html"&gt;&lt;u&gt;link&lt;/u&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;2. NY Fed remarks on lessons from the crisis - &lt;a href="http://www.newyorkfed.org/newsevents/speeches/2009/dud091207.html"&gt;&lt;u&gt;link&lt;/u&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;3. Credit/ Leverage and its role in creating financial crises over the years - &lt;a href="http://www.voxeu.org/index.php?q=node/4349"&gt;&lt;u&gt;link&lt;/u&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I particularly liked this excerpt:&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-style: italic;"&gt;Long-run historical evidence therefore suggests that credit has an important role to play in central bank policy. Its exact role remains open to debate. After their recent misjudgements, central banks should clearly pay some attention to credit aggregates and not confine themselves simply to following targeting rules based on output and inflation.&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;4. The Simpson' paradox always fascinates me. This example uses unemployment rate comparisons between today and the 1981 recession - &lt;a href="http://online.wsj.com/article/SB125970744553071829.html?mod=googlenews_wsj"&gt;&lt;u&gt;link&lt;/u&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;4b. This response by Andrew Gelman talks about when the comparison at the sub-group level is appropriate (when the definitions of the sub-groups being compared between the two samples are robust and more apples-to-apples) and also where the aggregate level is more appropriate (where the definitions have not remained stable - typically happens when the two samples are temporally divided - and therefore any comparison is not necessarily apples-to-apples) - &lt;a href="http://www.stat.columbia.edu/%7Ecook/movabletype/archives/2009/12/simpsons_parado.html"&gt;&lt;u&gt;link&lt;/u&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Holidaying in India and just beginning to recover from the sensory overload (of family, friends, food, the media, the general environment). Really looking forward to the remaining two weeks.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-138536474943525872?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/138536474943525872/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=138536474943525872' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/138536474943525872'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/138536474943525872'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/12/interesting-reads-from-dec-8.html' title='Interesting reads from Dec 8'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-370697494769265885</id><published>2009-11-24T21:22:00.001-05:00</published><updated>2009-11-24T21:24:46.077-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Risk Management'/><category scheme='http://www.blogger.com/atom/ns#' term='Financial Regulation'/><title type='text'>Too Big To Fail or Too Scared To Confront?</title><content type='html'>Back to the blog after a long break. I do need to find a way to become more regular at updating the blog and keeping at expressing my thoughts and ideas.&lt;br /&gt;&lt;br /&gt;What spurred this latest post was a &lt;a href="http://www.prospectmagazine.co.uk/2009/11/how-to-shrink-the-banks/"&gt;&lt;u&gt;decent article&lt;/u&gt;&lt;/a&gt; I read on the Too-Big-To-Fail (TBTF) doctrine. Of course, one is talking about banks. The article goes into details about the cost of propping up the banks and some of the estimates are truly mind-blogging.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: times new roman; font-style: italic;"&gt;According to the Bank of England, governments and central banks in the US, Britain, and Europe have spent or committed more than $14 trillion—the equivalent to roughly 25 per cent of the world’s economic output—to prop up financial institutions. Combined with a global recession, this bailout has undermined the public finances of the developed world.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Another related set of articles is &lt;a href="http://www.economist.com/blogs/freeexchange/rajan_roundtable/"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt; from the Free Exchange blog of the Economist. Raghuram Rajan from the University of Chicago School of Business has contributed some good ideas and a robust discussion of the pros and cons of various options have been presented. As always, there is little effort on the part of the various contributors to synthesize a viewpoint. Rather the tendency is to point out why a specific solution presented may not work.&lt;br /&gt;&lt;br /&gt;To a large extent, I think these ideas miss an important point. The ideas consistently treat financial institutions as rational entities, which seem to operate largely on principles of rational economics, capital theory and other such textbook ideas. The fact of the matter is that management matters. And management is a function of the human beings that take important decisions within these organizations, their incentive structures and also, more broadly, the set of values and identity that seems to motivate these human players.&lt;br /&gt;&lt;br /&gt;And unless we all as stakeholders begin to take notice that the destiny of corporations are driven by the individuals managing those organizations, that the laws of economics are (unlike the laws of nature) created by its human players, we will continue to argue around the margins on window dressing the regulatory system and make no significant progress towards creating a more stable financial system. A stable system by definition is going to provide fewer opportunities to pursue supernormal profits. A stable system needs the suppression of that oldest of human sins, greed. Do have the courage to confront ourselves and seriously consider a slew of workable solutions to fix a broken system? Only time will tell but I am not holding my breath.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-370697494769265885?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/370697494769265885/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=370697494769265885' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/370697494769265885'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/370697494769265885'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/11/too-big-to-fail-or-too-scared-to.html' title='Too Big To Fail or Too Scared To Confront?'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-4408801075979292372</id><published>2009-09-27T12:52:00.003-04:00</published><updated>2009-09-27T12:57:11.718-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Risk Management'/><title type='text'>The Fed's failure in end to end risk management</title><content type='html'>Another one in a series of risk management write-ups. (I guess this is becoming more and more common as this is my full-time job right now.) I came across a recent article in the Washington Post about the malpractices in lending practiced by subprime affiliates of large banks and the reluctance of the Federal Reserve to play an effective regulatory role. The article is &lt;a href="http://www.washingtonpost.com/wp-dyn/content/article/2009/09/26/AR2009092602706.html?hpid=topnews"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The article talks about how the Fed gradually withdrew in its regulatory responsibility on consumer finance companies as these were not "banks". The Fed reduced its oversight of these companies because it believed it did not have the right &lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_nGE-rKiiGEc/Sr-ZVPR_2nI/AAAAAAAACi0/ZCLgTd__KoE/s1600-h/PH2009092602709.jpg"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer; width: 228px; height: 217px;" src="http://1.bp.blogspot.com/_nGE-rKiiGEc/Sr-ZVPR_2nI/AAAAAAAACi0/ZCLgTd__KoE/s320/PH2009092602709.jpg" alt="" id="BLOGGER_PHOTO_ID_5386192269516724850" border="0" /&gt;&lt;/a&gt;jurisdiction to regulate these companies. This was despite a considerable amount of evidence from individuals and other watch-dog bodies that were reporting egregious practices by these institutions. Another big factor that was playing at the time was the good old "markets self-regulate" belief (I was going to say theory and I corrected myself. Maybe I should say, myth.) but I am not going to spend too much time in this post on that.&lt;br /&gt;&lt;br /&gt;Why did the Fed turn its head away from the problem? One of my hypotheses is too much of a reliance on "literalism". The Fed chose to literally interpret its mandate of regulating banks and decided to look no further - even though there were other institutions whose practices were exactly the same as what any bank would do. Literalism is a particular problem I have observed in the US. It is the strong objection to interpret a piece of policy/ law developed years ago in line with the world today. This problem is most commonly seen with respect to the US Constitution and its various amendments. But "literalism" is a problem when it creates blind spots in end-to-end risk management and ends up threatening the viability of the corporation or, as in this case, the entire financial system. &lt;span style="font-weight: bold;"&gt;An effective risk manager is expected to be proactive in identifying gaps in the end-to-end risk management and being open to taking on more responsibility, proposing changes to the system, as needed.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The other problem was that the Fed tended to be influenced more by grand economic theories and conceptual/ philosophical frameworks and decided to discount the data coming up from the ground. According to the article, the Fed tended to discount these pieces of anecdotal evidence as their place within a broader framework or their systematic impact was well-known. This is another problem often with smart people. It is a thinking that goes: &lt;span style="font-style: italic;"&gt;I think and talk in concepts, abstractions and theories. Therefore, I will only listen when other people talk the same way. &lt;/span&gt;Now this is a problem which afflicts many of us, and therefore might be even borderline acceptable in everyday life. But this is fatal in risk management, where your job is to anticipate different ways in which the system can be at risk. &lt;span style="font-weight: bold;"&gt;An effective risk manager is expected to constantly keep her radar up for pieces of information that might be contrary to a pre-existing framework and have an efficient means of investigating whether the anecdotal evidence points to any material threat.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Finally, one important lesson that is worth taking way is that when it comes to human created systems, &lt;span style="font-weight: bold;"&gt;there is no one overarching framework or "truth"&lt;/span&gt;. Because interactions between humans and institutions created by humans are not governed by the laws of physics, there are often no absolutes in these things. Many theories or frameworks could be simultaneously true or may apply in portions of the world we are trying to understand. Depending on the prevailing conditions, one set of rules may hold. And as conditions change or as the previous framework pushes the environment to one extreme, the competing framework often becomes more relevant and appropriate to apply. It is often practical to keep one's mind open to other theories and frameworks. Ideological alignment or obsession with one "truth" system only makes one closed to other explanations or possibilities.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-4408801075979292372?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/4408801075979292372/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=4408801075979292372' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/4408801075979292372'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/4408801075979292372'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/09/feds-failure-in-end-to-end-risk.html' title='The Fed&apos;s failure in end to end risk management'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_nGE-rKiiGEc/Sr-ZVPR_2nI/AAAAAAAACi0/ZCLgTd__KoE/s72-c/PH2009092602709.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-3349848547363539929</id><published>2009-09-22T06:03:00.003-04:00</published><updated>2009-09-22T06:10:43.913-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Things to read series'/><title type='text'>Things that I am reading this morning ...</title><content type='html'>1. Seth Godin's post about building better graphs - Link &lt;a href="http://sethgodin.typepad.com/seths_blog/2009/07/how-to-make-graphs-that-work.html"&gt;here&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;2. Phoenix's light rail success, driven by weekend travelers. That downtown really needed some life and looks like light rail did the trick - Link &lt;a href="http://www.nytimes.com/2009/09/20/us/20rail.html?adxnnl=1&amp;amp;adxnnlx=1253613973-eaBNutNqlHB22HZ1DfDnDw"&gt;here&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;3. A couple of exciting sounding book reviews from The Economist. These are on my library hold list - Link &lt;a href="http://www.economist.com/books/displaystory.cfm?story_id=14401030"&gt;here&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-3349848547363539929?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/3349848547363539929/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=3349848547363539929' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/3349848547363539929'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/3349848547363539929'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/09/things-that-i-am-reading-this-morning.html' title='Things that I am reading this morning ...'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-2136223314412426303</id><published>2009-09-16T21:48:00.002-04:00</published><updated>2009-09-16T21:52:37.123-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Risk Management'/><category scheme='http://www.blogger.com/atom/ns#' term='Recession'/><category scheme='http://www.blogger.com/atom/ns#' term='Modeling'/><title type='text'>A case study in risk management</title><content type='html'>The credit crisis of 2008, or the Great Recession as it is now famous as, has had many many books written on it. Writers from across the ideological spectrum have written about why the crisis occured and how their brand of ideology could have prevented the crisis. Which is why I was skeptical when I came across &lt;a href="http://www.thedailybeast.com/blogs-and-stories/2009-09-14/the-unnecessary-meltdown/"&gt;this piece&lt;/a&gt; which seemed to rehash the story of the collapse of Lehman. I was pleasantly surprised that this article was about one element that has been whispered off and on, but not very convincingly: about risk management based on common sense. (The reader needs to get past the title and the opening blurb, though. The title seems to suggest the credit crisis would have never taken place if Goldman Sachs hadn't spotted the game early enough. That is plain ridiculous. The leveraging of the economy + the decline in lending standards created a ticking time-bomb. But I digress.)&lt;br /&gt;&lt;br /&gt;The article is not about having some fancy risk management metrics or why our models are wrong or why we should not trust a Ph.D that offers to build a model for you. (Of course, all of these elements contributed to why the crisis was ignored for all these years.) Instead, the article recounts a real-life meeting that took place in Goldman Sachs at the end of 2006. The meeting was convened by Goldman CFO, David Viniar, based on some seemingly innocuous happenings. The company had been losing money on its mortgage backed securities for 10 days in a row. The resulting deep-dive into the details of the trades pointed to a sense of unease about the mortgage market. Which then caused Goldman to famously back-off from the market.&lt;br /&gt;&lt;br /&gt;I'll leave the reading to you to get more details of what happened. But some thoughts on what contribute to effective risk management practices.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;- A real-life feel for the business.&lt;/span&gt; You can't be just into the models, you need to be savvy enough to understand how the models you build interact with the real world outside. And it is an appreciation of this interaction that cause the hairs to stand at the back of your neck when you encounter something that just doesn't seem right.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;- Proper role of risk management in the decision making hierarchy.&lt;/span&gt; Effective risk management takes place when the risk governance has the authority to put the brakes on risk takers (i.e., the traders, in this case). In Goldman, there were a number of enablers for this type of interaction to take place effectively. Most importantly, risk management reported to the CFO, i.e. high enough in the corporate heirarchy. Second, investment decisions needed a go-ahead from both the risk takers and risk governance.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;- Mutual respect between risk governance and risk takers.&lt;/span&gt; Goldman encourages a collaborative style of decision making. This allows multiple conflicting opinions to be present at the table. Minority opinions are encouraged and appreciated. Over time, this fosters a culture that genuinely tolerates dissonance of opinions. This also allows the CFO to be influenced by the comptroller group as much as he typically would by the trading group.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;- Finally, a certain intellectual probity to acknowledge what it does not know or understand. &lt;/span&gt;During the meeting, the Goldman team was not able to pinpoint what their source of unease was. But they were able to honestly admit that they didn't really understand what was going on, but that it was also most appropriate to bring the ship to harbour, given their blindspot about what they didn't know. It takes courage to back-off from an investing decision, saying "I don't understand this well enough" in the alpha-male investment banking culture.&lt;br /&gt;&lt;br /&gt;All in all, a really interesting read.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-2136223314412426303?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/2136223314412426303/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=2136223314412426303' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/2136223314412426303'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/2136223314412426303'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/09/case-study-in-risk-management.html' title='A case study in risk management'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-6078115449143209642</id><published>2009-09-10T22:15:00.004-04:00</published><updated>2009-09-10T22:19:42.476-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Macroeconomics'/><category scheme='http://www.blogger.com/atom/ns#' term='Computing'/><title type='text'>Productivity growth and the never-to-return jobs</title><content type='html'>I have talked about the economy in a couple of previous posts. This was here talking about &lt;a href="http://stat-exchange.blogspot.com/2009/06/green-shoots-or-bust.html"&gt;&lt;u&gt;green shoots&lt;/u&gt;&lt;/a&gt;, and about the signs of frailness in the &lt;a href="http://stat-exchange.blogspot.com/2009/06/great-escape-or-great-deception.html"&gt;&lt;u&gt;recovery&lt;/u&gt;&lt;/a&gt;. Over the months of April through August, the life of this blog, news about the economy did seem mixed. The first clear signs that things were beginning to stabilize came around the May timeframe when the drumbeat of negative economic news first started to turn mixed. The jobless claims did not rise as quickly as anticipated, the economy continued to lose jobs but fell off from the rate of close to 0.5 million a month. Around the same time period, existing home sales started to pick up for the first time in more than 2 years and finally in August, the sales pickup translated to rise in prices, for the first time in nearly 2 and a half years.&lt;br /&gt;&lt;br /&gt;Meanwhile, Asia continued to power ahead, creating hope and optimism that it would serve as the engine for the stabilization and subsequent growth of the US economy. But even as sectors such as auto, manufacturing and - in some geographies - retail sales have started to show modest increases, job growth still eludes the economy. Quoting the WSJ blog Real Time Economics, a rough sketch of the numbers looks something like this. Average hours worked is declining at an annual rate of nearly 3%, based on quarterly numbers from earlier this year. This is largely driven by the job cuts, but also by anaemic hiring on part of companies. On the other hand, economic indicators point to the GDP growth returning to its cruising rate of about 2-3% a year. The combination of reduced work hours and economic growth translates to a positive growth for this interesting metric called Labor Productivity. One can therefore expect a productivity jump of nearly 4-6% in the third quarter. And given that incomes are flat, this is going to be good news for corporate profits. Dow at 11,000 by the end of year, anyone?&lt;br /&gt;&lt;br /&gt;It has been said earlier that this looks like the famous jobless recovery that everyone fears. My take on what is going on. The slumping economy has given corporations the leeway to embrace job automation and computer-driven efficiency measures in a pretty radical manner. The people getting eased out are the ones who have enjoyed a successful run at holding down 'Old Economy' jobs in a world which doesn't value these jobs any longer. When the housing bubble was on, the inefficiency of these jobs never surfaced. But as corporate bottomlines are exposed, companies are making do with fewer and more talented people. Employees who are adept at computers and the use of technology and in its power to ruthlessly driven efficiencies.&lt;br /&gt;&lt;br /&gt;For every one of us, this is a sign of how ephemeral our much 'valued' skills are in today's economic reality. A call to action that will be heard by the smart amongst us, but which will also be sadly ignored by many.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-6078115449143209642?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/6078115449143209642/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=6078115449143209642' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/6078115449143209642'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/6078115449143209642'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/09/productivity-growth-and-never-to-return.html' title='Productivity growth and the never-to-return jobs'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-1218191303359218479</id><published>2009-09-07T00:00:00.002-04:00</published><updated>2009-09-07T00:03:13.805-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Markets'/><title type='text'>Markets in everything - buying friends</title><content type='html'>You think you are just too anti-social to cut it in the hypernetworked 21st century? Go and buy your own friends! That's the latest in trying and creating a market for everything.&lt;br /&gt;&lt;br /&gt;Read this &lt;a href="http://www.wired.com/epicenter/2009/09/not-enough-facebook-friends-buy-them/"&gt;&lt;u&gt;link&lt;/u&gt;&lt;/a&gt;. An Australian company will 'sell' you anywhere from a 1000 to 10,000 friends - for a price of course. Creeps me out.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-1218191303359218479?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/1218191303359218479/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=1218191303359218479' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/1218191303359218479'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/1218191303359218479'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/09/markets-in-everything-buying-friends.html' title='Markets in everything - buying friends'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-5179111954101226843</id><published>2009-09-01T23:41:00.000-04:00</published><updated>2009-09-01T23:42:16.109-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Risk Management'/><category scheme='http://www.blogger.com/atom/ns#' term='Knowledge Workers'/><title type='text'>Knowledge-worker roles in the 21st century - 2/2</title><content type='html'>I am going to now talk about the second kind of job, that is going to become increasingly attractive for knowledge workers. In the first type of job, I talked about the advances in computing and communication capabilities and technology that make it extremely attractive for jobs that had been performed hitherto by humans to now be transferred to machines. Does this mean that we are all headed into a world depicted in the Matrix or in the Terminator movies?&lt;br /&gt;&lt;br /&gt;I think not. As these jobs get outsourced, I anticipate a blowback where society discovers that there are certain types of jobs that cannot be handled by computers at all. These are tasks where highly interrelated decisions need to be made, and where the decisions themselves have second-, third- and fourth-order implications. Also, the situations are such that these implications cannot be 'hard-coded' but keep evolving at a rate that make it necessary for the decision maker to not only follow rules but also exercise judgment. These are places where a 'human touch' is required even in a knowledge role. (I say 'even' because knowledge roles by definition should be easier to codify and outsource to computers.)&lt;br /&gt;&lt;br /&gt;One such area that is certainly a judgment based role is risk management. Risk management is anticipating and mitigating different ways in which downside loss can impact a system. Risks can be of two types. One, there are standard 'known' risks whose frequency, pattern of occurence and downside loss impact are comparatively well-known and therefore easier to plan for and mitigate. The second are the unknown risks whose occurency and intensity cannot be predicted. Now any system needs to be set up (if it wants to survive for the long term, that is) to handle both these types of risks. But as you make the system more mechanized to handle the first type of known and predictable risks, it has lesser ability and flexibility to handle the second 'unknown' type of risk.&lt;br /&gt;&lt;br /&gt;This is where the role of an experienced risk manager comes in. A risk manager typically has a fair amount of experience in his space. Additionally, he has the ability to maintain mental models of systems in his head which have multiple interactions and whose impacts span multiple time periods. The role of the risk manager is then to devise a system that works equally effectively against both known and unknown risks. The system needs to be such that standard breakdowns are handled without intervention. At the same time, a dashboard of metrics are created about the system which give visibility into the fundamental relationships underlying the system. And when the metrics point to the underlying fundamentals being stretched to breaking point, that's the point at which the occurence of the unexpected risks becomes imminent. The risk manager then steers the system away from being impacted by the downside implications that can result.&lt;br /&gt;&lt;br /&gt;My role in my industry is a risk management role, and the role has given me the chance to think deeply about risk and failure modes. And it certainly seems clear to me that there will always be room for human judgment and skills in this domain.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-5179111954101226843?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/5179111954101226843/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=5179111954101226843' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/5179111954101226843'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/5179111954101226843'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/09/knowledge-worker-roles-in-21st-century.html' title='Knowledge-worker roles in the 21st century - 2/2'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-4607944203597416146</id><published>2009-08-09T19:03:00.002-04:00</published><updated>2009-08-09T19:05:30.062-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Outsourcing'/><category scheme='http://www.blogger.com/atom/ns#' term='Computing Infrastructure'/><title type='text'>Knowledge-worker roles in the 21st century - 1/2</title><content type='html'>It is rare to find a piece in the media nowadays that doesn't have a certain "view" on the important social and economic issues of today. Underlying every opinion piece is ideology of some sort. Slavish commitment to the ideology results in the writer typically producing such a biased view that it only appeals to those who are already prejudiced with the same view. It has become extremely difficult (especially with the evolution of the Internet and with blogs) to reach an informed and balanced view on a subject by referencing an authoritative piece on the subject.&lt;br /&gt;&lt;br /&gt;Which is why I was pleasantly surprised to come across &lt;a href="http://www.washingtonpost.com/wp-dyn/content/article/2009/08/07/AR2009080702043.html"&gt;&lt;u&gt;this piece&lt;/u&gt;&lt;/a&gt; in the Washington Post today. The piece by Gregory Clark, a professor of economics at UC, Davis presents the view that many of us are afraid to admit. And that is that the US will very soon be forced to confront a reality where the technological advances in the economy today creates its own Haves and Have-Nots. And the chasm between the Haves and Have-Nots would be so huge and so impossible to bridge that the government will be forced to play an equalizing role, so that the social order in society remains more or less intact. So how is new technology creating this chasm? More importantly, for me and my readers, what are the kinds of knowledge-worker jobs that are going to be valued in the twenty-first century?&lt;br /&gt;&lt;br /&gt;The last fifty years of the second millenium have been marked by the emergence of the computer. A machine designed to do millions of logical and mathematical operations in a fraction of a second, the computer has now started to take over a vast majority of the computing and logical thinking that human beings would usually perform. With the ability of the computer (through programming languages) to execute long sequences of operations at high speed, the end-result is a powerful "proxy" intelligence that can be harnessed to do both good and harm. And this proxy-intelligence is taking the place of traditional intelligence; the role performed by human beings in society. And this intelligence comes without moods, expectations of recognition/ praise; in fact, without any kind of the emotional inconsistencies and quirks shown by human beings. No surprise that many of the front-end business processes involving the delivery of basic and transactional services to consumers is being replaced by the computer (such as the ATM machine). With the computer becoming an increasingly integral part of the economy, I see two kinds of jobs that knowledge-workers can embrace in this economy. I am going to cover one of these roles in this post and the second, in the next post.&lt;br /&gt;&lt;br /&gt;The first role is that of the accelarator towards an increasing automation of simple business processes. The cost benefit of the computer over human beings is obvious; however in order for the computer to perform even in a limited way like human beings, detailed instruction sets with logical end-points at each node need to be created. It requires the imagination and creativity of the human mind to do this programming in a really effective manner - i.e. the computer actually being able to do what the human being in the same position would have been able to do. Also, it requires human ingenuity to engineer the machine to do this efficiently - within the desired speed and operating cost constraints. This role of an accelarator or an enabler of the "outsourcing" of hitherto human performed activity to machines will be increasingly in demand over the next 10-15 years.&lt;br /&gt;&lt;br /&gt;This role will require a unique mix of skills. &lt;span style="font-weight: bold;"&gt;First and foremost, the role requires a detailed understanding of business processes,&lt;/span&gt; the roles played by the various players, the inputs and outputs at various stages. The business process understanding needs to span multiple companies and industries. Let's take something that Clark refers in his article: change to a flight reservation. The business process calls for not just access to the reservations database and the flights database, but also things like changing meal options, providing seating information (with information about the aircraft seating chart), reconfirming the frequent flyer account number, etc. Additionally, providing options for payment if there is going to be a fee involved.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Second, the role requires the ability to understand the capabilities of IT platforms and packages to able to perform the desired function.&lt;/span&gt; This role actually has two components. One is the mapping of human actions into the logic understood by a computer system. The second is the system architecture/ engineering side, which is the configuration of the various building blocks (comprised of different IT "boxes" delivering different functionality) to create an end-to-end process delivery capability. Given the lack of standards that exist for these types of solutions, any deep skills in this area involves understanding the peculiarities of specific solutions in a great level of detail.&lt;br /&gt;&lt;br /&gt;I'd love to hear more from readers on this. Have you seen these roles emerging in your industry? What other types of skills does the enabler or accelarator role need?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-4607944203597416146?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/4607944203597416146/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=4607944203597416146' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/4607944203597416146'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/4607944203597416146'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/08/knowledge-worker-roles-in-21st-century.html' title='Knowledge-worker roles in the 21st century - 1/2'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-4186829537955081755</id><published>2009-08-03T23:53:00.003-04:00</published><updated>2009-08-03T23:55:57.953-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Exploratory Data Analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='statistical inference'/><title type='text'>Why individual level data analysis is difficult</title><content type='html'>I recently completed a piece of data analysis using individual level data. The project was one of the more challenging pieces of analysis I have undertaken at work, and I was (happily, for myself and everyone else who worked on it) able to turn it into something useful. And there were some good lessons learned at the end of it all, which I want to share in today's post.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;So, what is unique and interesting about individual level data and why is it valuable?&lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt; &lt;/span&gt;- With any dataset that you want to derive insights from, there are a number of attributes about the unit being analyzed (individual, group, state, country) and one or more attributes that you are trying to explain. Let's call the first group predictors and the second group target variables. Individual level data has a much wider range of predictor and target variables. There is also a much wider range of interactions between these various predictors. For example, while on an average, older people tend to be wealthier, individual level data reveals that there are older people who are broke and younger people who are millionaires. As a result of these wide ranges of data and the different types of interactions between these variables (H-L, M-M, M-H, L-H ... you get the picture), it is possible to understand fundamental relationships between the predictors and the targets and interactions between the predictors. Digging a little deeper into the people vs wealth data, what this might tell you is that what really matters for your level of wealth is your education levels, the type of job you do, etc. This level of variation is not available with the group level data. In other words, the group level data is just not as rich.&lt;br /&gt;- Now, along with the upside comes downside. The richness of the individual level predictors means that data occassionally is messy. What is messy? Messy means having wrong values at an individual level, sometimes missing or null values at an individual level. At a group level, many of the mistakes average themselves out, especially if the errors are distributed evenly around zero. But at the individual levels, the likelihood of errors has to be managed as part of the analysis work. With missing data, the challenge is magnified. Is missing data truly missing? Or did it get dropped during some data gathering step? Is there something systematic to missing data, or is it random? Should missing data be treated as missing or should it be imputed to some average value? Or should it be imputed to a most likely value? These are all things that can materially impact the analysis and therefore should be given due consideration.&lt;br /&gt;&lt;br /&gt;Now to the analysis itself. What were some of my important lessons?&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;- Problem formulation has to be crystal clear and that in turn should drive the technique.&lt;/span&gt;&lt;br /&gt;Problem formulation is the most important step of the problem solving exercise. What are we trying to do with the data? Are we trying to build a predictive model with all the data? Are we examining interactions between predictors? Are we studying the relationship between predictors one at a time and the target? All of these outcomes require different analytical approaches. Sometimes, analysts learn a technique and then look around for a nail to hit. But judgment is needed to make sure the appropriate technique is used. The analyst needs to have the desire to learn to use an technique that he/she is not aware of. By the same token, discipline to use a simpler technique where appropriate.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt; - Spending time to understand the data is a big investment that is completely worth it.&lt;/span&gt;&lt;br /&gt;You cannot spend too much time understanding the data. Let me repeat that for effect. You cannot spend too much time understanding the data. And I have come to realize that far from being a drudge, understanding the data is one of the most fulfilling and value added pieces of any type of analysis. The most interesting part of understanding data (for me) is the sheer number of data points that are located so far away from the mean or median of the sample. So if you are looking at people with mortgages and the average mortgage amount is $150,000, the number of cases where the mortgage amount exceeds $1,000,000 lends a completely new perspective of the type of people in your sample.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt; - Explaining the results in a well-rounded manner is a critical close-out at the end.&lt;/span&gt;&lt;br /&gt;The result of a statistical analysis is usually a set of predictors which have met the criteria for significance. Or it could be a simple two variable correlation that is above a certain threshold. But whatever be the results of the analysis, it is important to base the analysis result in real-life insights that can be understood by the audience. So, if the insight reveals that people with large mortgages have a higher propensity to pay off their loans, further clarification will be useful around the income level of these people, their education levels, the types of jobs they hold, etc. All these ancillary data points are ways of closing out the profile of the "thing" that has been revealed by the analysis.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt; - Speed is critical to get the results in front of the right people before they lose interest.&lt;/span&gt;&lt;br /&gt;And finally, if you are going to spend a lot of everyone's (and your) precious time doing a lot of the above, the results need to be driven in extra-short time for people to keep their interest in what you are doing. In today's information-saturated world, it only takes the next headline in the WSJ for people to start talking about something else. So, you need to basically do the analysis in a smart manner, and also it needs to be super-fast. Delivered yesterday, as the cliche goes.&lt;br /&gt;&lt;br /&gt;In hindsight, it gives me an appreciation of why data analysis or statistical analysis using individual level data is one of the more challenging analytical exercises. And why it is so difficult to get it right.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-4186829537955081755?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/4186829537955081755/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=4186829537955081755' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/4186829537955081755'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/4186829537955081755'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/08/why-individual-level-data-analysis-is.html' title='Why individual level data analysis is difficult'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-1630365672469752686</id><published>2009-07-21T07:23:00.003-04:00</published><updated>2009-07-21T07:31:43.374-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Data visualization'/><category scheme='http://www.blogger.com/atom/ns#' term='Correlation'/><category scheme='http://www.blogger.com/atom/ns#' term='Causality'/><title type='text'>More data visualization - this time about books</title><content type='html'>Ever wonder where the proof was about reading ..ummm, erotica being bad for you. Here it is. Check this &lt;a href="http://booksthatmakeyoudumb.virgil.gr/"&gt;&lt;u&gt;link&lt;/u&gt;&lt;/a&gt; out.&lt;br /&gt;&lt;br /&gt;An interesting study was done that went somewhat like this.&lt;br /&gt;- Get the ten most frequent "favorite books" at every college using the college's Network Statistics page on Facebook. Possibly these books represent the intellectual calibre of the college.&lt;br /&gt;- Get their SAT/ACT scores for the colleges.&lt;br /&gt;- You can now get a relationship between types of book read and scholastic achievement&lt;br /&gt;&lt;br /&gt;The results are pretty impressive, though still somewhat dubious. According to the study, Classics is usually good for you (agree with that), Erotica is way bad. Controversially, so is African-American literature and chick-lit. In the link, check out the visual that stacks the book by genre.&lt;br /&gt;&lt;br /&gt;Make what you want about this, but be careful between causality and correlation.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-1630365672469752686?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/1630365672469752686/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=1630365672469752686' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/1630365672469752686'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/1630365672469752686'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/07/more-data-visualization-this-time-about.html' title='More data visualization - this time about books'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-8187249080842749014</id><published>2009-07-18T15:55:00.006-04:00</published><updated>2009-07-18T16:26:01.617-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Data visualization'/><category scheme='http://www.blogger.com/atom/ns#' term='Exploratory Data Analysis'/><title type='text'>Data visualization</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_nGE-rKiiGEc/SmIvZorNsuI/AAAAAAAACVU/8e3NzvyU7yA/s1600-h/junetemps.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 240px;" src="http://4.bp.blogspot.com/_nGE-rKiiGEc/SmIvZorNsuI/AAAAAAAACVU/8e3NzvyU7yA/s320/junetemps.gif" alt="" id="BLOGGER_PHOTO_ID_5359898623986217698" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;An example of a really well-done graphic is from the NOAA website. Science and particularly math afficionados seem to have a particular affinity to following weather science. (I am wondering whether it is a visceral reaction to global warming naysayers who, the scientists think, are possibly insulting their learning.)&lt;br /&gt;&lt;br /&gt;The graph is a world temperature graph and this type of graph has come in so many different forms, it is difficult not to have seen such a graph.  What I like about this is the elegant and non-intrusive form in which the overlays are done.&lt;br /&gt;• By using dots and varying the size of the dots, the creator of the graph is making sure that the underlying geographic details (important in a world map where there is great detail that needs to be captured in a small area, therefore you cannot use very thick lines for country borders) still come through.&lt;br /&gt;• The other thing that I liked is some of the simplications the creator has made. The dots are equally spaced but I am pretty sure that’s exactly not how the data was gathered. But to tell the story, that detail is not as important.&lt;br /&gt;&lt;br /&gt;The graphic came from Jeff Masters' weather blog which is one of the best of its kind. Here's a &lt;a href="http://www.wunderground.com/blog/JeffMasters/show.html"&gt;&lt;u&gt;link&lt;/u&gt;&lt;/a&gt; if you are interested.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-8187249080842749014?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/8187249080842749014/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=8187249080842749014' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/8187249080842749014'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/8187249080842749014'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/07/data-visualization.html' title='Data visualization'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_nGE-rKiiGEc/SmIvZorNsuI/AAAAAAAACVU/8e3NzvyU7yA/s72-c/junetemps.gif' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-8947075886156966096</id><published>2009-07-15T23:01:00.004-04:00</published><updated>2009-07-15T23:47:43.838-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Technology'/><category scheme='http://www.blogger.com/atom/ns#' term='Physics'/><title type='text'>Two great finds for physics fans</title><content type='html'>Back after a long break in the posts. Call it a mixture of home responsibilities, writer's block and just some plain old laziness.&lt;br /&gt;&lt;br /&gt;One of my other interests (apart from statistics and social science) is physics and technology. I really enjoy reading about emerging applications of technology in various spheres of social and economic importance. The Technology Quarterly of the Economist is one of my treasured reads (though I end up reading very little of it, because of me wanting to leave aside "quality time" to do the reading).&lt;br /&gt;&lt;br /&gt;I want to share two recent finds in the science and physics space. One is a really good book called "&lt;a href="http://www.amazon.com/Great-Equations-Breakthroughs-Pythagoras-Heisenberg/dp/039306204X"&gt;&lt;u&gt;The Great Equations&lt;/u&gt;&lt;/a&gt;" by Robert Crease. The book covers ten of the seminal equations in physics and basically spins a story around how the equation formulator came about to creating the equation. There is usually a little mathematical proof behind the story usually, but most of the book is about the professional journey made by the scientist from an existing view of the world (or an older paradigm, to be more exact) to a new paradigm. And the paradigm is usually encapsulated in the form of an equation.&lt;br /&gt;&lt;br /&gt;I found a couple of aspects about the journey extremely interesting. One, it was fascinating to have a window into the minds of physics greats (Newton, Maxwell, Einstein, Schrodinger, to name a few) and see how they synthesized the various different world views around them to create or arrive at their respective equations. &lt;span style="font-weight: bold; font-style: italic;"&gt;The ability to deal with all the complexity of observed phenomena, the different philosophies and world views and to come up with something as elegant as a great equation&lt;/span&gt;, that defines genius for me. The second aspect that I found extremely interesting was that there was usually years and years of experimentation or mathematical work that preceded arriving at the great equation. One might be inclined to think that the great equations (given their utter simplicity) happen through a flash of inspiration. Nothing could be further from the truth.&lt;br /&gt;&lt;br /&gt;The next find were the Feynman lectures. Now, many of us have read some of the Feynman lectures or have seen the lectures on a place like Youtube. But how cool would it be to have these lectures be annotated by Bill Gates? Check this &lt;a href="http://research.microsoft.com/apps/tools/tuva/index.html"&gt;&lt;u&gt;link&lt;/u&gt;&lt;/a&gt; out at the Microsoft Research website. And happy watching!&lt;br /&gt;&lt;br /&gt;I am guessing this blog has a fair share of aspiring or one-time physics and engineering fans. How do you keep your engineering bone tickled? I'd love to hear your pet indulgences.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-8947075886156966096?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/8947075886156966096/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=8947075886156966096' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/8947075886156966096'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/8947075886156966096'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/07/two-great-finds-for-physics-fans.html' title='Two great finds for physics fans'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-7731143120899794129</id><published>2009-07-08T00:02:00.003-04:00</published><updated>2009-07-08T00:10:36.902-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='credit downturn'/><category scheme='http://www.blogger.com/atom/ns#' term='Macroeconomics'/><title type='text'>Market chills</title><content type='html'>I have argued in a number of recent posts: &lt;a href="http://stat-exchange.blogspot.com/2009/06/great-escape-or-great-deception.html"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt;, &lt;a href="http://stat-exchange.blogspot.com/2009/06/green-shoots-or-bust.html"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt; and &lt;a href="http://stat-exchange.blogspot.com/2009/05/macro-economic-indicators-good-source.html"&gt;&lt;u&gt;here&lt;/u&gt;&lt;/a&gt; that we are nowhere close to the bottom when it comes to this economic downturn. The jobless numbers are back to sliding downwards at an accelerated pace after one month of deceleration.&lt;br /&gt;&lt;br /&gt;And the markets seem to have caught the chills.&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_nGE-rKiiGEc/SlQbiNBiBYI/AAAAAAAACU8/34esZagnft4/s1600-h/Sp.png"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer; width: 251px; height: 141px;" src="http://1.bp.blogspot.com/_nGE-rKiiGEc/SlQbiNBiBYI/AAAAAAAACU8/34esZagnft4/s320/Sp.png" alt="" id="BLOGGER_PHOTO_ID_5355936131275949442" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;We discussed this at work a few months back. Someone who is very well-respected in banking circles and who has seen a few past recessions called out that you can tell that a recovery is underway when there is a sustained period where the indicators yo-yo between good and bad news. We seem to be entering this phase now.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-7731143120899794129?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/7731143120899794129/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=7731143120899794129' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/7731143120899794129'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/7731143120899794129'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/07/market-chills.html' title='Market chills'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_nGE-rKiiGEc/SlQbiNBiBYI/AAAAAAAACU8/34esZagnft4/s72-c/Sp.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-5093194798297067914</id><published>2009-07-03T19:47:00.003-04:00</published><updated>2009-07-03T19:52:44.356-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Research'/><category scheme='http://www.blogger.com/atom/ns#' term='Modeling'/><title type='text'>Best Coffee Survey and a research methodology question</title><content type='html'>A recent Zagat survey rated the best coffee in the US. The best coffee rating went (expectedly, I guess) to Starbucks. Even though I have had better coffee at other places, I guess Starbucks combines great coffee with ubiquitous presence and therefore ends up getting the top rating. Now, I think Starbucks coffee is good and the baristas are extremely friendly, but in terms of pure coffee flavour, I would rate Panera's Hazelnut coffee higher. Also some of the Kona coffees that you find at places like WaWa are also really good. Any kind of place serving Jamaica's Blue Mountain coffee will obviously be great. So what makes Starbucks special? Are there other factors at play beyond the pure taste of the coffee.&lt;br /&gt;&lt;br /&gt;One hypothesis is that the national-level presence of Starbucks could be contributing towards the voting going for Starbucks. In places where Starbucks has to compete with other chains like Peets (San Francisco) and Dunkin Donuts (New England), comparative ratings between Starbucks and other chains shows a narrower gap. In places where Starbucks has not competition however, it is likely to get disproportionately good ratings.&lt;br /&gt;&lt;br /&gt;Let us say you are one of the contributors in the survey and are in St.Louis, MO. The competition for Starbucks in St.Louis is likely to be (I guess) the burnt robusta coffee at the local restaurant. In such a market, Starbucks will enjoy a clear advantage, both for the quality of the coffee as well as the ambience. So, let's say, you had to rate Starbucks on a scale of 1-5. It is likely you would give Starbucks a 4-5 in a non-competitive market, such as St.Louis, in the absence of valid benchmarks or competition to compare against. In a competitive market dominated by multiple brands, the difference between Starbucks and other brands is likely to be narrower. Also, the assertion can also be made that a more discerning audience (having had the opportunity to sample multiple chains) is less likely to give extremely high scores (4s and 5s) to any of the choices under consideration.&lt;br /&gt;&lt;br /&gt;Therefore, the sampling design and the analysis methodology becomes extremely critical for surveys around this. To avoid the "no-competition" bias, there could a number of questions a market research analyst would need to ask herself:&lt;br /&gt;1. Should we use only data points from places where there are multiple chains in the same geography? (Doesn't sound fair. We will be throwing away data, which a lot of sensible people have explained is a cardinal sin. We should probably weight the information in some way).&lt;br /&gt;2. Should we consider data for the analysis only where a person has provided ratings about multiple chains voluntarily or penalize when people have not rated a chain that could have been rated?&lt;br /&gt;3. Or are there modeling solutions available to manage this conundrum? Topic of my next post!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-5093194798297067914?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/5093194798297067914/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=5093194798297067914' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/5093194798297067914'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/5093194798297067914'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/07/best-coffee-survey-and-research.html' title='Best Coffee Survey and a research methodology question'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-460712333126619150</id><published>2009-06-28T00:31:00.003-04:00</published><updated>2009-06-28T00:36:58.691-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Research'/><category scheme='http://www.blogger.com/atom/ns#' term='Prioritization'/><title type='text'>Research funding or Why we have still haven't found a cure for cancer</title><content type='html'>Cancer has ben known to medicine since the time of Hippocrates. And modern medicine has known and studies causes of cancer since the mid-18th century. In 1971, Nixon announced a project to create a cure for cancer and (a la Kennedy, with regard to the moon mission) announced a definite cure in the next five years. Today, nearly 40 years and $105 billion dollars of public investment later (the private investment can be considered to be at least a significant fraction the public investment), we are no closer to finding a cure. In fact, after adjusting for age and size of the population, the cancer death rate has &lt;a href="http://www.nytimes.com/2009/04/24/health/policy/24cancer.html"&gt;&lt;u&gt;dropped&lt;/u&gt;&lt;/a&gt; by only 5% in the last 50 years. Compare this with nearly 60% drop in the death rates of diseases like pneumonia and influenza. Why is this the case?&lt;br /&gt;&lt;br /&gt;Part of the reason is that cancer has multiple causes and we are not really sure about the true causal linkage between the various factors and the cancer cells misbehaving. Environmental factors cause some types, exposure to radioactivity causes other types, tobacco is a well-known factor causing mouth and lung cancer and there are viruses that cause still some other types. The common thread linking all of these causal factors and the various different types of cancers they cause is difficult to isolate. And therefore while we continue to make some improvements around the margin (getting people to live for a few additional months or years), a true cure has been elusive.&lt;br /&gt;&lt;br /&gt;But another likely cause is the way in which various research funding agencies have made investment prioritization decisions. The funds have invariably gone to small-budget, incremental improvement type projects which are usually along previously established avenues of inquiry. The truly innovative approaches and especially the risky (from a success of the project standpoint) proposals have seldom obtained funding. The process developed to identify research subjects have been good at avoiding funding truly bad research. However, by the same token, they have continued to fund projects that are conventional and low risk and as a result, only contributing to marginal improvements. A &lt;a href="http://www.nytimes.com/2009/06/28/health/research/28cancer.html"&gt;&lt;u&gt;recent article&lt;/u&gt;&lt;/a&gt; in the New York Times sheds some more light on to this topic.&lt;br /&gt;&lt;br /&gt;My view is that this is quite a common problem (sub-optimal funds allocation) when funds are limited. This is not only true for cancer research in particular or any other form of medical research in general, but even in the financial services industry that I am part of. The funds allocation agency feels pressure not to waste the limited funds and also to make sure that the maximum amount of research projects get the benefits of the fund. Therefore the push to fund projects that are from proven areas and are set up to make incremental improvements to the areas. Also, this leads to a tendency to parcel the funds and distribute small quantities into a large number of projects, While what they should be paradoxically doing (given the shortage of funds) is to make the bold bet and fund those areas which may not be as proven but show the highest promise for overall success. Again, this happens more commonly than in the field of cancer research.&lt;br /&gt;&lt;br /&gt;Challenging the financial budget, the status-quo way of thinking is not easy to do. There will be people who will say No and be discouraging, rarely because them have something to lose but mostly because the tendency is to play safe. People usually do not get fired for taking the safe, conventional-wisdom driven decisions. It is the risk-takers that get panned if the risks do not play out as expected.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-460712333126619150?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/460712333126619150/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=460712333126619150' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/460712333126619150'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/460712333126619150'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/06/research-funding-or-why-we-have-still.html' title='Research funding or Why we have still haven&apos;t found a cure for cancer'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-1719062149762278250</id><published>2009-06-25T22:33:00.003-04:00</published><updated>2009-06-25T22:41:35.457-04:00</updated><title type='text'>Why I blog</title><content type='html'>I have been regular at maintaining the blog for the past month or so. My feelings have been mixed so far. On the one hand, it is an effort to keep up the writing effort day after day. One of my goals is to make sure that the blog remains fresh for future readers. And the freshness of the blog is to totally a function of keeping up the effort of adding new and interesting material. With my day job and with the challenges of keeping up with the ever growing demands of our 5-month old, writing is always not easy.&lt;br /&gt;&lt;br /&gt;But being a glass-half-full kind of guy, what has this exercise brought me?&lt;br /&gt;&lt;br /&gt;For starters, it has got me to start writing again. I am a firm believer that writing is a great way of organizing your thoughts and making them more logical and structured. And it is a habit that I had at one time, lost at some point and am keen on regaining again. Communication is an important skill in today's world. With all the clutter, media generated noise, terabytes of data and messages flowing back and forth, the Internet driven distractions, it is important to cut through the clutter and reach out powerfully with one's words to make a difference. Gandhi had a difficult enough time getting his word out to millions of his countrymen and getting them united against the British. But that was close to a century back. Imagine Obama's difficulty in getting his thoughts out to people in today's hyper-information age. And the way you get better at communication is by keeping at it through weekdays and weekends, through work deadlines and daughter's shrieks of excitement. Hopefully, I am getting better at this stuff.&lt;br /&gt;&lt;br /&gt;The other big positive for me is that I am beginning to learn a lot more and at a much faster pace on my professional interest, math models and statistical inference. As Hal Varian, chief economist at Google has &lt;a href="http://www.mckinseyquarterly.com/Hal_Varian_on_how_the_Web_challenges_managers_2286"&gt;&lt;u&gt;famously remarked&lt;/u&gt;&lt;/a&gt;, the statistician job is going to be the sexy job for the next ten years. And this field is evolving so rapidly that it is extremely critical to keep updating one's knowledge and skills and remaining ahead of the curve. In order for me to provide a stream of meaningful material for the audience of this blog, I have had to spend a good amount of my time reading and updating my own knowledge base. Just last morning, I managed to read an interesting article on multi-level modeling. This took me to a &lt;a href="http://www.cmm.bristol.ac.uk/index.shtml"&gt;&lt;u&gt;web-site&lt;/u&gt;&lt;/a&gt; dedicated to multi-level modeling at the University of Bristol. And the lecture notes in turn made me aware of some of the ways I could tackle some ticklish problems at work. (Look at this &lt;a href="http://www.cmm.bristol.ac.uk/learning-training/videos/index.shtml"&gt;&lt;u&gt;really cool lecture&lt;/u&gt;&lt;/a&gt;. It is a video link and needs Internet Explorer as the browser.) I have become much more aware of the latest problems and solution kits out there in the last month, than what I learnt in the past several years. A huge plus for me.&lt;br /&gt;&lt;br /&gt;So all-in-all, I am hoping to learn something out of all this and make at least a fractional improvement to what I want to add to the world. And hopefully keep my audience engaged and interested in the stuff I write. My writings are clearly not meant for the masses, I don't have any such hopes! The people who are likely to like my writing are going to be similar to me: numbers-obsessed, math-loving and tech savvy geeks. And if I can make a fraction of a difference to my readers as this exercise is making for me, I will be a happy blogger.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-1719062149762278250?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/1719062149762278250/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=1719062149762278250' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/1719062149762278250'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/1719062149762278250'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/06/why-i-blog.html' title='Why I blog'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-8030017127469499177</id><published>2009-06-22T22:57:00.004-04:00</published><updated>2009-06-22T23:05:00.799-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Macroeconomics'/><category scheme='http://www.blogger.com/atom/ns#' term='Recession'/><title type='text'>... the Great Escape? Or the Great Deception?</title><content type='html'>In an earlier &lt;a href="http://stat-exchange.blogspot.com/2009/06/green-shoots-or-bust.html"&gt;&lt;u&gt;post&lt;/u&gt;&lt;/a&gt;, I commented on the now famous "Green Shoots" of recovery but the very real long term threats to continued economic growth. It turns out that the "so-called" economic recovery seems to be more of a financial market recovery. Conventional wisdom goes that the financial markets turnaround precedes the real economy turnaround by about 6 months. Early signs did point to this phenomenon. Market indices in both emerging markets and the developed markets showed smart 30%+ growths in the last 3 months. Corporate bond offerings began to surge and even below investment grade offerings jumped up (and were well subscribed) in June.&lt;br /&gt;&lt;br /&gt;However, some temperance seem to have set in of late. Emerging market indices like the Sensex and the Hang Seng are at least about 10-15% down from their early June peaks. Likewise with the DJIA. The steady upper trend seen for the best part of the last 8 weeks seems to have been interrupted. The yield on 10-year US treasuries had gone up to nearly 4% but is not trended back down to about 3.5%, basically signalling that everything is not as hunky-dory as we expected. There is still a high demand for quality (the irony of it all is that quality is denoted by US treasuries!). The &lt;a href="http://www.economist.com/businessfinance/displaystory.cfm?story_id=13871130"&gt;&lt;u&gt;Economist&lt;/u&gt;&lt;/a&gt; states that all economic indicators have not magically turned to positive, which is what one would expect if the markets and the media are to be believed. According to the Economist,&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-style: italic;"&gt;The June Empire State survey of manufacturing activity in New York showed a retreat. German export figures for April showed a 4.8% month-on-month fall. The latest figures for American and euro-zone industrial production showed similar dips. American raw domestic steel production is down 47% year on year; railway traffic in May was almost a quarter below its level of a year earlier. Bankers say that chief executives seem a lot less confident about the existence of “green shoots” than markets are.&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;We shouldn't be either. For a bunch of reasons.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;1. Losses are nowhere close to bottoming out.&lt;/span&gt; Expectations for large credit defaults amongst corporates is expected to be higher than 11% for 2009 and continue to remain there for 2010.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;2. At the individual level, unemployment is showing no signs of abating.&lt;/span&gt; There was a good article in the Washington Post today on how the economic recovery seems to be taking place in the absence of jobs. Check this &lt;a href="http://www.washingtonpost.com/wp-dyn/content/article/2009/06/21/AR2009062101859.html"&gt;&lt;u&gt;link&lt;/u&gt;&lt;/a&gt; out. Unemployment is expected to be north of 10% and remain there for a good part of 2009 and into 2010. Unemployment is closely linked with the consumer confidence number and therefore any sluggishness in the job market is going to impact consumer spending and therefore further impact the rate of recovery of the economy.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;3. Emerging markets were the promised land for the world economy, not not any longer.&lt;/span&gt; The markets don't seem to think so however. Indian economic growth is expected to be the slowest in the past 6 years. With much more fragile safety nets in the Asian economic tigers, these economies are going to be even more careful while navigating out of the downturn.&lt;br /&gt;&lt;br /&gt;In short, a long haul seems clear. Also seems clear is a fundamental remaking of industries as a whole. Financial services, automobiles and potentially health-care are industries where a new business model is ripe for discovery. This should create many more opportunities for the data scientist, the topic of my next post.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-8030017127469499177?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/8030017127469499177/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=8030017127469499177' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/8030017127469499177'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/8030017127469499177'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/06/great-escape-or-great-deception.html' title='... the Great Escape? Or the Great Deception?'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-7165469950138113193</id><published>2009-06-20T07:09:00.006-04:00</published><updated>2009-06-20T07:43:50.161-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Simulations'/><category scheme='http://www.blogger.com/atom/ns#' term='Decision Making'/><category scheme='http://www.blogger.com/atom/ns#' term='Stress Testing'/><category scheme='http://www.blogger.com/atom/ns#' term='Modeling'/><title type='text'>Monte Carlo simulations gone bad</title><content type='html'>In my &lt;a href="http://stat-exchange.blogspot.com/2009/06/stress-testing-your-model-part-33.html"&gt;&lt;u&gt;series&lt;/u&gt;&lt;/a&gt; on stress testing models, I concluded with Monte Carlo simulations as a way of understanding the set of outcomes a model can produce and being able to handle a wide set of inputs without breaking down. However, Monte Carlo simulations can be done in ways that at best, are totally useless and at worst, can produce highly misleading outcomes. I want to discuss some of these breakdown modes in this post.&lt;br /&gt;&lt;br /&gt;So, (drumroll), top Monte Carlo simulation fallacies I have come across.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;1. Assuming all of the model drivers are normally distributed&lt;/span&gt;&lt;br /&gt;Usually the biggest fallacy of them all. I have seen multiple situations where people have merrily assumed that all drivers are normally distributed and hence can be modeled as such. In most events in nature, heights and weights of human beings, sizes of stars, it is fair to expect and find distributions that are normal or even close to normal. However, not so with business data. Because of the influence of human beings, business data tends to get pretty severely attenuated at places and stretched out at some other places. Now, there are a number of other important distributions to consider (which will probably form part of another post sometime), but assuming all distributions are normal is pure bunkum. But this is usually a rookie mistake! Move on to ...&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;2. Ignoring the probabilities of extreme tail events&lt;/span&gt;&lt;br /&gt;Another quirk of business events is the size and frequency of tail events. Tail events astound us frequently with both their size and their frequency. Just when you thought Q4 08's GDP drop of close to 6% is a once-a-100-years event, it then goes and repeats itself in the next quarter. Ergo, with 10% falls in market cap in a day. Guess what you see the next trading day! Short advise is, be very afraid of things that happen in the tails. Because these events occur so infrequently, distributions are usually misleading in this space. So if you are expecting your model to tell you when things go bump at night, you will be in for a rude shock when they actually go bump. But why go to the tails when there are bigger things that lurk in the main body of the distribution, such as...&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;3. Assuming that model inputs are independent&lt;/span&gt;&lt;br /&gt;Again, this is another example of a lazy assumption. People make these assumptions because they are obsessed with the tool at hand and its coolness-coefficient and cannot be bothered to use their heads and use the tool to solve the problem at hand. I am going to have a pretty big piece on lazy assumptions soon. (One of my favourite soap-box items!) When people run Monte Carlo simulations, the assumptions and inputs to the model are usually correlated to each other to different degrees. This means that the distributions of outcomes that you get at the end are going to crunched together (probability-density wise) at some places and are going to be sparse at some other places. But assuming a perfectly even distributions on either side of the mean is really not the goal here. The goal is to get as close an approximation of real-life distributions as possible. But then if only things were that simple! Now, you could be really smart and get all of the above just right and build a really cool tool. You could then get into the fourth fallacy of thinking ...&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;4. That it is about the distribution or the tool, it is NOT! It is about what you do with the results of the analysis&lt;/span&gt;&lt;br /&gt;The Monte Carlo simulation tool is indeed just that, a tool. The distributions produced at the end of running the tool are not an end in themselves, they are an aid to decision making. In my experience, a well-thought out decision making framework needs to be created to make use of the distribution outputs. The decision-making framework could go something as follows. Let's take a framework to evaluate investment decisions, that uses NPV. One framework could be: I will make the investment only if a.) the mean NPV I can make is positive, and b.) less than 20% of the outcomes are negative NPV, and c.) less than 5% of the outcomes are negative NPV of less than $50 million. There's really no great science in coming up with these frameworks, but it has to be something that the decision maker is comfortable with and it should address uncertainty in outcomes.&lt;br /&gt;&lt;br /&gt;So, have you come across some of these fallacies in your work? How have you seen the Monte Carlo tool used and misused in your work? And what decision making frameworks (if any) were allied with this tool to drive good decisions?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-7165469950138113193?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/7165469950138113193/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=7165469950138113193' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/7165469950138113193'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/7165469950138113193'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/06/monte-carlo-simulations-gone-bad.html' title='Monte Carlo simulations gone bad'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-195773421647549839</id><published>2009-06-17T22:33:00.005-04:00</published><updated>2009-06-18T08:34:02.966-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='bias'/><category scheme='http://www.blogger.com/atom/ns#' term='statistical inference'/><title type='text'>Why sitting down and talking does not help - (and what it means for the data scientist)</title><content type='html'>What is the sane person's advise to two people who cannot agree on something? It is usually sit down and resolve their differences. A set of recent recent studies seem to suggest that it doesn't really help.&lt;br /&gt;&lt;br /&gt;Some recent studies have shown that when people with strong opposing positions are put together to talk it out, it makes them even more entrenched in their opinions. This is the point put forward by Cass Sunstein in his book &lt;a href="http://www.amazon.com/Going-Extremes-Minds-Unite-Divide/dp/0195378016"&gt;&lt;u&gt;Going to Extremes&lt;/u&gt;&lt;/a&gt;. Even when the groups/ people with opposing views are presented objective evidence, people tend to "see" what they want to believe in the data and ignore the rest.&lt;br /&gt;&lt;br /&gt;Another  &lt;a href="http://www.economist.com/sciencetechnology/displayStory.cfm?story_id=13815141"&gt;&lt;u&gt;study&lt;/u&gt;&lt;/a&gt; cited by the Economist struck a similar message. The study was looking at self-help books which stress positive thinking, and their impacts on people. What the study found was that positive thinking only helps for people who are predisposed to thinking positively. The study can be credited to Joanne Wood of the University of Waterloo in Canada and her colleagues. The researchers report in Psychological Science journal that when people with high self-esteem are made to repeat positive reinforcing messages, they do tend to take more positive positions (on standardized tests) than people who do not repeat positive reinforcing messages.&lt;br /&gt;&lt;br /&gt;So far so good. It sounds as though positive reinforcing helps. But when the test was done on people with low self-esteem, the results were quite the opposite. People who repeated the positive reinforcing message took less positive positions than the ones that did not repeat the message. So it seems to imply that positive reinforcement actually hurts when applied to people who are inclined to believe otherwise. For me, this sounds like another example of people entrenching towards their own biases. When people with entrenched positions are forced to take a contrary position (or look at objective data), they tend to entrench even further on their original positions.&lt;br /&gt;&lt;br /&gt;So what are the implications for the data scientist from all of this?&lt;br /&gt;Mostly that predisposed positions produce a sort of "blindness" to objective data. We all suffer from confirmation bias, we like to believe what we like to believe. It is therefore a great effort for us to actively look at data objectively and take what that data is telling us, vs. putting the appropriate spin that suits us. The data scientist needs to exercise tremendous discipline here. It takes a superhuman effort not to succumb to our biases and to (not) believe what we want to believe, and take a genuine interest in forming an objective opinion.&lt;br /&gt;&lt;br /&gt;One of the bigger learning for me in all of this is also the importance of give-and-take in making progress on any issue. Because of the entrenchment bias, people seldom change their views (i.e., come around to your way of thinking) based on objective data and logical persuasion. They only come along when they have skin in the game and for that to happen, there has to be an active element of give-and-take between the two parties. Which makes me even more admiring of GOOD politicians and diplomats. Their ability to keep moving forward on an issue in a bipartisan manner comes out of their skill in give-and-take, and thereby overcoming the entrenchment bias.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-195773421647549839?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/195773421647549839/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=195773421647549839' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/195773421647549839'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/195773421647549839'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/06/why-sitting-down-and-talking-does-not.html' title='Why sitting down and talking does not help - (and what it means for the data scientist)'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-9105809079475550484</id><published>2009-06-14T18:20:00.006-04:00</published><updated>2009-06-14T18:57:42.254-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Computing'/><title type='text'>Connecting to the data centers - Netbooks</title><content type='html'>An apt follow-up to the &lt;a href="http://stat-exchange.blogspot.com/2009/06/massive-cloud-of-1s-and-0s.html"&gt;post on data centers&lt;/a&gt; should be on the evolving tools to access the data centers. I had gone to Costco this morning and came across Netbooks. These are just stripped down laptops (or notebooks, if you may) that are perfect for accessing the Internet and getting your work done. Quite the rage with the college crowd apparently.&lt;br /&gt;&lt;br /&gt;This is an attempt by the computer hardware industry to break the price barrier on portable computers. Used to about $1500 &lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_nGE-rKiiGEc/SjV-xc8rw6I/AAAAAAAACU0/FofTLmyt9_E/s1600-h/netbooks.sidebyside.520w.jpg"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer; width: 253px; height: 150px;" src="http://4.bp.blogspot.com/_nGE-rKiiGEc/SjV-xc8rw6I/AAAAAAAACU0/FofTLmyt9_E/s320/netbooks.sidebyside.520w.jpg" alt="" id="BLOGGER_PHOTO_ID_5347319520621740962" border="0" /&gt;&lt;/a&gt;and came down to about $1000 four to five years back. But then the barrier stayed there for a while, with manufacturers adding feature on top of feature but refusing to reduce price. Till netbooks came along. These devices are priced at about $200-$350 and are pretty minimalist in their design. They have a fairly robust processor, a good sized keyboard and screen. No CD drive, for only neanderthals use a CD. But loaded when it comes to things like a Webcam, WiMax, etc. The netbook idea has two parallel phenomena that drove its evolution. One, the high-profile $100 laptop  for third-world kids that really didn't go anywhere. The other was the increase in broadband penetration in the United States.&lt;br /&gt;&lt;br /&gt;Another driver (probably) is the coming of age of the Millennial generation. When I grew up, the cool computer company of our times was Microsoft (or Apply, if you hated Microsoft). Both these companies had built their business models on paid products, products that needed upgrade and which cost money. We had therefore a certain reverence towards these companies and therefore an implicit acceptance of their pay-for-use business model.&lt;br /&gt;&lt;br /&gt;Today's generation has come of age in the age of Google, Linux, Napster and other social networking sites. All of which are free. Today's kids feel less beholden to the idea of a computer company putting out formal products which you need to pay for and which get upgraded once every two years, for which you need to pay for again. In today's age, the idea of freeware and products that actively evolve with use is becoming more and more accepted. Ergo, the netbook.&lt;br /&gt;&lt;br /&gt;Enough of my pop-psychology for now. Anyway, netbooks are really cool gadgets and I am tempted to get one really soon. The Economist had &lt;a href="http://www.economist.com/business/displayStory.cfm?story_id=13832588&amp;amp;source=hptextfeature"&gt;a good article&lt;/a&gt; on the subject. Let me know if you are early adopters of netbooks and your experiences so far.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-9105809079475550484?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/9105809079475550484/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=9105809079475550484' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/9105809079475550484'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/9105809079475550484'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/06/connecting-to-data-centers-netbooks.html' title='Connecting to the data centers - Netbooks'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_nGE-rKiiGEc/SjV-xc8rw6I/AAAAAAAACU0/FofTLmyt9_E/s72-c/netbooks.sidebyside.520w.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-6361670094171696751</id><published>2009-06-12T23:15:00.005-04:00</published><updated>2009-06-13T17:25:23.737-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Simulations'/><category scheme='http://www.blogger.com/atom/ns#' term='Stress Testing'/><category scheme='http://www.blogger.com/atom/ns#' term='Modeling'/><title type='text'>Stress testing your model - Part 3/3</title><content type='html'>We discussed two techniques of ensuring the robustness of models in two previous posts. In the first post, we discussed &lt;a href="http://stat-exchange.blogspot.com/2009/06/stress-testing-your-model-part-13.html"&gt;out-of-sample validation&lt;/a&gt;. In the second post, we discussed &lt;a href="http://stat-exchange.blogspot.com/2009/06/stress-testing-your-model-part-23.html"&gt;sensitivity analysis&lt;/a&gt;. I find sensitivity analysis to be a really valuable technique for ensuring the robustness of model outputs and decisions driven by models - but only when it is done right.&lt;br /&gt;&lt;br /&gt;Another and a more computing-intensive technique of ensuring model output robustness is Monte Carlo simulation. Monte Carlo simulation basically involves running the models literally thousands of time and changing each of the inputs a little with every run. With advances in computing power and the power being within reach of most modelers and researchers, it has become fairly easy to set up and run the simulation.&lt;br /&gt;&lt;br /&gt;So let's say, we have a model with 3 inputs. And now let's assume that the inputs are varied in 10 steps over its entire valid range. So now the model will produce 1000 different outputs for various values of inputs (1000 = 10 x 10 x 10), each output having a theoretical probability of 0.001.&lt;br /&gt;&lt;br /&gt;How are the inputs varied?&lt;br /&gt;Typically using a distribution that varies the inputs in a probabilistic manner. The input distribution is the most important assumption that goes into the Monte Carlo simulation. The typical approach is to assume that most events are normally distributed. But the reality is that normal distribution is usually observed only in natural phenomena. In most business applications, distributions are usually skewed in one direction. (Take loan sizes on a financial services product, like a credit card. The distribution is always skewed towards the higher side, as balances cannot be less than zero but can take really large positive values.)&lt;br /&gt;&lt;br /&gt;Correlation or covariance of the inputs&lt;br /&gt;In a typical business model, inputs are seldom independent; they have various degrees of correlation. It is important to keep this correlation in mind while running the scenarios. By factoring in covariance of inputs explicitly while running the simulation, the output is probabilistically weighted towards results which occur when the inputs are correlated.&lt;br /&gt;&lt;br /&gt;Of course, as with any piece of modeling, there are ways in which this technique can be misused. Some of my pet gripes about MC simulation will form the subject of a later post.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-6361670094171696751?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/6361670094171696751/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=6361670094171696751' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/6361670094171696751'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/6361670094171696751'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/06/stress-testing-your-model-part-33.html' title='Stress testing your model - Part 3/3'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-8833075261462301471</id><published>2009-06-10T21:56:00.005-04:00</published><updated>2009-06-10T22:07:39.682-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Data Centers'/><category scheme='http://www.blogger.com/atom/ns#' term='Computing Infrastructure'/><category scheme='http://www.blogger.com/atom/ns#' term='Computing'/><title type='text'>The massive cloud of 1s and 0s</title><content type='html'>Read an &lt;a href="http://www.nytimes.com/2009/06/14/magazine/14search-t.html?ref=magazine&amp;amp;pagewanted=all"&gt;interesting article&lt;/a&gt; in the New York Times about the growth in data centers as we become an increasingly internet based world. Some of the numbers aroun&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_nGE-rKiiGEc/SjBmlrQJqOI/AAAAAAAACUs/7rYnVaffdmU/s1600-h/data-center-t01.jpg"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer; width: 229px; height: 172px;" src="http://2.bp.blogspot.com/_nGE-rKiiGEc/SjBmlrQJqOI/AAAAAAAACUs/7rYnVaffdmU/s320/data-center-t01.jpg" alt="" id="BLOGGER_PHOTO_ID_5345885555140438242" border="0" /&gt;&lt;/a&gt;d the data center capacities at places like Microsoft, Google and for e-commerce or bidding sites like Amazon and eBay, was quite astounding. Not to mention the various electronic financial exchanges in the world.&lt;br /&gt;&lt;br /&gt;Some of the numbers are quite astounding. Microsoft has more than 200,000 and its massively bigger competitor has to have more. And this massive data center infrastructure capability is already beginning to have a serious impact on the power requirements for our new world. More power for our data centers to upload of pictures of the weekend do onto Facebook, or an intact polar ice-cap. You chose!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-8833075261462301471?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/8833075261462301471/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=8833075261462301471' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/8833075261462301471'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/8833075261462301471'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/06/massive-cloud-of-1s-and-0s.html' title='The massive cloud of 1s and 0s'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_nGE-rKiiGEc/SjBmlrQJqOI/AAAAAAAACUs/7rYnVaffdmU/s72-c/data-center-t01.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-2114286422949515441</id><published>2009-06-09T21:26:00.003-04:00</published><updated>2009-06-09T21:58:21.060-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Budget Deficit'/><category scheme='http://www.blogger.com/atom/ns#' term='Macroeconomics'/><category scheme='http://www.blogger.com/atom/ns#' term='Recession'/><title type='text'>Green shoots ... or bust</title><content type='html'>Equity markets, both international and US, seem to have taken to heart the signs of bottoming of the world economy and the rebound seen in Asia. The Brazilian, Chinese and Indian stock markets are at least 50% higher than the lows in Q4 2008. The DJIA went down to the 6500 for a while but has since rebounded to gyrating around the 8500 market and has on occasion, flirted with the 9000 level.&lt;br /&gt;&lt;br /&gt;The rates of job-loss is falling in the US and despite a glut in world oil supply, crude prices are nearing the $70 mark after crashing down to the $30s only recently. Are we past the worst then? Despite the lingering weakness in W.Europe (something seen arguably since the start of WW II!), the signs of economic recovery seem to be unmistakable.&lt;br /&gt;&lt;br /&gt;Is the US consumer then going to go back to his/her free-spending ways? While we seem to have come up a fair bit from the Q4 depths, at least from a consumer confidence standpoint, there could be a few big obstacles to growth.&lt;br /&gt;1. The budget deficit. With the famous American aversion to taxes and the growing burden of entitlements (driven mainly by healthcare costs) as the baby-boomer generation retires, the deficit is only going to get worse.&lt;br /&gt;2. The cost of borrowing to feed the deficit. The US Treasury's place of pride as the investment of the highest quality could be under threat as the domestic debt as a % of GDP grows. The US government will need to increasingly borrow more and pay higher interest rates for the borrowing. The higher interest burden is going to crimp the ability to make productive investments.&lt;br /&gt;3. Finally, with increasing protectionism and government involvement in the economy, the vitality of US business enterprise to identify and capitalize on opportunity looks to be suppressed for the next several years.&lt;br /&gt;&lt;br /&gt;A number of prognosticators have made some long-range predictions of the US Economy in &lt;a href="http://nymag.com/news/features/57070/"&gt;this article&lt;/a&gt;. An interesting read, as the forecasters have taken a 5-10 year view rather than the 6-12 month view typically taken by realtor types. Another interesting article, about the &lt;a href="http://www.economist.com/finance/displaystory.cfm?story_id=13648998"&gt;long term speed limit&lt;/a&gt; of the US economy from the Economist. Summarizing the article,&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;According to Robert Gordon, a productivity guru at Northwestern University, America’s trend rate of growth in 2008 was only 2.5%, the lowest rate in its history, and well below the 3-3.5% that many took for granted a few years ago. Without factoring in the financial crisis, Mr Gordon expects potential growth to fall to 2.35% over the coming years.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-2114286422949515441?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/2114286422949515441/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=2114286422949515441' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/2114286422949515441'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/2114286422949515441'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/06/green-shoots-or-bust.html' title='Green shoots ... or bust'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-3567989083459546651</id><published>2009-06-07T10:57:00.005-04:00</published><updated>2009-06-07T11:57:21.618-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Risk Management'/><category scheme='http://www.blogger.com/atom/ns#' term='Sensitivity Analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='Model Validation'/><category scheme='http://www.blogger.com/atom/ns#' term='Modeling'/><title type='text'>Stress testing your model - Part 2/3</title><content type='html'>Continuing on the topic of risk management for models. After building a model, how do you make sure the model remains robust under working conditions? More importantly, make sure it works well under extreme conditions? We discussed the importance of independent validation for empirical models in a previous post. In my experience, model failures have been frequent when the validation process has been superficial.&lt;br /&gt;&lt;br /&gt;Now, I want to move on to sensitivity analysis. Sensitivity analysis involves understanding the variability of the model output as the inputs to the model are varied. The inputs are changed by + or - 10 to 50% and the output is recorded. The range of outputs gives a sense of the various outcomes that can be anticipated and one needs to prepare for. Sensitivity analysis can also be used to stress test the other components of the system which the model drives. For example, let's say the output of the model is a financial forecast that goes into a system that is used to drive, deposit generation. The sensitivity analysis output gives an opportunity to check the robustness of the downstream system. By knowing that one might require occasionally to generate deposits at 4-5 times the usual monthly volumes, one can prepare accordingly.&lt;br /&gt;&lt;br /&gt;Now, sensitivity analysis is one piece of stress testing that has usually been misdirected and incomplete. Good sensitivity analysis looks at both the structural components of the model as well as the inputs to the model. Most sensitivity analysis I have encountered stress only the structural components. What is the difference between the two?&lt;br /&gt;&lt;br /&gt;Let's say, you have a model to project the performance of the balance sheet of a bank. One of the typical stresses that one would apply is to the expected level of losses on the loan portfolio of the bank. A stress of 20-50% and sometimes even 100% increase in losses is applied and the model outputs are assessed. When this is done consistently with all the other components of the balance sheet, you can get a sense of the sensitivity of the model to various components.&lt;br /&gt;&lt;br /&gt;But that's not the same as the sensitivity to inputs. Because inputs are based in real-world phenomena, their impact is usually spread out to multiple components in the model. For example, if the 100% increase in losses were driven by a recession in the economy, there would be other impacts that one would need to worry about. Now, a recession is usually accompanied by a flight to quality from investors. So if there is a recessionary outlook, the value of equity holdings could crash as well due to equity investors moving out from equities (selling) and into more stable instruments. A third impact could be the impact of higher capital requirements on the value of traded securities . As other banks face the same recessionary environment, their losses could increase to such an extent that a call to increase capital becomes inevitable. How does one increase capital? The easiest route is to liquidate existing holdings. Driving a greater fall in the market prices of traded securities. Thus, putting further stresses on the balance sheet.&lt;br /&gt;&lt;br /&gt;So, the scenario of running a 50% increase in loan losses is a purely illusory one. When loan losses increase, one has to contend with what the fundamental driver could be and how can that fundamental driver impact other portions of the balance sheet.&lt;br /&gt;&lt;br /&gt;The other place where sensitivity analysis is often incomplete is by not looking at the impact of upstream and downstream processes and strategies. A model is never a stand-alone entity. It has upstream sources of data and down-stream uses of the model output. So if the model has to face situations where there are extreme values of inputs, what could be the implications on upstream and downstream strategies? These are the questions any serious model builder should be asking.&lt;br /&gt;&lt;br /&gt;This discussion on sensitivity analysis has hopefully been eye-opening to modeling practitioners. Now, we will go on to a third technique, Monte Carlo simulation in another post. But before we go there, what are other examples of sensitivity analyses that you have seen in your work? How has this analysis been used effectively (or otherwise)? What are good graphical ways of sharing the output?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-3567989083459546651?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/3567989083459546651/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=3567989083459546651' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/3567989083459546651'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/3567989083459546651'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/06/stress-testing-your-model-part-23.html' title='Stress testing your model - Part 2/3'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-2984035199203453599</id><published>2009-06-06T08:28:00.005-04:00</published><updated>2009-06-06T08:48:23.617-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Model Validation'/><category scheme='http://www.blogger.com/atom/ns#' term='Stress Testing'/><category scheme='http://www.blogger.com/atom/ns#' term='Modeling'/><title type='text'>Stress testing your model - Part 1/3</title><content type='html'>So, you've built a model. You have been careful about understanding your data, transforming it appropriately, used the right modeling technique, done an independent validation (if it is an empirical model) and now you are ready to use the model to make forecasts, drive decisions, etc.&lt;br /&gt;&lt;br /&gt;Wait. Not so fast. Before the model is ready for prime time, you need to make sure that the model is robust. What defines a robust model?&lt;br /&gt;- the inputs should cover not just the expected events but also extreme events&lt;br /&gt;- the model should not break down (i.e., mispredict) when the inputs turn extreme (Well, no model can be expected to perform superbly when the inputs turn extreme. If models could do that, the events wouldn't be termed extreme events. But the worst thing that a model can do is provide an illusion of normal (english usage) outputs when the inputs are extreme.&lt;br /&gt;&lt;br /&gt;I want to share some of the techniques that are used for understanding the robustness of the models, what I like about them and what I don't.&lt;br /&gt;&lt;br /&gt;1. When it comes to empirical models, one of the most useful techniques is Out of Sample Validation. This is done by building the model on one data set and validating the algorithm on another. For the truest validation, the validation dataset should be independent of the build, should be drawn from a different time period. "Check-the-box" type validation is when you validate the model on a portion of the build sample itself. Such validation often holds and just as often offers a false sense of security, because in real terms, you have really not validated anything.&lt;br /&gt;Caveat: Out of sample validation is of no use if the future is going to look very different from the past. Validating a model to predict the probability of mortgage default using conventional mortgages data would have been of no use in a world where no-documentation mortgages and other exotic-term mortgages were being marketed.&lt;br /&gt;&lt;br /&gt;The other two approaches I want to discuss are Sensitivity Analysis and Monte Carlo Simulation.  I will cover them in subsequent posts.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-2984035199203453599?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/2984035199203453599/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=2984035199203453599' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/2984035199203453599'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/2984035199203453599'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/06/stress-testing-your-model-part-13.html' title='Stress testing your model - Part 1/3'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-2572895923061031974</id><published>2009-06-02T23:08:00.003-04:00</published><updated>2009-06-02T23:29:19.491-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Problem Solving'/><category scheme='http://www.blogger.com/atom/ns#' term='Modeling'/><category scheme='http://www.blogger.com/atom/ns#' term='Estimation'/><title type='text'>Using your grey cells ...</title><content type='html'>Growing up as a singularly unathletic child, my favourite form of recreation was usually through books. And a favourite amongst the books were Agatha Christie's murder mysteries featuring Hercule Poirot. Poirot fascinated me. I guess there was the element of vicariously living through the act of evil being punished by good. (Which probably attracts us to all mystery writers).&lt;br /&gt;&lt;br /&gt;But another aspect that made Poirot more appealing than the more energetic specimens like Sherlock Holmes was his reliance on "ze little grey cells". The power of analytical reasoning practiced through the simple mechanism of question and answer being used to solve fiendishly difficult murders. How romantic an idea!&lt;br /&gt;&lt;br /&gt;I recently came across a intriguing set of problems, which require rigorous exercise of the grey cells. Called Fermi problems, these are just plain old estimation problems. Typical examples go like "estimate the number of piano tuners in New York City", "estimate the number of Mustangs in the US". It requires one to start out with some basic facts and figures and then get to the answer through a number of logical reasoning steps. For the piano tuner question, it is usually good to start off with some estimate of the population of New York City. Starting off with ridiculous numbers (like 1 million or 100 million) will definitely lead you to the wrong answer. So, what the estimation exercise really calls for is some general knowledge with some ability to think and reason logically.&lt;br /&gt;&lt;br /&gt;The solution to the piano tuner problem typically goes as follows:&lt;br /&gt;- Number of people in NYC -&gt; Number of households&lt;br /&gt;- Number of households -&gt; Estimating the numbers with a piano&lt;br /&gt;- Number of pianos -&gt; Some tuning frequency -&gt; Demand for number of pianos that need tuning in a month&lt;br /&gt;- Assuming a certain number of pianos that can be tuned in a day and a certain number of working days, you get to the number of likely tuners&lt;br /&gt;&lt;br /&gt;The exercise definitely teaches some ability to make logical connections. The other thing this type of thinking teaches is parsimony of assumptions. One could make the problem more complex by assuming a different population for NYC's different buroughs, different estimates for the proportion of households with pianos for Manhattan vs the Bronx and so on. In practice however, these assumptions only introduce false precision to the answer. Just because you have thought through the solution in an enormous degree of detail doesn't necessarily make it right.&lt;br /&gt;&lt;br /&gt;Some typical example of Fermi problems can be found at &lt;a href="http://iws.ccccd.edu/mbrooks/demos/fermi_questions.htm"&gt;this&lt;/a&gt; link. Enjoy the experience. And I would love to hear some of the feelings that strike you as you try and solve these problems. Some of the "a-ha" moments for me were around the parsimony of assumptions, needing to find the point of greatest uncertainty and then fix it with the least cost, so as to narrow down the range of answers.&lt;br /&gt;&lt;br /&gt;The exercise overall taught me a fair bit about about modeling, the way we approach modeling problems, ranges of uncertainty and how we deal with them, parsimony of assumptions.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-2572895923061031974?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/2572895923061031974/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=2572895923061031974' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/2572895923061031974'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/2572895923061031974'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/06/using-your-grey-cells.html' title='Using your grey cells ...'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-1015686533624285339</id><published>2009-05-31T10:40:00.004-04:00</published><updated>2009-05-31T15:42:20.282-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Extreme Values'/><category scheme='http://www.blogger.com/atom/ns#' term='Law of Large Numbers'/><category scheme='http://www.blogger.com/atom/ns#' term='Probability'/><title type='text'>The counter-intuitiveness of probability - small sample sizes</title><content type='html'>Sports enthusiasts amongst you (and who read this website) have to be into sports statistics. My earliest memories about cricket were not about me playing my first impressive air cover-drive or about charging in (in my shorts) and delivering a toe-crushing yorker fired towards the base of the leg-stump. My most vivid early memories were about buying cricket and sports magazines hot off the presses and reading sheets and sheets of cricket statistics.&lt;br /&gt;&lt;br /&gt;These statistics covered a wide range of topics. There was the usual highest number of runs, highest number of wickets type stuff. There were also ratio-type statistics: number of 5WI per game, number of centuries per innings, proportion of winning games in which a certain batsman scored a century. With a lot of the ratio metrics, there was usually a minimum in the form of number of matches, innings the player should have played before being part of the statistics. For my unschooled mind, it was a way of eliminating the one-match wonders, the flukes from the more serious practitioners.&lt;br /&gt;&lt;br /&gt;With the gift of recapitulating some of those memories and looking at them afresh with my relatively better schooled analytical mind, it struck me that what the statistician (or more precisely, compiler of statistics) was trying to do was to use the law of large numbers (and large event sizes) to produce a distribution centered around the true mean. Put in another way, when the sample is small, one is more apt to get extreme values. So, if the "true" average for number of innings per century is 4.5, there could be 3-4 innings' stretches where the player scores consecutive hundreds, pushing the average well down. And if these stretches occur (by chance) at the start of someone's career, it is wont to lead to wrong conclusions about the ability of the batsman.&lt;br /&gt;&lt;br /&gt;A simple exercise. How many Heads to expect out of 5 coin tosses with an unbiased coin? One would say, between 2 and 3 Heads. But if you had one trial vs several, what would you expect? What would the mean look like and what would the distribution be? For the sake of simplicity, let's label 2 and 3 Heads as Central values, 1 and 4 as Off-Central and 0 and 5 as Extreme values.&lt;br /&gt;&lt;br /&gt;I did a quick simulation. Following are the results around mean, Central, Off-Central and Extreme values.&lt;br /&gt;With 1 trial, the results were: 3,1,0,0.&lt;br /&gt;With 3 trials, the results were: 1.66,0,2,1.&lt;br /&gt;With 5 trials, the results were: 1.8,1,4,0.&lt;br /&gt;With 10 trials, the results were: 2.7,10,0,0.&lt;br /&gt;Now 10 is no magic number but it is easy to see how one can get a greater proportion of central values (or values closer to the mean) as the number of trials gets larger. I would love to get a cool snip of SAS or R code that can do this simulation. And hence the push to eliminate outliers by increasing the number of trials.&lt;br /&gt;&lt;br /&gt;Now this is the paradox of small trials. When the number of trials are small, when you have fewer shots at observing something, chances are greater that you'd actually see more extreme values whose frequency cannot be predicted. What does this mean for risk management? Does one try and manage greater volatility at a unit level or lesser at the system level? And how do you make sure the greater volatility at the unit level does not sink your business?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-1015686533624285339?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/1015686533624285339/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=1015686533624285339' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/1015686533624285339'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/1015686533624285339'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/05/counter-intuitiveness-of-probability.html' title='The counter-intuitiveness of probability - small sample sizes'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-4741546786839735801</id><published>2009-05-29T22:46:00.006-04:00</published><updated>2009-05-30T15:15:33.450-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Data visualization'/><category scheme='http://www.blogger.com/atom/ns#' term='Exploratory Data Analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='statistical inference'/><title type='text'>Data handling - the heart of good analysis - Part 1</title><content type='html'>I have been building consumer behaviour models using regression and classification tree techniques for the last 4 years now. Most of this work has been in SAS. Now, there are a large number of interesting SAS procedures that are only slightly different from one another. Many of them can be interchangeable used, like PROC REG and PROC GLM.&lt;br /&gt;&lt;br /&gt;But the single most important learning for me over this period has been that you can't spend enough time understanding and transforming the data. Very many interesting and potentially promising pieces of analysis go nowhere because the researcher has not enough time understanding the data. And then, having understood the data, transformed it into a form that is relevant to the problem at hand.&lt;br /&gt;&lt;br /&gt;One of the seminal pieces on understanding data and plotting it in useful ways, is John Tukey's "Exploratory Data Analysis". This paper introduces some unique and important ways of graphing and understanding what the data is trying to say. One of my personal favorite SAS procedures is PROC MEANS and PROC UNIVARIATE. And of course PROC GPLOT. My advice to the budding social scientist and quantitative practitioner is to learn to use these techniques before learning the cooler procedures like LOGISTIC and Linear Models. This was one of the first things I learnt in my own journey as a statistical modeler and I have some very good and experienced colleagues to thank for making sure I learnt the basics first.&lt;br /&gt;&lt;br /&gt;Over the next several posts, I am going to share some of my favorite forms of data depiction. The next several days will be a very interesting read, I promise.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-4741546786839735801?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/4741546786839735801/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=4741546786839735801' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/4741546786839735801'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/4741546786839735801'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/05/data-handling-heart-of-good-analysis.html' title='Data handling - the heart of good analysis - Part 1'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-8645582242365296614</id><published>2009-05-26T22:45:00.004-04:00</published><updated>2009-05-26T22:56:51.068-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='credit downturn'/><category scheme='http://www.blogger.com/atom/ns#' term='Macroeconomics'/><title type='text'>Macro-economic indicators - a good source</title><content type='html'>I found a good source of macro-economic indicators on the Internet. This is on the NY Times web-site. Go to the Blogs section and look for a blog called Economix. There is a really good graphic along the right side of the page. The graphic covers important macro-economic metrics such as the unemployment rate, inventory-to-sales ratio, GDP growth, consumer price index (or inflation), factory orders, durable goods orders, etc. Click for a link &lt;a href="http://economix.blogs.nytimes.com/"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Why are these metrics important? Looking across a broad swathe of metrics gives a good blend of the various viewpoints one should consider when forming a view of the economy and where it is headed. And it is extremely clear that while some of the indicators seem to have stabilized and are pointing to a bottom having been reached, it is by no means consistent across indicators.&lt;br /&gt;&lt;br /&gt;We have merely gone from all bad news to mostly bad news with some stable news thrown in.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-8645582242365296614?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/8645582242365296614/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=8645582242365296614' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/8645582242365296614'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/8645582242365296614'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/05/macro-economic-indicators-good-source.html' title='Macro-economic indicators - a good source'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-1660702990333334380</id><published>2009-05-24T12:23:00.002-04:00</published><updated>2009-05-24T12:51:02.269-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='flu'/><category scheme='http://www.blogger.com/atom/ns#' term='statistical inference'/><category scheme='http://www.blogger.com/atom/ns#' term='regression'/><category scheme='http://www.blogger.com/atom/ns#' term='cdc'/><category scheme='http://www.blogger.com/atom/ns#' term='swine flu'/><title type='text'>Simple regression applications - estimating flu impacts</title><content type='html'>Came across this interesting &lt;a href="http://www.slate.com/id/2218367/"&gt;piece&lt;/a&gt; around the estimation of flu impacts. From Slate. One of my favourite web-sites.&lt;br /&gt;&lt;br /&gt;You must have across the new articles which say that flu caused so many thousand deaths in a certain year. Now attributing deaths to flu is not as straight-forward as it would seem. Flu is not the "Cause of Death" that often in a death certificate. Flu usually kills by causing secondary conditions like pneumonia, heart disease, etc. which the enfeebled body is not able to resist. So one can find relatively few cases where the cause of death can be directly attributed to the flu. So how is the estimation done? The answer is simple regression using deaths as the dependent variable and the number of flu cases as the independent variable.&lt;br /&gt;&lt;br /&gt;One piece of data is the number of deaths in the US. This can be broken down by week or by month for the flu season. (Approx Oct to Apr) The other piece of data is the number of flu cases tracked by various testing labs across the country. This information is also available broken down by week or by month. The &lt;a href="http://www.cdc.gov"&gt;CDC website&lt;/a&gt; is a ready source of such morbidity, going back at least to the early 90s! Then it is a matter of running a simple regression to create a link between flu cases and deaths. The regression takes the form: deaths = intercept + co_eff * number of flu cases, the intercept being the number of deaths one can expect due to other baseline causes.&lt;br /&gt;&lt;br /&gt;It almost seems too simple to be true. How can you be sure that deaths caused in a certain month can be linked to flu cases from that period? Or does one assume a certain lag for flu to lead to mortality? How do we know we have normalized for everything else? What is the CI of the estimates? Check &lt;a href="http://jama.ama-assn.org/cgi/content/abstract/289/2/179"&gt;this paper&lt;/a&gt; out by William W. Thompson for more details!&lt;br /&gt;&lt;br /&gt;Now with the emergence of the potentially more deadly H1N1 flu, how can one go about estimating its impact?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-1660702990333334380?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/1660702990333334380/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=1660702990333334380' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/1660702990333334380'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/1660702990333334380'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/05/simple-regression-applications.html' title='Simple regression applications - estimating flu impacts'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-9213524190487167128</id><published>2009-05-23T00:18:00.002-04:00</published><updated>2009-05-23T00:28:16.817-04:00</updated><title type='text'>Demise of Chrysler's dealers and a modeling problem</title><content type='html'>An interesting thought experiment.&lt;br /&gt;&lt;br /&gt;Chrysler is closing many of its dealers over the next 3 weeks as part of its planned bankruptcy. If you are a dealer who is facing the axe, how might you go about liquidating your inventory at the best possible return? Or more realistically, the least loss.&lt;br /&gt;&lt;br /&gt;One idea might be to have an auction for individuals. Define a strike price (don't disclose, of course!), promise at least as good a price as what other dealers for comparable cars are willing to offer. And let the bidding begin. There might be some marketing spend to print out a few hundred mailers and send it out to the neighborhood.&lt;br /&gt;&lt;br /&gt;Another parallel strategy can be to have a reverse-auction for other dealers which can then be volume driven. Assuming that Chrysler demand does not plummet to zero when its factories are shut and it is going through its bankruptcy, there will be a few months-long phase when Chrysler dealers will not have enough inventory to meet their demand. At least some of them. The reverse action you set up can promise to drive down the average price of the vehicle if the purchasing dealer is willing to pick up volume. So internally, have a strike price of (say) $20K / vehicle if you buy one Jeep, but the price drops to $18.5K if you buy 2, $17.75K if you are willing to buy 3 and so on.&lt;br /&gt;&lt;br /&gt;But any other ideas on how to model this? What might a good algorithm be? Does one use game theory to solve the problem?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-9213524190487167128?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/9213524190487167128/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=9213524190487167128' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/9213524190487167128'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/9213524190487167128'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/05/demise-of-chryslers-dealers-and.html' title='Demise of Chrysler&apos;s dealers and a modeling problem'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-9133099635680018750</id><published>2009-05-03T17:32:00.004-04:00</published><updated>2009-05-03T17:36:20.577-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Wall Street'/><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='Modeling'/><title type='text'>It's been a long time</title><content type='html'>It's been long since I shared my views on my blog. Excuses are multiple, but none is particularly credible.&lt;br /&gt;&lt;br /&gt;But I think this is a great time to start again. In the last 2 years, the world has been painfully exposed to the fallibility of models. From being the engines of modern finance and the economy at large, models have gone to becoming the reason #1 for the economy's collapse. Even worst, model builders have become a bit of a laughing stock in the post-Wall Street society.&lt;br /&gt;&lt;br /&gt;Here's an attempt to set models and statistics to back where they belong. So lets see how it goes this time around.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-9133099635680018750?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/9133099635680018750/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=9133099635680018750' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/9133099635680018750'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/9133099635680018750'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2009/05/its-been-long-time.html' title='It&apos;s been a long time'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-2775018348497149411</id><published>2007-04-16T02:22:00.000-04:00</published><updated>2007-04-16T02:37:37.629-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='Probability'/><category scheme='http://www.blogger.com/atom/ns#' term='Risk'/><category scheme='http://www.blogger.com/atom/ns#' term='Modeling'/><title type='text'>Quantifying the tails - Another interesting book</title><content type='html'>One of my readers Rusen pointed to another very interesting (if somewhat specialized) book which deals with quantifying extreme or &lt;em&gt;once-in-a-hundred-years&lt;/em&gt; events. The name of this book is "Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets" by Nassim Nicholas Taleb.&lt;br /&gt;&lt;br /&gt;Taleb talks about events that occur at the extreme end of a probability distribution, where we &lt;em&gt;think&lt;/em&gt; our models work but they models break down (because of a number of reasons). Taleb presents a perspective around where we should depend on models and where our intuition should tell us not to trust the models.&lt;br /&gt;&lt;br /&gt;If readers want a preview of what's there in the book, check out this website &lt;a href="http://www.fooledbyrandomness.com"&gt;www.fooledbyrandomness.com&lt;/a&gt;. (Taleb is releasing a second book this month called "The Black Swan: the impact of the highly improbable".)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-2775018348497149411?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/2775018348497149411/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=2775018348497149411' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/2775018348497149411'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/2775018348497149411'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2007/04/quantifying-tails-another-interesting.html' title='Quantifying the tails - Another interesting book'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-9178835650645884080</id><published>2007-04-12T14:28:00.000-04:00</published><updated>2007-04-16T02:22:34.133-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='Probability'/><category scheme='http://www.blogger.com/atom/ns#' term='Risk'/><category scheme='http://www.blogger.com/atom/ns#' term='Modeling'/><title type='text'>Some interesting books on risk and probability</title><content type='html'>My holiday reading in India is two books I have been planning to read for a while now, but never really found the time.&lt;br /&gt;&lt;br /&gt;One is "Against the Gods: The Remarkable Story of Risk" by Peter Bernstein. People who have read Peter Bernstein will know what to expect. Very detailed coverage of fairly arcane concepts without getting into too many technicalities. A good jumping board to get into the actual academic papers. (Something I love doing is to pick these books and then search for the bibliography, esp. academic papers. Very interesting exercise and highly encouraged for those amongst us who want to know a little more.). I should post a review of the book in the next few weeks. For those who want to read another quality book from Peter Bernstein, try "Capital Ideas".&lt;br /&gt;&lt;br /&gt;Second is a book called "Chances are: Adventures in Probability" by Kaplan and Kaplan. I am midway through this book. Again very informative and entertaining. The underpinnings of insurance is particularly interesting. (Bayes is coming up in the next chapter. I am super excited!)&lt;br /&gt;&lt;br /&gt;A third book I really enjoyed was "When Genius Failed" by Roger Lowenstein. Hope to start a discussion about the book very soon, as I see very immediate application with what LTCM went through and my current line of work. (No, I don't work with a hedge fund.)&lt;br /&gt;&lt;br /&gt;Even while I am adding stuff about these books, do try and grab them from the local library or better still, own them. They will give make for pleasurable reading for many many years, I promise. For those living in Fairfax County in Virginia, USA, the library system has all of these books.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-9178835650645884080?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/9178835650645884080/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=9178835650645884080' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/9178835650645884080'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/9178835650645884080'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2007/04/some-interesting-books-on-risk-and.html' title='Some interesting books on risk and probability'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6088933028650286437.post-2226338493928569902</id><published>2007-04-05T07:25:00.000-04:00</published><updated>2007-04-05T07:54:46.144-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='Statistical'/><category scheme='http://www.blogger.com/atom/ns#' term='Modeling'/><title type='text'>Prolog</title><content type='html'>If you have come to the Statistics Exchange, I am guessing you are someone facinated by Statistics and intrigued by the idea of what an exchange means.&lt;br /&gt;&lt;br /&gt;This blog is just coming up but let me give you a preview of what you can expect. This is a place where people can get together and share statistical insights and ideas purely from a practitioner standpoint. Don't expect to see too much of theory here, this space is purely about interesting applications within the space. I am just getting started here and in the middle of a vacation, which is a perfect time to step back and do something like this. My regular job is full of statistical modeling, so it will be exciting to learn things from work and apply them here. And vice versa. Hoping to put some interesting stuff here pretty soon.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6088933028650286437-2226338493928569902?l=stat-exchange.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://stat-exchange.blogspot.com/feeds/2226338493928569902/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6088933028650286437&amp;postID=2226338493928569902' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/2226338493928569902'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6088933028650286437/posts/default/2226338493928569902'/><link rel='alternate' type='text/html' href='http://stat-exchange.blogspot.com/2007/04/prolog.html' title='Prolog'/><author><name>Krish Swamy</name><uri>http://www.blogger.com/profile/10690110192473170310</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
