Exchange Ideas

Causal Capital

RMB - Risk, Markets & Banking

 

August 11, 2006

The issues of External Data in models isn`t to be expected

Banks that are reading the AMA segment of the Basel accord will find that it makes mention to many parts of the methodology portion of a typical framework taking in: scenarios, loss data, forward looking measures (indicators), control assessment and the nebulous use of external data. Out of all the quantitative measures that are used by an operational risk practitioner to dimension their potential exposure, external data seems to present one of the largest concerns. In this article we are going to look at some of the problems that come about from the use of external data and some of the good applications that are worthy of it.

Basel Accord Paragraph 674

A bank’s operational risk measurement system must use relevant external data (either public data and/or pooled industry data), especially when there is reason to believe that the bank is exposed to infrequent, yet potentially severe, losses. These external data should include data on actual loss amounts, information on the scale of business operations where the event occurred, information on the causes and circumstances of the loss events, or other information that would help in assessing the relevance of the loss event for other banks.

Just like many of the wonderful paragraphs of the Basel Accord, this statement gives some insight on what we should be focusing our attention on and it leaves enough latitude to allow some banks to create their own elaborate machinations, many of us in the industry simply struggle to interpret the best use of a scarce resource.


The Providers
Firstly we need to locate a good provider of such information before we do anything with it and there are a few companies out there that promote this service, but there aren’t that many and I am sure plenty of risk managers that have scanned the internet for hours trying to locate a good source of external data know this to be true. Some of the firms that provide external data include OpVantage (part of the Fitch group), OpRisk Analytics, Aon OpBase and a popular one is the British Bankers Association Global Operational Loss Database or GOLD. Some of these providers are member based requiring all subscribers to contribute as a condition of sale however, some banks seem reluctant to add to the database even though they wish to draw from it and this just highlights how immature the risk discipline is as a whole across the industry. Social good of all is a disclosure by a few? In the world of data there isn’t a place for utilitarianism and all banks will have a certain level of impartance on pillar II and pillar III of the accord but that is another debate altogether.

The largest problem with external data (once we have found it) is applicability, that is are losses in Europe representative of exposures in Asia and many banks have been looking at industry specific focus centres for a good source of information.


FSA Capital Requirements Directive Implementation, March 2006

We observed that all AMA aspirants had access to at least one centrally-sourced external loss database, and we saw some convergence in the data providers used ... Although many firms are assessing the need to scale external data, the ‘reliability’ of such data did not appear to have been evaluated by many firms. One simple test a firm could carry out is to assess the quality of reporting of its own events in the external database(s) to which it subscribes and some business areas were making active use of specialist/niche databases, for example on fraud and IT incidents. One concern is that there was limited awareness or recognition of the alignment of loss data to internal events in the design of the OR framework and the AMA solution overall.
FSA Article


Applicability is a two sided equation of course and banks that don’t classify their internal losses by product, Basel II mapped risk event classification and other pertinently transparent mechanisms are going to have real difficulties combining or mixing external data with internal data. The good old saying “one doesn’t want to compare apples with alphabet letters” and to be bluntly honest I would be concentrating on such the internal homogeneous taxonomy before worrying about the specific problems of data scaling. It is scaling though, that seems to be on the lips of most bankers when they talk about external data so we are going to briefly address it.


Scaling
Is the Size of an Operational Loss Related to Firm Size? Great question isn’t it. A group of risk analysts decided to put the theory to the test some six years ago when the European Commission proposed that capital charges for operational risk might be based on the size and income of a firm.

What they discovered was that “While it seems intuitive that operational risk is to some degree a function of firm size, the nature of this relationship is not straightforward” and really if we think this through it would incredulous to believe that it is. What these analysts found was that size only accounts for a very small portion (about 5%) of the variability in loss severity.

The result of the investigation can be found here on Gloriamundi

The size of the firm is related to the magnitude of the loss but such associations are not linear and that there is clear evidence that there is a diminishing relationship between size of firm and magnitude of loss. In the real world we see small firms suffering losses on a business line in a similar proportion to big firms. This is to be expected when one considers the context of a banking product and its faults but in which if we contemplate the problem carefully, it actually diminishes the whole argument itself. That is a poor correlation provides a place for external data and takes us into one of the good uses of external data and resolves the scaling issue altogether.


Stratification
We agree that to calculate the capital of a bank we take our historical internal loss data, fit it to a family of curves to create a hypothetical loss distribution and then measure a specific quartile of that curve (the confidence level) to give us an opVar number. The problem arises that internal data is often insufficient to accurately estimate the upper tail of the loss distribution because extreme losses rarely occur.

By combining external data with internal data we are able to increase our sample size and thus the estimate of our capital, assuming that we are drawing losses from the same loss distribution and we “hope” internal loss data includes all losses that have occurred while external data includes losses exceeding a known peer group reporting threshold.

One could of course throw all the data together into a new loss dataset however that generally overestimates the likelihood of high losses and takes us back to the scaling debate. The canonical solution to this problem is to stratify the sample by combining internal and external data to obtain many times our sample of losses and it works like this:

Suppose we have Y number of internal loss observations and Y+(Y*1/2) number of external loss observations and that external losses have been censored above Z. The internal data has Y+(Y*1/2) less than external data and the same above the censored mark. A sample from the loss distribution needs to contain all of the data for losses over Z from the internal and external data set and, four copies of each data point for loss below Z from the internal data. This new sample is not biased toward higher losses and incorporates all of the available information.

An example of this can be found on the IDEAS NETWORK

Another approach would be to estimate the loss distribution using the weighted average approach by calculating moments and quantiles of the loss distribution in a way that four times as much weight is given to any loss below the threshold as it is above the threshold for both internal and external data.

There are plenty of examples of stratification on the internet and I perceive it to be a straight forward yet statistically proven method for mixing the two datasets but we have to remember that our data must be from the same distribution (homogeneous) otherwise the measure is totally inaccurate.

One can liken it to a voting poll where a reporter asks ten people in a room (a sample of the population) who they are going to vote for and we all know the more people the reporter asks the more accurate their assessment becomes.

Posted by CausalEvents at 06:04 AM | Comments (2)

What can I do with PRMIA online?