Common mistakes to avoid while using big data in risk management

Managing risk is a challenging enterprise, and errors are often made which can lead to catastrophic consequences. Today, big data analytics using digital tools like Hadoop or Splunk has seen an uptick amongst corporations looking to mitigate risk. There’s an optimism that reviewing big data can yield insights that can help manage risk more effectively and thus prevent disasters such as the 2008 financial crisis. For example, many banks are now performing real-time analytics on customer data such as credit history, transaction history and employment history to more accurately determine which segment of customers represent a high or low risk for being given a mortgage or loan.

In the same way, numerous product manufacturers are utilizing big data analytics in order to determine their customers’ likes and dislikes, enabling them to create products that meet their customers’ specific tastes. Doctors are using big data to determine high risk patients who require more immediate care. The energy industry is using big data to spot problems in the production process early on before they develop into something unmanageable. And the list goes on across a plethora of different industries.

Nevertheless, while big data offers tremendous potential to manage risk across many industries and sectors, it’s important to avoid common mistakes when handling said data. These could produce inaccurate results that will enhance risk if instead of reducing it.

Using incomplete or irrelevant data

Data scientists must ensure the data they are using is a relevant and complete representation of what they want to analyze (such as customer behavior, or oil pressures). Using incomplete or skewed data sets can lead to erroneous conclusions that will undermine risk management.

Using data that’s not up-to-date

Historical data is important for generating insights to manage risk. However, it is recommended to also incorporate the most up-to-date data available, preferably in real time, for the most accurate insights. With the world is continually in flux, what was true yesterday may not be true today.

Not taking into account all the key variables

A frequent mistake when performing big data analytics is not including all the pertinent variables in the calculations. Data scientists must ensure that all relevant variables (e.g. customer income, credit history and employment history for evaluating mortgage suitability) are captured, since even one missing variable can dramatically alter the accuracy of the result. Deciding what the pertinent variables are is not always straightforward, often requiring deep thought as well as even trial and error iteration.

Selection bias

Perhaps the most serious mistake of all is cherry-picking the data set to produce results which are skewed based on the analyst’s bias. Data scientists must be very careful to not let their subjective views affect what data sets they select for evaluation. This point seems highly relevant in today’s era of ‘fake news’, where people listen to news which they want to be factual, even if it’s not. The same principle applies to big data analytics.


Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.