How We Used Data Analysis to Detect Fraudulent Transaction
Transaction fraud is a problem that financial companies are constantly facing. And ordinary customers suffer from it either. However, if you learn to detect such frauds in time, the situation will improve significantly. Happily, special types of data analysis can become great assistants in such a matter.
This is what our article is devoted to. We're going to discuss the issue of online transaction fraud detection and describe the main data analysis steps to achieve the desired. And we won't limit ourselves to theory alone. We'll give you an actual example of our own practice.
Let’s get started!
Problem with fraudulent transactions
In the KPMG international study on banking risks, fraud and scams are included in the top 5 challenges, which banks are being confronted with. As 60% of respondents from the information security services of financial organizations admit, they recently noticed the activation of cyber fraudsters, which led to significant losses.
Clients of financial institutions remain at a loss too. According to the BBB Scam Tracker, in 2019 the average American lost $75 to $2,500 (depending on the scenario used) on the tricks of bank fraudsters.
But what is transaction fraud? What types of scams are there and what do bank clients complain about?
We’ll list some of the most popular cases:
-
the creation of so-called “clone cards”. Fraudsters read secret information from the user's card’s magnetic stripe and then make “white cards” - pieces of plastic with a magnetic stripe and stolen information printed on them. After that, attackers can freely use the account of the real cardholder;
-
phishing when secret card data is received from the user himself. There are a lot of phishing instances, and the most striking one is when attackers contact users on the bank's behalf (presumably) - usually by email or SMS. Cyber-crooks convince users that important changes are being made in the banking security system (or they come up with another convincing excuse to get what they want). Also, scammers ask unsuspecting bank clients to renew their private card information, either by sending a response letter or by filling out the attached form;
-
another example of financial transaction card fraud is the case when the user is convinced to transfer his money to the account of the attacker (of course, this poor fellow doesn’t suspect a thing, he’s quite sure he’s dealing with a trustworthy company);
-
conversion of a check stolen by an attacker.
Okay, we've discussed the fraud issue, and now it's time to consider which data analysis to use to avoid these problems. Moreover, we'd like to share with you our own experience and describe a real-life case.
So let's take a look at our financial data analysis example.
What was the primary task?
It all started with the fact that a reputable European company requested our assistance. Its owners instructed us to help them quickly and efficiently identify fraudulent card transactions. As you already understood, this problem is much more serious than it might seem and causes a lot of trouble for financial institutions.
We were provided with a set of transactions of bank accounts (cards) for two payment systems, some of which were fraudulent. And we had to create a reliable transaction fraud detection algorithm.
Inputs
The data was a collection of transaction information by day. From this point of view, it was considered as a time series {yt}, in which frauds were allocated according to a special algorithm. The result was 2 time series: “normal” payments {yn} frauds {fn}.
In Figure 1, normal payments {yn} marked in blue, and frauds {fn} are highlighted in red.
Figure 1
The initial approach to financial data analysis
To analyze the structure of the {fn} series, standard tests were used: the extended Dickey-Fuller test aimed at detecting the presence of unit roots and the Kwiatkowski-Phillips test for stationarity of the series. Both showed the stationarity of the series {fn}, namely, Δfn~I(0).
Using smoothing (the red curve in Figure 2), the optimal curve was selected to make a further forecast (Fig. 3).
Figure 2
Figure 3
Despite the fact that the series was stationary, we used the ARIMA model (ARIMA (1,1,2), which gave an acceptable result.
Another way to solve the problem
The above outcome is more or less satisfactory, however, there is a small problem. The thing is, the fraud forecast, like any forecast related to the time series, has a confidence zone of the result expanding according to the quadratic law. To be precise, only 2-3 forecast periods (in our case, 2-3 days) can be considered the relevant result. In addition, online transaction fraud detection occurs much later than when it really happens, which forces us to take into account the compensatory amount.
These findings got us to reconsider the approach to solving the problem. And we found another way to implement the analysis of financial data.
We reached the conclusion that it would be more convenient to implement a different principle and first determine the type of payment: normal payment, suspicious one, or fraud. Then it all comes to the classification of the main components in space using the methods of nearest neighbors or neural networks.
Now, after describing the approximate data analysis process, we'd like to tell you where else it can be used (in addition to detecting financial transaction card fraud).
More options to use data analysis techniques
So where and when should you apply big data analysis? We'll provide some of the most obvious and sought-after examples.
#1. Classic analysis
Here we're talking about statistics in all its forms. As you know, the activity of people in many cases involves working with data, which includes the study, processing, and analysis of information. What's more, the main characteristic of statistical analysis methods is their complexity.
A statistical study and related data analysis can be performed using the following methods:
-
Statistical observation;
-
Grouping of materials of statistical observation;
-
Absolute and relative statistical values;
-
Variation series;
-
Competent sampling;
-
Correlation and regression analysis;
-
Rows of dynamics.
#2. Modeling
In this case, we’re dealing with neural networks and classification methods, which are an important part of modern machine learning techniques.
Let's give an example of the problem the classification solves.
Imagine that there are many elements (situations) divided into specific classes (let’s name these elements the initial set). Also, you have a finite set of objects, and you know which classes they belong to. Such a set is called a training sample. What classes the remaining objects belong to is unknown. You have to construct a data analysis algorithm capable of classifying an arbitrary element from the initial set.
The algorithm you created can be used to solve the following problems:
-
Assessment of creditworthiness of borrowers;
-
Prediction of customer attrition;
-
Optical character recognition;
-
Speech recognition;
-
Spam detection;
-
Classification of documents.
#3. Web scrape
Let’s talk about another case when we might need data analysis techniques, namely, web scraping.
In a broad sense, web scraping is the collection of information from various Internet resources. The useful data category may include:
-
catalog of goods;
-
all sorts of images;
-
videos;
-
text content;
-
open contact details - email addresses, phone numbers, etc.
#4. Social Network Analysis, or Graph Theory
Graph theory belongs to discrete mathematics and is widely used in solving various problems in different fields of activity, including economics, programming, communication, and sociology.
Using social graphs, we deal with the following issues:
-
user identification;
-
social search;
-
generation of recommendations helping to choose “friends”, media content, and news;
-
identification of "real" relationships;
-
collecting open information for graph modeling.
Though, the data analysis process in the case of social graphs is associated with a number of difficulties, such as differences in social networks and closed social data.
#5. Medical research
Analysis of medical data allows you to bravely face such problems as:
-
Medical research planning and data collection;
-
Calculation of the main descriptive characteristics of the studied values;
-
Visual representation of data through the construction of such graphs as histograms, scatterplots, etc.
-
Identifying statistically significant differences between samples;
-
Analysis of dependencies between factors;
-
Survival analysis;
-
Calculation of the required sample size;
-
Prediction of treatment outcome.
Summary
We've examined the key data analysis steps and described how to implement transaction fraud detection. And we sincerely hope our review would be useful to you.