Geography of Online Scams
Geography of Online Scams
Geography of Online Scams
Abstract—This paper presents an analysis of online dating their site via known web proxies and similarly allocated IP
fraud’s geography. Working with real romance scammer dating blocks. There are however limitations to the effectiveness of
profiles collected from both proxied and direct connections, these countermeasures, with privately hosted or intentionally
we analyse geographic patterns in the targeting and distinct
characteristics of dating fraud from different countries, revealing disguised proxies escaping the checks of proxy listing services.
several strong markers indicative of particular national origins The real location, even at a national level, of the creators of
having distinctive approaches to romance scamming. We augment the scam profiles is of interest both to law enforcement and
IP geolocation information with other evidence about the dating for other preventative efforts – not only for the purpose of
profiles. By analysing the resource overlap between scam profiles, identifying that a given profile is a scam, but for following up
we discover that up to 11% of profiles created from proxied
connections could be assigned a different national origin on with appropriate countermeasures once a significant origin of
the basis of text or images shared with profiles from direct scams has been identified (e.g., contacting local law enforce-
connections. Our methods allow for improved understanding ment, funding targeted preventative campaigns). This paper is
of the origins of dating fraud, beyond only direct geolocation the first study we know of to address this topic.
of IP addresses, with patterns and resource sharing revealing In this paper, we use a dataset of real online dating scam
approximate location information which could be used to target
prevention campaigns.
profiles which includes profiles created via both proxied and
direct connections. We set out to answer the following research
questions:
I. I NTRODUCTION • Where does dating fraud come from? What does IP
The online romance scam is one the most prevalent forms of geolocation evidence tell us about the origins of profiles
mass-marketing fraud in many Western countries. False dating created via direct connections, and how does this connect
profiles are created by scammers as a prelude to a sustained to the locations given in the profiles?
false romance, during which the target is repeatedly defrauded • Do profile elements get reused internationally? Does
of large sums of money. The impact on victims in terms of reuse suggest different origins for dating profiles? Can
both monetary loss and emotional harm can be substantial. we complement IP geolocation by examining profile
However, technical analysis of the methods used by these elements being reused between unproxied and proxied
scammers remains sparse, with few quantitative analyses of connections?
attacks and attackers. • Does dating fraud from different regions present
Previous work has explored victim understanding of the different characteristics? Do countries tend towards
scam process in interview settings [1], text reuse in romance certain forms of romance scam in a distinctive manner?
scammer approaches via Craigslist [2] and strategies deployed In Section II below, we describe the available data, and note
in an anonymous Chinese dating site [3]. A major unaddressed its limitations. In Section III below, we outline the significant
hurdle for combatting this fraud is understanding its true origin countries within the SOURCE dataset, and the national
global origins, as misrepresentation of location is common. locations those profiles present. In Section IV we look at
Uncertainty about location and international legal obstacles text and images being shared between romance scam profiles,
can hinder investigation and prosecution. and what these patterns suggest about the PROXY dataset.
The locations scammers give in their profile are typically In Section V, we examine the major scam origin nations
regarded as being as false as the profile picture, calculated to to identify patterns in other elements of the profiles, before
attract the interest of their targets [1]. Dating sites record the concluding in Section VI with a discussion of the policy
IP addresses used by scammers in creating and accessing their implications of this analysis.
profiles, and may compare those addresses to blacklists or use
the IP geolocation (especially when compared to the profile’s II. DATA S OURCE
declared location) to inform a judgement about the likelihood The data used in this paper comes from a public online
that a profile is genuine. In response, most scam profile dating scamlist maintained at scamdigger.com, which offers
authors make use of web proxies to disguise their IP address up romance scammer profile data for public awareness. An
connection information, and so they appear to be using a exhaustive collection of the 5,402 scam profile instances, as
connection from the location given in their profile information. collected during March 2017, was examined with respect to
Dating sites are predictably countering by banning access to two sources of geographic information:
2
1) The location given in the scammer dating profile infor- these types of fraud. The next largest origins, Malaysia and
mation. South Africa, are also well-known for producing other forms
2) The IP address used to create the profile, as reported by of internet fraud. All of the listed nations score below 50
the dating site. on the 2016 Corruption Perception Index [6], except for the
Other profile elements of note include the age, gender, United States and the United Kingdom, suggesting these may
occupation, marital status and self-description, which are be unusual cases.
analysed in detail in related work. Of the two sources of
Nation Count Proportion
geographic information, the former was recorded as a string,
often specifying location to a city level. This was geocoded 1 Nigeria 488 0.302
2 Ghana 216 0.134
to lat/lon coordinates and a standard format through queries 3 Malaysia 178 0.110
to the Open Street Map’s Nominatim service1 . For the sake 4 South Africa 140 0.087
of brevity, the locations given in profiles are referred to as the 5 United Kingdom 86 0.053
6 United States 57 0.035
presented locations. 7 Turkey 50 0.031
The IP address information was mapped to a location 8 India 47 0.029
through the use of a geolocation service 2 , providing both coor- 9 Togo 41 0.025
10 Senegal 40 0.025
dinates and structured address information. Some 368 records 11 Philippines 29 0.018
contained no IP address information and were excluded, 12 Ukraine 28 0.017
leaving 5,194 profile instances. Of the IP addresses used, 13 Russia 24 0.015
14 Ivory Coast 23 0.014
many (67.9%) have been identified as known web proxies or 15 Kenya 22 0.014
VPN end-points by the dating site, raising doubts about the
reliability of the inferred geographic location. For this purpose, TABLE I: The SOURCE countries for > 20 scam profiles
we separate the data into the SOURCE (i.e., un-proxied users)
and PROXY (i.e., proxied users) subsets, of 1,666 and 3,528 Figure 1 plots the major scam origins against their profile’s
profiles respectively. It is possible that IP addresses from presented location, as directional arrows weighted by volume
the SOURCE dataset are in fact unknown proxies, perhaps of scams. The United States is the location most commonly
shared secretly amongst criminals, and similarly, it is possible presented in dating profiles, at 63% of the SOURCE dataset,
that PROXY users are only masking their specific connection followed by the UK (11%), Germany (3%) and Canada (2%).
information rather than their national origins. We address these As presented locations are usually indicative of the victims’
possibilities below as they touch upon our results. nationality, we can understand the data as reporting that
Some important limitations of the data source must residents of the US are the major target of romance scams,
be acknowledged as context for our analysis. Firstly, the followed by those of other western nations.
scamdigger.com site is primarily a scam-list for profiles sub- Africa: Most African sources focus their attention on the
mitted to a particular dating site, datingnmore.com, which major western targets reported above. A notable exception
reviews submitted profiles with particular focus on online is a cluster of profiles from Ghana which appear to report
dating fraud, and lists those identified as scammers either at their location accurately. This may be a simple reaction to a
registration or after interaction with members. The profiles scam-detection methodology which uses mismatches between
presented are thus those of scammers that attempt to target presented and IP-geolocated locations3 ; or could represent a
this particular dating site, which may be a source of unknown more ‘honest’ scam format aimed at extracting funds through
bias. As with almost all criminal data analysis, these are also straight seduction. A similar but smaller group appears in
those dating fraud profiles from scammers who have been South Africa. Other exceptions include a small cluster of pro-
identified or caught, and it is possible that they are not rep- files from South Africa and Ghana which present their location
resentative of a more skillful subpopulation, which could also as Iraq and Afghanistan. These are classic “military scam”
be geographically biased. The former issue could be explored profiles, purporting to be members of the US military stationed
further through comparison with statistics from other dating overseas. A small number of Nigerian profiles present their
sites, where they can be persuaded to release this information. location as Malaysia, for unclear reasons.
The latter is an inherent limitation of criminological data. Europe: Almost all SOURCE profiles from the United King-
dom presented themselves as from the United States, with
III. G EOGRAPHIC O RIGINS OF DATING F RAUD only 9% targeting the United Kingdom itself, despite this also
Table I lists the significant origin countries for the SOURCE being an internationally targeted location. Profiles originating
dataset. The largest single origin by far was Nigeria, at in Turkey targeted the United States and Germany, in keeping
over 30% of the dataset. West Africa in general accounts with the international norm. Most interestingly, profiles from
for over 50% of the SOURCE locations. These proportions the Ukraine and Russia almost always presented their national
closely match previous observations of the national origins location as consistent with their IP address. This marked devi-
of advance-fee fraud, as determined by email header IP ation from the pattern of romance scams originating elsewhere
addresses [4], [5], suggesting potential commonality between highlights the distinctive nature of Russian and Ukrainian
dating fraud.
1 https://wiki.openstreetmap.org/wiki/Nominatim (March 2017)
2 http://freegeoip.net (September 2017) 3 Such a method is in use by the dating site operators
3
Fig. 1: The major paths from SOURCE IP addresses to the locations given in profiles
Asia: India follows the international norm in presenting pro- that knowledge of proxies is affected similarly despite their
files as from the United States and United Kingdom, although location around the globe, means we are searching for an
the ratio allocated to each is weighted more in favour of the unknown threshold at which to discard the idea that certain
United Kingdom (2:1 vs the 10:1 in West Africa), perhaps origins are genuine – the rate of false negative error in these
due to closer national ties. There are some small groups proxy lists. As we cannot be certain of this rate, no hard
of Indian source IPs which present profiles in Singapore or conclusions can be drawn from proxy ratios alone, but we
Malaysia. Malaysian scammers also present profiles in the US can say that a large SOURCE:PROXY ratio is a signal carrying
and UK at the Indian 2:1 ratio, with small secondary clusters some information about the credibility of location information.
presenting from Malaysia and nearby Australia. Scammers in Where the number of profiles with an unknown IP address
the Philippines split their presentation between the Philippines is a small fraction of the number of known proxies for this
itself and the US, an unusual pattern that likely reflects the location, we will regard these locations as suspect. Where this
close links between the US and the Philippines. is not the case, we can be more confident that the IP address
United States: Almost all SOURCE profiles from the United accurately reflects the origin of the scam profile.
States gave their location as within the United States. However,
the most common presented state locations were New York Nation P ROXY S OURCE :P ROXY
and Texas, while the source addresses were mostly located in United States 1949 0.03
Arizona, California and Virginia, suggesting a degree of lo- United Kingdom 204 0.42
Russia 47 0.50
cation misrepresentation within the nation or else imprecision Ukraine 23 1.17
of unknown proxying attempts. Philippines 11 2.42
Turkey 10 4.55
India 5 7.83
IV. AUGMENTING G EOLOCATION E VIDENCE Kenya 1 11.00
Ivory Coast 0 23.00
As previously highlighted, SOURCE IP addresses are not Malaysia 5 29.67
necessarily accurate origins – they could be unknown proxies South Africa 3 35.00
which escaped detection. While this is inherently an unknown Nigeria 12 37.54
Senegal 0 40.00
factor, we can make use of certain additional evidence as an Togo 0 41.00
augmentation. For SOURCE IP information we can assess the Ghana 4 43.20
likelihood of impersonations, and for the unknown PROXY
subset’s true locations we can examine the reuse of text and TABLE II: Ratio of suspected source IPs to known proxies by
images with direct connections. country
dataset from outside the US have presented their location and similar scam types. Geographic clusters of resources can
as being in the US, attesting international effort at exactly also be useful in identifying the true origins of profiles using
this form of misinformation. Looking at temporal reporting proxies to hide their location.
information, we find that the proportion of SOURCE profiles in Text reuse is common in scam profiles, with key chunks of
the US has been decreasing since 2013, suggestive of gradually text and expressions being observed across different unique
improving proxy detection. profiles. To identify these overlaps, we first preprocessed the
The UK is the next most suspect IP location, also attracting textual descriptions to standardise case and remove punctu-
a large volume of SOURCE profiles as a falsely presented ation, and then used a longest common substring method to
location, and with more PROXY than SOURCE IP addresses. cluster texts. Any two texts which shared more than a threshold
However, scammers would have to be an order of magnitude of 10 tokens (words) were considered to be part of the same
more effective at masking their IP addresses as UK locations cluster. By this method, 899 unique profiles could be assigned
than as US locations, in order to explain the ratios of scam to a cluster, sharing text with at least one other profile4
profiles generated by these IP addresses. It is notable that
both SOURCE and PROXY profiles from UK IP addresses most Location Assigned
Nigeria 88
often present themselves as located in the US. This suggests Ghana 56
either that the UK supports a population of relatively security- Malaysia 41
conscious romance scammers targeting the US, or is acting as a Italy 11
South Africa 8
significant staging ground for fraud from elsewhere directed at India 5
the US. Temporal information here also suggests a downward United Kingdom 5
trend since a spike in 2014. Benin 4
Kenya 4
Russia and the Ukraine are also locations with a significant Philippines 4
number of PROXY profiles, but here there is less reason to Other 15
suspect the SOURCE IP addresses do not reflect the national TABLE III: Inferred true locations of PROXY profiles
origin of the scam. Unlike the US and UK, we do not see any
significant number of other SOURCE profiles presenting Russia
Looking first of all at reuse within the SOURCE subset,
and the Ukraine as their location, and unlike the SOURCE
the greatest text reuse occurred within nations, with multiple
profiles, most PROXY profiles from these locations present
unique profiles originating in Nigeria and South Africa sharing
their location as the US. The reporting figures appear stable
description text. The greatest international text reuse was
over the observed period. The few presented Russian and
between Nigeria and South Africa, with multiple profiles in
Ukrainian PROXY profiles may simply be scammers protecting
each country sharing elements, and, interestingly, between
their individual location and connection information, without
Nigeria and the United States. Given the previous evidence
interest in masking their national origins. Similarly, known
that the SOURCE profiles in the United States may have been
proxies account for just over a quarter of the IP addresses
created through undetected proxies, we can take these Nigerian
from the Philippines, but there are few profiles traced from
and South African scripts appearing in the US as further
outside the country which purport to be located there, so there
evidence of this under-detection. Similarly, scripts appearing in
is little reason to suspect large-scale misrepresentation.
the United Kingdom suggest that there are undetected proxies
The remaining locations are only lightly populated by IP
amongst the SOURCE IP addresses from the UK. Text reuse
addresses from known proxies, and we may have confidence
within Africa and between Nigeria and to a lesser extent
that these are genuine national origins of online dating fraud.
with all of Malaysia, India and Turkey, suggest a common
Some locations show up neither as significant SOURCE approach to romance scamming in these nations. Notably, we
origins nor as presented locations in profiles, but only as see little to no direct text reuse from Russia, the Ukraine
transit points in the PROXY dataset. These are locations with or the Philippines, either internally or externally, though it is
significant proxy populations, but apparently of low appeal worth noting that we have relatively few examples from these
as targets for international dating fraud. All such profiles countries in comparison to the numbers from West Africa.
predominantly presented as located in the United States, with
Turning to the PROXY dataset, we find that 241 (11%) share
the proxy country being at best a distant second. Notable
text with SOURCE profiles, meaning that their true location can
transit locations include the Netherlands, Switzerland, Sweden,
be indirectly inferred. Table III reveals the results of assigning
France, Australia, Romania and Finland.
the majority national label for shared clusters. As well as
adding significantly to the totals for the already-dominant
B. Profile Description Reuse West African and Malaysian scam origins, this inference also
reveals a number of Italian scam profiles. Combining these
Previous work has shown that romance scammers engage
discovered origins with the smaller number of Italian SOURCE
in substantial reuse of certain profile elements to save on
profiles which enabled this inference, Italy would place 11th
labour, using certain cached images and making use of tex-
in Table I, with more profiles originating here than in Russia
tual “scripts” which can be copied and pasted with minimal
or the Ukraine.
editing [2]. We here seek to explore how these sharing
patterns appear geographically. Understanding which sources 4 This number does not count variants of the same profile identified as such
are sharing resources can help identify cooperating criminals from the dataset, so these 899 reflect 28% of the dataset
5
TABLE IV: Dominant demographic characteristics by origin country. Significant differences highlighted.