This post gives a quick example of using the faker package in Python to generate fake customer data. You may find yourself building something where you’ll ultimately be processing personal data, such as names, addresses or phone numbers. Given the sensitivity of this data, you might want to start out using some fake data. This is where faker comes in. It provides a range of functions to generate real-looking personal data.

The first step in using faker is to create a Faker instance We’re also importing pandas as we’ll be using this shortly. Prior to creating my object fake, I’m also setting the seed with Faker.seed(10). This is set upfront for the class, prior to creating a specific instance. If you try to set the seed on a class-instance you get an error.

import pandas as pd
from faker import Faker

# set the seed

# set the locale to GB
fake = Faker("en_GB")

When creating your Faker instance you can also set the locale. Here I’m using en_GB to access some UK-specific functions. You can also set multiple locales, as shown in the documentation. All you need to do is pass a list of locales, which are listed out with their respective functions.

The next step is to actually start creating some fake data. When you call one of the methods of a Faker instances you’ll get a single result back. For example, fake.first_name() will return a single fake first name. Below list comprehension is being used to call the functions N times and get a list of N results back. The postcode() method is available from the en_GB locale to give us valid UK postcodes.

# how many customers to fake
N = 1000

# get first names, last names and postcodes
first_name = [fake.first_name() for i in range(N)]
last_name = [fake.last_name() for i in range(N)]
postcode = [fake.postcode() for i in range(N)]

Finally, the data is combined into a pandas dataframe so it can be saved out to file or processed further.

# create a dataframe
df_fake = pd.DataFrame({
    "first_name": first_name,
    "last_name": last_name,
    "postcode": postcode})

# save to a csv
##   first_name last_name  postcode
## 0  Alexandra      Cole  HR20 9HP
## 1       Ryan      Shaw  OL5P 7ZE
## 2       Anne     Riley    E9 3XZ
## 3     Pamela  Campbell   B2W 8PS
## 4    Melissa   Doherty   M6E 7GQ

Next steps

There are a myriad of others things you can fake with faker. Below are just a few examples, and you can find many more in the docs.

  • Get a full address
## 9 Hughes forest
## Millerside
## RM7 7WQ
  • A fake dictionary, with the ability to control types allowed for the values (value_types) and whether the number of entries can vary (variable_nb_elements)
## {'nostrum': '', 'dicta': Decimal('-70767313.43577077714920803205648961606945059892963516160911845072344355')}

You can also create fake credit card numbers, which is very useful given all security controls around this sort of data.

## JCB 15 digit
## Jacob Power
## 213164059033701 09/27
## CVC: 190

There are also more niche options depending on the data you work with, such as:

  • fake.isbn10() for ISBNs
  • fake.color() for a colour hexcode, with various options for other colour formats or families of colours

A number of additional community packages are available to extend faker, e.g.

  • Air travel data with faker_airtravel
  • Music genres and instruments with faker_music
  • Fake markdown post with mdgen. (It’s worth a look at the docs just to laugh at the nonsense posts it creates:

Final mention goes to the .catch_phrase() method because it allows us to create some pretty amusing company catch phrases. Below are some fake companies and their associated catch-phrases.

for i in range(5):
  f"{}: {fake.catch_phrase().title()}"
## 'Taylor-Davies: Front-Line Dynamic Portal'
## 'Smith Ltd: Secured 3Rdgeneration Moratorium'
## 'Andrews Ltd: Realigned Explicit Product'
## 'Booth-Mccarthy: Expanded Disintermediate Success'
## 'Martin, Bennett And Matthews: Integrated Reciprocal Attitude'