This post gives a quick example of using the faker
package in Python to generate fake customer data.
You may find yourself building something where you’ll ultimately be processing personal data, such as names, addresses or phone numbers.
Given the sensitivity of this data, you might want to start out using some fake data.
This is where faker
comes in.
It provides a range of functions to generate real-looking personal data.
The first step in using faker
is to create a Faker
instance
We’re also importing pandas
as we’ll be using this shortly.
Prior to creating my object fake
, I’m also setting the seed with Faker.seed(10)
.
This is set upfront for the class, prior to creating a specific instance.
If you try to set the seed on a class-instance you get an error.
import pandas as pd
from faker import Faker
# set the seed
Faker.seed(10)
# set the locale to GB
fake = Faker("en_GB")
When creating your Faker
instance you can also set the locale.
Here I’m using en_GB to access some UK-specific functions.
You can also set multiple locales, as shown in the documentation.
All you need to do is pass a list of locales, which are listed out with their respective functions.
The next step is to actually start creating some fake data.
When you call one of the methods of a Faker
instances you’ll get a single result back.
For example, fake.first_name()
will return a single fake first name.
Below list comprehension is being used to call the functions N
times and get a list of N
results back.
The postcode()
method is available from the en_GB
locale to give us valid UK postcodes.
# how many customers to fake
N = 1000
# get first names, last names and postcodes
first_name = [fake.first_name() for i in range(N)]
last_name = [fake.last_name() for i in range(N)]
postcode = [fake.postcode() for i in range(N)]
Finally, the data is combined into a pandas
dataframe so it can be saved out to file or processed further.
# create a dataframe
df_fake = pd.DataFrame({
"first_name": first_name,
"last_name": last_name,
"postcode": postcode})
# save to a csv
df_fake.head()
## first_name last_name postcode
## 0 Alexandra Cole HR20 9HP
## 1 Ryan Shaw OL5P 7ZE
## 2 Anne Riley E9 3XZ
## 3 Pamela Campbell B2W 8PS
## 4 Melissa Doherty M6E 7GQ
Next steps
There are a myriad of others things you can fake with faker
.
Below are just a few examples, and you can find many more in the docs.
- Get a full address
print(fake.address())
## 9 Hughes forest
## Millerside
## RM7 7WQ
- A fake dictionary, with the ability to control types allowed for the values (
value_types
) and whether the number of entries can vary (variable_nb_elements
)
fake.pydict(nb_elements=4)
## {'nostrum': 'pateljayne@example.net', 'dicta': Decimal('-70767313.43577077714920803205648961606945059892963516160911845072344355')}
You can also create fake credit card numbers, which is very useful given all security controls around this sort of data.
print(fake.credit_card_full())
## JCB 15 digit
## Jacob Power
## 213164059033701 09/27
## CVC: 190
There are also more niche options depending on the data you work with, such as:
fake.isbn10()
for ISBNsfake.color()
for a colour hexcode, with various options for other colour formats or families of colours
A number of additional community packages are available to extend faker, e.g.
- Air travel data with
faker_airtravel
- Music genres and instruments with
faker_music
- Fake markdown post with
mdgen
. (It’s worth a look at the docs just to laugh at the nonsense posts it creates: pypi.org/project/mdgen/)
Final mention goes to the .catch_phrase()
method because it allows us to create some pretty amusing company catch phrases.
Below are some fake companies and their associated catch-phrases.
for i in range(5):
f"{fake.company().title()}: {fake.catch_phrase().title()}"
## 'Taylor-Davies: Front-Line Dynamic Portal'
## 'Smith Ltd: Secured 3Rdgeneration Moratorium'
## 'Andrews Ltd: Realigned Explicit Product'
## 'Booth-Mccarthy: Expanded Disintermediate Success'
## 'Martin, Bennett And Matthews: Integrated Reciprocal Attitude'