Reading Pickle Files in Pandas using read

Mokhtar EbrahimLast Updated On: October 16, 2023

read_pickle in Pandas allows you to load pickled Pandas objects.

It can load data such as DataFrames and Series that were saved using Pandas to_pickle method.

In this tutorial, We’ll uncover its syntax, load pickle files into DataFrames, and benchmark its performance under different compression algorithms.

Table of Contents hide

1 Pandas read_pickle Syntax and Parameters
2 Risks of Unpickling Data from Untrusted Sources
3 How to read a pickle file
4 Read Compressed Pickle
5 Read first Row or n Rows from Pickle
6 Benchmark for Different Compression Algorithms
7 Error Handling and Troubleshooting
- 7.1 Solutions
8 Resource

Pandas read_pickle Syntax and Parameters

The basic syntax for read_pickleis as follows:

pandas.read_pickle(filepath_or_buffer, compression='infer', storage_options=None)

filepath_or_buffer: The path to the file which contains the pickled object. This can be either a string representing the file path, a file-like object, or a bytes-like object.
compression: The type of compression to use, if any. By default, it’s set to ‘infer’, which means the method will try to infer the compression type from the file extension. Supported compression types include ‘bz2’, ‘gzip’, ‘xz’, and ‘zip’.
storage_options: This is a dict parameter which is relevant if you’re using specific storage connection settings, especially when working with remote storage like S3 or GCS.

Risks of Unpickling Data from Untrusted Sources

It’s crucial to understand the potential dangers associated with unpickling data, especially when the source of that data is untrusted.

How to read a pickle file

You can use read_pickle function to read pickle file in Panads like this:

import pandas as pddf = pd.read_pickle('sample_data.pkl')

Output:

 Name Age Salary0 Alex 25 500001 John 30 600002 Jane 28 55000

Here, we loaded the DataFrame stored in the “sample_data.pkl” file.

The DataFrame, as shown in the output, has three columns: ‘Name’, ‘Age’, and ‘Salary’ and three entries for demonstration.

Read Compressed Pickle

Pandas natively supports several compression protocols:

gzip: An extensively used compression method, particularly suitable for textual data.
bz2: Another compression method that often provides a better compression ratio than gzip, albeit at a slightly slower speed.
xz: Provides one of the best compression ratios, although it can be much slower than the other methods.
zip: Widely known and used, it is also supported by Pandas for both pickling and reading.

Assuming you have pickled and compressed a DataFrame using one of the supported methods, you can read the compressed file directly using read_pickle by specifying the appropriate compression type.

For gzip compression:

df_gzip = pd.read_pickle('dataframe.pkl.gz', compression='gzip')

For bz2 compression:

df_bz2 = pd.read_pickle('dataframe.pkl.bz2', compression='bz2')

For xz compression:

df_xz = pd.read_pickle('dataframe.pkl.xz', compression='xz')

For zip compression:

df_zip = pd.read_pickle('dataframe.pkl.zip', compression='zip')

One of the handy features is that if you omit the compression parameter when calling read_pickle, Pandas will try to infer the compression based on the file extension.

Read first Row or n Rows from Pickle

Unlike CSV or other textual formats, pickled files are not designed for partial reading.

The primary mechanism with pickles is all or nothing. Once the data is loaded, you can easily access the first row.

Load the Pickle and Access the First Row:

df = pd.read_pickle('dataframe.pkl')first_row = df.iloc[0]

Output:

A 1B 4Name: 0, dtype: int64

The output showcases the values from the first row of our sample DataFrame. Here, we’ve used the iloc property of the DataFrame.

Alternative Method using head():

Pandas DataFrames have a built-in method called head(), which returns the first n rows of the DataFrame.

first_row_with_head = df.head(1)

Output:

 A B0 1 4

If you want to retrieve the first 10 rows, you'll use df.head(10).

Output (for n=3, as an example):

 A B0 1 41 2 52 3 6

Benchmark for Different Compression Algorithms

Below is a Python code that creates sample pickle files with different compressions and benchmarks the reading times using read_pickle:

import pandas as pdimport timedata = {'A': range(1, 100001), 'B': range(100001, 1, -1)}df = pd.DataFrame(data)# Pickle with different compressionsdf.to_pickle("dataframe.pkl") # No compressiondf.to_pickle("dataframe_gzip.pkl.gz", compression='gzip')df.to_pickle("dataframe_bz2.pkl.bz2", compression='bz2')df.to_pickle("dataframe_xz.pkl.xz", compression='xz')df.to_pickle("dataframe_zip.pkl.zip", compression='zip')# Measure load timesfiles = ["dataframe.pkl", "dataframe_gzip.pkl.gz", "dataframe_bz2.pkl.bz2", "dataframe_xz.pkl.xz", "dataframe_zip.pkl.zip"]compression_methods = ["No Compression", "gzip", "bz2", "xz", "zip"]for file, method in zip(files, compression_methods): start_time = time.time() _ = pd.read_pickle(file) end_time = time.time() elapsed_time = end_time - start_time print(f"Reading time with {method}: {elapsed_time:.4f} seconds")

Output:

Reading time with No Compression: 0.0932 secondsReading time with gzip: 0.7555 secondsReading time with bz2: 4.9183 secondsReading time with xz: 2.1486 secondsReading time with zip: 0.7317 seconds

As you can see, gzip and zip compression is the fastest compression you can read from.

The slowest one is the xz compression yet, in case of creating the pickle file it was the smallest in size.

Error Handling and Troubleshooting

One of the common errors when unpickling in Pandas is the unsupported pickle protocol issue.

This arises due to version mismatches between the Python libraries that were used to pickle the data and those being used to unpickle it.

The error message might look something like: ValueError: unsupported pickle protocol: 5.

Solutions

Upgrade Python: If the error is due to an older Python version, consider upgrading to a newer one that supports the required protocol.

Re-Pickle with a Lower Protocol: If you have access to the environment where the data was originally pickled, you can re-pickle it specifying a lower protocol. For example:python df.to_pickle("dataframe_lower_protocol.pkl", protocol=4)

Use a Virtual Environment: If you need to maintain multiple Python versions or library versions, consider using tools like venv or conda to create isolated environments.

General Troubleshooting Tips

Always check the versions of Python and Pandas when facing such issues. This can be done using:

import sysprint(sys.version)

and

print(pd.__version__)

Resource

https://pandas.pydata.org/docs/reference/api/pandas.read_pickle.html

Mokhtar Ebrahim

Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.

Read Excel files using Pandas read_excel
Read JSON files using Python Pandas read_json
Read SQL Query/Table into DataFrame using Pandas read_sql
Read HTML tables using Pandas read_html function
Read Parquet files using Pandas read_parquet
Parsing XML Files into DataFrames using Pandas read_xml
Select columns using Pandas read_csv usecols parameter

Reading Pickle Files in Pandas using read_pickle (2024)

FAQs

How to read pickle files in Pandas? ›

Reading Pickle Files Using Pandas

This function takes the name of the pickle file as an argument and returns a pandas DataFrame. One can read pickle files in Python using the read_pickle() function. Similar to the read_csv() function, this function will also return a Pandas DataFrame as output.

Discover More Details ›

How do I see the contents of a pickle file? ›

Command line usage

When invoked from the command line, python -m pickletools will disassemble the contents of one or more pickle files. Note that if you want to see the Python object stored in the pickle rather than the details of pickle format, you may want to use -m pickle instead.

See Details ›

How to read model pickle file in Python? ›

To load a saved model from a Pickle file, all you need to do is pass the “pickled” model into the Pickle load() function and it will be deserialized. By assigning this back to a model object, you can then run your original model's predict() function, pass in some test data and get back an array of predictions.

Explore More ›

How to view .pkl file? ›

If you cannot open your PKL file correctly, try to right-click or long-press the file. Then click "Open with" and choose an application. You can also display a PKL file directly in the browser: Just drag the file onto this browser window and drop it.

Discover More Details ›

What is the difference between Pandas read pickle and CSV? ›

Pickle is a serialized way of storing a Pandas dataframe. Basically, you are writing down the exact representation of the dataframe to disk. This means the types of the columns are and the indices are the same. If you simply save a file as csv , you are just storing it as a comma separated list.

Keep Reading ›

How do I load a pickle file? ›

Pickling with a File

In this example, we will use a pickle file to first write the data in it using the pickle. dump() function. Then using the pickle. load() function, we will load the pickle fine in Python script and print its data in the form of a Python dictionary.

Show Me More ›

Which of the following methods is used to read data from a pickle file? ›

Similarly, load() reads pickled objects from a file, whereas loads() deserializes them from a bytes-like object. In this tutorial, we will be using the dump() and load() functions to pickle Python objects to a file and unpickle them.

Show Me More ›

How to check if a pickle file exists in Python? ›

We use the is_file() function, which is part of the Path class from the pathlib module, or exists() function, which is part of the os. path module, in order to check if a file exists or not in Python.

See Details ›

How to use pickle in Python? ›

To use pickle in Python, we must first create an object like my_object. Then, using the dump() function and the 'wb' parameter on open(), we can serialize this object to a file called my_object. pickle for future usage. This binary write mode ensures that our data is secure and stored correctly for later access.

See Details ›

What is a .pickle file? ›

Pickle in Python is primarily used in serializing and deserializing a Python object structure. In other words, it's the process of converting a Python object into a byte stream to store it in a file/database, maintain program state across sessions, or transport data over the network.

See Details ›

What is the difference between pickle and marshal in Python? ›

The pickle module differs from marshal in several significant ways: The pickle module keeps track of the objects it has already serialized, so that later references to the same object won't be serialized again. marshal doesn't do this. This has implications both for recursive objects and object sharing.

Know More ›

How to open pickle file in pandas? ›

read_pickle() method in Pandas. File path where the pickled object will be loaded. For on-the-fly decompression of on-disk data. If 'infer', then use gzip, bz2, xz or zip if path ends in '.

Read On ›

How to parse a PKL file? ›

. pkl file are Run by Python.
You need to install a module named Pickle for open . pkl file in binary mode.
import pickle.
with open('filename.pkl', 'rb') as file:
my_object = pickle.load(file)
print(my_object)

Feb 1, 2023

Get More Info ›

What is the extension of a Python pickle file? ›

Python pickle files may have the extension ". pickle" or ". pkl".

Tell Me More ›

How to read ODS file using Pandas? ›

read_excel() returns a new DataFrame that contains the values from data. xlsx . You can also use read_excel() with OpenDocument spreadsheets, or . ods files.

View Details ›

How do I read a zip file in Pandas? ›

1️⃣ Read zip files without going through the whole process of unzipping before reading. 2️⃣ Write files into a compressed format, rather than saving them in memory before compression. ✨ Specify the compression format when reading a single zip file. ✨ Use the ZipFile class when reading a file from a zip folder.

Show Me More ›

How to read a binary file using pickle? ›

Reading all records of binary file using pickle module

The read_records() function reads record from the binary file and displayed on the screen using the same object. If the end of the file is already reached, the load function will raise an EOFError exception.

Learn More ›

Reading Pickle Files in Pandas using read_pickle (2024)

Pandas read_pickle Syntax and Parameters

Risks of Unpickling Data from Untrusted Sources

How to read a pickle file

Read Compressed Pickle

Read first Row or n Rows from Pickle

Benchmark for Different Compression Algorithms

Error Handling and Troubleshooting

Solutions

Resource

Related posts

FAQs

How to read pickle files in Pandas? ›

What is the difference between pickle and marshal in Python? ›