pandas' astype()
function is convenient for casting entire DataFrames, specific columns, or Series into different dtypes
. In this post, we'll go over the basic syntax, and a few examples. The data we used comes from a Kaggle dataset on Goodreads.
df.astype(dtype)
Basic Syntax: The only argument you need is dtype
, set to whatever data type you would like ALL of the data in your DataFrame to be. Let's check what the DataFrame looks like first.
df.info()
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5495 entries, 0 to 5494
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 bookID 5495 non-null int32
1 average_rating 5495 non-null float32
2 isbn13 5495 non-null int64
3 num_pages 5495 non-null int32
4 ratings_count 5495 non-null int32
5 text_reviews_count 5495 non-null int32
dtypes: float32(1), int32(4), int64(1)
memory usage: 150.4 KB
We can see that there are 3 data types in the DataFrame initially. Now let's cast all of them to floats
. We can use a string to designate the datatype, or just name the dtype
.
df.astype("float")
Example 1: df2 = df.astype("float")
df2.dtypes
Output:
bookID float64
average_rating float64
isbn13 float64
num_pages float64
ratings_count float64
text_reviews_count float64
dtype: object
df.astype(float)
Example 2: df3 = df.astype(float)
df3.dtypes
Output:
bookID float64
average_rating float64
isbn13 float64
num_pages float64
ratings_count float64
text_reviews_count float64
dtype: object
In both cases, all of the data was successfully cast into floats
.
NOTE: the function will raise an error if you cannot cast one type to another.
ValueError
Example 3: If we take all of the columns in the initial dataset, which includes strings, such as the title and authors of books, and try to cast the entire DataFrame into float, we get the following error.
df.info()
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5495 entries, 0 to 5494
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 bookID 5495 non-null int32
1 title 5495 non-null string
2 authors 5495 non-null string
3 average_rating 5495 non-null float32
4 isbn 5495 non-null string
5 isbn13 5495 non-null int64
6 language_code 5495 non-null string
7 num_pages 5495 non-null int32
8 ratings_count 5495 non-null int32
9 text_reviews_count 5495 non-null int32
10 publication_date 5495 non-null string
11 publisher 5495 non-null string
dtypes: float32(1), int32(4), int64(1), string(6)
memory usage: 408.0 KB
Notice that there are several columns containing string
data.
df.astype("float")
Output:
ValueError: could not convert string to float: 'Angle of Repose'
Casting specific columns
df.astype({"col1": "dtype", "col2": "dtype"})
Example 1: In this example, you call the astype()
function on the DataFrame
, df
, as before, but instead of passing one dtype
to apply to all the columns, you pass a dictionary where the key is the column name, and the value is the respective dtype
to cast that column of data.
df4 = df.astype({"ratings_count": "int64", "text_reviews_count": "int64"})
df4.dtypes
Output:
bookID int32
title string
authors string
average_rating float32
isbn string
isbn13 int64
language_code string
num_pages int32
ratings_count int64
text_reviews_count int64
publication_date string
publisher string
dtype: object
As you can see, the two columns ratings_count
and text_reviews_count
were successfully cast to int64
.
df["col"].astype(dtype)
Example 2: Another option is to call the astype()
function directly on the column. In the below example, we cast bookID
into floats
.
df["bookID"] = df["bookID"].astype(float)
df.dtypes
Output:
bookID float64
title string
authors string
average_rating float32
isbn string
isbn13 int64
language_code string
num_pages int32
ratings_count int32
text_reviews_count int32
publication_date string
publisher string
dtype: object
NOTE: astype()
works for many data types, but for certain data types, such as datetime
, you need your data to be in a specific format in order to call astype()
. Otherwise, you'll have to use a more specialized casting function like to_datetime()
.
About
Einblick is an agile data science platform that provides data scientists with a collaborative workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick customers include Cisco, DARPA, Fuji, NetApp and USDA. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter.