The pandas.concat() function is a part of the
Pandas library in Python, and it is used for concatenating two or more
Pandas objects along a particular axis, either row-wise (axis=0) or
column-wise (axis=1). This function is extremely useful when you have
data spread across multiple tables, files, or arrays and you want to
combine them into a single DataFrame for easier analysis and
manipulation. The concat function is quite flexible, allowing for
different types of set logic for indexes and relational algebra
functionality between the objects you want to concatenate.
1. Pandas concat() Syntax Explained
The general syntax for pandas.concat() is:
pandas.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)
Here’s a brief overview of some key parameters:
objs: A sequence or mapping of Series or DataFrame objects to be concatenated. This is the primary argument, and the objects you want to concatenate must be passed here.axis: Specifies the axis along which the objects will be concatenated.axis=0indicates row-wise concatenation, whileaxis=1indicates column-wise concatenation.join: Indicates the type of join operation to perform. You can choose between ‘inner’ and ‘outer’ joins.ignore_index: When set toTrue, this will reset the index for the resulting DataFrame. This is useful if you don’t want to keep the index labels from the original DataFrames.keys: Allows you to create hierarchical index levels when you concatenate the DataFrames.verify_integrity: If set toTrue, this will check for duplicate index values and will raise an exception if found.sort: Sorts the result DataFrame’s index if set toTrue.
An example to concatenate two DataFrames row-wise:
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2'],
'C': ['C0', 'C1', 'C2']},
index=[0, 1, 2])
df2 = pd.DataFrame({'A': ['A3', 'A4', 'A5'],
'B': ['B3', 'B4', 'B5'],
'C': ['C3', 'C4', 'C5']},
index=[3, 4, 5])
# Concatenate DataFrames
result = pd.concat([df1, df2])
In the example above, result will be a DataFrame containing rows from
both df1 and df2, concatenated along the rows (axis=0 by default).
1.1 Example - Row-wise Concatenation (Axis=0)
Row-wise concatenation in Pandas is done using the axis=0 parameter.
This concatenates two or more DataFrames along the rows, stacking them
on top of each other. Here’s a simple example to demonstrate this:
# Import the Pandas library
import pandas as pd
# Create two simple DataFrames
df1 = pd.DataFrame({'A': [1, 2, 3],
'B': ['a', 'b', 'c']})
df2 = pd.DataFrame({'A': [4, 5, 6],
'B': ['d', 'e', 'f']})
# Perform row-wise concatenation
result = pd.concat([df1, df2], axis=0)
# Display the resulting DataFrame
print(result)
The output of this code will be:
A B
0 1 a
1 2 b
2 3 c
0 4 d
1 5 e
2 6 f
Explanation of Result
As you can see, the DataFrames df1 and df2 are stacked on top of
each other, and the resulting DataFrame result contains all the rows
from both. Note the following:
- Index Labels: The index labels are retained from the original
DataFrames. If you want to ignore these and have a new continuous
index, you can set the
ignore_indexparameter toTrue. - Column Matching: The columns from each DataFrame (
'A'and'B'in this example) match, so there are no NaN values. If the columns didn’t match, the missing ones would be filled with NaN in the resulting DataFrame. - Order of DataFrames: The order in which DataFrames are listed in
the
pd.concatfunction determines the order of the rows in the concatenated DataFrame. In the example, rows fromdf1appear first, followed by rows fromdf2.
1.2 Example - Column-wise Concatenation (Axis=1)
Column-wise concatenation in Pandas is done using the axis=1
parameter. This stacks DataFrames side by side, appending the columns of
one DataFrame next to another. Here’s a simple example:
# Import the Pandas library
import pandas as pd
# Create two simple DataFrames
df1 = pd.DataFrame({'A': [1, 2, 3],
'B': ['a', 'b', 'c']})
df2 = pd.DataFrame({'C': [4, 5, 6],
'D': ['d', 'e', 'f']})
# Perform column-wise concatenation
result = pd.concat([df1, df2], axis=1)
# Display the resulting DataFrame
print(result)
The output will look like this:
A B C D
0 1 a 4 d
1 2 b 5 e
2 3 c 6 f
Explanation of Result
The DataFrames df1 and df2 are concatenated side-by-side, resulting
in a new DataFrame result that includes all the columns from both.
Let’s discuss some of the details:
- Index Matching: By default,
pandas.concataligns the DataFrames based on their index labels. If the index labels in the DataFrames match, as they do in this example, the rows will be correctly aligned. - New Columns: All the columns from
df1('A'and'B') appear first inresult, followed by all the columns fromdf2('C'and'D'). The order depends on the sequence of DataFrames in thepd.concatfunction. - No NaN Values: Since both
df1anddf2have matching indices and cover all the rows, there are no NaN (Not a Number) values inresult.
2. Important Parameters in Pandas concat()
The pandas.concat function offers several parameters that provide you
with fine-grained control over how DataFrames are concatenated. Let’s
delve into some of these important parameters, complete with examples.
1. objs: Specifying the DataFrames
The objs parameter is where you specify the sequence of DataFrame or
Series objects that you want to concatenate.
Example:
import pandas as pd
df1 = pd.DataFrame({'A': [1, 2]})
df2 = pd.DataFrame({'A': [3, 4]})
result = pd.concat(objs=[df1, df2])
# Output: A DataFrame with values from df1 and df2
2. axis: Control over Rows or Columns
The axis parameter determines whether the concatenation is row-wise
(axis=0) or column-wise (axis=1).
Example:
Row-wise:
result_row = pd.concat([df1, df2], axis=0)
Column-wise:
result_column = pd.concat([df1, df2], axis=1)
3. join: Inner vs Outer Joins
The join parameter allows you to specify how to handle overlapping
column names. You can opt for an
‘outer’ join, which includes all columns,
or an ‘inner’ join,
which includes only overlapping columns.
Example:
df3 = pd.DataFrame({'B': [1, 2]})
# Inner Join
result_inner = pd.concat([df1, df3], axis=1, join='inner')
# Outer Join
result_outer = pd.concat([df1, df3], axis=1, join='outer')
4. ignore_index: Re-indexing
When set to True, the ignore_index parameter will reset the index of
the resulting DataFrame. This is often useful when the index doesn’t
contain meaningful data.
Example:
result_ignore_index = pd.concat([df1, df2], ignore_index=True)
5. keys: Multi-level Indexing
The keys parameter allows you to create a hierarchical index
(multi-level index) by providing an array or sequence of keys.
Example:
result_keys = pd.concat([df1, df2], keys=['dataset_1', 'dataset_2'])
6. sort: Sorting Index
The sort parameter, when set to True, will sort the columns or index
of the resulting DataFrame.
Example:
result_sort = pd.concat([df1, df3], sort=True)
3. Handling Indexes in Concatenation
When using pandas concat to concatenate DataFrames, managing indexes
can be crucial for data integrity and further data manipulations. There
are several options to handle indexes, which include retaining indexes,
ignoring indexes, and creating multi-level indexes.
1. Retaining Indexes
By default, pandas concat retains the index labels from the original
DataFrames when performing the concatenation. This is most useful when
the indexes have meaningful data.
Example:
import pandas as pd
# Create two DataFrames with index labels
df1 = pd.DataFrame({'A': ['A0', 'A1']}, index=['a', 'b'])
df2 = pd.DataFrame({'A': ['A2', 'A3']}, index=['c', 'd'])
# Retaining Indexes
result_retain = pd.concat([df1, df2])
# Output retains the index labels from df1 and df2
print(result_retain)
2. Ignoring Indexes
If you want to ignore the index labels from the original DataFrames and
instead want to use a new continuous index, you can set the
ignore_index parameter to True.
Example:
# Ignoring Indexes
result_ignore = pd.concat([df1, df2], ignore_index=True)
# Output will have a new continuous index
print(result_ignore)
3. Creating Multi-level Indexes
When you’re dealing with multiple DataFrames and you want to remember
which row came from which DataFrame, you can create multi-level indexes
using the keys parameter.
Example:
# Creating Multi-level Indexes
result_multi = pd.concat([df1, df2], keys=['df1', 'df2'])
# Output will have a hierarchical index
print(result_multi)
Through these different index-handling options, pandas concat provides
a flexible way to manage how your concatenated DataFrame is indexed.
4. Handling Missing Data in Concatenation
When concatenating DataFrames using pandas concat, it’s common to
encounter missing data, especially if the DataFrames you are combining
don’t have the same columns or rows. Knowing how to handle these
situations is essential for maintaining data integrity.
4.1 What Happens When There Are Missing Columns or Rows?
If you concatenate DataFrames that don’t have the same columns, Pandas will introduce NaN (Not a Number) values for the columns that are missing in some DataFrames.
Example:
import pandas as pd
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'C': [7, 8]})
# Columns 'B' and 'C' are not common between df1 and df2
result_missing_columns = pd.concat([df1, df2])
# Output will contain NaN values in columns 'B' and 'C'
print(result_missing_columns)
Similarly, if you perform a column-wise concatenation (axis=1) of
DataFrames with differing row indices, Pandas will introduce NaN values
for the rows that don’t align.
4.2 Using fillna to Replace NaN Values
Once you have NaN values in your DataFrame, you can choose to replace
them with specific values or estimates using the fillna method.
Example:
# Replace NaN values with 0
result_filled = result_missing_columns.fillna(0)
# Output will have 0s instead of NaN
print(result_filled)
You can also replace NaN values in specific columns:
# Replace NaN values in column 'B' with the mean of column 'A'
result_missing_columns['B'] = result_missing_columns['B'].fillna(result_missing_columns['A'].mean())
# Output will have mean of column 'A' in place of NaN in column 'B'
print(result_missing_columns)
5. Advanced Concatenation with Pandas concat() Function
Once you’re comfortable with the basics of pandas concat, you might
find yourself needing more advanced concatenation techniques for complex
data manipulation tasks. Here, we’ll cover some advanced topics, such as
dealing with different column names, different row indexes, and how to
combine concat with other Pandas functions like groupby, merge,
and join.
5.1 Example - Concatenation with Different Column Names
Sometimes the DataFrames you’re working with have different column names
but hold similar types of data. You can still use pandas concat to
concatenate these DataFrames.
Example:
import pandas as pd
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'X': [5, 6], 'Y': [7, 8]})
# Concatenating DataFrames with different column names
result_diff_columns = pd.concat([df1, df2], axis=0)
# Output will contain NaN values due to different column names
print(result_diff_columns)
5.2 Example - Concatenation with Different Row Indexes
If you concatenate DataFrames column-wise (axis=1) and they have
different row indexes, pandas concat will introduce NaN values where
the indexes don’t align.
Example:
df3 = pd.DataFrame({'A': [9, 10]}, index=['x', 'y'])
# Concatenating DataFrames with different row indexes
result_diff_index = pd.concat([df1, df3], axis=1)
# Output will contain NaN values where indexes don't align
print(result_diff_index)
5.3 Example - Combining concat with groupby, merge, and join
The real power of pandas concat emerges when you combine it with other
operations like groupby, merge, and join.
Example with groupby:
# Assume each DataFrame represents data from different years
df1['Year'] = 2020
df2['Year'] = 2021
# Concatenate and then group by 'Year'
result_groupby = pd.concat([df1, df2], keys=['2020', '2021'])
grouped = result_groupby.groupby('Year').sum()
# Output will be the sum of values grouped by 'Year'
print(grouped)
Example with merge:
# Assume you have a third DataFrame with additional info
df_info = pd.DataFrame({'A': [1, 5], 'Info': ['Type1', 'Type2']})
# Concatenate df1 and df2, and then merge with df_info
result_merge = pd.concat([df1, df2], ignore_index=True)
result_merge = pd.merge(result_merge, df_info, on='A', how='left')
# Output will contain additional 'Info' column
print(result_merge)
Example with join:
# Assume df1 and df3 contain related data but have different indexes
result_join = df1.join(df3, how='outer', lsuffix='_df1', rsuffix='_df3')
# Output will be an outer join of df1 and df3
print(result_join)
6. Practical Examples and Use Cases
In this section, we’ll explore some of the most common challenges that
users face while using Pandas concat() Function
1. Concatenating DataFrames with Different Column Names
Problem: Users often ask how to concatenate DataFrames that have different column names.
Example: Let’s say DataFrame df1 has columns ['A', 'B'] and
DataFrame df2 has columns ['X', 'Y'].
Solution: You can concatenate them and get a DataFrame with all four
columns, but the values will be filled with NaN where information is
not available.
import pandas as pd
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'X': [5, 6], 'Y': [7, 8]})
result = pd.concat([df1, df2], axis=0)
# Expected Output:
# A B X Y
# 0 1.0 3.0 NaN NaN
# 1 2.0 4.0 NaN NaN
# 0 NaN NaN 5.0 7.0
# 1 NaN NaN 6.0 8.0
print(result)
2. How to Keep the Original Indices After Concatenation
Problem: Users often want to keep the original index values after
using pandas concat.
Example: DataFrame df1 and df2 both have an index ranging from 0
to 2.
Solution: Using the default settings of pd.concat() will retain
the original indices.
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
result = pd.concat([df1, df2])
# Expected Output:
# A B
# 0 1 3
# 1 2 4
# 0 5 7
# 1 6 8
print(result)
Output: The resulting DataFrame will have duplicated indices, each
ranging from 0 to 2, unless ignore_index is set to True.
3. Concatenating a List of DataFrames in a Loop
Problem: Users sometimes need to concatenate a list of DataFrames in a loop, which can be inefficient.
Example: You have a list of DataFrames [df1, df2, df3].
Solution: It’s more efficient to concatenate them all in a single line instead of using a loop.
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
df3 = pd.DataFrame({'A': [9, 10], 'B': [11, 12]})
result = pd.concat([df1, df2, df3])
# Expected Output:
# A B
# 0 1 3
# 1 2 4
# 0 5 7
# 1 6 8
# 0 9 11
# 1 10 12
print(result)
Output: This produces a single DataFrame that contains all the rows
from df1, df2, and df3.
4. Concatenating DataFrames with Different Row Indices
Problem: How to concatenate DataFrames that have different row indices?
Example: DataFrame df1 has a row index of ['a', 'b'] and df2
has a row index of ['x', 'y'].
Solution: If you use pandas concat with axis=0, the row indices
will be retained, leading to a DataFrame with indices
['a', 'b', 'x', 'y'].
df1 = pd.DataFrame({'A': [1, 2]}, index=['a', 'b'])
df2 = pd.DataFrame({'A': [3, 4]}, index=['x', 'y'])
result = pd.concat([df1, df2], axis=0)
# Expected Output:
# A
# a 1
# b 2
# x 3
# y 4
print(result)
Output: The resulting DataFrame will have indices
['a', 'b', 'x', 'y'].
5. Handling NaN Values After Concatenation
Problem: What to do if the concatenated DataFrame has NaN values?
Example: DataFrame df1 has columns ['A', 'B'] and df2 has
columns ['A', 'C'].
Solution: You can use fillna to replace NaN values after
concatenation.
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'C': [7, 8]})
result = pd.concat([df1, df2], axis=0)
result.fillna(0, inplace=True)
# Expected Output:
# A B C
# 0 1.0 3.0 0.0
# 1 2.0 4.0 0.0
# 0 5.0 0.0 7.0
# 1 6.0 0.0 8.0
print(result)
Output: The NaN values in the resulting DataFrame will be replaced
with 0.
7. Best Practices for Pandas concat() function
Understanding when to use pandas concat versus other functions like
merge and join, as well as considerations for performance, can help
you write efficient and maintainable code.
7.1 When to Use pandas.concat vs merge and join
Use pandas concat when you want simple stacking of DataFrames either
vertically or horizontally.
# Vertical stacking with `concat`
result_vertical = pd.concat([df1, df2], axis=0)
Use merge when you want to combine DataFrames based on a common
column.
Example:
# Merging on a common column 'A'
result_merge = pd.merge(df1, df2, on='A', how='inner')
Use join when you want to combine DataFrames based on index.
Example:
# Joining on index
result_join = df1.join(df2, how='outer', lsuffix='_df1', rsuffix='_df2')
7.2 Performance Considerations
Preallocate Resulting DataFrame: When you are concatenating a large
number of DataFrames in a loop, it’s generally more efficient to
preallocate the resulting DataFrame and populate it, rather than using
concat within the loop.Example:
# Inefficient
result = pd.DataFrame()
for df in list_of_dfs:
result = pd.concat([result, df])
# Efficient
result = pd.concat(list_of_dfs, axis=0)
Avoid Using ignore_index=True Unnecessarily: Resetting the index
by using ignore_index=True may lead to additional computational
overhead. Use it only when you’re sure you don’t need the existing
index.
Example:
# If retaining index is not necessary
result_ignore = pd.concat([df1, df2], ignore_index=True)
Be Mindful of Data Types: When using pandas concat, ensure that
the data types of columns are consistent across all DataFrames to avoid
type upcasting, which may consume more memory.
Example:
# Ensure consistent data types
df1['A'] = df1['A'].astype('int32')
df2['A'] = df2['A'].astype('int32')
8. Common Pitfalls and How to Avoid Them
While pandas concat is an incredibly useful function, there are some
pitfalls that users commonly encounter. Understanding these can save you
a lot of time and trouble.
1. Duplicated Indices
When using pandas concat, it’s possible to end up with duplicated
index labels, especially if you are concatenating DataFrames with
overlapping indices and you’ve not specified to ignore or regenerate the
indices.
Example:
# Create DataFrames with overlapping indices
df1 = pd.DataFrame({'A': [1, 2]}, index=['x', 'y'])
df2 = pd.DataFrame({'B': [3, 4]}, index=['x', 'y'])
# Using `concat` without resetting the index
result_duplicated = pd.concat([df1, df2], axis=0)
# Output will have duplicated indices
print(result_duplicated)
How to Avoid:
You can avoid this issue by setting the ignore_index parameter to
True, or by using the verify_integrity parameter to raise an error
if there are overlapping indices.
# Using `ignore_index` to reset index
result_ignore = pd.concat([df1, df2], ignore_index=True, axis=0)
# Using `verify_integrity` to check for overlapping indices
try:
result_verify = pd.concat([df1, df2], verify_integrity=True, axis=0)
except ValueError as e:
print(f"ValueError encountered: {e}")
2. Memory Issues
Concatenating very large DataFrames can lead to memory issues. This is particularly true if you’re working in a constrained memory environment.
How to Avoid:
You can avoid memory issues by using the copy parameter. Setting it to
False tells Pandas not to copy data unnecessarily, saving memory.
# Using `copy=False` to save memory
result_memory = pd.concat([df1, df2], copy=False)
3. Data Type Preservation
When using pandas concat, the data type of columns should ideally be
consistent across all DataFrames to avoid type upcasting, which can
consume more memory and lead to unexpected behaviors.
# DataFrame with integers
df1 = pd.DataFrame({'A': [1, 2]})
# DataFrame with floats
df2 = pd.DataFrame({'A': [1.1, 2.2]})
# Using `concat`
result_dtype = pd.concat([df1, df2])
# Column 'A' has been upcast to float
print(result_dtype.dtypes)
How to Avoid:
Ensure that the data types are consistent before using pandas concat.
# Ensure data types are consistent
df1['A'] = df1['A'].astype('float64')
df2['A'] = df2['A'].astype('float64')
result_dtype_fixed = pd.concat([df1, df2])
# Now the data types are consistent
print(result_dtype_fixed.dtypes)
9. Frequently Asked Questions
Can I use pandas concat to merge two DataFrames vertically and
horizontally at the same time?
No, pandas concat only allows concatenation either row-wise (axis=0)
or column-wise (axis=1) at a time. You cannot concatenate both
vertically and horizontally in a single operation.
How do I handle duplicate index values when using pandas concat?
You have several options to deal with duplicate index values:
- Use the
ignore_index=Trueparameter to reset the index. - Use the
keysparameter to create a multi-level index that distinguishes between the source DataFrames. - Use the
verify_integrity=Trueparameter to raise an error if the operation would result in duplicate indices.
What happens when there are missing columns or rows?
If the DataFrames have different columns or indexes and you’re using the
default outer join, pandas concat will include all columns or rows
but will fill the missing entries with NaN (Not a Number).
How can I replace NaN values that appear after concatenation?
You can use the fillna method on the concatenated DataFrame to replace
NaN values with a value of your choice. For example, if your DataFrame
is stored in a variable called df, you could use df.fillna(0) to
replace all NaN values with zeros.
Is pandas concat the same as merge and join?
No, each function has its specific use-cases:
- Use
pandas concatfor simple stacking of DataFrames either vertically or horizontally. - Use
mergeto combine DataFrames based on a common column. - Use
jointo combine DataFrames based on index.
How can I improve the performance of pandas concat?
If you’re concatenating multiple DataFrames, consider doing it in a
single concat call rather than using multiple calls in a loop. Also,
using the copy=False parameter can save memory.
Can I concatenate DataFrames with different data types?
Yes, you can, but be aware that this might lead to type upcasting, which could consume more memory and lead to unexpected behavior.
What are the most common pitfalls to avoid?
The most common pitfalls include duplicated indices, memory issues due
to large DataFrames, and data type inconsistencies. Understanding these
and how to avoid them can make your usage of pandas concat more
effective.
10. Summary
pandas concatis a powerful function for concatenating DataFrames either row-wise (axis=0) or column-wise (axis=1).- Important parameters like
axis,join,ignore_index, andkeysoffer control over how DataFrames are concatenated. - Handling of missing data, indexes, and duplicate indices are key areas where caution is required.
- Real-world examples illustrated the utility of
pandas concatin scenarios such as merging monthly sales data, combining datasets with different columns, and using multi-level indexing in financial data.
When to Use pandas concat
- Use
pandas concatwhen you need to simply stack DataFrames either vertically or horizontally. - It’s especially useful when the DataFrames to be combined don’t share a common key, or when you need to maintain the existing indices.
Other Alternatives
merge: When you want to combine DataFrames based on a common column.join: For combining DataFrames based on their indexes.
References: The official Pandas documentation
offers in-depth explanations, examples, and best practices related to
pandas concat.

![Pandas concat() Function Tutorial [Practical Examples]](/pandas-concat-function/pandas_concat_func.jpg)
