Overview of the Pandas iloc Function
In the realm of data analysis and data manipulation, the pandas
library in Python stands out as one of the most powerful tools
available. One feature that makes pandas incredibly flexible and
user-friendly is its diverse range of indexing options. Among these, the
pandas iloc function is particularly noteworthy.
The term iloc stands for “integer-location,” and as the name suggests,
it is used for integer-based indexing. With pandas iloc, you can
effortlessly select rows and
columns from your DataFrame by specifying their integer-based
positions. Whether you are slicing the DataFrame, selecting particular
cells, or even performing conditional selections, iloc provides an
intuitive yet efficient way to carry out these operations.
What sets pandas iloc apart is its straightforwardness and ease of
use. You don’t need to worry about the row or column labels; all you
need is the integer-based position, and iloc will take care of the
rest. This makes it an excellent option for scenarios where you don’t
have the luxury of labeled data or simply prefer to index using integer
values.
To sum up, pandas iloc is a versatile, efficient, and user-friendly
way to handle row and column selection based solely on integer
locations, making it an indispensable tool for anyone working with data
in Python.
Syntax and Parameters
Understanding the syntax is the first step in mastering any function,
and pandas iloc is no exception. The general syntax for using iloc
can be illustrated as follows:
DataFrame.iloc[<row_selection>, <column_selection>]
Here, <row_selection> and <column_selection> can be:
- A single integer (e.g.,
5) - A list of integers (e.g.,
[4, 5, 6]) - A slice object with integers (e.g.,
1:7)
Note that iloc operates solely on the basis of integer-based
positions, so the indexes and column names in the DataFrame are not
considered during selection.
Parameters Explained
Technically, pandas iloc is more of a property than a method, so you
won’t see traditional parameters as you might with other functions.
However, the arguments you pass when slicing can be thought of as
informal parameters. Let’s discuss them:
Row Selection (<row_selection>): The integer-based position(s) of
the row(s) you wish to select. This can be a single integer, a list of
integers, or an integer-based slice object.
- Single Integer:
df.iloc[0]selects the first row. - List of Integers:
df.iloc[[0, 1, 2]]selects the first three rows. - Slice Object:
df.iloc[0:3]selects rows from index 0 to 2.
Column Selection (<column_selection>): The integer-based
position(s) of the column(s) you wish to select. Similar to row
selection, you can use a single integer, a list of integers, or an
integer-based slice object.
- Single Integer:
df.iloc[:, 0]selects the first column. - List of Integers:
df.iloc[:, [0, 1]]selects the first and second columns. - Slice Object:
df.iloc[:, 0:2]selects columns from index 0 to 1.
Simple Examples
The pandas iloc function’s versatility can be better understood
through examples. Below are some straightforward yet powerful examples
to demonstrate how to make various types of selections from a DataFrame
using pandas iloc.
1. Single Row Selection
Selecting a single row is as simple as passing a single integer to
iloc.
# Import pandas library
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'Occupation': ['Engineer', 'Doctor', 'Teacher', 'Lawyer']
})
# Select the first row
first_row = df.iloc[0]
In this example, first_row will contain the data
[Alice, 25, Engineer] from the DataFrame.
2. Single Column Selection
To select a single column, you’ll need to specify the integer index of
that column, making sure to include a colon : to indicate that you
want all rows for that column.
# Select the first column
first_column = df.iloc[:, 0]
first_column will contain all names from the DataFrame.
3. Multiple Row and Column Selection
To select multiple rows and columns, you can use lists of integers or slice objects.
# Select first two rows and first two columns
subset = df.iloc[0:2, 0:2]
subset will contain the names and ages of Alice and Bob.
4. Other Examples
Select Last Row: To get the last row, you can use negative indexing.
last_row = df.iloc[-1]
Select Specific Rows and Columns: You can select non-consecutive rows and columns by passing lists of integers.
specific_selection = df.iloc[[0, 2], [1, 3]]
Conditional Row Selection: While pandas iloc doesn’t directly
support condition-based indexing, you can still achieve this by
combining it with boolean indexing.
condition = df['Age'] > 30
filtered_rows = df.iloc[condition.values]
Advanced Use-Cases
For more advanced data manipulation tasks, pandas iloc can be used in
conjunction with other pandas features to perform complex operations. In
this section, we will explore some of the advanced use-cases where
pandas iloc really shines.
1. Conditional Selection
While iloc itself is not inherently designed for condition-based
selection, you can still achieve this by combining it with boolean
indexing. Here’s how:
import pandas as pd
# Create DataFrame
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'Occupation': ['Engineer', 'Doctor', 'Teacher', 'Lawyer']
})
# Create a condition where Age is greater than 30
condition = df['Age'] > 30
# Use iloc for conditional selection
filtered_rows = df.iloc[condition.values]
print(filtered_rows)
In this example, filtered_rows will contain the data for Charlie and
David, who are older than 30.
2. Steps-wise Slicing
When dealing with large DataFrames, you may want to skip some rows or columns. This is where steps-wise slicing can be handy.
# Select every alternate row from the first five rows and the first two columns
stepwise_slice = df.iloc[0:5:2, 0:2]
print(stepwise_slice)
Here, stepwise_slice will contain the data for Alice and Charlie,
skipping Bob and David.
3. Using iloc with groupby
The pandas iloc property can be used effectively with the groupby
method to analyze grouped data.
# Group by Occupation and then select the first entry for each group using iloc
grouped = df.groupby('Occupation')
# Select the first entry for each group
first_entry_each_group = grouped.apply(lambda x: x.iloc[0])
print(first_entry_each_group)
In this example, first_entry_each_group will contain the first entry
for each occupational group in the DataFrame.
Differences between iloc, loc, and at
Understanding the nuanced differences between iloc, loc, and at
can help you choose the most appropriate indexing method for your
specific needs. Below, we break down these differences in terms of
speed, flexibility, and limitations.
Table Comparing iloc, loc, and at
| Feature | pandas iloc |
pandas loc |
pandas at |
|---|---|---|---|
| Indexing Type | Integer-based | Label-based | Label-based |
| Speed | Fast | Moderate | Fastest (for single cell) |
| Single Cell Access | Yes | Yes | Yes |
| Row/Column Slicing | Yes | Yes | No |
| Conditional Access | No (needs boolean mask) | Yes (directly) | No |
| Multi-axis Indexing | Yes | Yes | No |
| Read/Write Access | Both | Both | Both |
| Complex Queries | No | Yes | No |
Speed Comparison
pandas iloc: Generally faster for integer-based indexing.pandas loc: Not as fast asilocbut offers more functionality like label-based indexing.pandas at: Extremely fast for accessing a single cell, but limited to that use-case.
Flexibility and Limitations
pandas iloc: Very flexible for integer-based row/column slicing but does not directly support conditional access or label-based indexing.pandas loc: Offers a broad range of functionalities like label-based indexing and conditional access but can be slower thaniloc.pandas at: Provides the fastest access for single cell values but is not suited for slicing or conditional access.
Performance Comparison of Pandas iloc
When working with large data sets, the speed of data manipulation and
retrieval operations can be a critical factor. In this context,
understanding the performance characteristics of pandas iloc can offer
valuable insights. Below, we compare the performance of iloc with other
pandas indexing
methods, particularly loc and at.
Let’s create a sample DataFrame with 100,000 rows and 5 columns to test
the performance. We’ll time how long it takes to access a single cell
using iloc, loc, and at.
import pandas as pd
import numpy as np
import time
# Create a DataFrame with random sample data
n_rows = 100000
n_cols = 5
data = np.random.rand(n_rows, n_cols)
columns = [f'Column_{i}' for i in range(1, n_cols+1)]
df = pd.DataFrame(data, columns=columns)
# Using iloc
start_time = time.time() # Record start time in seconds
cell_value = df.iloc[50000, 2] # Perform operation
iloc_time = time.time() - start_time # Calculate elapsed time in seconds
# Using loc
start_time = time.time() # Record start time in seconds
cell_value = df.loc[50000, 'Column_3'] # Perform operation
loc_time = time.time() - start_time # Calculate elapsed time in seconds
# Using at
start_time = time.time() # Record start time in seconds
cell_value = df.at[50000, 'Column_3'] # Perform operation
at_time = time.time() - start_time # Calculate elapsed time in seconds
# Display the time taken for each operation in seconds
print("iloc time: {:.6f}".format(iloc_time))
print("loc time: {:.6f}".format(loc_time))
print("at time: {:.6f}".format(at_time))
Output
iloc time: 0.000142
loc time: 0.000761
at time: 0.000023
Observations:
- Speed of
at: Once again,atemerges as the fastest method for single-cell access, taking only 0.0000181 seconds. This is consistent with its design optimization for this specific task. - Speed of
ilocvsloc: In the new measurements,ilocis still faster thanloc, but the time difference is less dramatic compared to the previous set of measurements. However,ilocstill maintains an edge in terms of speed for integer-based indexing. - General Performance: The performance differences between
iloc,loc, andatare less pronounced in the new set of measurements. However, their relative speed rankings remain the same:atis the fastest, followed byiloc, and thenloc.
Row Selection
Now, let’s compare the time taken to select a row using iloc and
loc.
# Using iloc
start_time = time.time()
row_data = df.iloc[50000]
iloc_row_time = time.time() - start_time
# Using loc
start_time = time.time()
row_data = df.loc[50000]
loc_row_time = time.time() - start_time
print(f'iloc row time: {iloc_row_time}')
print(f'loc row time: {loc_row_time}')
Output:
iloc row time: 0.0002033710479736328
loc row time: 0.0001373291015625
Column Selection
Here, we’ll time the selection of a column.
# Using iloc
start_time = time.time()
column_data = df.iloc[:, 2]
iloc_col_time = time.time() - start_time
# Using loc
start_time = time.time()
column_data = df.loc[:, 'Column_3']
loc_col_time = time.time() - start_time
print(f'iloc column time: {iloc_col_time}')
print(f'loc column time: {loc_col_time}')
Output:
iloc column time: 0.00023794174194335938
loc column time: 0.00024199485778808594
Recommendations:
- Single-Cell Access:
atremains the fastest option for single-cell access and should be your go-to choice when speed is crucial. - Integer-Based Slicing:
ilocis still faster thanlocand should be preferred when you are dealing with integer-based row and column indices. - Label-Based or Conditional Selection:
locremains invaluable for more complex, label-based data manipulations, despite being slower thaniloc.
Performance Summary
Based on the above examples, you can generally conclude:
ilocis usually faster for integer-based row and column selection.locis flexible but can be slower for large DataFrames.atis extremely fast for accessing single cells but doesn’t support slicing.
Top 10 Frequently Asked Questions on Pandas iloc
Is iloc zero-based?
Yes, pandas iloc uses zero-based indexing. This means the index starts
from 0. The first row can be accessed with df.iloc[0], the second with
df.iloc[1], and so on.
Can iloc accept boolean values?
pandas iloc itself does not directly accept boolean values, but you
can pass a boolean mask by converting it to integer-based indexes. For
example, a condition like df['Age'] > 30 can be converted to its
integer index form to be used with iloc.
How to select multiple rows and columns with iloc?
You can select multiple rows and columns by providing lists or slices of
integers. For example, df.iloc[0:2, [0, 2]] would select the first two
rows and the first and third columns.
Can I use negative integers with iloc?
Yes, negative integers can be used to index rows or columns in reverse
order. For instance, df.iloc[-1] will return the last row of the
DataFrame.
Can iloc modify DataFrame values?
Absolutely, iloc can be used for assignment operations to modify the
DataFrame. For example, df.iloc[0, 0] = 'New Value' would modify the
first cell of the DataFrame.
Is iloc faster than loc?
Generally, iloc is faster for integer-based indexing compared to loc
because it doesn’t have to resolve labels. However, the speed difference
may not be noticeable for smaller DataFrames.
Is it possible to use iloc with groupby?
Yes, iloc can be used with groupby to select particular rows from
each group. For example, using groupby and then applying
lambda x: x.iloc[0] would return the first entry for each group.
Can iloc handle NaN or missing values?
iloc itself does not deal with NaN or missing values; it only performs
integer-based selection. You’ll have to handle missing values separately
using functions like dropna or fillna.
What happens if the index passed to iloc is out of bounds?
If an out-of-bounds index is passed to iloc, it raises an
IndexError. However, if a slice with an out-of-bounds index is used,
iloc will return values up to the maximum available index without
raising an error.
Can iloc be used on Series as well as DataFrames?
Yes, iloc works on both pandas Series and DataFrames. The usage is
largely similar, involving integer-based indexing to select or modify
data.
Conclusion
The pandas iloc indexer is a powerful tool for selecting and
manipulating data within pandas DataFrames and Series. Its utility
ranges from simple row and column selections to more complex operations
when combined with other
pandas
features like groupby. Although it primarily focuses on
integer-based indexing, it can be adapted to work with boolean
conditions, thereby offering a flexible approach to data manipulation
tasks. Whether you are a beginner in data analysis or an experienced
professional, understanding iloc is crucial for efficient data
handling.
pandas ilocuses zero-based integer indexing for both row and column selection.- It supports various forms of slicing, including step-wise slicing and selection of specific rows and columns.
ilocis generally faster thanlocfor integer-based indexing but lacks some of the flexibility thatlocoffers for label-based and conditional selection.- Advanced use-cases include combining
ilocwithgroupbyfor group-specific selections and using boolean masks for conditional selection.
Additional Resources and References
- Official Documentation: For a deep dive into all the parameters and capabilities, the official pandas documentation is the best place to go.
- Pandas User Guide: The user guide provides comprehensive examples and tutorials.
- Stack Overflow: For practical problems and real-world examples, Stack Overflow is an excellent resource.


