Getting started with Python subprocess module
The subprocess module in Python is a powerful tool that allows you to
spawn new processes, connect to their input/output/error pipes, and
obtain their return codes. In simple terms, it enables your Python
script to run shell commands, just as you would if you were operating
from a terminal. Whether you want to run a simple command like ls on a
UNIX system or execute more complex chained commands using pipes,
subprocess has you covered.
Before diving into the nitty-gritty of how to use the subprocess
module, it’s important to understand the historical context and the
basics of setting up your environment.
What subprocess Replaces (e.g., os.system, os.spawn*)
Prior to the introduction of the subprocess module, Python developers
had a few other options for running shell commands, including functions
like os.system() and os.spawn*(). Here’s a quick comparison:
os.system(): This function allows you to run shell commands, but
it’s less powerful than subprocess. It doesn’t allow you to capture
the standard output (stdout) or standard error (stderr) easily, nor
does it provide good error handling options.
import os
os.system('ls -l')
os.spawn*(): This family of functions provides more control over
the process, but it’s also more complex to use and less Pythonic in its
approach.
import os
os.spawnlp(os.P_WAIT, 'ls', 'ls', '-l')
The subprocess module aims to replace these older functions with a
more powerful, flexible, and Pythonic interface. By using subprocess,
you can perform everything from running a simple shell command to
launching a process and interacting with its input/output streams, all
while writing more maintainable and readable code.
Basic Requirements and Setup
To use the subprocess module, you’ll need to import it in your Python
script. It’s a built-in module, so you don’t need to install any
external packages.
import subprocess
Once imported, you can begin using its methods to interact with the
system. Here’s a quick example of running a simple shell command
(ls -l):
import subprocess
subprocess.run(['ls', '-l'])
Different subprocess Methods and Their Options
The Python subprocess module provides several methods to work with
external processes. Each method has a specific use-case and offers
certain features. Let’s explore the most commonly used methods along
with their supported options and examples.
1. subprocess.run (Python 3.5+)
What It Does: This is the recommended method for invoking
subprocesses in Python 3.5 and above. It runs a command, waits for it to
finish, and then returns a CompletedProcess instance that contains
information about the executed process.
Supported Options:
args: The command to execute, as a list or a string.capture_output: If set toTrue, captures standard output and standard error.cwd: Specifies the working directory.timeout: Sets a timeout for the command.
import subprocess
result = subprocess.run(["ls", "-l"], capture_output=True, text=True)
print("STDOUT:", result.stdout)
2. subprocess.call
What It Does: Runs a command, waits for it to finish, and then returns the return code. It’s a simple way to run a command and check its return code but doesn’t capture output.
Supported Options: Similar to subprocess.run.
import subprocess
return_code = subprocess.call(["ls", "-l"])
print("Return Code:", return_code)
3. subprocess.check_call
What It Does: Similar to subprocess.call, but raises a
CalledProcessError exception if the command returns a non-zero exit
code.
Supported Options: Similar to subprocess.run.
import subprocess
try:
subprocess.check_call(["false"])
except subprocess.CalledProcessError as e:
print(f"Command failed with error {e.returncode}")
4. subprocess.check_output
What It Does: Runs a command, waits for it to finish, captures its
output, and then returns that output as a byte string. It raises a
CalledProcessError if the command returns a non-zero exit code.
Supported Options:
stderr: Redirect standard error (usually set tosubprocess.STDOUTto capture errors).text: If set toTrue, the output is returned as a string instead of bytes.
import subprocess
try:
output = subprocess.check_output(["ls", "-l"], text=True)
print("STDOUT:", output)
except subprocess.CalledProcessError as e:
print(f"Command failed with error {e.returncode}")
Other Options
stdout,stderr: To redirect output, either to capture or pipe it to other commands.shell: If set toTrue, the command is executed through the shell.env: A dictionary representing the environment variables to set for the new process.
For Beginners: Basic Operations
If you’re new to the Python subprocess module, you’re in the right
place. In this section, we’ll cover the basic operations you can perform
with this incredibly versatile tool.
Running a Shell Command with subprocess.run()
The subprocess.run() method is the simplest way to run a command. It
runs the command, waits for it to finish, and then returns a
CompletedProcess instance that contains information about the process,
such as the exit code and any output.
Here’s an example:
import subprocess
subprocess.run(["ls", "-l"])
In this example, we’re running the ls -l command, which lists files in
a directory in a detailed manner.
Arguments and Options
The command and its options or arguments are passed as a list of
strings. For example, if you’re running a command that looks like this
in the shell—find . -name '*.txt'—you would convert it to the
following list when using subprocess.run():
subprocess.run(["find", ".", "-name", "*.txt"])
Return Code
The returncode attribute of the returned CompletedProcess object
gives you the exit code of the command. A 0 usually means that the
command executed successfully, and any other value indicates an error.
result = subprocess.run(["ls", "-l"])
print("Return code:", result.returncode)
Capturing Output with stdout
By default, subprocess.run() will output directly to the console. If
you want to capture the output as a Python string, you can use the
stdout parameter:
result = subprocess.run(["ls", "-l"], capture_output=True, text=True)
print("Have {} bytes in stdout:\n{}".format(len(result.stdout), result.stdout))
Here, capture_output=True captures the output, and text=True makes
it a string rather than bytes.
Error Handling with stderr
Similarly, you can capture the standard error output using the stderr
parameter:
result = subprocess.run(["ls", "-l", "/nonexistent"], capture_output=True, text=True)
print("stderr:\n{}".format(result.stderr))
If the directory /nonexistent does not exist, the stderr attribute
of the CompletedProcess object will contain the error message.
Intermediate Topics
Once you’re comfortable with the basics of the subprocess module, you
can begin to explore some of its more advanced features. These include
working with the Popen class, redirecting input/output, setting
timeouts, and more.
The Popen Class
The Popen class is the backbone of the subprocess module and offers
more flexibility compared to the run() method. It allows you to spawn
a new process and interact with its input/output streams in a
non-blocking manner.
Here’s how you can initiate a Popen object:
from subprocess import Popen
process = Popen(["ls", "-l"])
Communicating with the Process
You can send data to stdin or read from stdout and stderr, using
the communicate() method.
from subprocess import Popen, PIPE
process = Popen(["sort"], stdin=PIPE, stdout=PIPE, stderr=PIPE, text=True)
stdout, stderr = process.communicate(input="banana\napple\ncherry")
print(stdout)
This sorts the input strings and prints the sorted output.
Redirecting Input and Output
You can redirect the stdin, stdout, and stderr using file objects.
with open("input.txt", "w") as f:
f.write("banana\napple\ncherry")
with open("input.txt", "r") as infile, open("output.txt", "w") as outfile:
process = Popen(["sort"], stdin=infile, stdout=outfile)
Timeouts and How to Implement Them
Timeouts can be added to make sure a subprocess operation doesn’t hang
indefinitely. Use the timeout parameter with communicate() or
wait().
from subprocess import TimeoutExpired
try:
process = Popen(["sleep", "10"], stdout=PIPE, stderr=PIPE)
process.communicate(timeout=5)
except TimeoutExpired:
process.kill()
print("Process timed out and was killed.")
Working with Pipes
Pipes can be used to chain multiple subprocesses together, just like in a Unix shell.
from subprocess import Popen, PIPE
p1 = Popen(["ls", "-l"], stdout=PIPE)
p2 = Popen(["grep", "txt"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]
print(output.decode('utf-8'))
Setting Environment Variables
The env parameter allows you to set
environment variables for the subprocess.
import os
my_env = os.environ.copy()
my_env["MY_VARIABLE"] = "value"
process = Popen(["printenv", "MY_VARIABLE"], env=my_env, stdout=PIPE, text=True)
stdout, _ = process.communicate()
print(stdout.strip())
Advanced Usage
Once you’ve mastered the intermediate functionalities of the
subprocess module, you’re ready to tackle its more advanced features.
These include running commands in parallel, working with long-running
processes, considering security implications, and handling text
encoding.
Running Commands in Parallel
Python’s threading or multiprocessing libraries can be used alongside
subprocess to run multiple commands in parallel.
from threading import Thread
from subprocess import run
def execute_command(cmd):
run(cmd)
commands = [["ls", "-l"], ["df", "-h"], ["uptime"]]
threads = []
for cmd in commands:
thread = Thread(target=execute_command, args=(cmd,))
thread.start()
threads.append(thread)
# Wait for all threads to finish
for thread in threads:
thread.join()
print("All commands executed.")
Interacting with Long-Running Processes
For long-running processes, you may need more intricate interaction,
which you can achieve by using the poll() or wait() methods.
from subprocess import Popen, TimeoutExpired
process = Popen(["some_long_running_command"])
try:
process.wait(timeout=60)
except TimeoutExpired:
print("Process is still running.")
process.terminate()
Security Considerations (e.g., shell=True risks)
While using shell=True can be convenient, it poses a security risk,
especially when combined with dynamically generated script. This opens
the door to shell injection vulnerabilities.
# Potentially dangerous
run("ls -l " + user_input, shell=True)
Always sanitize user input or avoid using shell=True with dynamic
input.
Universal Newlines and Text Encoding
The text parameter (formerly known as universal_newlines in Python
2) can be set to True if you wish to work with text instead of binary
data for stdin, stdout, and stderr.
result = run(["ls", "-l"], capture_output=True, text=True, encoding='utf-8')
Here, text=True tells Python to open the file in text mode, and
encoding='utf-8' specifies the text encoding to be used.
Platform-Specific Concerns and Handling
While Python is a cross-platform language, it’s important to be aware of
the platform-specific nuances that can affect how the subprocess
module behaves. The key areas to consider are the differences between
Unix-based systems and Windows, as well as some cross-platform best
practices.
Differences Between Unix and Windows
Command Interpreter: On Unix-based systems, the default shell is
often Bash, whereas, on Windows, it’s usually cmd.exe. This difference
can affect how commands are parsed and executed.
# Unix-based
subprocess.run(["ls", "-l"])
# Windows
subprocess.run(["dir", "/S"])
- Path Separators: Unix uses
/whereas Windows uses\as the path separator. This is crucial when specifying file paths. - Environment Variables: Environment variables are accessed
differently on Unix (
$HOME) and Windows (%USERPROFILE%). - Case Sensitivity: Unix is case-sensitive, while Windows is not. Therefore, filenames and commands need to be case-accurate on Unix but not on Windows.
Cross-Platform Best Practices
Using os Module for Path Handling: Use the os.path module to
handle file paths so that they are automatically formatted to suit the
operating system.
import os
filepath = os.path.join("folder", "file.txt")
Checking Platform: You can conditionally execute code depending on
the platform using sys.platform.
import sys
if sys.platform == "win32":
subprocess.run(["dir", "/S"], shell=True)
else:
subprocess.run(["ls", "-l"])
Avoid shell=True When Possible: This is a security best practice,
but it also can make your code more portable.
Specify Text Encoding: When capturing output, specify the encoding to avoid surprises with character sets on different platforms.
subprocess.run(["ls", "-l"], capture_output=True, text=True, encoding='utf-8')
Difference Between shell=True and shell=False
When working with Python’s subprocess module, you’ll often come across
the shell parameter. By default, shell=False, but you can set it to
True to change the behavior of how commands are executed. Let’s break
down the difference in layman’s terms and see when you should use each.
shell=False (Default)
What It Does: When shell=False, the command you provide is
directly executed without invoking an additional shell process. Each
argument in the command is a separate item in a list.
import subprocess
# Using shell=False
subprocess.run(["ls", "-l"])
Pros:
- More Secure: No risk of shell injection attacks, which we’ll discuss below.
- Clearer Syntax: The command and its arguments are clearly defined in a list, which makes it easy to construct dynamically.
Cons:
- Less Flexible: You can’t use shell features like wildcard
characters (
*), variable expansion ($VAR), and piping commands (|).
shell=True
What It Does: When shell=True, Python will run your command inside
a new shell process. This enables you to take advantage of shell
features like wildcard expansion, variable substitution, and more.
import subprocess
# Using shell=True
subprocess.run("ls -l *.txt", shell=True)
Pros:
- More Flexible: You can use all features of the shell, such as wildcards, piping, and others.
- Concise for Simple Commands: For a quick script with simple
commands,
shell=Truecan be more concise.
Cons:
- Less Secure: Risk of shell injection attacks. If you’re building a command string using external input, the user could potentially execute arbitrary commands.
Example of Security Risk:
Imagine you have the following code snippet where user_input comes
from an external source.
# This is dangerous!
subprocess.run(f"echo {user_input}", shell=True)
If the user provides a value like ; rm -rf /, it would delete all
files on your system!
Which One to Use?
- Use
shell=Falsewhen:- You don’t need any shell-specific features.
- You’re using external or untrusted input to construct your command.
- Use
shell=Truewhen:- You absolutely need shell features, and you’re aware of the security implications.
- The command and its arguments are fixed (hardcoded) and do not depend on external input.
Troubleshooting and Common Pitfalls
Even experienced developers sometimes encounter issues while working
with the subprocess module. In this section, we will cover some common
pitfalls, how to debug subprocess calls, and ways to handle exceptions.
Debugging subprocess Calls
Logging: Use Python’s logging module to log the exact command being run, along with its output and errors.
import logging
logging.basicConfig(level=logging.DEBUG)
cmd = ["ls", "-l"]
logging.debug(f"Executing command: {' '.join(cmd)}")
result = subprocess.run(cmd, capture_output=True, text=True)
logging.debug(f"Output: {result.stdout}")
logging.debug(f"Errors: {result.stderr}")
Print Statements: For quick debugging, strategically place print
statements to display key subprocess attributes like stdout, stderr,
and returncode.
result = subprocess.run(["ls", "-l"], capture_output=True, text=True)
print("STDOUT:", result.stdout)
print("STDERR:", result.stderr)
print("Return Code:", result.returncode)
How to Handle Exceptions
CalledProcessError: This exception is raised when a process
returns a non-zero exit code. It can be caught to handle the error
gracefully.
try:
subprocess.run(["false"], check=True)
except subprocess.CalledProcessError as e:
print(f"Command failed with error {e.returncode}, output: {e.output}")
TimeoutExpired: As previously discussed, this exception can be
caught when using the timeout parameter.
try:
subprocess.run(["sleep", "10"], timeout=1)
except subprocess.TimeoutExpired:
print("Process timed out.")
Real-World Examples and Use-Cases of Python subprocess
The subprocess module in Python is highly versatile and can be applied
in various real-world scenarios. Here are some typical use-cases.
Scripting
Scenario: You want to periodically back up your important documents to a remote server.
import subprocess
import datetime
# Create a timestamp
timestamp = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
# Compress the folder into a tarball
subprocess.run(["tar", "-czvf", f"backup_{timestamp}.tar.gz", "/path/to/important_folder"])
# Transfer it to a remote server
subprocess.run(["scp", f"backup_{timestamp}.tar.gz", "username@remote-server:/path/to/backup/"])
Automating System Tasks
Scenario: You want to update your system and installed packages automatically.
import subprocess
# Update package list and upgrade all packages in a Debian-based system
subprocess.run(["sudo", "apt-get", "update"])
subprocess.run(["sudo", "apt-get", "upgrade", "-y"])
# Or for a Red Hat-based system
# subprocess.run(["sudo", "yum", "update", "-y"])
Data Pipeline Integrations
Scenario: You have different tools for different steps in your data
pipeline. One tool generates data and saves it as a .csv file, another
reads this .csv file and processes the data, and a third tool
visualizes the data.
import subprocess
# Step 1: Generate data with Tool A
subprocess.run(["tool_a", "--output", "data.csv"])
# Step 2: Process data with Tool B
subprocess.run(["tool_b", "--input", "data.csv", "--output", "processed_data.csv"])
# Step 3: Generate visualizations with Tool C
subprocess.run(["tool_c", "--input", "processed_data.csv", "--output", "data_plot.png"])
FAQs: Frequently Asked Questions about subprocess
What is the subprocess module used for?
The subprocess module is used for spawning new processes, interacting
with process input/output, and retrieving their return codes in Python
scripts.
How do I execute a simple shell command?
You can use the subprocess.run function:
subprocess.run("ls -l", shell=True)
How do I run multiple commands in a sequence or in parallel?
For running commands in sequence, simply call subprocess.run multiple
times. To run commands in parallel, you can use Python’s
concurrent.futures.ThreadPoolExecutor or
concurrent.futures.ProcessPoolExecutor.
How do I run multiple commands in a sequence or in parallel?
For running commands in sequence, simply call subprocess.run multiple
times. To run commands in parallel, you can use Python’s
concurrent.futures.ThreadPoolExecutor or
concurrent.futures.ProcessPoolExecutor.
What’s the difference between shell=True and shell=False?
Setting shell=True runs the command in a new shell process, allowing
you to use shell features like wildcard characters (*), variable
expansion ($VAR), and piping commands (|). However, it’s generally
less secure. shell=False (the default) directly runs the command
without invoking a shell, making it more secure but less flexible.
How do I set a timeout for a command?
Use the timeout argument with subprocess.run:
subprocess.run(["ls", "-l"], timeout=10)
How can I change the working directory for the command?
Use the cwd parameter:
subprocess.run(["ls", "-l"], cwd='/some/other/directory')
How do I handle errors and exceptions?
For checking the return code, you can look at the returncode attribute
of the object returned by subprocess.run. To raise an exception when
the command fails, you can use subprocess.check_call or
subprocess.check_output.
What are some alternatives to subprocess?
Alternatives include the sh library for more Pythonic subprocess
handling, fabric for tasks and commands over SSH, and paramiko for
lower-level SSH interactions.
Alternatives to Python subprocess module
While the subprocess module is incredibly powerful and flexible, there
are other libraries and modules you might consider depending on your
specific needs. Let’s explore some of those alternatives and when they
might be more appropriate to use.
1. shlex for Command Parsing
Overview: The shlex library is used for parsing shell-like
syntaxes, splitting the
command line into a list of strings that can be passed to
subprocess.
import shlex
command = 'ls -l "My Folder"'
args = shlex.split(command)
subprocess.run(args)
When to Use: Use shlex when you need to parse complex command
strings, especially ones that include special characters or spaces.
2. sh library
Overview: The sh library aims to make subprocess interfacing more
Pythonic and easier to work with.
import sh
print(sh.ls("-l"))
When to Use: sh is great for quick scripting tasks and reduces
boilerplate code. However, it may not be suitable for projects where you
need lower-level control over the subprocess.
3. fabric library
Overview: Fabric is primarily used for SSH and is higher-level than
subprocess. It’s particularly useful for deployment scripts and system
administration tasks.
from fabric import Connection
with Connection('my-server') as c:
c.run('ls -l')
When to Use: Choose fabric when you’re working with remote systems
over SSH and require a mix of local and remote command execution.
4. paramiko library
Overview: Like Fabric, paramiko is used for SSH connectivity but
is a lower-level library.
import paramiko
ssh = paramiko.SSHClient()
ssh.connect('my-server')
stdin, stdout, stderr = ssh.exec_command('ls -l')
When to Use: paramiko is ideal for custom SSH interactions and
when you need finer control over the SSH layer itself.
When to Use Alternatives
- Complex Parsing: Use
shlexif command parsing becomes too complex. - Simpler Syntax: For simpler, more Pythonic code, consider using
sh. - Remote Operations: For SSH-based operations,
fabricorparamikomay be more suitable. - Advanced Features: When you need features that are not offered by
subprocess, like simultaneous stdout and stderr capturing, you may consider alternatives.
Summary and Conclusion
The Python subprocess module serves as a powerful tool for spawning
new processes and interacting with their input/output streams, making it
an indispensable utility for both simple scripts and complex workflows.
Whether you’re a beginner automating basic tasks or an experienced
developer constructing data pipelines, subprocess offers robust
capabilities for process management. The versatility of this module
ranges from running simple shell commands with subprocess.run to
complex operations using the Popen class. Additionally, the module
supports various options like timeouts, error handling, and environment
variable customization, making it suitable for a wide array of
applications.
Key Takeaways
- Simple to Advanced: From
subprocess.runfor basic needs to the more advancedPopenclass,subprocessoffers different levels of complexity depending on your requirements. - Cross-Platform: It works on both Unix and Windows, although with some platform-specific considerations.
- Flexible and Secure: While
shell=Trueprovides shell capabilities like wildcard and piping,shell=Falseis often more secure, especially with untrusted input. - Error Handling: Methods like
subprocess.check_callandsubprocess.check_outputcan automatically check for errors, saving you additional manual error-checking code. - Capture Output: Easy ways to capture standard output and error streams for further processing.
Further Reading and Resources
- Python Official Documentation on subprocess - The official documentation is always a great place to dive deeper.
- Stack Overflow - For troubleshooting and quick queries.
- GitHub Code Samples - For real-world code examples.

![Mastering Python subprocess Module [In-Depth Tutorial]](/python-subprocess/python_subprocess.jpg)
