In this tutorial, we’ll explore how to extract specific files, folders,
and symbolic links from a tar.gz archive, using straightforward
command-line techniques suitable for beginners.
Before diving into the commands, it’s important to understand what a
tar.gz file is:
- Tar: Short for “tape archive”, it’s a file format and a
command-line utility used for collecting multiple files into a single
archive file (
.tar). It’s often used in Unix and Linux environments. - Gzip: A compression method to reduce the
size of files.
When a tar file is compressed with gzip, it gets a
.tar.gzor.tgzextension.
Steps to extract specific file(s), link(s) and directories from archive
You’ll need a Unix-like environment (Linux, macOS, FreeBSD, etc.) or a Windows system with tools like Git Bash, Cygwin, or WSL (Windows Subsystem for Linux) installed. The tar utility should be pre-installed in most Unix-like environments.
1. Open Terminal or Command Line Interface
- Open your command-line interface (CLI).
- Navigate to the directory containing your
tar.gzfile using thecdcommand. For example, if your file is in theDownloadsfolder, you’d typecd Downloads.
2. Listing Contents of the Archive
Before extracting, you might want to see what’s inside the archive. As you are required to give the complete path of the file or directory you intend to extract.
Use the command: tar -tzvf filename.tar.gz.
ttellstarto list contents.zis for gzip compressed files.vis for verbose mode, showing details.fspecifies that you’re working with a file.
For example:
# tar -tzvf demo-archive.tar.gz
drwx------ root/root 0 2024-01-22 12:10 demo-archive/
lrwxrwxrwx root/root 0 2024-01-22 12:10 demo-archive/link_to_file1.txt -> file1.txt
-rw------- root/root 0 2024-01-22 12:10 demo-archive/file2.txt
-rw------- root/root 0 2024-01-22 12:10 demo-archive/file1.txt
drwx------ root/root 0 2024-01-22 12:10 demo-archive/reports/
-rw------- root/root 0 2024-01-22 12:10 demo-archive/reports/report3.txt
-rw------- root/root 0 2024-01-22 12:10 demo-archive/reports/report2.txt
-rw------- root/root 0 2024-01-22 12:10 demo-archive/reports/report1.txt
You can also pipe the output and grep for any specific file or directory. You can also use regex while performing the grep:
tar -tzvf demo-archive.tar.gz | grep -E 'file[0-9]+\.txt'
Here’s a breakdown of the regex pattern:
file: Matches the literal text “file”.[0-9]+: Matches one or more digits. This part will match1,2, etc.\.txt: The dot.is a special character in regex, so it’s escaped with a backslash to match a literal dot. Thentxtmatches the file extension.
The -E flag in grep is used for extended regular expressions, which
allow a broader range of regex patterns than basic regex.
3. Extracting Specific File or Multiple Files
The general command to extract specific files is:
tar -xzvf filename.tar.gz path/to/file1 path/to/file2.
Here we must specify the complete path as we received in the previous
step output. So for example to extract demo-archive/file2.txt and
demo-archive/reports/report3.txt we will use:
# tar -xzvf demo-archive.tar.gz demo-archive/file2.txt demo-archive/reports/report3.txt
demo-archive/file2.txt
demo-archive/reports/report3.txt
4. Extracting Symbolic Links
Extracting
symbolic links from a tar.gz
archive is similar to extracting regular files or directories, but there
are a few key points to understand about how symbolic links are handled
in archives.
- When you extract a symlink from a
tar.gzarchive, the extracted symlink will still point to the original location that it referenced when it was archived. - If the target of the symlink does not exist at the expected location, the symlink will be broken (it will point to a non-existent location) so you have to make sure to extract both the original source file to which the symbolic link is pointing to along with the symbolic link.
# tar -xzvf demo-archive.tar.gz demo-archive/file1.txt demo-archive/link_to_file1.txt
demo-archive/link_to_file1.txt
demo-archive/file1.txt
[root@fi-758-ncs22-12-06-cs-01 tmp]# ls -l demo-archive
total 0
-rw-------. 1 root root 0 Jan 22 12:10 file1.txt
lrwxrwxrwx. 1 root root 9 Jan 22 12:10 link_to_file1.txt -> file1.txt
5. Extracting all files from an specific directory
You can also choose to extract a specific directory and all the content inside this directory:
tar -xzvf demo-archive.tar.gz demo-archive/reports/
Sample Output:
demo-archive/reports/
demo-archive/reports/report3.txt
demo-archive/reports/report2.txt
demo-archive/reports/report1.txt
6. Handling Wildcards during extraction
Handling wildcards for extracting specific patterns of files from a
tar.gz archive can be a useful technique, especially when dealing with
large numbers of files. However, it’s important to note that the tar
command itself does not directly support wildcard usage during
extraction. Instead, you can combine tar with other commands like
grep for listing and xargs for extracting. Here’s how you can do it:
First, list the contents of the archive and use grep with a wildcard
pattern to filter the files you’re interested in.
tar -tzvf archive_name.tar.gz | grep 'path/to/directory/*pattern*'
After you have the list of files, you can use xargs to pass these file
names to tar for extraction.
tar -tzvf archive_name.tar.gz | grep 'path/to/directory/*pattern*' | xargs -I '{}' tar -xzvf archive_name.tar.gz '{}'
For example:
Let us list the files with some wildcard pattern:
# tar -tzvf demo-archive.tar.gz | grep 'demo-archive/reports/.*\.txt'
-rw------- root/root 0 2024-01-22 12:10 demo-archive/reports/report3.txt
-rw------- root/root 0 2024-01-22 12:10 demo-archive/reports/report2.txt
-rw------- root/root 0 2024-01-22 12:10 demo-archive/reports/report1.txt
Next we will combine this command with xargs to also extract the files in single line:
tar -tzvf demo-archive.tar.gz | grep 'demo-archive/reports/.*\.txt' | awk '{print $6}' | xargs -I '{}' tar -xzvf demo-archive.tar.gz '{}'
Here
- We are getting the list of files using
tar -tzvf demo-archive.tar.gz | grep 'demo-archive/reports/.*\.txt' - Next we print the file name and path using
awk '{print $6}'and pass this list toxargs - Lastly
xargswill parse the file list and store the file list in {} which is taken as input fortar -xzvf demo-archive.tar.gzto further extract them from the archive.
Extracting specific files from a tar.gz archive is a useful skill,
particularly for managing large archives or when dealing with limited
storage space. By following these steps and understanding the commands
used, you can efficiently manage .tar.gz files in a Unix-like
environment. Remember, the key is to know the exact path of the files
you want to extract and to use the correct options with the tar
command. You can read more using
man tar command.


