Commands and Scripting
Guide to Bash commands and scripting
This is not an exhaustive guide, therefore here's additional sources of information just in case:
Exercises: https://www.learnshell.org/
Another guide: https://tldp.org/LDP/abs/html/
What is a Shell?
A shell is a command-line interpreter - it's the program that takes the commands you type and translates them into actions the operating system can understand. It's called a "shell" because it wraps around the operating system kernel, providing a user interface to access system functions.
Bash (Bourne Again Shell)
Bash is the most widely used shell, especially on Linux systems:
Default on most Linux distributions and older macOS versions
Written in C
Highly compatible - most shell scripts you find online are written for bash
Rich scripting capabilities with good documentation
Stable and mature - been around since 1989
Extensive history and tab completion
Zsh (Z Shell)
Zsh is a more modern shell with enhanced features:
Default on newer macOS (since Catalina)
Better autocompletion - more intelligent suggestions
Advanced globbing - more powerful pattern matching
Themes and plugins - highly customizable (especially with Oh My Zsh)
Better interactive features - spelling correction, shared history
You’ll likely use Bash or Zsh if you’re using MacOS. In order to switch between them temporarily, just type their name: bash or zsh.
But if you want to permanently change the default shell, use these commands:
Check which one you’re currently using:
Expansions
Let's consider one type of expansions.
Brace Expansion
Brace expansion is a convenient Bash feature that generates multiple strings from a pattern containing braces. It happens before any other expansions and allows you to create multiple arguments or strings efficiently.
Increment: You can specify an increment in the brace expansion, such as
{1..10..2}to get 1 3 5 7 9Zero-padding: You can prefix numbers with 0 to force consistent width, e.g.,
{01..10}would expand to01 02 ... 10.
Example 1
{1..10} utilizes brace expansion to generate a sequence of numbers from 1 to 10.
Printing the sequence.
This command will output:
Using in a for loop.
This loop will iterate, assigning each number from 1 to 10 to the variable i in turn, and print a message for each.
Example 2
Nested Braces
You can nest brace expansions for more complex patterns:
Practical Uses
Creating directories:
Backing up files:
Batch renaming or operations:
Important notes:
No variables in brace expansion. You cannot directly use variables within brace expansion for the start and end values. For example,
echo {$from..$to}wherefrom=1andto=10will not work as expected; it would literally output{$from..$to}. For variable-based ranges, consider using theseqcommand or a traditionalfor ((i=start; i<=end; i++))loop.Brace expansion doesn't use wildcards or match existing files—it just generates text
No spaces should appear inside the braces unless you want them in the output
Simple commands
Echo
echo - Print text to the screen
Appends a timestamped message to a log file
Ls
ls - List the contents of a directory (shows files and folders).
Combinations
-l
Long format (detailed)
-a
Show all (including hidden)
-h
Human-readable sizes
-t
Sort by time
-S
Sort by size
-r
Reverse order
-R
Recursive (subdirectories)
Pwd
Print current working directory
Cd
cd - Change directory (move to a different folder)
Go to data pipeline folder and immediately list all files
Mkdir
mkdir - Make a new directory (create a folder)
Creates nested folder structure: data/raw, data/processed, data/archive
Rmdir
Remove empty directories only.
You can specify one or many directories to remove.
Remove nested empty directories:
Mv
mv - Move or rename files/folders
Move all CSV files to backup folder and count how many were moved
Rm
The remove command deletes a file or empty directory:
Remove directory with contents (DANGEROUS):
You can optionally add a -r flag to tell the rm command to delete a directory and all of its contents recursively. "Recursively" is just a fancy way of saying "do it again on all of the subdirectories and their contents".
Remove with confirmation (safer practice):
Asks before deleting each file
Cp
cp - Copy files or folders
Copy entire data folder to backup with today's date in the name
Touch
touch - Create an empty file or update timestamp
Creates 10 files: file1.txt, file2.txt, ... file10.txt
Every file has metadata that includes timestamps:
Last modified time - when the file content was last changed
Last accessed time - when the file was last opened
When you use touch on an existing file, it updates these timestamps to the current time WITHOUT changing the file's content.
Cat
The cat command is used to view the contents of a file. It's short for "concatenate", which is a fancy way of saying "put things together". It can feel like a confusing name if you're using the cat command to view a single file, but it makes more sense when you're using it to view multiple files at once.
You can do something like this:
This would read the contents of error.log and redirect (chain) it to grep command which will search for the word “date”.
Or this:
This would read the contents of example.txt and pipe it into wc command.
Head/tail
Sometimes you don't want to print everything in a file. Files can be really big after all.
The head Command
The head command prints the first n lines of a file, where n is a number you specify.
If you don't specify a number, it will default to 10.
The tail Command
The tail command prints the last n lines of a file, where n is a number you specify.
Less/more
less and more - they're both commands for viewing files, but less is the more powerful one.
The more and less commands let you view the contents of a file, one page (or line) at a time.
In the context of these commands, less is literally more. The less command does everything that the more command does but also has more features. As a general rule, you should use less instead of more.
You would only use more if you're on a system that doesn't have less installed.
more Command
The older, simpler file viewer:
What you can do:
Press
Space- go to next pagePress
Enter- go down one linePress
q- quitThat's basically it!
Limitations:
You can only scroll DOWN (not back up)
Once you pass something, you can't go back to see it
Less features overall
less Command
The newer, better file viewer:
What you can do:
Press
SpaceorPage Down- go to next pagePress
borPage Up- go BACK up a pagePress
Arrow keys- move up/down line by linePress
/searchterm- search for textPress
n- go to next search resultPress
N- go to previous search resultPress
g- go to beginning of filePress
G- go to END of filePress
q- quit
Why it's better:
You can scroll both up AND down
You can search within the file
It doesn't load the entire file into memory (great for huge files)
Much more control
Which
The which command is used in Unix-like systems (Linux, macOS) to find the full path of an executable file that would be run when you type a command. It searches through directories listed in your system's PATH environment variable to locate the specified program. For example, typing which ls would show the path to the ls command's executable file.
Uname
uname -a- Print all system informationuname -s- Print kernel nameuname -r- Print kernel release
Date
Show current date and time:
Output: Thu Oct 2 16:45:23 AQTT 2025
Show date in specific format:
Output: 2025-10-02
Show time only:
Output: 16:45:23
CURL
Transfer data to/from servers (Client URL).
What it does:curl is a command-line tool for making HTTP/HTTPS requests. It's like a browser, but for the terminal. You can download files, interact with APIs, send data, and test web services.
Think of it as: A programmable web browser for the command line
Basic GET request (fetch webpage):
Download data from a URL:
Download JSON data from API and save it to dataset.json file
Save with original filename:
Downloads and saves as 'dataset.csv' (keeps original name)
Working with APIs
GET request with headers:
Sends request with authentication and specifies JSON response
POST request with JSON data:
Creates new user by sending JSON data
POST data from file:
The @ symbol reads data from file
PUT request (update):
Updates user with ID 123
DELETE request:
Deletes user with ID 123
Why it's essential for data engineers:
Fetch data from APIs
Download datasets from URLs
Test API endpoints
Automate data ingestion
Monitor web services
Read
Bash style:
With timeout:
Waits 10 seconds for input, if no response, continues with defaults
Zsh style:
Find
find - Search for files based on criteria
Example:
Finds all CSV files larger than 100MB modified in the last 7 days. mtime stands for “modified time”
Delete old log files:
Finds and deletes log files older than 30 days
Find and process files:
Finds all JSON files, counts lines in each, then sums them up
Tee
The tee command in Bash reads from standard input and writes to both standard output AND one or more files simultaneously. Think of it like a "T" pipe fitting in plumbing - the data flow splits in two directions.
Syntax:
tee [-ai] [file ...]-aAppend the output to the files rather than overwriting them.-iIgnore the SIGINT signal.fileA pathname of an output file.
tee is almost always used with an upstream source because its whole purpose is to duplicate data flowing through a pipeline.
Typical usage pattern:
Common examples:
This is incredibly useful for logging command output while still monitoring it in real-time, or when you need to save intermediate results in a pipeline.
Can you use tee without a pipe?
Technically yes, but it's uncommon:
SIGINT
SIGINT is a signal (Signal Interrupt) sent to a process, typically when you press Ctrl+C in the terminal. It's a request for the program to terminate gracefully. The process can catch this signal and handle it (e.g., clean up resources before exiting) or ignore it.
Common signals include:
SIGINT (2): Interrupt from keyboard (Ctrl+C)
SIGTERM (15): Termination request
SIGKILL (9): Forceful kill (cannot be caught or ignored)
Explanation of the -i option:
By default: If
teereceives a signal like SIGINT (Ctrl+C), it does what any normal program would do - it terminates immediatelyWith the
-ioption: The-iflag tellsteeto ignore SIGINT signals
Why is this useful?
Imagine you have a long-running command pipeline:
If you press Ctrl+C, SIGINT goes to all processes in the pipeline. Without -i, tee would stop immediately, breaking the pipeline. With -i:
Now tee will ignore Ctrl+C and keep running, allowing the data flow to continue even if you accidentally hit Ctrl+C or intentionally want to stop only certain parts of the pipeline.
Tar
Archive and compress files
Create compressed backup with exclusions:
Creates compressed archive excluding temp files and cache folder, with date in filename
Extract to specific directory:
Extracts compressed archive to a specific location
List contents without extracting:
Shows only CSV files inside the archive without extracting
rsync - Remote/local file synchronization tool
rsync - Remote/local file synchronization toolWhat it does:rsync is a file copying/syncing tool that only transfers the differences between source and destination. It's much smarter and faster than regular cp command, especially for large files or when syncing repeatedly.
Think of it as: Smart copy that only updates what changed
Key advantages over cp:
Only copies changed files (not everything)
Can resume interrupted transfers
Shows progress
Works over network (SSH)
Preserves permissions, timestamps, ownership
Can delete files in destination that don't exist in source
Basic syntax:
Important note about trailing slashes:
Common Examples
Simple local sync:
Syncs data folder to backup (archive mode, verbose)
Sync with progress bar:
Shows progress, human-readable sizes
Sync to remote server:
Syncs from remote server to local machine (with compression)
Zip / unzip
Zip / unzipCompress and extract zip files
Create zip with password protection:
Creates encrypted zip file (will prompt for password)
Zip multiple directories:
Combines multiple folders into one zip file
Unzip to specific directory:
Extracts zip contents to specific location
List contents without extracting:
Unzip specific file:
Extracts only one specific file from the zip
Gzip and gunzip - Compress and decompress files
Gzip and gunzip - Compress and decompress filesWhat they do:
gzipcompresses files (makes them smaller)gunzipdecompresses files (restores original)
Think of it as: ZIP files for Linux (but only for single files)
File extension: .gz
gzip - Compress files
Basic syntax:
What happens:
Original file gets compressed
Creates
filename.gzOriginal file is DELETED (replaced with compressed version)
Simple Examples
Compress a file:
Creates data.csv.gz, deletes data.csv
Keep original file:
Creates data.csv.gz, KEEPS data.csv
Compress multiple files:
Each file becomes file1.txt.gz, file2.txt.gz, file3.txt.gz
gunzip - Decompress files
Basic syntax:
What happens:
Compressed file gets decompressed
Creates original filename
Compressed file is DELETED
Simple Examples
Decompress a file:
Creates data.csv, deletes data.csv.gz
Keep compressed file:
Top
TopReal-time system monitoring
Basic usage:
Shows live view of processes, CPU, memory usage
Once inside top:
Press
M- sort by memory usagPress
P- sort by CPU usagePress
k- kill a process (then enter PID)Press
q- quit
Run top in batch mode (for logging):
Takes one snapshot of system state and saves top 20 lines to fil
Monitor specific user's processes:
Shows only processes belonging to specific user
Show only specific number of processes:
Shows top 15 processes once (useful for scripts)
Alternative: htop (more user-friendly if installed):
Interactive, colorful, easier to use than top
Awk - Pattern scanning and text processing tool
Awk - Pattern scanning and text processing toolWhat it does:awk is a powerful programming language designed for processing text files, especially structured data like CSV files. It works by reading files line-by-line and letting you perform operations on specific columns (fields).
Think of it as: Excel formulas for the command line
Best for:
Extracting specific columns from CSV/tab-delimited files
Performing calculations on data (sum, average, count)
Filtering rows based on conditions
Reformatting structured data
Simple example:
Prints columns 1 and 3 from a CSV file
More complex example:
For rows where column 3 > 100, calculate the average of column 4
How it works:
F','= field separator is comma (for CSV files)$1, $2, $3= column 1, column 2, column 3$0= entire lineYou can use conditions, loops, and calculations
Sed - Stream editor for find/replace and text transformation
Sed - Stream editor for find/replace and text transformationWhat it does:sed is a tool for editing text in a stream (line by line). It's most commonly used for find-and-replace operations, but can also delete lines, insert text, and transform data.
Think of it as: Find and Replace on steroids
Best for:
Finding and replacing text in files
Deleting specific lines
Extracting specific line ranges
Modifying text without opening an editor
Simple example:
Replaces all occurrences of "old" with "new"
More complex example:
Converts date format from YYYY-MM-DD to DD/MM/YYYY
Common operations:
s/find/replace/g= substitute (find and replace)/pattern/d= delete lines matching pattern10,20d= delete lines 10-20i= edit file in-place (modify the actual file)
Time
What it does:time measures how long a command takes to run. It shows three different time measurements.
Basic usage:
Example:
Output explanation:
real = actual time that passed (what you'd see on a stopwatch)
user = time CPU spent running your program
sys = time CPU spent on system operations (file I/O, etc.)
Real-world examples:
Save timing to variable:
Diff
Common Uses
Compare two files:
Side-by-side comparison:
Shows files next to each other
Unified format (like Git):
Shows context around changes
Ignore whitespace differences:
Compare directories:
Shows which files are different
Brief output (just show which files differ):
Colorized output:
Understanding diff Output
Format: <line_number><action><line_number>
a= addc= changed= delete
Grep
grep - Search text using patterns
Basic search
Case-insensitive search
Search recursively in directories
Show line numbers
Invert match (show lines that don't match)
You can also search multiple files at once. For example, if we wanted to search for the word "hello" in hello.txt and hello2.txt, we could run:
Recursive Search
You can also search an entire directory, including all subdirectories. For example, to search for the word "hello" in the current directory and all subdirectories:
The . is a special alias for the current directory.
Sort
The sort command in Linux/Unix is used to sort lines of text files or input in various ways.
Basic Usage
Common Options
Sort Order:
Numeric Sorting:
Case Sensitivity:
Unique Values:
Real-World Use Cases
Find top 10 largest files:
Sort log entries by timestamp:
Get unique IP addresses from logs:
Sort processes by memory usage:
Uniq
The uniq command filters out or reports repeated lines. Important: It only detects adjacent duplicates, so the input usually needs to be sorted first.
Basic Syntax
Basic Usage
Common Options
Count occurrences:
Show only duplicates:
Show only unique lines:
Ignore case:
Cut
The cut command extracts sections from each line of files - great for working with columnar data.
Basic Syntax
Cutting by Characters
Cutting by Fields (Columns)
Custom Delimiters
Practical Examples
jq - JSON processor and query tool
jq - JSON processor and query toolWhat it does:jq is like grep, sed, and awk combined, but specifically for JSON data. It lets you parse, filter, transform, and extract data from JSON files or API responses.
Think of it as: SQL queries for JSON
Why it's essential for data engineering:
Most APIs return JSON
Modern logs are often in JSON format
Easy to extract specific fields from complex JSON
Basic syntax:
Common Examples
Pretty print JSON:
Makes JSON readable with proper indentation
Extract a specific field:
Output: "John"
Extract nested field:
Gets city from nested structure
Extract from array:
Gets name from first item in array
Extract multiple fields:
Output: "John" and 30 on separate lines
Filter array based on condition:
Shows only users older than 25
Create new JSON structure:
Transforms JSON with new field names
Extract to CSV:
Converts JSON array to CSV format
Count items in array:
Returns number of items
Get all values of a specific field:
Extracts all names from array of users
Filter and transform:
Gets only active users, shows only id and name fields
Real-World Data Engineering Example
Extract data from API response:
Fetches API data and extracts all emails
Convert JSON logs to CSV:
Filter error logs:
Count errors per day:
Extract nested data:
Combine multiple JSON files:
The -s flag slurps all files into one array
du
Disk Usage (check how much space files/folders use)
What it does: du shows how much disk space files and directories are using. It's essential for finding what's eating up your storage.
Think of it as: A disk space analyzer for the command line
Simple Examples
Check size of current directory:
Shows size of current directory and all subdirectories (in kilobytes)
Check size of specific folder:
Human-readable sizes:
Shows sizes as 1K, 234M, 2G instead of kilobytes
Summary only (total size):
Shows just one line with total size
Check multiple folders:
Find largest datasets:
Shows all folders in /data sorted by size
Check database size:
du Common Flags
-h
Human-readable (KB, MB, GB)
-s
Summary only (total)
-a
Show all files (not just directories)
-c
Show grand total at end
-d N
Max depth of N levels
--max-depth=N
Same as -d N
-k
Show in kilobytes
-m
Show in megabytes
history
View history of previously run commands.
ln - Create links (shortcuts to files)
ln - Create links (shortcuts to files)What it does: Creates links to files or directories. There are two types: hard links and symbolic (soft) links.
Think of it as: Creating shortcuts or aliases to files
Symbolic Links (Soft Links) - Most Common
Create a symbolic link:
Example:
Creates a shortcut called report_link.txt that points to the original file
Link to directory:
Check if it's a link:
Output shows: lrwxr-xr-x ... report_link.txt -> /home/user/documents/report.txt
Create shortcut to frequently used directory:
su - Switch User
su - Switch UserWhat it does: Switches to another user account. Stands for "substitute user" or "switch user".
Basic syntax:
Switch to root:
Prompts for root password
Switch to root with root's environment:
The dash (-) loads root's environment variables and home directory
Switch to specific user:
Prompts for bob's password
Exit back to your user:
sudo - Execute command as another user (usually root)
sudo - Execute command as another user (usually root)What it does: Runs a single command with elevated privileges (usually as root). Stands for "superuser do".
Think of it as: Temporary admin powers for one command
Basic syntax:
Example:
Runs apt update as root
Common Uses
Install software:
Edit system files:
View protected files:
Operators
Arithmetic Operators
+- Addition-- Subtraction*- Multiplication/- Division%- Modulus (remainder)*- Exponentiation
Example:
Comparison Operators
For numeric comparisons
For string comparisons
Example:
Logical Operators
Example:
&& - Run next command ONLY if previous succeeds
Note: && operator signifies conditional execution. The core function of && is to create a dependency between commands. The command to the right of && will only run if the command to its left exits with a status of 0. In Bash, an exit status of 0 conventionally signifies success, while any non-zero exit status indicates failure.
If mkdir fails (e.g., the directory already exists), cd will not be attempted.
Short-circuiting: The && operator exhibits "short-circuiting" behavior. If the first command fails, Bash immediately stops evaluating the expression and does not execute the subsequent commands linked by &&. This is efficient as it avoids unnecessary operations.
|| - Run next command ONLY if previous fails
Syntax:
Behavior:
command2 runs ONLY if command1 exits with non-zero (failure)
If command1 succeeds, command2 never runs
Examples:
; (Semicolon) - Run next command REGARDLESS
Syntax:
Behavior:
command2 runs no matter what
Doesn't care if command1 succeeded or failed
Examples:
! (NOT) - Negate exit status
Syntax:
Examples:
Combining Operators
AND then OR:
If command1 succeeds, run command2; if either fails, run command3
Example:
Grouping with parentheses:
Real-World Examples
Safe script execution:
Permissions
Each file and directory in Unix systems has permissions associated with them.
You have to ask 2 questions when talking about permissions:
Who has the permissions?
What permissions do they have?
Any user accessing a specific file/directory may or may not have access to read it, write to it, or execute it.
Both permissions, i.e. who and what are represented by a 10-character string. Here are examples for each type of file:
Regular Files
Directories
Special Files
What do these characters mean?
1 character is always either
-ord, so that a user recognizes if it’s a directory or not.
Regular file (e.g. -rwxrwxrwx)
Directory (e.g. drwxrwxrwx)
The next 3 characters
r,w,xrepresent the three permissions - read, write, execute. Who are they apply to? Usually the owner, i.e. the one who created the file, or else if changed afterwards manually.Each permissions has a state: granted or not granted. If it’s granted, there is a letter present and
-if not. Example:r-xmeans owner can read and execute but not write. owner can
Finally, the next 6 characters are another 2 sets of
rwx. The second set ofrwxapplies to the group instead of the owner. And the last set applies to everyone else.
Changing permissions
For more information: https://www.stationx.net/linux-file-permissions-cheat-sheet/#def-per
chmod command (stands for "change mode”)
Example: chmod -R u=rwx,g=,o= DIRECTORY. This means:
The owner can read, write, and execute
The group can do nothing
Others can do nothing
In the command above, u means "user" (aka "owner"), g means "group", and o means "others". The = means "set the permissions to the following", and the rwx means "read, write and execute". The g= and o= mean "set group and other permissions to nothing". The -R means "recursively", which means "do this to all of the contents of the directory as well".
Remember, . is a special alias for the current directory.
There is symbolic and numeric notations for permission definition:
Symbolic notation:
u= user/ownerg= groupo= othera= all (user + group + other)
Numeric notation:
First digit = owner permissions (instead of the first three letters)
Second digit = group permissions (instead of the second three letters)
Third digit = other permissions (instead of the third three letters)
So chmod 755 file means:
7(rwx) for owner5(r-x) for group5(r-x) for other
Common Permission Patterns
755: Executable files (owner can do everything, others can read/execute)644: Regular files (owner can read/write, others read-only)600: Private files (only owner can read/write)777: Full access for everyone (generally avoided for security)
chown (stands for “change owner”)
What it does:chown changes who owns a file or directory. In Unix/Linux systems, every file has an owner (user) and a group. This command lets you change either or both.
Think of it as: Transferring ownership of files to different users
Why it matters:
Control who can access/modify files
Fix permission issues
Set up proper access for web servers, databases, etc.
Essential for multi-user systems and servers
Basic Syntax
or
Running scripts
Creating a script
A "shebang" is a special line at the top of a script that tells your shell which program to use to execute the file. It’s a shell interpreter.
The format of a shebang is:
For example, if your script is a Python script and you want to use Python 3, your shebang might look like this:
This tells the system to use the Python 3 interpreter located at /usr/bin/python3 to run the script.
If you're writing scripts that need to work on bash and zsh shells, use the portable version:
Running a script
If the program is in the current directory, you need to prefix it with ./ to run it:
./program.sh
There is also an option to run a script that can’t be closed by ctrl+c combination. You can run it by adding force after your script name, e.g.: ./program.sh force
Writing Professional Bash Script Headers
When creating Bash scripts for professional or collaborative environments, including a well-structured header makes your code more maintainable and easier to understand. Here's how to document your scripts effectively.
Header Placement
Place your documentation header immediately after the shebang line (#!/bin/bash) and before any executable code. Use the # symbol to create comments that won't be executed.
Essential Header Information
A professional script header should include these five key pieces of information:
Author - Who wrote the script (name or username)
Creation Date - When the script was originally created
Last Modified - When the script was last updated
Description - A brief explanation of what the script does
Usage - How to run the script, including any arguments or flags
Why This Matters
Including these details helps anyone who encounters your script (including your future self) quickly understand:
Its purpose and functionality
Who to contact with questions
Whether it's current or potentially outdated
How to execute it correctly
Example Header
Additional Considerations
For more complex scripts, you might also include:
Version number for tracking script evolution
Dependencies listing required tools or packages
License information for shared or open-source code
Contact information such as email or support channels
Adopting this convention from the start establishes good habits and makes your scripts production-ready.
Shell configuration
Bash and Zsh both have configuration files that run automatically each time you start a new shell session. These files are used to set up your shell environment. They can be used to set up aliases, functions, and environment variables.
These files are located in your home directory (~) and are hidden by default. The ls command has a -a flag that will show hidden files:
If you're using Bash,
.bashrcis probably the file you want to edit.If you're using Zsh,
.zshrcis probably the file you want to edit or create if it doesn't yet exist.
Environment variables
Apart from regular variables, there is another type of variable called an environment variable. They are available to all programs that you run in your shell.
You can view all of the environment variables that are currently set in your shell with the env command.
To set a variable in your shell, use the export command:
What's particularly useful is that any programs or scripts you execute in your shell will inherit access to these environment variables.
To demonstrate this, let's create a simple script file named greet.sh:
Now we can make it executable and run it:
You can also temporarily set a variable for a single command, instead of exporting it (exporting means the variable will persist until you close the shell).
For example:
Your shell comes with several environment variables that are essentially "standard" - meaning various programs and system components recognize and utilize them automatically. The PATH variable is a prime example of this.
Why is the PATH Variable Important?
PATH Variable Important?Without the PATH variable, you'd need to specify the complete filesystem location for every command you want to execute. Rather than simply typing ls, you'd be forced to type /bin/ls (or wherever the ls program lives on your particular system). This would be extremely tedious.
The PATH variable contains a collection of directory paths that your shell searches through whenever you enter a command. When you type ls, your shell examines each directory listed in PATH looking for an executable file named ls. Once found, it executes that program. If no matching executable is discovered, you'll receive a "command not found" error.
You can view your current PATH setting with this command:
This will display a long string of directory paths separated by colons (:). Each path represents a location where your shell searches for executable programs.
Note: Restarting your shell session will reset the PATH variable to its default.
Adding a directory to PATH
To add a directory to your PATH without overwriting all of the existing directories, use the export command and reference the existing PATH variable:
As you know, this is a temporary change until your session is closed, therefore you won’t be able to use executables from anywhere.
Permanently adding a directory to PATH
The most common way to do this is to add the same export command that you used in the last lesson to your shell's configuration file.
Man command
The man command is short for "manual". It's a program that displays the manual for other programs.
The man command functions only with programs that have documentation available in the manual system, though fortunately this includes most shell built-ins and standard Unix utilities. To use it, simply provide the command name as an argument. The logical starting point is to examine the manual for the manual system itself:
How to search for what you need:
Command flag conventions
The availability and nature of command flags depends entirely on how each program's developer designed it. However, most Unix commands follow established patterns:
Single-letter flags use one dash as a prefix (e.g.,
v)Word-based flags use two dashes as a prefix (e.g.,
-version)Many commands offer both short and long versions of the same option (e.g.,
vand-version)
Help flag
Standard practice among mature command-line applications is to include a "help" feature that displays usage instructions. This assistance is typically accessible through one of these methods:
-help(long flag format)h(short flag format)help(as the initial argument)
The help output tends to be more digestible than comprehensive man documentation. Rather than serving as exhaustive reference material, it functions more like a concise getting-started tutorial.
Nano editor
Ctrl+Oto save the file (confirm any prompts with "enter")Ctrl+Xto exit the editor.
There should be a list of commands at the bottom of the screen.
Program Exit Codes
Exit codes (also known as "return codes" or "status codes") serve as a communication mechanism for programs to indicate whether their execution completed successfully.
A program returns 0 to signal successful completion. All other exit codes indicate some form of failure or error condition. In most cases when something goes wrong, you'll see exit code 1, which serves as a general-purpose error indicator.
These exit codes enable programs to monitor and respond to the success or failure of other programs they execute. For instance, at Boot.dev, our monitoring system checks the exit code of our server application - if it terminates with a non-zero code, our monitoring automatically restarts the service and records the failure for investigation.
Within your shell environment, you can examine the exit code from the most recently executed command using the special variable $?. Here are some practical examples:
Standard output (stdout), Standard error (stderr), Standard input (stdin)
Redirecting Streams
You can redirect stdout and stderr to different places using the > and 2> operators. > redirects stdout, and 2> redirects stderr.
Capturing Standard Output to a File
Note: use >> to append to a file instead of rewriting it.
Capturing Error Output to a File
In this demonstration, ls is used to deliberately trigger an error message (attempting to list a directory that doesn't exist), and this error output gets redirected into errors.log.
Standard input
Since we have standard output, it makes sense that there would also be standard input, correct?
"Standard Input," commonly referred to as "stdin," represents the default source from which programs receive their input data. It functions as a data stream that applications can consume during their execution.
Note: The read command prompts for and accepts user input from stdin (standard input).
Piping
Among the shell's most elegant features is the ability to chain programs together by sending one program's output directly into another program's input. This single mechanism enables remarkably sophisticated automation workflows.
The Pipe Operator
The pipe symbol is | - a vertical line character typically found on the same key as the backslash (\\) above your enter key. This operator captures the stdout from the command on its left side and feeds it as stdin to the command on its right side.
In this demonstration, the echo command produces the text "I find your lack of faith disturbing" as its output. Rather than displaying this text in your terminal, the pipe operator redirects it to the wc (word count) utility. The wc program tallies the words in whatever input it receives, and the -w flag instructs it to report only the word count.
This functionality works because wc, like most command-line utilities, can accept input from stdin as an alternative to reading from a file path.
Xargs
xargs is a powerful bash command that builds and executes commands from standard input. It's particularly useful for handling situations where you need to pass a large number of arguments to a command, or when you want to convert input into arguments for another command.
Basic Concept
xargs reads items from standard input (separated by spaces or newlines) and passes them as arguments to another command. Think of it as a bridge that converts input lines into command arguments.
Simple Examples
Interrupt and Kill
Interrupt
Occasionally, a running program will become unresponsive or you'll need to terminate it. This typically happens when:
The command contains an error and isn't behaving as expected
The program is attempting network operations while you're offline
You're processing large datasets and decide not to wait for completion
A software defect is causing the application to freeze
When you encounter these situations, you can terminate the program using ctrl + c. This keyboard combination sends a "SIGINT" (interrupt signal) to the running process, instructing it to terminate gracefully.
Kill
Occasionally, a program becomes completely unresponsive (or behaves maliciously) and ignores the SIGINT signal entirely. When this occurs, your best approach is to open a separate shell session (another terminal window) and forcibly terminate the problematic process.
Command Format
PID represents "process ID" - a unique numerical identifier assigned to every running process on your system. To discover the process IDs currently active on your machine, you can use the ps ("process status") command:
The "aux" flags specify "display all processes, including those belonging to other users, with detailed information for each process".
More about Scripting
Positional arguments
Positional arguments allow your script to accept input from the command line when executed.
Basic Positional Parameters
When you run a script like ./script.sh arg1 arg2 arg3, Bash automatically assigns these values to special variables:
$0: The script name itself$1: First argument$2: Second argument$3: Third argument... and so on up to
$9${10}: Tenth argument and beyond (use braces)
Example:
Running ./greet.sh Alice 30 outputs:
Special Parameter Variables
$#: Number of arguments passed to the script$@: All arguments as separate words$*: All arguments as a single word$?: Exit status of the last command$$: Process ID of the current script$!: Process ID of the last background command
Example:
Looping Through Arguments
Data types
Bash is not a strongly-typed language - it treats almost everything as strings by default. However, it does support some data structures:
1. Variables (Strings/Numbers)
Basic variables:
Everything is a string unless you do math:
2. Arrays (Indexed)
What they are: Ordered lists of values, accessed by numeric index (0, 1, 2, ...)
Creating Arrays
Method 1: Direct assignment
Method 2: Individual assignment
Method 3: Empty array
Accessing Array Elements
Get single element:
Get all elements:
Get array length:
Get length of specific element:
Modifying Arrays
Add element:
Update element:
Remove element:
Remove entire array:
Looping Through Arrays
Method 1: For loop
Method 2: Index-based loop
Method 3: C-style loop
Array Slicing
Get subset:
3. Associative Arrays (Bash 4.0+)
What they are: Key-value pairs (like dictionaries in Python or objects in JavaScript)
Must declare first:
Creating Associative Arrays
Method 1: Individual assignment
Method 2: All at once
Accessing Associative Arrays
Get value by key:
Get all keys:
Get all values:
Check if key exists:
Looping Through Associative Arrays
Loop through keys and values:
Output:
4. Strings
String operations:
5. Integers (with declare)
Declare as integer:
Real-World Examples
Example 1: Processing Files
Example 2: Configuration
Example 3: Log Levels
Example 4: Data Pipeline
Example 5: Environment Variables
Example 6: User Data
Example 7: Counting
Output:
Array vs Associative Array
Keys
Numbers (0,1,2...)
Strings
Declaration
Optional
declare -A required
Access
${arr[0]}
${arr[key]}
Use case
Lists, sequences
Key-value pairs
Common Patterns
Read file into array:
Split string into array:
Command output to array:
Check if element exists:
Remove duplicates:
Limitations
No multi-dimensional arrays (natively):
No true objects:
Quick Reference
Indexed Arrays:
Associative Arrays:
Bash Control Structures: Loops, Conditionals, and More
This guide covers the essential control structures in Bash scripting that allow you to create dynamic, decision-making scripts.
Conditionals
If Statements
The if statement lets you execute code based on conditions.
Basic syntax:
If-else:
If-elif-else:
Example:
String comparisons:
=: equal to!=: not equal toz: string is emptyn: string is not empty
File tests:
f: file exists and is a regular filed: directory existsr: file is readablew: file is writablex: file is executablee: file exists (any type)
Example:
Case Statements
Use case for multiple conditions based on pattern matching.
Example:
Loops
For Loop
Iterate over a list of items.
Basic syntax:
Examples:
While Loop
Execute code while a condition is true.
Example:
Until Loop
Execute code until a condition becomes true (opposite of while).
Example:
Loop Control
break : Exit the loop entirely
continue : Skip to the next iteration
Example:
Functions
Define reusable blocks of code.
Basic syntax:
Example:
Practical examples
Example 1: File Backup Script
Example 2: Menu System
Practical Combined Example using some of the commands
Complete backup workflow:
Last updated