Commands and Scripting

Guide to Bash commands and scripting

This is not an exhaustive guide, therefore here's additional sources of information just in case:

Exercises: https://www.learnshell.org/
Another guide: https://tldp.org/LDP/abs/html/
https://www.freecodecamp.org/news/the-linux-commands-handbook/#heading-the-linux-gzip-command

What is a Shell?

A shell is a command-line interpreter - it's the program that takes the commands you type and translates them into actions the operating system can understand. It's called a "shell" because it wraps around the operating system kernel, providing a user interface to access system functions.

Bash (Bourne Again Shell)

Bash is the most widely used shell, especially on Linux systems:

Default on most Linux distributions and older macOS versions
Written in C
Highly compatible - most shell scripts you find online are written for bash
Rich scripting capabilities with good documentation
Stable and mature - been around since 1989
Extensive history and tab completion

Zsh (Z Shell)

Zsh is a more modern shell with enhanced features:

Default on newer macOS (since Catalina)
Better autocompletion - more intelligent suggestions
Advanced globbing - more powerful pattern matching
Themes and plugins - highly customizable (especially with Oh My Zsh)
Better interactive features - spelling correction, shared history

You’ll likely use Bash or Zsh if you’re using MacOS. In order to switch between them temporarily, just type their name: bash or zsh.

But if you want to permanently change the default shell, use these commands:

chsh -s /bin/bash    # set bash as default
chsh -s /bin/zsh     # set zsh as default

Check which one you’re currently using:

echo $SHELL

Expansions

All kinds of expansions

Let's consider one type of expansions.

Brace Expansion

Brace expansion is a convenient Bash feature that generates multiple strings from a pattern containing braces. It happens before any other expansions and allows you to create multiple arguments or strings efficiently.

Increment: You can specify an increment in the brace expansion, such as {1..10..2} to get 1 3 5 7 9
Zero-padding: You can prefix numbers with 0 to force consistent width, e.g., {01..10} would expand to 01 02 ... 10.

Example 1

{1..10} utilizes brace expansion to generate a sequence of numbers from 1 to 10.

Printing the sequence.

echo {1..10}

This command will output:

1 2 3 4 5 6 7 8 9 10

Using in a for loop.

for i in {1..10}; do
    echo "Current number: $i"
done

This loop will iterate, assigning each number from 1 to 10 to the variable i in turn, and print a message for each.

Example 2

echo file{1,2,3}.txt

# Output: file1.txt file2.txt file3.txt

echo {a..z}
# Output: a b c d e f g h i j k l m n o p q r s t u v w x y z

echo {0..10..2}
# Output: 0 2 4 6 8 10

echo {10..1..2}
# Output: 10 8 6 4 2

Nested Braces

You can nest brace expansions for more complex patterns:

echo {a,b}{1,2}
# Output: a1 a2 b1 b2

echo {{A..Z},{a..z}}
# Output: A B C ... Z a b c ... z

Practical Uses

Creating directories:

mkdir -p project/{src,bin,docs,tests}

Backing up files:

cp file.txt{,.backup}
# Expands to: cp file.txt file.txt.backup

Batch renaming or operations:

touch report_{jan,feb,mar,apr}.txt
mv photo.{jpg,png}  # rename photo.jpg to photo.png

Important notes:

No variables in brace expansion. You cannot directly use variables within brace expansion for the start and end values. For example, echo {$from..$to} where from=1 and to=10 will not work as expected; it would literally output {$from..$to}. For variable-based ranges, consider using the seq command or a traditional for ((i=start; i<=end; i++)) loop.
Brace expansion doesn't use wildcards or match existing files—it just generates text
No spaces should appear inside the braces unless you want them in the output

Simple commands

Echo

echo - Print text to the screen

echo "Processing complete at $(date)" >> log.txt

Appends a timestamped message to a log file

Ls

ls - List the contents of a directory (shows files and folders).

Combinations

ls -lah      # See everything with details and readable sizes
ls -lth      # Recent files first with readable sizes
ls -lhS      # Biggest files first
ls -lat      # All files, newest first (including hidden)

Option

What it does

-l

Long format (detailed)

-a

Show all (including hidden)

-h

Human-readable sizes

-t

Sort by time

-S

Sort by size

-r

Reverse order

-R

Recursive (subdirectories)

Pwd

Print current working directory

Cd

cd - Change directory (move to a different folder)

cd ~/projects/data-pipeline && ls -la

Go to data pipeline folder and immediately list all files

Mkdir

mkdir - Make a new directory (create a folder)

mkdir -p data/{raw,processed,archive}

Creates nested folder structure: data/raw, data/processed, data/archive

Rmdir

Remove empty directories only.

rmdir [directory_1] [directory_2]

You can specify one or many directories to remove.

Remove nested empty directories:

rmdir -p path/to/empty/dirs

Mv

mv - Move or rename files/folders

mv *.csv backup/ && echo "Moved $(ls backup/*.csv | wc -l) files"

Move all CSV files to backup folder and count how many were moved

Rm

The remove command deletes a file or empty directory:

rm some_file.txt

Remove directory with contents (DANGEROUS):

You can optionally add a -r flag to tell the rm command to delete a directory and all of its contents recursively. "Recursively" is just a fancy way of saying "do it again on all of the subdirectories and their contents".

rm -r some_directory

Remove with confirmation (safer practice):

rm -ri folder_name

Asks before deleting each file

Cp

cp - Copy files or folders

cp -r /source/data /backup/data_$(date +%Y%m%d)

Copy entire data folder to backup with today's date in the name

Touch

touch - Create an empty file or update timestamp

touch file{1..10}.txt

Creates 10 files: file1.txt, file2.txt, ... file10.txt

Every file has metadata that includes timestamps:

Last modified time - when the file content was last changed
Last accessed time - when the file was last opened

When you use touch on an existing file, it updates these timestamps to the current time WITHOUT changing the file's content.

Cat

The cat command is used to view the contents of a file. It's short for "concatenate", which is a fancy way of saying "put things together". It can feel like a confusing name if you're using the cat command to view a single file, but it makes more sense when you're using it to view multiple files at once.

# Print the contents of a file to the terminal
cat file1.txt

# Concatenate the contents of multiple files and print them to the terminal
cat file1.txt file2.txt

You can do something like this:

cat error.log | grep date

This would read the contents of error.log and redirect (chain) it to grep command which will search for the word “date”.

Or this:

cat example.txt | wc
# Example output:  4     102     637
# number of newlines, words, characters

This would read the contents of example.txt and pipe it into wc command.

Head/tail

Sometimes you don't want to print everything in a file. Files can be really big after all.

The head Command

The head command prints the first n lines of a file, where n is a number you specify.

head -n 10 file1.txt

If you don't specify a number, it will default to 10.

The tail Command

The tail command prints the last n lines of a file, where n is a number you specify.

tail -n 10 file1.txt

Less/more

less and more - they're both commands for viewing files, but less is the more powerful one.

The more and less commands let you view the contents of a file, one page (or line) at a time.

In the context of these commands, less is literally more. The less command does everything that the more command does but also has more features. As a general rule, you should use less instead of more.

You would only use more if you're on a system that doesn't have less installed.

more Command

The older, simpler file viewer:

more filename.txt

What you can do:

Press Space - go to next page
Press Enter - go down one line
Press q - quit
That's basically it!

Limitations:

You can only scroll DOWN (not back up)
Once you pass something, you can't go back to see it
Less features overall

less Command

The newer, better file viewer:

less filename.txt

What you can do:

Press Space or Page Down - go to next page
Press b or Page Up - go BACK up a page
Press Arrow keys - move up/down line by line
Press /searchterm - search for text
Press n - go to next search result
Press N - go to previous search result
Press g - go to beginning of file
Press G - go to END of file
Press q - quit

Why it's better:

You can scroll both up AND down
You can search within the file
It doesn't load the entire file into memory (great for huge files)
Much more control

Which

The which command is used in Unix-like systems (Linux, macOS) to find the full path of an executable file that would be run when you type a command. It searches through directories listed in your system's PATH environment variable to locate the specified program. For example, typing which ls would show the path to the ls command's executable file.

Uname

uname -a - Print all system information
uname -s - Print kernel name
uname -r - Print kernel release

Date

Show current date and time:

date

Output: Thu Oct 2 16:45:23 AQTT 2025

Show date in specific format:

date +%Y-%m-%d

Output: 2025-10-02

Show time only:

date +%H:%M:%S

Output: 16:45:23

CURL

Transfer data to/from servers (Client URL).

What it does:curl is a command-line tool for making HTTP/HTTPS requests. It's like a browser, but for the terminal. You can download files, interact with APIs, send data, and test web services.

Think of it as: A programmable web browser for the command line

Basic GET request (fetch webpage):

curl <https://example.com>

Download data from a URL:

curl -o dataset.json "<https://api.example.com/data?limit=1000>"

Download JSON data from API and save it to dataset.json file

Save with original filename:

curl -O <https://example.com/dataset.csv>

Downloads and saves as 'dataset.csv' (keeps original name)

Working with APIs

GET request with headers:

curl -H "Authorization: Bearer YOUR_TOKEN" \\
     -H "Accept: application/json" \\
     <https://api.example.com/users>

Sends request with authentication and specifies JSON response

POST request with JSON data:

curl -X POST <https://api.example.com/users> \\
     -H "Content-Type: application/json" \\
     -d '{"name":"Alice","email":"alice@example.com"}'

Creates new user by sending JSON data

POST data from file:

curl -X POST <https://api.example.com/upload> \\
     -H "Content-Type: application/json" \\
     -d @data.json

The @ symbol reads data from file

PUT request (update):

curl -X PUT <https://api.example.com/users/123> \\
     -H "Content-Type: application/json" \\
     -d '{"name":"Alice Updated"}'

Updates user with ID 123

DELETE request:

curl -X DELETE <https://api.example.com/users/123> \\
     -H "Authorization: Bearer TOKEN"

Deletes user with ID 123

Why it's essential for data engineers:

Fetch data from APIs
Download datasets from URLs
Test API endpoints
Automate data ingestion
Monitor web services

Read

Bash style:

read -p "Enter database name: " db_name

With timeout:

read -t 10 -p "Continue? (yes/no): " answer
if [ -z "$answer" ]; then echo "Timeout! Proceeding with defaults"; fi

Waits 10 seconds for input, if no response, continues with defaults

Zsh style:

echo -n "Enter database name: "
read db_name

Find

find - Search for files based on criteria

Example:

find /data -name "*.csv" -size +100M -mtime -7

Finds all CSV files larger than 100MB modified in the last 7 days. mtime stands for “modified time”

Delete old log files:

find /var/log/app -name "*.log" -mtime +30 -delete

Finds and deletes log files older than 30 days

Find and process files:

find ./data -name "*.json" -exec wc -l {} \\; | awk '{sum+=$1} END {print sum}'

Finds all JSON files, counts lines in each, then sums them up

Tee

The tee command in Bash reads from standard input and writes to both standard output AND one or more files simultaneously. Think of it like a "T" pipe fitting in plumbing - the data flow splits in two directions.

Syntax:

tee [-ai] [file ...]
- -a Append the output to the files rather than overwriting them.
- -i Ignore the SIGINT signal.
- file A pathname of an output file.

tee is almost always used with an upstream source because its whole purpose is to duplicate data flowing through a pipeline.

Typical usage pattern:

upstream-command | tee file.txt | downstream-command
```

The data flow looks like:
```
upstream → tee → stdout (to screen or next command)
            ↓
          file.txt

Common examples:

# Save output to a file while still seeing it on screen
ls -la | tee directory-listing.txt

# Append instead of overwriting with -a
echo "new log entry" | tee -a logfile.txt

# Write to multiple files
echo "data" | tee file1.txt file2.txt file3.txt

# Combine with sudo to write to protected files
echo "config line" | sudo tee /etc/some-config-file

#####

# Capture build output while watching it
make | tee build.log

# Save curl response while piping to jq
curl https://api.example.com/data | tee response.json | jq '.results'

# Log script output
./my-script.sh | tee script-output.log

# Debug a pipeline by saving intermediate results
cat data.csv | process1 | tee after-process1.txt | process2 | tee final.txt

This is incredibly useful for logging command output while still monitoring it in real-time, or when you need to save intermediate results in a pipeline.

Can you use tee without a pipe?

Technically yes, but it's uncommon:

tee file.txt
# Then you type input manually, and it echoes to screen + saves to file
# Press Ctrl+D to end

SIGINT

SIGINT is a signal (Signal Interrupt) sent to a process, typically when you press Ctrl+C in the terminal. It's a request for the program to terminate gracefully. The process can catch this signal and handle it (e.g., clean up resources before exiting) or ignore it.

Common signals include:

SIGINT (2): Interrupt from keyboard (Ctrl+C)
SIGTERM (15): Termination request
SIGKILL (9): Forceful kill (cannot be caught or ignored)

Explanation of the -i option:

By default: If tee receives a signal like SIGINT (Ctrl+C), it does what any normal program would do - it terminates immediately
With the -i option: The -i flag tells tee to ignore SIGINT signals

Why is this useful?

Imagine you have a long-running command pipeline:

some-long-process | tee output.txt | another-process

If you press Ctrl+C, SIGINT goes to all processes in the pipeline. Without -i, tee would stop immediately, breaking the pipeline. With -i:

some-long-process | tee -i output.txt | another-process

Now tee will ignore Ctrl+C and keep running, allowing the data flow to continue even if you accidentally hit Ctrl+C or intentionally want to stop only certain parts of the pipeline.

Tar

Archive and compress files

Create compressed backup with exclusions:

tar -czf backup_$(date +%Y%m%d).tar.gz --exclude='*.tmp' --exclude='cache/' /home/user/project

Creates compressed archive excluding temp files and cache folder, with date in filename

Extract to specific directory:

tar -xzf data_archive.tar.gz -C /destination/folder

Extracts compressed archive to a specific location

List contents without extracting:

tar -tzf backup.tar.gz | grep "*.csv"

Shows only CSV files inside the archive without extracting

`rsync` - Remote/local file synchronization tool

What it does:rsync is a file copying/syncing tool that only transfers the differences between source and destination. It's much smarter and faster than regular cp command, especially for large files or when syncing repeatedly.

Think of it as: Smart copy that only updates what changed

Key advantages over cp:

Only copies changed files (not everything)
Can resume interrupted transfers
Shows progress
Works over network (SSH)
Preserves permissions, timestamps, ownership
Can delete files in destination that don't exist in source

Basic syntax:

rsync [options] source/ destination/

Important note about trailing slashes:

rsync source/ dest/     # Copies CONTENTS of source into dest
rsync source dest/      # Copies source FOLDER itself into dest

Common Examples

Simple local sync:

rsync -av /home/data/ /backup/data/

Syncs data folder to backup (archive mode, verbose)

Sync with progress bar:

rsync -avh --progress /large/dataset/ /backup/

Shows progress, human-readable sizes

Sync to remote server:

rsync -avz user@server:/remote/data/ /local/backup/

Syncs from remote server to local machine (with compression)

`Zip` / `unzip`

Compress and extract zip files

Create zip with password protection:

zip -r -e secure_data.zip sensitive_files/

Creates encrypted zip file (will prompt for password)

Zip multiple directories:

zip -r archive.zip folder1/ folder2/ folder3/

Combines multiple folders into one zip file

Unzip to specific directory:

unzip data.zip -d /destination/folder

Extracts zip contents to specific location

List contents without extracting:

unzip -l archive.zip | grep "*.csv"

Unzip specific file:

unzip archive.zip "data/important.csv" -d ./

Extracts only one specific file from the zip

`Gzip` and `gunzip` - Compress and decompress files

What they do:

gzip compresses files (makes them smaller)
gunzip decompresses files (restores original)

Think of it as: ZIP files for Linux (but only for single files)

File extension: .gz

gzip - Compress files

Basic syntax:

gzip filename

What happens:

Original file gets compressed
Creates filename.gz
Original file is DELETED (replaced with compressed version)

Simple Examples

Compress a file:

gzip data.csv

Creates data.csv.gz, deletes data.csv

Keep original file:

gzip -k data.csv

Creates data.csv.gz, KEEPS data.csv

Compress multiple files:

gzip file1.txt file2.txt file3.txt

Each file becomes file1.txt.gz, file2.txt.gz, file3.txt.gz

gunzip - Decompress files

Basic syntax:

gunzip filename.gz

What happens:

Compressed file gets decompressed
Creates original filename
Compressed file is DELETED

Simple Examples

Decompress a file:

gunzip data.csv.gz

Creates data.csv, deletes data.csv.gz

Keep compressed file:

gunzip -k data.csv.gz

`Top`

Real-time system monitoring

Basic usage:

top

Shows live view of processes, CPU, memory usage

Once inside top:

Press M - sort by memory usag
Press P - sort by CPU usage
Press k - kill a process (then enter PID)
Press q - quit

Run top in batch mode (for logging):

top -b -n 1 | head -20 > system_snapshot.txt

Takes one snapshot of system state and saves top 20 lines to fil

Monitor specific user's processes:

top -u username

Shows only processes belonging to specific user

Show only specific number of processes:

top -n 1 -b | head -15

Shows top 15 processes once (useful for scripts)

Alternative: htop (more user-friendly if installed):

htop

Interactive, colorful, easier to use than top

`Awk` - Pattern scanning and text processing tool

What it does:awk is a powerful programming language designed for processing text files, especially structured data like CSV files. It works by reading files line-by-line and letting you perform operations on specific columns (fields).

Think of it as: Excel formulas for the command line

Best for:

Extracting specific columns from CSV/tab-delimited files
Performing calculations on data (sum, average, count)
Filtering rows based on conditions
Reformatting structured data

Simple example:

awk -F',' '{print $1, $3}' data.csv

Prints columns 1 and 3 from a CSV file

More complex example:

awk -F',' '$3 > 100 {sum += $4; count++} END {print "Average:", sum/count}' sales.csv

For rows where column 3 > 100, calculate the average of column 4

How it works:

F',' = field separator is comma (for CSV files)
$1, $2, $3 = column 1, column 2, column 3
$0 = entire line
You can use conditions, loops, and calculations

`Sed` - Stream editor for find/replace and text transformation

What it does:sed is a tool for editing text in a stream (line by line). It's most commonly used for find-and-replace operations, but can also delete lines, insert text, and transform data.

Think of it as: Find and Replace on steroids

Best for:

Finding and replacing text in files
Deleting specific lines
Extracting specific line ranges
Modifying text without opening an editor

Simple example:

sed 's/old/new/g' file.txt

Replaces all occurrences of "old" with "new"

More complex example:

sed -E 's/([0-9]{4})-([0-9]{2})-([0-9]{2})/\\3\\/\\2\\/\\1/g' dates.txt

Converts date format from YYYY-MM-DD to DD/MM/YYYY

Common operations:

s/find/replace/g = substitute (find and replace)
/pattern/d = delete lines matching pattern
10,20d = delete lines 10-20
i = edit file in-place (modify the actual file)

Time

What it does:time measures how long a command takes to run. It shows three different time measurements.

Basic usage:

time command

Example:

time ls -R /

Output explanation:

real    0m2.456s    # Total elapsed time (wall clock)
user    0m1.234s    # CPU time spent in user mode
sys     0m0.890s    # CPU time spent in system/kernel mode

real = actual time that passed (what you'd see on a stopwatch)
user = time CPU spent running your program
sys = time CPU spent on system operations (file I/O, etc.)

Real-world examples:

# Measure script execution
time python data_processing.py

# Compare performance of commands
time grep "error" huge.log
time awk '/error/' huge.log

# Measure data pipeline
time ./etl_pipeline.sh

Save timing to variable:

start=$(date +%s)
python script.py
end=$(date +%s)
echo "Took $((end - start)) seconds"

Diff

Common Uses

Compare two files:

diff old_config.txt new_config.txt

Side-by-side comparison:

diff -y file1.txt file2.txt

Shows files next to each other

Unified format (like Git):

diff -u original.py modified.py

Shows context around changes

Ignore whitespace differences:

diff -w file1.txt file2.txt

Compare directories:

diff -r dir1/ dir2/

Shows which files are different

Brief output (just show which files differ):

diff -q dir1/ dir2/

Colorized output:

diff --color file1.txt file2.txt

Understanding diff Output

Format: <line_number><action><line_number>

a = add
c = change
d = delete

2c2       # Line 2 changed
< old     # < means from first file
---       # separator
> new     # > means from second file

Grep

grep - Search text using patterns

Basic search


grep "pattern" file.txt

Case-insensitive search

grep -i "pattern" file.txt

Search recursively in directories

grep -r "pattern" /path/to/directory

Show line numbers

grep -n "pattern" file.txt

Invert match (show lines that don't match)

grep -v "pattern" file.txt

You can also search multiple files at once. For example, if we wanted to search for the word "hello" in hello.txt and hello2.txt, we could run:

grep "hello" hello.txt hello2.txt

Recursive Search

You can also search an entire directory, including all subdirectories. For example, to search for the word "hello" in the current directory and all subdirectories:

grep -r "hello" .

The . is a special alias for the current directory.

Sort

The sort command in Linux/Unix is used to sort lines of text files or input in various ways.

Basic Usage

sort filename          # Sort file alphabetically
sort file1 file2       # Sort multiple files together
cat file | sort        # Sort output from another command

Common Options

Sort Order:

sort file              # Ascending order (default)
sort -r file           # Reverse order (descending)

Numeric Sorting:

sort -n file           # Sort numerically (10 comes after 2)
sort -h file           # Human-readable numbers (1K, 2M, 3G)
sort -g file           # General numeric (handles scientific notation)

Case Sensitivity:

sort -f file           # Case-insensitive (fold case)
sort file              # Case-sensitive (uppercase first by default)

Unique Values:

sort -u file           # Sort and remove duplicates

Real-World Use Cases

Find top 10 largest files:

du -h * | sort -hr | head -10

Sort log entries by timestamp:

sort -k 1,2 access.log

Get unique IP addresses from logs:

cat access.log | awk '{print $1}' | sort -u

Sort processes by memory usage:

ps aux | sort -k 4 -rn | head -10

Uniq

The uniq command filters out or reports repeated lines. Important: It only detects adjacent duplicates, so the input usually needs to be sorted first.

Basic Syntax

uniq [OPTION] [INPUT [OUTPUT]]

Basic Usage

uniq file.txt              # Remove adjacent duplicate lines
sort file.txt | uniq       # Remove all duplicates (sort first!)

Common Options

Count occurrences:

uniq -c file.txt           # Prefix lines with occurrence count
sort file.txt | uniq -c    # Count all duplicates

Show only duplicates:

uniq -d file.txt           # Show only duplicate lines
uniq -D file.txt           # Show all duplicate lines (not just one)

Show only unique lines:

uniq -u file.txt           # Show only lines that appear once

Ignore case:

uniq -i file.txt           # Case-insensitive comparison

Cut

The cut command extracts sections from each line of files - great for working with columnar data.

Basic Syntax

cut OPTION [FILE]

Cutting by Characters

cut -c 1-5 file.txt        # Extract characters 1-5
cut -c 1,3,5 file.txt      # Extract characters 1, 3, and 5
cut -c 5- file.txt         # Extract from character 5 to end
cut -c -10 file.txt        # Extract first 10 characters

Cutting by Fields (Columns)

cut -f 1 file.txt          # Extract 1st field (tab-delimited by default)
cut -f 1,3 file.txt        # Extract 1st and 3rd fields
cut -f 2-4 file.txt        # Extract fields 2 through 4
cut -f 3- file.txt         # Extract from field 3 to end

Custom Delimiters

cut -d ',' -f 2 data.csv   # Use comma as delimiter, get 2nd field
cut -d ':' -f 1 /etc/passwd    # Extract usernames (colon-delimited)
cut -d ' ' -f 1,3 file.txt     # Use space as delimiter

Practical Examples

# Get usernames from /etc/passwd
cut -d ':' -f 1 /etc/passwd

# Extract email domain
echo "user@example.com" | cut -d '@' -f 2

# Get second column from CSV
cut -d ',' -f 2 employees.csv

# Get filename without extension
echo "file.txt" | cut -d '.' -f 1

# Extract IP addresses (first 3 columns of dot-separated)
cut -d '.' -f 1-3 ips.txt

`jq` - JSON processor and query tool

What it does:jq is like grep, sed, and awk combined, but specifically for JSON data. It lets you parse, filter, transform, and extract data from JSON files or API responses.

Think of it as: SQL queries for JSON

Why it's essential for data engineering:

Most APIs return JSON
Modern logs are often in JSON format
Easy to extract specific fields from complex JSON

Basic syntax:

jq 'filter' file.json

Common Examples

Pretty print JSON:

echo '{"name":"John","age":30}' | jq '.'

Makes JSON readable with proper indentation

Extract a specific field:

jq '.name' user.json

Output: "John"

Extract nested field:

jq '.user.address.city' data.json

Gets city from nested structure

Extract from array:

jq '.[0].name' users.json

Gets name from first item in array

Extract multiple fields:

jq '.name, .age' user.json

Output: "John" and 30 on separate lines

Filter array based on condition:

jq '.[] | select(.age > 25)' users.json

Shows only users older than 25

Create new JSON structure:

jq '{username: .name, user_age: .age}' user.json

Transforms JSON with new field names

Extract to CSV:

jq -r '.[] | [.id, .name, .email] | @csv' users.json

Converts JSON array to CSV format

Count items in array:

jq '. | length' array.json

Returns number of items

Get all values of a specific field:

jq '.[].name' users.json

Extracts all names from array of users

Filter and transform:

jq '[.[] | select(.status == "active") | {id, name}]' users.json

Gets only active users, shows only id and name fields

Real-World Data Engineering Example

Extract data from API response:

curl -s '<https://api.example.com/users>' | jq '.data[].email'

Fetches API data and extracts all emails

Convert JSON logs to CSV:

cat logs.json | jq -r '[.timestamp, .level, .message] | @csv' > logs.csv

Filter error logs:

jq 'select(.level == "ERROR")' app.log

Count errors per day:

jq -r '.timestamp' errors.json | cut -d'T' -f1 | sort | uniq -c

Extract nested data:

jq '.results[] | {user: .user.name, score: .metrics.score}' data.json

Combine multiple JSON files:

jq -s '.' file1.json file2.json > combined.json

The -s flag slurps all files into one array

du

Disk Usage (check how much space files/folders use)

What it does: du shows how much disk space files and directories are using. It's essential for finding what's eating up your storage.

Think of it as: A disk space analyzer for the command line

Simple Examples

Check size of current directory:

du

Shows size of current directory and all subdirectories (in kilobytes)

Check size of specific folder:

du /home/user/data

Human-readable sizes:

du -h

Shows sizes as 1K, 234M, 2G instead of kilobytes

Summary only (total size):

du -sh folder_name

Shows just one line with total size

2.3G    folder_name

Check multiple folders:

du -sh folder1 folder2 folder3

1.2G    folder1
500M    folder2
3.4G    folder3

Find largest datasets:

du -sh /data/* | sort -hr

Shows all folders in /data sorted by size

Check database size:

du -sh /var/lib/postgresql/

du Common Flags

Flag

What it does

-h

Human-readable (KB, MB, GB)

-s

Summary only (total)

-a

Show all files (not just directories)

-c

Show grand total at end

-d N

Max depth of N levels

--max-depth=N

Same as -d N

-k

Show in kilobytes

-m

Show in megabytes

history

View history of previously run commands.

`ln` - Create links (shortcuts to files)

What it does: Creates links to files or directories. There are two types: hard links and symbolic (soft) links.

Think of it as: Creating shortcuts or aliases to files

Symbolic Links (Soft Links) - Most Common

Create a symbolic link:

ln -s /path/to/original /path/to/link

Example:

ln -s /home/user/documents/report.txt report_link.txt

Creates a shortcut called report_link.txt that points to the original file

Link to directory:

ln -s /data/projects/current /home/user/current_project

Check if it's a link:

ls -l report_link.txt

Output shows: lrwxr-xr-x ... report_link.txt -> /home/user/documents/report.txt

Create shortcut to frequently used directory:

ln -s /var/log/application ~/logs
cd ~/logs  # Now you can easily access logs

`su` - Switch User

What it does: Switches to another user account. Stands for "substitute user" or "switch user".

Basic syntax:

su username

Switch to root:

su

Prompts for root password

Switch to root with root's environment:

su -

The dash (-) loads root's environment variables and home directory

Switch to specific user:

su bob

Prompts for bob's password

Exit back to your user:

exit

`sudo` - Execute command as another user (usually root)

What it does: Runs a single command with elevated privileges (usually as root). Stands for "superuser do".

Think of it as: Temporary admin powers for one command

Basic syntax:

sudo command

Example:

sudo apt update

Runs apt update as root

Common Uses

Install software:

sudo apt install python3

Edit system files:

sudo nano /etc/hosts

View protected files:

sudo cat /var/log/auth.log

Operators

Arithmetic Operators

+ - Addition
- - Subtraction
* - Multiplication
/ - Division
% - Modulus (remainder)
* - Exponentiation

Example:

result=$((5 + 3))
echo $result  # Outputs: 8

Comparison Operators

For numeric comparisons

eq # Equal to
-ne # Not equal to
-gt # Greater than
-lt # Less than
-ge # Greater than or equal to
-le # Less than or equal to

For string comparisons

==    # Equal to
!=    # Not equal to
<     # Less than (ASCII alphabetical order)
>     # Greater than (ASCII alphabetical order)
-z    # String is null (zero length)
-n    # String is not null

Example:

if [ "$a" -eq "$b" ]; then
    echo "a is equal to b"
fi

if [ "$str1" == "$str2" ]; then
    echo "Strings are equal"
fi

Logical Operators

&&    # AND
||    # OR
!     # NOT

Example:

if [ "$a" -gt 0 ] && [ "$a" -lt 10 ]; then
    echo "a is between 0 and 10"
fi

&& - Run next command ONLY if previous succeeds

Note: && operator signifies conditional execution. The core function of && is to create a dependency between commands. The command to the right of && will only run if the command to its left exits with a status of 0. In Bash, an exit status of 0 conventionally signifies success, while any non-zero exit status indicates failure.

mkdir my_new_directory && cd my_new_directory

If mkdir fails (e.g., the directory already exists), cd will not be attempted.

Short-circuiting: The && operator exhibits "short-circuiting" behavior. If the first command fails, Bash immediately stops evaluating the expression and does not execute the subsequent commands linked by &&. This is efficient as it avoids unnecessary operations.

|| - Run next command ONLY if previous fails

Syntax:

command1 || command2

Behavior:

command2 runs ONLY if command1 exits with non-zero (failure)
If command1 succeeds, command2 never runs

Examples:

# Try to use preferred command, fallback to alternative
command -v python3 || echo "Python3 not found"

# Create directory or print error
mkdir mydir || echo "Failed to create directory"

# Try multiple alternatives
ping google.com || ping 8.8.8.8 || echo "No internet"

; (Semicolon) - Run next command REGARDLESS

Syntax:

command1 ; command2

Behavior:

command2 runs no matter what
Doesn't care if command1 succeeded or failed

Examples:

# Run both no matter what
cd /somewhere ; ls

# Execute sequence
mkdir temp ; cd temp ; touch file.txt

! (NOT) - Negate exit status

Syntax:

! command

Examples:

# Run if file does NOT exist
! [ -f myfile.txt ] && touch myfile.txt

# Invert grep result
! grep "error" log.txt && echo "No errors found"

Combining Operators

AND then OR:

command1 && command2 || command3

If command1 succeeds, run command2; if either fails, run command3

Example:

mkdir mydir && cd mydir || echo "Failed to create/enter directory"

Grouping with parentheses:

(command1 && command2) || command3

Real-World Examples

Safe script execution:

#!/bin/bash
# Exit if any command fails
cd /data/projects || exit 1
python script.py || exit 1
echo "Success!"

Permissions

Each file and directory in Unix systems has permissions associated with them.

You have to ask 2 questions when talking about permissions:

Who has the permissions?
What permissions do they have?
1. Any user accessing a specific file/directory may or may not have access to read it, write to it, or execute it.

Both permissions, i.e. who and what are represented by a 10-character string. Here are examples for each type of file:

Regular Files

rw-r--r-- # Regular file, owner: read/write, group/others: read-only
-rwxr-xr-x # Executable file, owner: all permissions, group/others: read/execute
-rw------- # Private file, only owner can read/write
-rwxrwxrwx # Full permissions for everyone (rarely used)
-r--r--r-- # Read-only for everyone

Directories

drwxr-xr-x   # Standard directory, owner: full access, others: read/enter
drwx------   # Private directory, only owner can access
drwxrwxrwx   # Public directory, everyone has full access
dr-xr-xr-x   # Directory without write permission for owner

Special Files

lrwxrwxrwx   # Symbolic link (permissions shown are for the link itself)
crw-rw-rw-   # Character device file
brw-r-----   # Block device file
srwxrwxrwx   # Socket file
prw-r--r--   # Named pipe (FIFO)

What do these characters mean?

1 character is always either - or d , so that a user recognizes if it’s a directory or not.

Regular file (e.g. -rwxrwxrwx)

Directory (e.g. drwxrwxrwx)

The next 3 characters r,w,x represent the three permissions - read, write, execute. Who are they apply to? Usually the owner, i.e. the one who created the file, or else if changed afterwards manually.
- Each permissions has a state: granted or not granted. If it’s granted, there is a letter present and - if not. Example: r-x means owner can read and execute but not write. owner can
Finally, the next 6 characters are another 2 sets of rwx. The second set of rwx applies to the group instead of the owner. And the last set applies to everyone else.

Changing permissions

For more information: https://www.stationx.net/linux-file-permissions-cheat-sheet/#def-per

chmod command (stands for "change mode”)

Example: chmod -R u=rwx,g=,o= DIRECTORY. This means:

The owner can read, write, and execute
The group can do nothing
Others can do nothing

In the command above, u means "user" (aka "owner"), g means "group", and o means "others". The = means "set the permissions to the following", and the rwx means "read, write and execute". The g= and o= mean "set group and other permissions to nothing". The -R means "recursively", which means "do this to all of the contents of the directory as well".

Remember, . is a special alias for the current directory.

There is symbolic and numeric notations for permission definition:

Symbolic notation:

chmod u+x filename      # Add execute for owner
chmod g-w filename      # Remove write for group 
chmod o=r filename      # Set other to read-only
chmod a+r filename      # Add read for all

u = user/owner
g = group
o = other
a = all (user + group + other)

Numeric notation:

chmod 755 filename      # rwxr-xr-x
chmod 644 filename      # rw-r--r--
chmod 600 filename      # rw-------

First digit = owner permissions (instead of the first three letters)
Second digit = group permissions (instead of the second three letters)
Third digit = other permissions (instead of the third three letters)

So chmod 755 file means:

7 (rwx) for owner
5 (r-x) for group
5 (r-x) for other

Common Permission Patterns

755: Executable files (owner can do everything, others can read/execute)
644: Regular files (owner can read/write, others read-only)
600: Private files (only owner can read/write)
777: Full access for everyone (generally avoided for security)

chown (stands for “change owner”)

What it does:chown changes who owns a file or directory. In Unix/Linux systems, every file has an owner (user) and a group. This command lets you change either or both.

Think of it as: Transferring ownership of files to different users

Why it matters:

Control who can access/modify files
Fix permission issues
Set up proper access for web servers, databases, etc.
Essential for multi-user systems and servers

Basic Syntax

chown user:group filename

chown user filename          # Change only owner
chown :group filename        # Change only group
chown user:group filename    # Change both

Running scripts

Creating a script

A "shebang" is a special line at the top of a script that tells your shell which program to use to execute the file. It’s a shell interpreter.

The format of a shebang is:

#! interpreter [optional-arg]

For example, if your script is a Python script and you want to use Python 3, your shebang might look like this:

#!/usr/bin/python3

This tells the system to use the Python 3 interpreter located at /usr/bin/python3 to run the script.

If you're writing scripts that need to work on bash and zsh shells, use the portable version:

#!/bin/bash    # Or #!/bin/zsh

Running a script

If the program is in the current directory, you need to prefix it with ./ to run it:

./program.sh

There is also an option to run a script that can’t be closed by ctrl+c combination. You can run it by adding force after your script name, e.g.: ./program.sh force

Writing Professional Bash Script Headers

When creating Bash scripts for professional or collaborative environments, including a well-structured header makes your code more maintainable and easier to understand. Here's how to document your scripts effectively.

Header Placement

Place your documentation header immediately after the shebang line (#!/bin/bash) and before any executable code. Use the # symbol to create comments that won't be executed.

Essential Header Information

A professional script header should include these five key pieces of information:

Author - Who wrote the script (name or username)
Creation Date - When the script was originally created
Last Modified - When the script was last updated
Description - A brief explanation of what the script does
Usage - How to run the script, including any arguments or flags

Why This Matters

Including these details helps anyone who encounters your script (including your future self) quickly understand:

Its purpose and functionality
Who to contact with questions
Whether it's current or potentially outdated
How to execute it correctly

Example Header

#!/bin/bash
#
# Author: Jane Smith
# Created: 2025-01-15
# Last Modified: 2025-03-20
# Description: Backup script for user data directories
# Usage: ./backup.sh [--full|--incremental] <destination>

# Script begins here
SOURCE_DIR="/home/users"
LOG_FILE="/var/log/backup.log"

Additional Considerations

For more complex scripts, you might also include:

Version number for tracking script evolution
Dependencies listing required tools or packages
License information for shared or open-source code
Contact information such as email or support channels

Adopting this convention from the start establishes good habits and makes your scripts production-ready.

Shell configuration

Bash and Zsh both have configuration files that run automatically each time you start a new shell session. These files are used to set up your shell environment. They can be used to set up aliases, functions, and environment variables.

These files are located in your home directory (~) and are hidden by default. The ls command has a -a flag that will show hidden files:

ls -a ~

If you're using Bash, .bashrc is probably the file you want to edit.
If you're using Zsh, .zshrc is probably the file you want to edit or create if it doesn't yet exist.

Environment variables

Apart from regular variables, there is another type of variable called an environment variable. They are available to all programs that you run in your shell.

You can view all of the environment variables that are currently set in your shell with the env command.

To set a variable in your shell, use the export command:

export NAME="Nariman" # declare a variable
echo $NAME # and use it

What's particularly useful is that any programs or scripts you execute in your shell will inherit access to these environment variables.

To demonstrate this, let's create a simple script file named greet.sh:

#!/bin/bash
echo "Hello, my name is $NAME"

Now we can make it executable and run it:

chmod +x greet.sh
./greet.sh
# Hello, my name is Nariman

You can also temporarily set a variable for a single command, instead of exporting it (exporting means the variable will persist until you close the shell).

For example:

WARN_MESSAGE="this works too" ./warn.sh

Your shell comes with several environment variables that are essentially "standard" - meaning various programs and system components recognize and utilize them automatically. The PATH variable is a prime example of this.

Why is the `PATH` Variable Important?

Without the PATH variable, you'd need to specify the complete filesystem location for every command you want to execute. Rather than simply typing ls, you'd be forced to type /bin/ls (or wherever the ls program lives on your particular system). This would be extremely tedious.

The PATH variable contains a collection of directory paths that your shell searches through whenever you enter a command. When you type ls, your shell examines each directory listed in PATH looking for an executable file named ls. Once found, it executes that program. If no matching executable is discovered, you'll receive a "command not found" error.

You can view your current PATH setting with this command:

echo $PATH

This will display a long string of directory paths separated by colons (:). Each path represents a location where your shell searches for executable programs.

Note: Restarting your shell session will reset the PATH variable to its default.

Adding a directory to PATH

To add a directory to your PATH without overwriting all of the existing directories, use the export command and reference the existing PATH variable:

export PATH="$PATH:/path/to/new"

As you know, this is a temporary change until your session is closed, therefore you won’t be able to use executables from anywhere.

Permanently adding a directory to PATH

The most common way to do this is to add the same export command that you used in the last lesson to your shell's configuration file.

Man command

The man command is short for "manual". It's a program that displays the manual for other programs.

The man command functions only with programs that have documentation available in the manual system, though fortunately this includes most shell built-ins and standard Unix utilities. To use it, simply provide the command name as an argument. The logical starting point is to examine the manual for the manual system itself:

# open the man pages for the 'man' command
man man

How to search for what you need:

man ls
# type '/-r' to start searching

# press 'n' to jump to the next result

# press 'N' to go back if you went too far

Command flag conventions

The availability and nature of command flags depends entirely on how each program's developer designed it. However, most Unix commands follow established patterns:

Single-letter flags use one dash as a prefix (e.g., v)
Word-based flags use two dashes as a prefix (e.g., -version)
Many commands offer both short and long versions of the same option (e.g., v and -version)

Help flag

Standard practice among mature command-line applications is to include a "help" feature that displays usage instructions. This assistance is typically accessible through one of these methods:

-help (long flag format)
h (short flag format)
help (as the initial argument)

The help output tends to be more digestible than comprehensive man documentation. Rather than serving as exhaustive reference material, it functions more like a concise getting-started tutorial.

Nano editor

Ctrl+O to save the file (confirm any prompts with "enter")
Ctrl+X to exit the editor.

There should be a list of commands at the bottom of the screen.

Program Exit Codes

Exit codes (also known as "return codes" or "status codes") serve as a communication mechanism for programs to indicate whether their execution completed successfully.

A program returns 0 to signal successful completion. All other exit codes indicate some form of failure or error condition. In most cases when something goes wrong, you'll see exit code 1, which serves as a general-purpose error indicator.

These exit codes enable programs to monitor and respond to the success or failure of other programs they execute. For instance, at Boot.dev, our monitoring system checks the exit code of our server application - if it terminates with a non-zero code, our monitoring automatically restarts the service and records the failure for investigation.

Within your shell environment, you can examine the exit code from the most recently executed command using the special variable $?. Here are some practical examples:

ls ~
echo $?
# 0

ls /invalid/directory/path
echo $?
# non-zero value (specific number varies by system)

Standard output (stdout), Standard error (stderr), Standard input (stdin)

Redirecting Streams

You can redirect stdout and stderr to different places using the > and 2> operators. > redirects stdout, and 2> redirects stderr.

Capturing Standard Output to a File

date > current_time.txt
cat current_time.txt
# Wed Sep 24 14:30:15 UTC 2025

Note: use >> to append to a file instead of rewriting it.

Capturing Error Output to a File

ls /nonexistent/path 2> errors.log
cat errors.log
# ls: cannot access '/nonexistent/path': No such file or directory

In this demonstration, ls is used to deliberately trigger an error message (attempting to list a directory that doesn't exist), and this error output gets redirected into errors.log.

Standard input

Since we have standard output, it makes sense that there would also be standard input, correct?

"Standard Input," commonly referred to as "stdin," represents the default source from which programs receive their input data. It functions as a data stream that applications can consume during their execution.

Note: The read command prompts for and accepts user input from stdin (standard input).

Piping

Among the shell's most elegant features is the ability to chain programs together by sending one program's output directly into another program's input. This single mechanism enables remarkably sophisticated automation workflows.

The Pipe Operator

The pipe symbol is | - a vertical line character typically found on the same key as the backslash (\\) above your enter key. This operator captures the stdout from the command on its left side and feeds it as stdin to the command on its right side.

echo "I find your lack of faith disturbing" | wc -w
# 7

In this demonstration, the echo command produces the text "I find your lack of faith disturbing" as its output. Rather than displaying this text in your terminal, the pipe operator redirects it to the wc (word count) utility. The wc program tallies the words in whatever input it receives, and the -w flag instructs it to report only the word count.

This functionality works because wc, like most command-line utilities, can accept input from stdin as an alternative to reading from a file path.

Xargs

xargs is a powerful bash command that builds and executes commands from standard input. It's particularly useful for handling situations where you need to pass a large number of arguments to a command, or when you want to convert input into arguments for another command.

Basic Concept

xargs reads items from standard input (separated by spaces or newlines) and passes them as arguments to another command. Think of it as a bridge that converts input lines into command arguments.

Simple Examples

# Find all .txt files and delete them
find . -name "*.txt" | xargs rm

# The above is roughly equivalent to running:
# rm file1.txt file2.txt file3.txt ...

# Count lines in multiple files
ls *.log | xargs wc -l

Interrupt and Kill

Interrupt

Occasionally, a running program will become unresponsive or you'll need to terminate it. This typically happens when:

The command contains an error and isn't behaving as expected
The program is attempting network operations while you're offline
You're processing large datasets and decide not to wait for completion
A software defect is causing the application to freeze

When you encounter these situations, you can terminate the program using ctrl + c. This keyboard combination sends a "SIGINT" (interrupt signal) to the running process, instructing it to terminate gracefully.

Kill

Occasionally, a program becomes completely unresponsive (or behaves maliciously) and ignores the SIGINT signal entirely. When this occurs, your best approach is to open a separate shell session (another terminal window) and forcibly terminate the problematic process.

Command Format

kill <PID>

PID represents "process ID" - a unique numerical identifier assigned to every running process on your system. To discover the process IDs currently active on your machine, you can use the ps ("process status") command:

ps aux

The "aux" flags specify "display all processes, including those belonging to other users, with detailed information for each process".

More about Scripting

Positional arguments

Positional arguments allow your script to accept input from the command line when executed.

Basic Positional Parameters

When you run a script like ./script.sh arg1 arg2 arg3, Bash automatically assigns these values to special variables:

$0 : The script name itself
$1 : First argument
$2 : Second argument
$3 : Third argument
... and so on up to $9
${10} : Tenth argument and beyond (use braces)

Example:

#!/bin/bash
# Usage: ./greet.sh John 25

echo "Script name: $0"
echo "Name: $1"
echo "Age: $2"

Running ./greet.sh Alice 30 outputs:

Script name: ./greet.sh
Name: Alice
Age: 30

Special Parameter Variables

$# : Number of arguments passed to the script
$@ : All arguments as separate words
$* : All arguments as a single word
$? : Exit status of the last command
$$ : Process ID of the current script
$! : Process ID of the last background command

Example:

#!/bin/bash

echo "Number of arguments: $#"
echo "All arguments: $@"
echo "Script PID: $$"

if [ $# -eq 0 ]; then
    echo "No arguments provided"
    exit 1
fi

Looping Through Arguments

#!/bin/bash

echo "Processing all arguments:"
for arg in "$@"; do
    echo "- $arg"
done

Data types

Bash is not a strongly-typed language - it treats almost everything as strings by default. However, it does support some data structures:

1. Variables (Strings/Numbers)

Basic variables:

name="John"
age=25
price=19.99

Everything is a string unless you do math:

x="10"
y="20"
echo $x$y     # Output: 1020 (string concatenation)

# Do math with $(( ))
result=$((x + y))
echo $result  # Output: 30

2. Arrays (Indexed)

What they are: Ordered lists of values, accessed by numeric index (0, 1, 2, ...)

Creating Arrays

Method 1: Direct assignment

fruits=("apple" "banana" "orange")

Method 2: Individual assignment

fruits[0]="apple"
fruits[1]="banana"
fruits[2]="orange"

Method 3: Empty array

my_array=()

Accessing Array Elements

Get single element:

fruits=("apple" "banana" "orange")
echo ${fruits[0]}    # apple
echo ${fruits[1]}    # banana
echo ${fruits[2]}    # orange

Get all elements:

echo ${fruits[@]}    # apple banana orange
echo ${fruits[*]}    # apple banana orange

Get array length:

echo ${#fruits[@]}   # 3

Get length of specific element:

echo ${#fruits[0]}   # 5 (length of "apple")

Modifying Arrays

Add element:

fruits+=("grape")
echo ${fruits[@]}    # apple banana orange grape

Update element:

fruits[1]="mango"
echo ${fruits[@]}    # apple mango orange grape

Remove element:

unset fruits[2]
echo ${fruits[@]}    # apple mango grape

Remove entire array:

unset fruits

Looping Through Arrays

Method 1: For loop

fruits=("apple" "banana" "orange")
for fruit in "${fruits[@]}"; do
    echo "I like $fruit"
done

Method 2: Index-based loop

for i in "${!fruits[@]}"; do
    echo "Index $i: ${fruits[$i]}"
done

Method 3: C-style loop

for ((i=0; i<${#fruits[@]}; i++)); do
    echo "${fruits[$i]}"
done

Array Slicing

Get subset:

numbers=(1 2 3 4 5 6 7 8 9 10)
echo ${numbers[@]:2:4}    # 3 4 5 6 (start at index 2, take 4 elements)
echo ${numbers[@]:5}      # 6 7 8 9 10 (from index 5 to end)

3. Associative Arrays (Bash 4.0+)

What they are: Key-value pairs (like dictionaries in Python or objects in JavaScript)

Must declare first:

declare -A person

Creating Associative Arrays

Method 1: Individual assignment

declare -A person
person[name]="John"
person[age]="30"
person[city]="New York"

Method 2: All at once

declare -A person=( [name]="John" [age]="30" [city]="New York" )

Accessing Associative Arrays

Get value by key:

echo ${person[name]}     # John
echo ${person[age]}      # 30

Get all keys:

echo ${!person[@]}       # name age city

Get all values:

echo ${person[@]}        # John 30 New York

Check if key exists:

if [[ -v person[name] ]]; then
    echo "Name exists"
fi

Looping Through Associative Arrays

Loop through keys and values:

for key in "${!person[@]}"; do
    echo "$key: ${person[$key]}"
done

Output:

name: John
age: 30
city: New York

4. Strings

String operations:

text="Hello World"

# Length
echo ${#text}              # 11

# Substring
echo ${text:0:5}           # Hello

# Replace
echo ${text/World/Bash}    # Hello Bash

# Uppercase
echo ${text^^}             # HELLO WORLD

# Lowercase
echo ${text,,}             # hello world

5. Integers (with declare)

Declare as integer:

declare -i number=10
number=number+5
echo $number    # 15 (no need for $(( )))

Real-World Examples

Example 1: Processing Files

files=("data1.csv" "data2.csv" "data3.csv")

for file in "${files[@]}"; do
    echo "Processing $file..."
    python process.py "$file"
done

Example 2: Configuration

declare -A config
config[host]="localhost"
config[port]="5432"
config[database]="mydb"
config[user]="admin"

echo "Connecting to ${config[host]}:${config[port]}"

Example 3: Log Levels

declare -A log_levels
log_levels[DEBUG]=0
log_levels[INFO]=1
log_levels[WARN]=2
log_levels[ERROR]=3

current_level=${log_levels[INFO]}
echo "Current log level: $current_level"

Example 4: Data Pipeline

# List of data sources
sources=("api1" "api2" "database1" "file_system")

# Process each source
for source in "${sources[@]}"; do
    echo "Extracting from $source..."
    ./extract.sh "$source"
done

Example 5: Environment Variables

declare -A environments
environments[dev]="development.server.com"
environments[staging]="staging.server.com"
environments[prod]="production.server.com"

env="prod"
echo "Deploying to ${environments[$env]}"

Example 6: User Data

declare -A users
users[john]="john@example.com"
users[alice]="alice@example.com"
users[bob]="bob@example.com"

# Send email to all users
for username in "${!users[@]}"; do
    email="${users[$username]}"
    echo "Sending email to $username at $email"
done

Example 7: Counting

declare -A word_count

# Count words in array
words=("apple" "banana" "apple" "orange" "banana" "apple")

for word in "${words[@]}"; do
    ((word_count[$word]++))
done

# Display counts
for word in "${!word_count[@]}"; do
    echo "$word: ${word_count[$word]}"
done

Output:

apple: 3
banana: 2
orange: 1

Array vs Associative Array

Feature

Indexed Array

Associative Array

Keys

Numbers (0,1,2...)

Strings

Declaration

Optional

declare -A required

Access

${arr[0]}

${arr[key]}

Use case

Lists, sequences

Key-value pairs

Common Patterns

Read file into array:

mapfile -t lines < file.txt
# or
readarray -t lines < file.txt

echo "Line 1: ${lines[0]}"
echo "Total lines: ${#lines[@]}"

Split string into array:

text="one,two,three,four"
IFS=',' read -ra parts <<< "$text"
echo ${parts[0]}    # one
echo ${parts[1]}    # two

Command output to array:

files=($(ls *.txt))
echo "Found ${#files[@]} text files"

Check if element exists:

fruits=("apple" "banana" "orange")

if [[ " ${fruits[@]} " =~ " banana " ]]; then
    echo "Found banana!"
fi

Remove duplicates:

original=("a" "b" "a" "c" "b")
unique=($(printf '%s\\n' "${original[@]}" | sort -u))
echo ${unique[@]}    # a b c

Limitations

No multi-dimensional arrays (natively):

No true objects:

Quick Reference

Indexed Arrays:

Associative Arrays:

Bash Control Structures: Loops, Conditionals, and More

This guide covers the essential control structures in Bash scripting that allow you to create dynamic, decision-making scripts.

Conditionals

If Statements

The if statement lets you execute code based on conditions.

Basic syntax:

If-else:

If-elif-else:

Example:

String comparisons:

= : equal to
!= : not equal to
z : string is empty
n : string is not empty

File tests:

f : file exists and is a regular file
d : directory exists
r : file is readable
w : file is writable
x : file is executable
e : file exists (any type)

Example:

Case Statements

Use case for multiple conditions based on pattern matching.

Example:

Loops

For Loop

Iterate over a list of items.

Basic syntax:

Examples:

While Loop

Execute code while a condition is true.

Example:

Until Loop

Execute code until a condition becomes true (opposite of while).

Example:

Loop Control

break : Exit the loop entirely
continue : Skip to the next iteration

Example:

Functions

Define reusable blocks of code.

Basic syntax:

Example:

Practical examples

Example 1: File Backup Script

Example 2: Menu System

Practical Combined Example using some of the commands

Complete backup workflow:

PreviousUsing Arrow for ML NextCron jobs

Last updated 2 months ago

hashtagWhat is a Shell?

hashtagBash (Bourne Again Shell)

hashtagZsh (Z Shell)

hashtagExpansions

hashtagSimple commands

hashtagEcho

hashtagLs

hashtagPwd

hashtagCd

hashtagMkdir

hashtagRmdir

hashtagMv

hashtagRm

hashtagCp

hashtagTouch

hashtagCat

hashtagHead/tail

hashtagLess/more

hashtagWhich

hashtagUname

hashtagDate

hashtagCURL

hashtagRead

hashtagFind

hashtagTee

hashtagTar

hashtagrsync - Remote/local file synchronization tool

hashtagZip / unzip

hashtagGzip and gunzip - Compress and decompress files

hashtagTop

hashtagAwk - Pattern scanning and text processing tool

hashtagSed - Stream editor for find/replace and text transformation

hashtagTime

hashtagDiff

hashtagGrep

hashtagSort

hashtagUniq

hashtagCut

hashtagjq - JSON processor and query tool

hashtagdu

hashtaghistory

hashtagln - Create links (shortcuts to files)

hashtagsu - Switch User

hashtagsudo - Execute command as another user (usually root)

hashtagOperators

hashtagArithmetic Operators

hashtagComparison Operators

hashtagLogical Operators

hashtagPermissions

hashtagChanging permissions

hashtagchmod command (stands for "change mode”)

hashtagCommon Permission Patterns

hashtagchown (stands for “change owner”)

hashtagRunning scripts

hashtagCreating a script

hashtagRunning a script

hashtagWriting Professional Bash Script Headers

hashtagShell configuration

hashtagEnvironment variables

hashtagWhy is the PATH Variable Important?

hashtagMan command

hashtagCommand flag conventions

hashtagHelp flag

hashtagNano editor

hashtagProgram Exit Codes

hashtagStandard output (stdout), Standard error (stderr), Standard input (stdin)

hashtagRedirecting Streams

hashtagStandard input

hashtagPiping

hashtagThe Pipe Operator

hashtagXargs

hashtagInterrupt and Kill

hashtagInterrupt

hashtagKill

hashtagMore about Scripting

hashtagPositional arguments

hashtagSpecial Parameter Variables

hashtagLooping Through Arguments

hashtagData types

hashtag1. Variables (Strings/Numbers)

What is a Shell?

Bash (Bourne Again Shell)

Zsh (Z Shell)

Expansions

Simple commands

Echo

Ls

Pwd

Cd

Mkdir

Rmdir

Mv

Rm

Cp

Touch

Cat

Head/tail

Less/more

Which

Uname

Date

CURL

Read

Find

Tee

Tar

`rsync` - Remote/local file synchronization tool

`Zip` / `unzip`

`Gzip` and `gunzip` - Compress and decompress files

`Top`

`Awk` - Pattern scanning and text processing tool

`Sed` - Stream editor for find/replace and text transformation

Time

Diff

Grep

Sort

Uniq

Cut

`jq` - JSON processor and query tool

du

history

`ln` - Create links (shortcuts to files)

`su` - Switch User

`sudo` - Execute command as another user (usually root)

Operators

Arithmetic Operators

Comparison Operators

Logical Operators

Permissions

Changing permissions

chmod command (stands for "change mode”)

Common Permission Patterns

chown (stands for “change owner”)

Running scripts

Creating a script

Running a script

Writing Professional Bash Script Headers

Shell configuration

Environment variables

Why is the `PATH` Variable Important?

Man command

Command flag conventions

Help flag

Nano editor

Program Exit Codes

Standard output (stdout), Standard error (stderr), Standard input (stdin)

Redirecting Streams

Standard input

Piping

The Pipe Operator

Xargs

Interrupt and Kill

Interrupt

Kill

More about Scripting

Positional arguments

Special Parameter Variables

Looping Through Arguments

Data types

1. Variables (Strings/Numbers)