# Commands and Scripting

This is not an exhaustive guide, therefore here's additional sources of information just in case:

* Exercises: <https://www.learnshell.org/>
* Another guide: <https://tldp.org/LDP/abs/html/>
* <https://www.freecodecamp.org/news/the-linux-commands-handbook/#heading-the-linux-gzip-command>
* <https://www.freecodecamp.org/news/bash-scripting-tutorial-linux-shell-script-and-command-line-for-beginners/>

***

### What is a Shell?

A **shell** is a command-line interpreter - it's the program that takes the commands you type and translates them into actions the operating system can understand. It's called a "shell" because it wraps around the operating system kernel, providing a user interface to access system functions.

#### Bash (Bourne Again Shell)

**Bash** is the most widely used shell, especially on Linux systems:

* **Default on most Linux distributions** and older macOS versions
* **Written in C**
* **Highly compatible** - most shell scripts you find online are written for bash
* **Rich scripting capabilities** with good documentation
* **Stable and mature** - been around since 1989
* **Extensive history and tab completion**

#### Zsh (Z Shell)

**Zsh** is a more modern shell with enhanced features:

* **Default on newer macOS** (since Catalina)
* **Better autocompletion** - more intelligent suggestions
* **Advanced globbing** - more powerful pattern matching
* **Themes and plugins** - highly customizable (especially with Oh My Zsh)
* **Better interactive features** - spelling correction, shared history

*You’ll likely use Bash or Zsh if you’re using MacOS. In order to switch between them temporarily, just type their name: `bash` or `zsh`.*

But if you want to permanently change the default shell, use these commands:

```bash
chsh -s /bin/bash    # set bash as default
chsh -s /bin/zsh     # set zsh as default
```

Check which one you’re currently using:

```bash
echo $SHELL
```

***

#### Expansions

[All kinds of expansions](https://www.gnu.org/software/bash/manual/html_node/Shell-Expansions.html)

Let's consider one type of expansions.

**Brace Expansion**

Brace expansion is a convenient Bash feature that generates multiple strings from a pattern containing braces. It happens before any other expansions and allows you to create multiple arguments or strings efficiently.

* **Increment:** You can specify an increment in the brace expansion, such as `{1..10..2}` to get 1 3 5 7 9
* **Zero-padding:** You can prefix numbers with 0 to force consistent width, e.g., `{01..10}` would expand to `01 02 ... 10`.

*Example 1*

`{1..10}` utilizes brace expansion to generate a sequence of numbers from 1 to 10.&#x20;

Printing the sequence.

```bash
echo {1..10}
```

This command will output:

```bash
1 2 3 4 5 6 7 8 9 10
```

Using in a for loop.

```bash
for i in {1..10}; do
    echo "Current number: $i"
done
```

This loop will iterate, assigning each number from 1 to 10 to the variable `i` in turn, and print a message for each.

*Example 2*

```bash
echo file{1,2,3}.txt

# Output: file1.txt file2.txt file3.txt

echo {a..z}
# Output: a b c d e f g h i j k l m n o p q r s t u v w x y z

echo {0..10..2}
# Output: 0 2 4 6 8 10

echo {10..1..2}
# Output: 10 8 6 4 2
```

**Nested Braces**

You can nest brace expansions for more complex patterns:

```bash
echo {a,b}{1,2}
# Output: a1 a2 b1 b2

echo {{A..Z},{a..z}}
# Output: A B C ... Z a b c ... z
```

**Practical Uses**

*Creating directories:*

```bash
mkdir -p project/{src,bin,docs,tests}
```

*Backing up files:*

```bash
cp file.txt{,.backup}
# Expands to: cp file.txt file.txt.backup
```

*Batch renaming or operations:*

```bash
touch report_{jan,feb,mar,apr}.txt
mv photo.{jpg,png}  # rename photo.jpg to photo.png
```

*Important notes:*

* **No variables in brace expansion.** You cannot directly use variables within brace expansion for the start and end values. For example, `echo {$from..$to}` where `from=1` and `to=10` will not work as expected; it would literally output `{$from..$to}`. For variable-based ranges, consider using the `seq` command or a traditional `for ((i=start; i<=end; i++))` loop.
* Brace expansion doesn't use wildcards or match existing files—it just generates text
* No spaces should appear inside the braces unless you want them in the output

***

### Simple commands

#### Echo

**`echo`** - Print text to the screen

```bash
echo "Processing complete at $(date)" >> log.txt
```

*Appends a timestamped message to a log file*

***

#### Ls

`ls` - **List** the contents of a directory (shows files and folders).

**Combinations**

```bash
ls -lah      # See everything with details and readable sizes
ls -lth      # Recent files first with readable sizes
ls -lhS      # Biggest files first
ls -lat      # All files, newest first (including hidden)
```

| Option | What it does                |
| ------ | --------------------------- |
| `-l`   | Long format (detailed)      |
| `-a`   | Show all (including hidden) |
| `-h`   | Human-readable sizes        |
| `-t`   | Sort by time                |
| `-S`   | Sort by size                |
| `-r`   | Reverse order               |
| `-R`   | Recursive (subdirectories)  |

***

#### Pwd

Print current working directory

***

#### Cd

**`cd`** - Change directory (move to a different folder)

```bash
cd ~/projects/data-pipeline && ls -la
```

*Go to data pipeline folder and immediately list all files*

***

#### Mkdir

**`mkdir`** - Make a new directory (create a folder)

```bash
mkdir -p data/{raw,processed,archive}
```

*Creates nested folder structure: data/raw, data/processed, data/archive*

***

#### Rmdir

Remove empty directories only.

```bash
rmdir [directory_1] [directory_2] 
```

*You can specify one or many directories to remove.*

**Remove nested empty directories:**

```bash
rmdir -p path/to/empty/dirs
```

***

#### Mv

**`mv`** - Move or rename files/folders

```bash
mv *.csv backup/ && echo "Moved $(ls backup/*.csv | wc -l) files"
```

*Move all CSV files to backup folder and count how many were moved*

***

#### **Rm**

The [remove command](https://www.ibm.com/docs/en/aix/7.3?topic=files-deleting-rm-command) deletes a file or empty directory:

```bash
rm some_file.txt
```

**Remove directory with contents (DANGEROUS):**

You can optionally add a `-r` flag to tell the `rm` command to delete a directory and *all* of its contents recursively. "Recursively" is just a fancy way of saying "do it again on all of the subdirectories and their contents".

```bash
rm -r some_directory
```

**Remove with confirmation (safer practice):**

```bash
rm -ri folder_name
```

*Asks before deleting each file*

***

#### Cp

**`cp`** - Copy files or folders

```bash
cp -r /source/data /backup/data_$(date +%Y%m%d)
```

*Copy entire data folder to backup with today's date in the name*

***

#### Touch

**`touch`** - Create an empty file or update timestamp

```bash
touch file{1..10}.txt
```

*Creates 10 files: file1.txt, file2.txt, ... file10.txt*

Every file has **metadata** that includes timestamps:

* **Last modified time** - when the file content was last changed
* **Last accessed time** - when the file was last opened

When you use `touch` on an **existing file**, it updates these timestamps to the current time WITHOUT changing the file's content.

***

#### Cat

The `cat` command is used to view the contents of a file. It's short for "concatenate", which is a fancy way of saying "put things together". It can feel like a confusing name if you're using the `cat` command to view a single file, but it makes more sense when you're using it to view multiple files at once.

```bash
# Print the contents of a file to the terminal
cat file1.txt
```

```bash
# Concatenate the contents of multiple files and print them to the terminal
cat file1.txt file2.txt
```

You can do something like this:

```bash
cat error.log | grep date
```

*This would read the contents of error.log and redirect (chain) it to grep command which will search for the word “date”.*

Or this:

```bash
cat example.txt | wc
# Example output:  4     102     637
# number of newlines, words, characters
```

*This would read the contents of example.txt and pipe it into `wc` command.*

***

#### Head/tail

Sometimes you don't want to print *everything* in a file. Files can be really big after all.

**The head Command**

The head command prints the first `n` lines of a file, where `n` is a number you specify.

```bash
head -n 10 file1.txt
```

If you don't specify a number, it will default to `10`.

**The tail Command**

The tail command prints the *last* `n` lines of a file, where `n` is a number you specify.

```bash
tail -n 10 file1.txt
```

***

#### Less/more

`less` and `more` - they're both commands for viewing files, but `less` is the more powerful one.

The more and less commands let you view the contents of a file, one page (or line) at a time.

In the context of these commands, `less` is *literally* `more`. The `less` command does everything that the `more` command does but also has more features. As a general rule, you should use `less` instead of `more`.

You would only use `more` if you're on a system that doesn't have `less` installed.

**`more` Command**

The **older, simpler** file viewer:

```bash
more filename.txt
```

**What you can do:**

* Press `Space` - go to next page
* Press `Enter` - go down one line
* Press `q` - quit
* **That's basically it!**

**Limitations:**

* You can only scroll DOWN (not back up)
* Once you pass something, you can't go back to see it
* Less features overall

**`less` Command**

The **newer, better** file viewer:

```bash
less filename.txt
```

**What you can do:**

* Press `Space` or `Page Down` - go to next page
* Press `b` or `Page Up` - go BACK up a page
* Press `Arrow keys` - move up/down line by line
* Press `/searchterm` - search for text
* Press `n` - go to next search result
* Press `N` - go to previous search result
* Press `g` - go to beginning of file
* Press `G` - go to END of file
* Press `q` - quit

**Why it's better:**

* You can scroll both up AND down
* You can search within the file
* It doesn't load the entire file into memory (great for huge files)
* Much more control

***

#### Which

The `which` command is used in Unix-like systems (Linux, macOS) **to find the full path of an executable file that would be run when you type a command**. It searches through directories listed in your system's `PATH` environment variable to locate the specified program. For example, typing `which ls` would show the path to the `ls` command's executable file.

#### Uname

* `uname -a` - Print all system information
* `uname -s` - Print kernel name
* `uname -r` - Print kernel release

***

#### Date

**Show current date and time:**

```bash
date
```

*Output: Thu Oct 2 16:45:23 AQTT 2025*

**Show date in specific format:**

```bash
date +%Y-%m-%d
```

*Output: 2025-10-02*

**Show time only:**

```bash
date +%H:%M:%S
```

*Output: 16:45:23*

***

#### CURL

Transfer data to/from servers (Client URL).

**What it does:**`curl` is a command-line tool for making HTTP/HTTPS requests. It's like a browser, but for the terminal. You can download files, interact with APIs, send data, and test web services.

**Think of it as:** A programmable web browser for the command line

Basic GET request (fetch webpage):

```bash
curl <https://example.com>
```

Download data from a URL:

```bash
curl -o dataset.json "<https://api.example.com/data?limit=1000>"
```

*Download JSON data from API and save it to dataset.json file*

Save with original filename:

```bash
curl -O <https://example.com/dataset.csv>
```

*Downloads and saves as 'dataset.csv' (keeps original name)*

**Working with APIs**

**GET request with headers:**

```bash
curl -H "Authorization: Bearer YOUR_TOKEN" \\
     -H "Accept: application/json" \\
     <https://api.example.com/users>
```

*Sends request with authentication and specifies JSON response*

**POST request with JSON data:**

```bash
curl -X POST <https://api.example.com/users> \\
     -H "Content-Type: application/json" \\
     -d '{"name":"Alice","email":"alice@example.com"}'
```

*Creates new user by sending JSON data*

**POST data from file:**

```bash
curl -X POST <https://api.example.com/upload> \\
     -H "Content-Type: application/json" \\
     -d @data.json
```

*The @ symbol reads data from file*

**PUT request (update):**

```bash
curl -X PUT <https://api.example.com/users/123> \\
     -H "Content-Type: application/json" \\
     -d '{"name":"Alice Updated"}'
```

*Updates user with ID 123*

**DELETE request:**

```bash
curl -X DELETE <https://api.example.com/users/123> \\
     -H "Authorization: Bearer TOKEN"
```

*Deletes user with ID 123*

**Why it's essential for data engineers:**

* Fetch data from APIs
* Download datasets from URLs
* Test API endpoints
* Automate data ingestion
* Monitor web services

***

#### Read

**Bash style:**

```bash
read -p "Enter database name: " db_name
```

**With timeout:**

```bash
read -t 10 -p "Continue? (yes/no): " answer
if [ -z "$answer" ]; then echo "Timeout! Proceeding with defaults"; fi
```

*Waits 10 seconds for input, if no response, continues with defaults*

**Zsh style:**

```bash
echo -n "Enter database name: "
read db_name
```

***

#### Find

`find` - Search for files based on criteria

**Example:**

```bash
find /data -name "*.csv" -size +100M -mtime -7
```

*Finds all CSV files larger than 100MB modified in the last 7 days. mtime stands for “modified time”*

**Delete old log files:**

```bash
find /var/log/app -name "*.log" -mtime +30 -delete
```

*Finds and deletes log files older than 30 days*

**Find and process files:**

```bash
find ./data -name "*.json" -exec wc -l {} \\; | awk '{sum+=$1} END {print sum}'
```

*Finds all JSON files, counts lines in each, then sums them up*

***

#### Tee

The `tee` command in Bash reads from standard input and writes to both standard output AND one or more files simultaneously. Think of it like a "T" pipe fitting in plumbing - the data flow splits in two directions.

**Syntax:**

* `tee [-ai] [file ...]`
  * `-a` Append the output to the files rather than overwriting them.&#x20;
  * `-i`      Ignore the SIGINT signal.
  * `file` A pathname of an output file.

`tee` is **almost always** used with an upstream source because its whole purpose is to duplicate data flowing through a pipeline.

**Typical usage pattern:**

````bash
upstream-command | tee file.txt | downstream-command
```

The data flow looks like:
```
upstream → tee → stdout (to screen or next command)
            ↓
          file.txt
````

**Common examples:**

```bash
# Save output to a file while still seeing it on screen
ls -la | tee directory-listing.txt

# Append instead of overwriting with -a
echo "new log entry" | tee -a logfile.txt

# Write to multiple files
echo "data" | tee file1.txt file2.txt file3.txt

# Combine with sudo to write to protected files
echo "config line" | sudo tee /etc/some-config-file

#####

# Capture build output while watching it
make | tee build.log

# Save curl response while piping to jq
curl https://api.example.com/data | tee response.json | jq '.results'

# Log script output
./my-script.sh | tee script-output.log

# Debug a pipeline by saving intermediate results
cat data.csv | process1 | tee after-process1.txt | process2 | tee final.txt
```

This is incredibly useful for logging command output while still monitoring it in real-time, or when you need to save intermediate results in a pipeline.

**Can you use `tee` without a pipe?**

Technically yes, but it's uncommon:

```bash
tee file.txt
# Then you type input manually, and it echoes to screen + saves to file
# Press Ctrl+D to end
```

**SIGINT**

SIGINT is a signal (Signal Interrupt) sent to a process, typically when you press **Ctrl+C** in the terminal. It's a request for the program to terminate gracefully. The process can catch this signal and handle it (e.g., clean up resources before exiting) or ignore it.

Common signals include:

* **SIGINT (2)**: Interrupt from keyboard (Ctrl+C)
* **SIGTERM (15)**: Termination request
* **SIGKILL (9)**: Forceful kill (cannot be caught or ignored)

**Explanation of the `-i` option:**

* **By default**: If `tee` receives a signal like SIGINT (Ctrl+C), it does what any normal program would do - it terminates immediately
* **With the `-i` option**: The `-i` flag tells `tee` to **ignore** SIGINT signals

**Why is this useful?**

Imagine you have a long-running command pipeline:

```bash
some-long-process | tee output.txt | another-process
```

If you press Ctrl+C, SIGINT goes to all processes in the pipeline. Without `-i`, `tee` would stop immediately, breaking the pipeline. With `-i`:

```bash
some-long-process | tee -i output.txt | another-process
```

Now `tee` will ignore Ctrl+C and keep running, allowing the data flow to continue even if you accidentally hit Ctrl+C or intentionally want to stop only certain parts of the pipeline.

***

#### Tar

Archive and compress files

**Create compressed backup with exclusions:**

```bash
tar -czf backup_$(date +%Y%m%d).tar.gz --exclude='*.tmp' --exclude='cache/' /home/user/project
```

*Creates compressed archive excluding temp files and cache folder, with date in filename*

**Extract to specific directory:**

```bash
tar -xzf data_archive.tar.gz -C /destination/folder
```

*Extracts compressed archive to a specific location*

**List contents without extracting:**

```bash
tar -tzf backup.tar.gz | grep "*.csv"
```

*Shows only CSV files inside the archive without extracting*

***

#### `rsync` - Remote/local file synchronization tool

**What it does:**`rsync` is a file copying/syncing tool that only transfers the differences between source and destination. It's much smarter and faster than regular `cp` command, especially for large files or when syncing repeatedly.

**Think of it as:** Smart copy that only updates what changed

**Key advantages over `cp`:**

* Only copies changed files (not everything)
* Can resume interrupted transfers
* Shows progress
* Works over network (SSH)
* Preserves permissions, timestamps, ownership
* Can delete files in destination that don't exist in source

**Basic syntax:**

```bash
rsync [options] source/ destination/
```

**Important note about trailing slashes:**

```bash
rsync source/ dest/     # Copies CONTENTS of source into dest
rsync source dest/      # Copies source FOLDER itself into dest
```

**Common Examples**

**Simple local sync:**

```bash
rsync -av /home/data/ /backup/data/
```

*Syncs data folder to backup (archive mode, verbose)*

**Sync with progress bar:**

```bash
rsync -avh --progress /large/dataset/ /backup/
```

*Shows progress, human-readable sizes*

**Sync to remote server:**

```bash
rsync -avz user@server:/remote/data/ /local/backup/
```

*Syncs from remote server to local machine (with compression)*

***

#### `Zip` / `unzip`

Compress and extract zip files

**Create zip with password protection:**

```bash
zip -r -e secure_data.zip sensitive_files/
```

*Creates encrypted zip file (will prompt for password)*

**Zip multiple directories:**

```bash
zip -r archive.zip folder1/ folder2/ folder3/
```

*Combines multiple folders into one zip file*

**Unzip to specific directory:**

```bash
unzip data.zip -d /destination/folder
```

*Extracts zip contents to specific location*

**List contents without extracting:**

```bash
unzip -l archive.zip | grep "*.csv"
```

**Unzip specific file:**

```bash
unzip archive.zip "data/important.csv" -d ./
```

*Extracts only one specific file from the zip*

***

#### `Gzip` and `gunzip` - Compress and decompress files

**What they do:**

* `gzip` compresses files (makes them smaller)
* `gunzip` decompresses files (restores original)

**Think of it as:** ZIP files for Linux (but only for single files)

**File extension:** `.gz`

**`gzip` - Compress files**

**Basic syntax:**

```bash
gzip filename
```

**What happens:**

* Original file gets compressed
* Creates `filename.gz`
* **Original file is DELETED** (replaced with compressed version)

**Simple Examples**

**Compress a file:**

```bash
gzip data.csv
```

*Creates data.csv.gz, deletes data.csv*

**Keep original file:**

```bash
gzip -k data.csv
```

*Creates data.csv.gz, KEEPS data.csv*

**Compress multiple files:**

```bash
gzip file1.txt file2.txt file3.txt
```

*Each file becomes file1.txt.gz, file2.txt.gz, file3.txt.gz*

**`gunzip` - Decompress files**

**Basic syntax:**

```bash
gunzip filename.gz
```

**What happens:**

* Compressed file gets decompressed
* Creates original filename
* **Compressed file is DELETED**

**Simple Examples**

**Decompress a file:**

```bash
gunzip data.csv.gz
```

*Creates data.csv, deletes data.csv.gz*

**Keep compressed file:**

```bash
gunzip -k data.csv.gz
```

***

#### `Top`

Real-time system monitoring

**Basic usage:**

```bash
top
```

*Shows live view of processes, CPU, memory usage*

**Once inside `top`:**

* Press `M` - sort by memory usag
* Press `P` - sort by CPU usage
* Press `k` - kill a process (then enter PID)
* Press `q` - quit

**Run top in batch mode (for logging):**

```bash
top -b -n 1 | head -20 > system_snapshot.txt
```

*Takes one snapshot of system state and saves top 20 lines to fil*

**Monitor specific user's processes:**

```bash
top -u username
```

*Shows only processes belonging to specific user*

**Show only specific number of processes:**

```bash
top -n 1 -b | head -15
```

*Shows top 15 processes once (useful for scripts)*

**Alternative: `htop` (more user-friendly if installed):**

```bash
htop
```

*Interactive, colorful, easier to use than top*

***

#### `Awk` - Pattern scanning and text processing tool

**What it does:**`awk` is a powerful programming language designed for processing text files, especially structured data like CSV files. It works by reading files line-by-line and letting you perform operations on specific columns (fields).

**Think of it as:** Excel formulas for the command line

**Best for:**

* Extracting specific columns from CSV/tab-delimited files
* Performing calculations on data (sum, average, count)
* Filtering rows based on conditions
* Reformatting structured data

**Simple example:**

```bash
awk -F',' '{print $1, $3}' data.csv
```

*Prints columns 1 and 3 from a CSV file*

**More complex example:**

```bash
awk -F',' '$3 > 100 {sum += $4; count++} END {print "Average:", sum/count}' sales.csv
```

*For rows where column 3 > 100, calculate the average of column 4*

**How it works:**

* `F','` = field separator is comma (for CSV files)
* `$1, $2, $3` = column 1, column 2, column 3
* `$0` = entire line
* You can use conditions, loops, and calculations

***

#### `Sed` - Stream editor for find/replace and text transformation

**What it does:**`sed` is a tool for editing text in a stream (line by line). It's most commonly used for find-and-replace operations, but can also delete lines, insert text, and transform data.

**Think of it as:** Find and Replace on steroids

**Best for:**

* Finding and replacing text in files
* Deleting specific lines
* Extracting specific line ranges
* Modifying text without opening an editor

**Simple example:**

```bash
sed 's/old/new/g' file.txt
```

*Replaces all occurrences of "old" with "new"*

**More complex example:**

```bash
sed -E 's/([0-9]{4})-([0-9]{2})-([0-9]{2})/\\3\\/\\2\\/\\1/g' dates.txt
```

*Converts date format from YYYY-MM-DD to DD/MM/YYYY*

**Common operations:**

* `s/find/replace/g` = substitute (find and replace)
* `/pattern/d` = delete lines matching pattern
* `10,20d` = delete lines 10-20
* `i` = edit file in-place (modify the actual file)

***

#### Time

**What it does:**`time` measures how long a command takes to run. It shows three different time measurements.

**Basic usage:**

```bash
time command
```

**Example:**

```bash
time ls -R /
```

**Output explanation:**

```bash
real    0m2.456s    # Total elapsed time (wall clock)
user    0m1.234s    # CPU time spent in user mode
sys     0m0.890s    # CPU time spent in system/kernel mode
```

* **real** = actual time that passed (what you'd see on a stopwatch)
* **user** = time CPU spent running your program
* **sys** = time CPU spent on system operations (file I/O, etc.)

**Real-world examples:**

```bash
# Measure script execution
time python data_processing.py

# Compare performance of commands
time grep "error" huge.log
time awk '/error/' huge.log

# Measure data pipeline
time ./etl_pipeline.sh
```

**Save timing to variable:**

```bash
start=$(date +%s)
python script.py
end=$(date +%s)
echo "Took $((end - start)) seconds"
```

***

#### Diff

**Common Uses**

**Compare two files:**

```bash
diff old_config.txt new_config.txt
```

**Side-by-side comparison:**

```bash
diff -y file1.txt file2.txt
```

*Shows files next to each other*

**Unified format (like Git):**

```bash
diff -u original.py modified.py
```

*Shows context around changes*

**Ignore whitespace differences:**

```bash
diff -w file1.txt file2.txt
```

**Compare directories:**

```bash
diff -r dir1/ dir2/
```

*Shows which files are different*

**Brief output (just show which files differ):**

```bash
diff -q dir1/ dir2/
```

**Colorized output:**

```bash
diff --color file1.txt file2.txt
```

**Understanding diff Output**

**Format: `<line_number><action><line_number>`**

* `a` = add
* `c` = change
* `d` = delete

```bash
2c2       # Line 2 changed
< old     # < means from first file
---       # separator
> new     # > means from second file
```

***

#### Grep

**`grep`** - Search text using patterns

Basic search

```bash

grep "pattern" file.txt
```

Case-insensitive search

```bash
grep -i "pattern" file.txt
```

Search recursively in directories

```bash
grep -r "pattern" /path/to/directory
```

Show line numbers

```bash
grep -n "pattern" file.txt
```

Invert match (show lines that don't match)

```bash
grep -v "pattern" file.txt
```

You can also search multiple files at once. For example, if we wanted to search for the word "hello" in `hello.txt` and `hello2.txt`, we could run:

```bash
grep "hello" hello.txt hello2.txt
```

**Recursive Search**

You can also search an entire directory, including all subdirectories. For example, to search for the word "hello" in the current directory and all subdirectories:

```bash
grep -r "hello" .
```

*The `.` is a special alias for the current directory.*

***

#### Sort

The `sort` command in Linux/Unix is used to sort lines of text files or input in various ways.

Basic Usage

```bash
sort filename          # Sort file alphabetically
sort file1 file2       # Sort multiple files together
cat file | sort        # Sort output from another command
```

**Common Options**

**Sort Order:**

```bash
sort file              # Ascending order (default)
sort -r file           # Reverse order (descending)
```

**Numeric Sorting:**

```bash
sort -n file           # Sort numerically (10 comes after 2)
sort -h file           # Human-readable numbers (1K, 2M, 3G)
sort -g file           # General numeric (handles scientific notation)
```

**Case Sensitivity:**

```bash
sort -f file           # Case-insensitive (fold case)
sort file              # Case-sensitive (uppercase first by default)
```

**Unique Values:**

```bash
sort -u file           # Sort and remove duplicates
```

**Real-World Use Cases**

**Find top 10 largest files:**

```bash
du -h * | sort -hr | head -10
```

**Sort log entries by timestamp:**

```bash
sort -k 1,2 access.log
```

**Get unique IP addresses from logs:**

```bash
cat access.log | awk '{print $1}' | sort -u
```

**Sort processes by memory usage:**

```bash
ps aux | sort -k 4 -rn | head -10
```

***

#### Uniq

The `uniq` command filters out or reports repeated lines. **Important**: It only detects adjacent duplicates, so the input usually needs to be sorted first.

**Basic Syntax**

```bash
uniq [OPTION] [INPUT [OUTPUT]]
```

**Basic Usage**

```bash
uniq file.txt              # Remove adjacent duplicate lines
sort file.txt | uniq       # Remove all duplicates (sort first!)
```

**Common Options**

**Count occurrences:**

```bash
uniq -c file.txt           # Prefix lines with occurrence count
sort file.txt | uniq -c    # Count all duplicates
```

**Show only duplicates:**

```bash
uniq -d file.txt           # Show only duplicate lines
uniq -D file.txt           # Show all duplicate lines (not just one)
```

**Show only unique lines:**

```bash
uniq -u file.txt           # Show only lines that appear once
```

**Ignore case:**

```bash
uniq -i file.txt           # Case-insensitive comparison
```

***

#### Cut

The `cut` command extracts sections from each line of files - great for working with columnar data.

**Basic Syntax**

```bash
cut OPTION [FILE]
```

**Cutting by Characters**

```bash
cut -c 1-5 file.txt        # Extract characters 1-5
cut -c 1,3,5 file.txt      # Extract characters 1, 3, and 5
cut -c 5- file.txt         # Extract from character 5 to end
cut -c -10 file.txt        # Extract first 10 characters
```

**Cutting by Fields (Columns)**

```bash
cut -f 1 file.txt          # Extract 1st field (tab-delimited by default)
cut -f 1,3 file.txt        # Extract 1st and 3rd fields
cut -f 2-4 file.txt        # Extract fields 2 through 4
cut -f 3- file.txt         # Extract from field 3 to end
```

**Custom Delimiters**

```bash
cut -d ',' -f 2 data.csv   # Use comma as delimiter, get 2nd field
cut -d ':' -f 1 /etc/passwd    # Extract usernames (colon-delimited)
cut -d ' ' -f 1,3 file.txt     # Use space as delimiter
```

**Practical Examples**

```bash
# Get usernames from /etc/passwd
cut -d ':' -f 1 /etc/passwd

# Extract email domain
echo "user@example.com" | cut -d '@' -f 2

# Get second column from CSV
cut -d ',' -f 2 employees.csv

# Get filename without extension
echo "file.txt" | cut -d '.' -f 1

# Extract IP addresses (first 3 columns of dot-separated)
cut -d '.' -f 1-3 ips.txt
```

***

#### `jq` - JSON processor and query tool

**What it does:**`jq` is like `grep`, `sed`, and `awk` combined, but specifically for JSON data. It lets you parse, filter, transform, and extract data from JSON files or API responses.

**Think of it as:** SQL queries for JSON

**Why it's essential for data engineering:**

* Most APIs return JSON
* Modern logs are often in JSON format
* Easy to extract specific fields from complex JSON

**Basic syntax:**

```bash
jq 'filter' file.json
```

**Common Examples**

**Pretty print JSON:**

```bash
echo '{"name":"John","age":30}' | jq '.'
```

*Makes JSON readable with proper indentation*

**Extract a specific field:**

```bash
jq '.name' user.json
```

*Output: "John"*

**Extract nested field:**

```bash
jq '.user.address.city' data.json
```

*Gets city from nested structure*

**Extract from array:**

```bash
jq '.[0].name' users.json
```

*Gets name from first item in array*

**Extract multiple fields:**

```bash
jq '.name, .age' user.json
```

*Output: "John" and 30 on separate lines*

**Filter array based on condition:**

```bash
jq '.[] | select(.age > 25)' users.json
```

*Shows only users older than 25*

**Create new JSON structure:**

```bash
jq '{username: .name, user_age: .age}' user.json
```

*Transforms JSON with new field names*

**Extract to CSV:**

```bash
jq -r '.[] | [.id, .name, .email] | @csv' users.json
```

*Converts JSON array to CSV format*

**Count items in array:**

```bash
jq '. | length' array.json
```

*Returns number of items*

**Get all values of a specific field:**

```bash
jq '.[].name' users.json
```

*Extracts all names from array of users*

**Filter and transform:**

```bash
jq '[.[] | select(.status == "active") | {id, name}]' users.json
```

*Gets only active users, shows only id and name fields*

**Real-World Data Engineering Example**

1. **Extract data from API response:**

```bash
curl -s '<https://api.example.com/users>' | jq '.data[].email'
```

*Fetches API data and extracts all emails*

1. **Convert JSON logs to CSV:**

```bash
cat logs.json | jq -r '[.timestamp, .level, .message] | @csv' > logs.csv
```

1. **Filter error logs:**

```bash
jq 'select(.level == "ERROR")' app.log
```

1. **Count errors per day:**

```bash
jq -r '.timestamp' errors.json | cut -d'T' -f1 | sort | uniq -c
```

1. **Extract nested data:**

```bash
jq '.results[] | {user: .user.name, score: .metrics.score}' data.json
```

1. **Combine multiple JSON files:**

```bash
jq -s '.' file1.json file2.json > combined.json
```

*The `-s` flag slurps all files into one array*

***

#### du

Disk Usage (check how much space files/folders use)

**What it does:** `du` shows how much disk space files and directories are using. It's essential for finding what's eating up your storage.

**Think of it as:** A disk space analyzer for the command line

**Simple Examples**

**Check size of current directory:**

```bash
du
```

*Shows size of current directory and all subdirectories (in kilobytes)*

**Check size of specific folder:**

```bash
du /home/user/data
```

**Human-readable sizes:**

```bash
du -h
```

*Shows sizes as 1K, 234M, 2G instead of kilobytes*

**Summary only (total size):**

```bash
du -sh folder_name
```

*Shows just one line with total size*

```bash
2.3G    folder_name
```

**Check multiple folders:**

```bash
du -sh folder1 folder2 folder3
```

```bash
1.2G    folder1
500M    folder2
3.4G    folder3
```

**Find largest datasets:**

```bash
du -sh /data/* | sort -hr
```

*Shows all folders in /data sorted by size*

**Check database size:**

```bash
du -sh /var/lib/postgresql/
```

**`du` Common Flags**

| Flag            | What it does                          |
| --------------- | ------------------------------------- |
| `-h`            | Human-readable (KB, MB, GB)           |
| `-s`            | Summary only (total)                  |
| `-a`            | Show all files (not just directories) |
| `-c`            | Show grand total at end               |
| `-d N`          | Max depth of N levels                 |
| `--max-depth=N` | Same as -d N                          |
| `-k`            | Show in kilobytes                     |
| `-m`            | Show in megabytes                     |

***

#### history

View history of previously run commands.

***

#### `ln` - Create links (shortcuts to files)

**What it does:** Creates links to files or directories. There are two types: **hard links** and **symbolic (soft) links**.

**Think of it as:** Creating shortcuts or aliases to files

**Symbolic Links (Soft Links) - Most Common**

**Create a symbolic link:**

```bash
ln -s /path/to/original /path/to/link
```

**Example:**

```bash
ln -s /home/user/documents/report.txt report_link.txt
```

*Creates a shortcut called report\_link.txt that points to the original file*

**Link to directory:**

```bash
ln -s /data/projects/current /home/user/current_project
```

**Check if it's a link:**

```bash
ls -l report_link.txt
```

*Output shows: lrwxr-xr-x ... report\_link.txt -> /home/user/documents/report.txt*

**Create shortcut to frequently used directory:**

```bash
ln -s /var/log/application ~/logs
cd ~/logs  # Now you can easily access logs
```

***

#### `su` - Switch User

**What it does:** Switches to another user account. Stands for "substitute user" or "switch user".

**Basic syntax:**

```bash
su username
```

**Switch to root:**

```bash
su
```

*Prompts for root password*

**Switch to root with root's environment:**

```bash
su -
```

*The dash (-) loads root's environment variables and home directory*

**Switch to specific user:**

```bash
su bob
```

*Prompts for bob's password*

**Exit back to your user:**

```bash
exit
```

***

#### `sudo` - Execute command as another user (usually root)

**What it does:** Runs a single command with elevated privileges (usually as root). Stands for "superuser do".

**Think of it as:** Temporary admin powers for one command

**Basic syntax:**

```bash
sudo command
```

**Example:**

```bash
sudo apt update
```

*Runs apt update as root*

**Common Uses**

**Install software:**

```bash
sudo apt install python3
```

**Edit system files:**

```bash
sudo nano /etc/hosts
```

**View protected files:**

```bash
sudo cat /var/log/auth.log
```

***

### Operators

#### Arithmetic Operators

* `+` - Addition
* `-` - Subtraction
* `*` - Multiplication
* `/` - Division
* `%` - Modulus (remainder)
* `*` - Exponentiation

Example:

```bash
result=$((5 + 3))
echo $result  # Outputs: 8
```

#### Comparison Operators

**For numeric comparisons**

```bash
eq # Equal to
-ne # Not equal to
-gt # Greater than
-lt # Less than
-ge # Greater than or equal to
-le # Less than or equal to
```

**For string comparisons**

```bash
==    # Equal to
!=    # Not equal to
<     # Less than (ASCII alphabetical order)
>     # Greater than (ASCII alphabetical order)
-z    # String is null (zero length)
-n    # String is not null
```

**Example:**

```bash
if [ "$a" -eq "$b" ]; then
    echo "a is equal to b"
fi

if [ "$str1" == "$str2" ]; then
    echo "Strings are equal"
fi
```

#### Logical Operators

```bash
&&    # AND
||    # OR
!     # NOT
```

Example:

```bash
if [ "$a" -gt 0 ] && [ "$a" -lt 10 ]; then
    echo "a is between 0 and 10"
fi
```

`&&` **- Run next command ONLY if previous succeeds**

**Note:** && operator signifies **conditional execution.** The core function of `&&` is to create a dependency between commands. The command to the right of `&&` will only run if the command to its left exits with a status of `0`. In Bash, an exit status of `0` conventionally signifies success, while any non-zero exit status indicates failure.

```bash
mkdir my_new_directory && cd my_new_directory
```

If `mkdir` fails (e.g., the directory already exists), `cd` will not be attempted.

**Short-circuiting:** The `&&` operator exhibits "short-circuiting" behavior. If the first command fails, Bash immediately stops evaluating the expression and does not execute the subsequent commands linked by `&&`. This is efficient as it avoids unnecessary operations.

**`||` - Run next command ONLY if previous fails**

**Syntax:**

```bash
command1 || command2
```

**Behavior:**

* command2 runs ONLY if command1 exits with non-zero (failure)
* If command1 succeeds, command2 never runs

**Examples:**

```bash
# Try to use preferred command, fallback to alternative
command -v python3 || echo "Python3 not found"

# Create directory or print error
mkdir mydir || echo "Failed to create directory"

# Try multiple alternatives
ping google.com || ping 8.8.8.8 || echo "No internet"
```

**`;` (Semicolon) - Run next command REGARDLESS**

**Syntax:**

```bash
command1 ; command2
```

**Behavior:**

* command2 runs no matter what
* Doesn't care if command1 succeeded or failed

**Examples:**

```bash
# Run both no matter what
cd /somewhere ; ls

# Execute sequence
mkdir temp ; cd temp ; touch file.txt
```

**`!` (NOT) - Negate exit status**

**Syntax:**

```bash
! command
```

**Examples:**

```bash
# Run if file does NOT exist
! [ -f myfile.txt ] && touch myfile.txt

# Invert grep result
! grep "error" log.txt && echo "No errors found"
```

**Combining Operators**

**AND then OR:**

```bash
command1 && command2 || command3
```

*If command1 succeeds, run command2; if either fails, run command3*

**Example:**

```bash
mkdir mydir && cd mydir || echo "Failed to create/enter directory"
```

**Grouping with parentheses:**

```bash
(command1 && command2) || command3
```

**Real-World Examples**

**Safe script execution:**

```bash
#!/bin/bash
# Exit if any command fails
cd /data/projects || exit 1
python script.py || exit 1
echo "Success!"
```

***

### Permissions

Each file and directory in Unix systems has **permissions** associated with them.

You have to ask 2 questions when talking about permissions:

1. **Who has the permissions?**
2. **What permissions do they have?**
   1. Any user accessing a specific file/directory may or may not have access to **read it, write to it, or execute it**.

Both permissions, i.e. **who** and **what** are represented by a 10-character string. Here are examples for each type of file:

* **Regular Files**

```bash
rw-r--r-- # Regular file, owner: read/write, group/others: read-only
-rwxr-xr-x # Executable file, owner: all permissions, group/others: read/execute
-rw------- # Private file, only owner can read/write
-rwxrwxrwx # Full permissions for everyone (rarely used)
-r--r--r-- # Read-only for everyone
```

* **Directories**

```bash
drwxr-xr-x   # Standard directory, owner: full access, others: read/enter
drwx------   # Private directory, only owner can access
drwxrwxrwx   # Public directory, everyone has full access
dr-xr-xr-x   # Directory without write permission for owner
```

* **Special Files**

```bash
lrwxrwxrwx   # Symbolic link (permissions shown are for the link itself)
crw-rw-rw-   # Character device file
brw-r-----   # Block device file
srwxrwxrwx   # Socket file
prw-r--r--   # Named pipe (FIFO)
```

**What do these characters mean?**

* 1 character is always either `-` or `d` , so that a user recognizes if it’s a directory or not.

&#x20;

**Regular file** (e.g. `-rwxrwxrwx`)

**Directory** (e.g. `drwxrwxrwx`)

* The next 3 characters `r,w,x` represent the three permissions - read, write, execute. Who are they apply to? Usually the owner, i.e. the one who created the file, or else if changed afterwards manually.
  * Each permissions has a state: granted or not granted. If it’s granted, there is a letter present and `-` if not. Example: `r-x` means owner can read and execute but not write. owner can
* Finally, the next 6 characters are another 2 sets of `rwx`. The second set of `rwx` applies to the group instead of the owner. And the last set applies to everyone else.

***

### Changing permissions

For more information: <https://www.stationx.net/linux-file-permissions-cheat-sheet/#def-per>

#### **chmod** command (stands for "change mode”)

**Example:** chmod -R u=rwx,g=,o= DIRECTORY. This means:

* The owner can read, write, and execute
* The group can do nothing
* Others can do nothing

In the command above, `u` means "user" (aka "owner"), `g` means "group", and `o` means "others". The `=` means "set the permissions to the following", and the `rwx` means "read, write and execute". The `g=` and `o=` mean "set group and other permissions to nothing". The `-R` means "recursively", which means "do this to all of the contents of the directory as well".

*Remember, `.` is a special alias for the current directory.*

There is symbolic and numeric notations for permission definition:

**Symbolic notation:**

```bash
chmod u+x filename      # Add execute for owner
chmod g-w filename      # Remove write for group 
chmod o=r filename      # Set other to read-only
chmod a+r filename      # Add read for all
```

* `u` = user/owner
* `g` = group
* `o` = other
* `a` = all (user + group + other)

**Numeric notation:**

```bash
chmod 755 filename      # rwxr-xr-x
chmod 644 filename      # rw-r--r--
chmod 600 filename      # rw-------
```

* First digit = owner permissions (instead of the first three letters)
* Second digit = group permissions (instead of the second three letters)
* Third digit = other permissions (instead of the third three letters)

So `chmod 755 file` means:

* `7` (rwx) for owner
* `5` (r-x) for group
* `5` (r-x) for other

#### Common Permission Patterns

* `755`: Executable files (owner can do everything, others can read/execute)
* `644`: Regular files (owner can read/write, others read-only)
* `600`: Private files (only owner can read/write)
* `777`: Full access for everyone (generally avoided for security)

***

#### chown (stands for “change owner”)

**What it does:**`chown` changes who owns a file or directory. In Unix/Linux systems, every file has an **owner** (user) and a **group**. This command lets you change either or both.

**Think of it as:** Transferring ownership of files to different users

**Why it matters:**

* Control who can access/modify files
* Fix permission issues
* Set up proper access for web servers, databases, etc.
* Essential for multi-user systems and servers

**Basic Syntax**

```bash
chown user:group filename
```

or

```bash
chown user filename          # Change only owner
chown :group filename        # Change only group
chown user:group filename    # Change both
```

***

### Running scripts

#### Creating a script

A ["shebang"](https://en.wikipedia.org/wiki/Shebang_\(Unix\)) is a special line at the top of a script that tells your shell which program to use to execute the file. It’s a shell interpreter.

The format of a shebang is:

```bash
#! interpreter [optional-arg]
```

For example, if your script is a Python script and you want to use Python 3, your shebang might look like this:

```bash
#!/usr/bin/python3
```

This tells the system to use the Python 3 interpreter located at `/usr/bin/python3` to run the script.

If you're writing scripts that need to work on bash and zsh shells, use the portable version:

```bash
#!/bin/bash    # Or #!/bin/zsh
```

#### Running a script

If the program is in the current directory, you need to prefix it with `./` to run it:

* `./program.sh`

There is also an option to run a script that can’t be closed by `ctrl+c` combination. You can run it by adding force after your script name, e.g.: `./program.sh force`

<details>

<summary><strong>Here are the main ways to run a bash script:</strong></summary>

**Directly executing it**:

```
./script.sh
```

Requires the file to have execute permissions (`chmod +x script.sh`).

**Calling the interpreter explicitly** — no execute permission needed:

```
bash script.sh
sh script.sh
```

**Source it** (runs in the *current* shell, so variables/functions persist in your session):

```
source script.sh
. script.sh       # shorthand, same thing
```

**With an absolute or relative path:**

```
/full/path/to/script.sh
../scripts/script.sh
```

**As a login/interactive shell argument:**

```
bash -c "commands here"   # run a string as a script
```

The key differences to remember: `bash script.sh` spawns a subshell and doesn't need execute permissions, while `source`/`.` runs the script inline in your current shell environment — useful when a script sets environment variables you want to keep.

</details>

#### Writing Professional Bash Script Headers

When creating Bash scripts for professional or collaborative environments, including a well-structured header makes your code more maintainable and easier to understand. Here's how to document your scripts effectively.

**Header Placement**

Place your documentation header immediately after the shebang line (`#!/bin/bash`) and before any executable code. Use the `#` symbol to create comments that won't be executed.

**Essential Header Information**

A professional script header should include these five key pieces of information:

1. **Author** - Who wrote the script (name or username)
2. **Creation Date** - When the script was originally created
3. **Last Modified** - When the script was last updated
4. **Description** - A brief explanation of what the script does
5. **Usage** - How to run the script, including any arguments or flags

**Why This Matters**

Including these details helps anyone who encounters your script (including your future self) quickly understand:

* Its purpose and functionality
* Who to contact with questions
* Whether it's current or potentially outdated
* How to execute it correctly

**Example Header**

```bash
#!/bin/bash
#
# Author: Jane Smith
# Created: 2025-01-15
# Last Modified: 2025-03-20
# Description: Backup script for user data directories
# Usage: ./backup.sh [--full|--incremental] <destination>

# Script begins here
SOURCE_DIR="/home/users"
LOG_FILE="/var/log/backup.log"

```

**Additional Considerations**

For more complex scripts, you might also include:

* **Version number** for tracking script evolution
* **Dependencies** listing required tools or packages
* **License information** for shared or open-source code
* **Contact information** such as email or support channels

Adopting this convention from the start establishes good habits and makes your scripts production-ready.

***

### Shell configuration

Bash and Zsh both have [configuration files](https://en.wikipedia.org/wiki/Unix_shell#Configuration_files) that **run automatically each time you start a new shell** session. These files are used to set up your shell environment. They can be used to set up aliases, functions, and environment variables.

These files are located in your home directory (`~`) and are hidden by default. The `ls` command has a `-a` flag that will show hidden files:

```bash
ls -a ~
```

* If you're using Bash, `.bashrc` is probably the file you want to edit.
* If you're using Zsh, `.zshrc` is probably the file you want to edit or create if it doesn't yet exist.

***

### Environment variables

Apart from regular variables, there is another type of variable called an [environment variable](https://en.wikipedia.org/wiki/Environment_variable). They are available to *all* programs that you run in your shell.

You can view all of the environment variables that are currently set in your shell with the `env` command.

To set a variable in your shell, use the `export` command:

```bash
export NAME="Nariman" # declare a variable
echo $NAME # and use it
```

***

What's particularly useful is that any **programs or** **scripts you execute in your shell will inherit access to these environment variables.**

To demonstrate this, let's create a simple script file named `greet.sh`:

```bash
#!/bin/bash
echo "Hello, my name is $NAME"
```

Now we can make it executable and run it:

```bash
chmod +x greet.sh
./greet.sh
# Hello, my name is Nariman
```

***

You can also temporarily **set a variable for a single command, instead of exporting it** (exporting means the variable will persist until you close the shell).

For example:

```bash
WARN_MESSAGE="this works too" ./warn.sh
```

***

Your shell comes with several environment variables that are essentially "standard" - meaning various programs and system components recognize and utilize them automatically. The `PATH` variable is a prime example of this.

#### Why is the `PATH` Variable Important?

Without the `PATH` variable, you'd need to specify the complete filesystem location for every command you want to execute. Rather than simply typing `ls`, you'd be forced to type `/bin/ls` (or wherever the `ls` program lives on your particular system). This would be extremely tedious.

The `PATH` variable contains a collection of directory paths that your shell searches through whenever you enter a command. When you type `ls`, your shell examines each directory listed in `PATH` looking for an executable file named `ls`. Once found, it executes that program. If no matching executable is discovered, you'll receive a "command not found" error.

You can view your **current `PATH` setting** with this command:

```bash
echo $PATH
```

This will display a long string of directory paths separated by colons (`:`). Each path represents a location where your shell searches for executable programs.

*Note: Restarting your shell session will reset the PATH variable to its default.*

**Adding a directory to PATH**

To add a directory to your `PATH` without overwriting all of the existing directories, use the `export` command and reference the existing `PATH` variable:

```bash
export PATH="$PATH:/path/to/new"
```

As you know, this is a temporary change until your session is closed, therefore you won’t be able to use executables from anywhere.

**Permanently adding a directory to PATH**

The most common way to do this is to add the same `export` command that you used in the last lesson to your shell's configuration file.

***

### Man command

The [man](https://www.ibm.com/docs/en/aix/7.3?topic=m-man-command) command is short for "manual". It's a program that displays the manual for other programs.

The `man` command functions only with programs that have documentation available in the manual system, though fortunately this includes most shell built-ins and standard Unix utilities. To use it, simply provide the command name as an argument. The logical starting point is to examine the manual for the manual system itself:

```bash
# open the man pages for the 'man' command
man man
```

How to search for what you need:

```bash
man ls
# type '/-r' to start searching

# press 'n' to jump to the next result

# press 'N' to go back if you went too far
```

***

### Command flag conventions

The availability and nature of command flags depends entirely on how each program's developer designed it. However, most Unix commands follow established patterns:

* Single-letter flags use one dash as a prefix (e.g., `v`)
* Word-based flags use two dashes as a prefix (e.g., `-version`)
* Many commands offer both short and long versions of the same option (e.g., `v` and `-version`)

#### Help flag

Standard practice among mature command-line applications is to include a "help" feature that displays usage instructions. This assistance is typically accessible through one of these methods:

* `-help` (long flag format)
* `h` (short flag format)
* `help` (as the initial argument)

The help output tends to be more digestible than comprehensive `man` documentation. Rather than serving as exhaustive reference material, it functions more like a concise getting-started tutorial.

***

### Nano editor

* `Ctrl+O` to save the file (confirm any prompts with "enter")
* `Ctrl+X` to exit the editor.

There should be a list of commands at the bottom of the screen.

***

### **Program Exit Codes**

Exit codes (also known as "return codes" or "status codes") serve as a communication mechanism for programs to indicate whether their execution completed successfully.

A program returns `0` to signal successful completion. All other exit codes indicate some form of failure or error condition. In most cases when something goes wrong, you'll see exit code `1`, which serves as a general-purpose error indicator.

These exit codes enable programs to monitor and respond to the success or failure of other programs they execute. For instance, at Boot.dev, our monitoring system checks the exit code of our server application - if it terminates with a non-zero code, our monitoring automatically restarts the service and records the failure for investigation.

Within your shell environment, you can examine the exit code from the most recently executed command using the special variable `$?`. Here are some practical examples:

```bash
ls ~
echo $?
# 0
```

```bash
ls /invalid/directory/path
echo $?
# non-zero value (specific number varies by system)
```

***

### Standard output (stdout), Standard error (stderr), Standard input (stdin)

#### **Redirecting Streams**

You can redirect stdout and stderr to different places using the `>` and `2>` operators. `>` redirects stdout, and `2>` redirects stderr.

**Capturing Standard Output to a File**

```bash
date > current_time.txt
cat current_time.txt
# Wed Sep 24 14:30:15 UTC 2025
```

Note: use >> to append to a file instead of rewriting it.

**Capturing Error Output to a File**

```bash
ls /nonexistent/path 2> errors.log
cat errors.log
# ls: cannot access '/nonexistent/path': No such file or directory
```

In this demonstration, `ls` is used to deliberately trigger an error message (attempting to list a directory that doesn't exist), and this error output gets redirected into `errors.log`.

#### Standard input

Since we have standard *output*, it makes sense that there would also be standard *input*, correct?

"Standard Input," commonly referred to as "stdin," represents the default source from which programs *receive* their input data. It functions as a data stream that applications can consume during their execution.

*Note: The `read` command prompts for and accepts user input from stdin (standard input).*

***

### Piping

Among the shell's most elegant features is the ability to chain programs together by sending one program's output directly into another program's input. This single mechanism enables remarkably sophisticated automation workflows.

#### **The Pipe Operator**

The pipe symbol is `|` - a vertical line character typically found on the same key as the backslash (`\\`) above your enter key. This operator captures the stdout from the command on its left side and feeds it as stdin to the command on its right side.

```bash
echo "I find your lack of faith disturbing" | wc -w
# 7
```

In this demonstration, the `echo` command produces the text "I find your lack of faith disturbing" as its output. Rather than displaying this text in your terminal, the pipe operator redirects it to the `wc` (word count) utility. The `wc` program tallies the words in whatever input it receives, and the `-w` flag instructs it to report only the word count.

This functionality works because `wc`, like most command-line utilities, can accept input from stdin as an alternative to reading from a file path.

***

### Xargs

`xargs` is a powerful bash command that builds and executes commands from standard input. It's particularly useful for handling situations where you need to pass a large number of arguments to a command, or when you want to convert input into arguments for another command.

**Basic Concept**

`xargs` reads items from standard input (separated by spaces or newlines) and passes them as arguments to another command. Think of it as a bridge that converts input lines into command arguments.

***Simple Examples***

```bash
# Find all .txt files and delete them
find . -name "*.txt" | xargs rm

# The above is roughly equivalent to running:
# rm file1.txt file2.txt file3.txt ...
```

```bash
# Count lines in multiple files
ls *.log | xargs wc -l
```

***

### Interrupt and Kill

#### Interrupt

Occasionally, a running program will become unresponsive or you'll need to terminate it. This typically happens when:

* The command contains an error and isn't behaving as expected
* The program is attempting network operations while you're offline
* You're processing large datasets and decide not to wait for completion
* A software defect is causing the application to freeze

When you encounter these situations, you can terminate the program using `ctrl + c`. This keyboard combination sends a "SIGINT" (interrupt signal) to the running process, instructing it to terminate gracefully.

#### Kill

Occasionally, a program becomes *completely* unresponsive (or behaves maliciously) and ignores the `SIGINT` signal entirely. When this occurs, your best approach is to open a separate shell session (another terminal window) and forcibly terminate the problematic process.

**Command Format**

```bash
kill <PID>
```

`PID` represents "process ID" - a unique numerical identifier assigned to every running process on your system. To discover the process IDs currently active on your machine, you can use the `ps` ("process status") command:

```bash
ps aux
```

The "aux" flags specify "display all processes, including those belonging to other users, with detailed information for each process".

***

### More about Scripting

#### Positional arguments

Positional arguments allow your script to accept input from the command line when executed.

**Basic Positional Parameters**

When you run a script like `./script.sh arg1 arg2 arg3`, Bash automatically assigns these values to special variables:

* `$0` : The script name itself
* `$1` : First argument
* `$2` : Second argument
* `$3` : Third argument
* ... and so on up to `$9`
* `${10}` : Tenth argument and beyond (use braces)

**Example:**

```bash
#!/bin/bash
# Usage: ./greet.sh John 25

echo "Script name: $0"
echo "Name: $1"
echo "Age: $2"
```

Running `./greet.sh Alice 30` outputs:

```bash
Script name: ./greet.sh
Name: Alice
Age: 30
```

#### Special Parameter Variables

* `$#` : Number of arguments passed to the script
* `$@` : All arguments as separate words
* `$*` : All arguments as a single word
* `$?` : Exit status of the last command
* `$$` : Process ID of the current script
* `$!` : Process ID of the last background command

**Example:**

```bash
#!/bin/bash

echo "Number of arguments: $#"
echo "All arguments: $@"
echo "Script PID: $$"

if [ $# -eq 0 ]; then
    echo "No arguments provided"
    exit 1
fi
```

#### Looping Through Arguments

```bash
#!/bin/bash

echo "Processing all arguments:"
for arg in "$@"; do
    echo "- $arg"
done
```

***

### Data types

Bash is **not a strongly-typed language** - it treats almost everything as strings by default. However, it does support some data structures:

#### 1. Variables (Strings/Numbers)

**Basic variables:**

```bash
name="John"
age=25
price=19.99

```

**Everything is a string unless you do math:**

```bash
x="10"
y="20"
echo $x$y     # Output: 1020 (string concatenation)

# Do math with $(( ))
result=$((x + y))
echo $result  # Output: 30

```

***

#### 2. Arrays (Indexed)

**What they are:** Ordered lists of values, accessed by numeric index (0, 1, 2, ...)

**Creating Arrays**

**Method 1: Direct assignment**

```bash
fruits=("apple" "banana" "orange")
```

**Method 2: Individual assignment**

```bash
fruits[0]="apple"
fruits[1]="banana"
fruits[2]="orange"

```

**Method 3: Empty array**

```bash
my_array=()

```

**Accessing Array Elements**

**Get single element:**

```bash
fruits=("apple" "banana" "orange")
echo ${fruits[0]}    # apple
echo ${fruits[1]}    # banana
echo ${fruits[2]}    # orange

```

**Get all elements:**

```bash
echo ${fruits[@]}    # apple banana orange
echo ${fruits[*]}    # apple banana orange

```

**Get array length:**

```bash
echo ${#fruits[@]}   # 3

```

**Get length of specific element:**

```bash
echo ${#fruits[0]}   # 5 (length of "apple")

```

**Modifying Arrays**

**Add element:**

```bash
fruits+=("grape")
echo ${fruits[@]}    # apple banana orange grape

```

**Update element:**

```bash
fruits[1]="mango"
echo ${fruits[@]}    # apple mango orange grape

```

**Remove element:**

```bash
unset fruits[2]
echo ${fruits[@]}    # apple mango grape

```

**Remove entire array:**

```bash
unset fruits

```

**Looping Through Arrays**

**Method 1: For loop**

```bash
fruits=("apple" "banana" "orange")
for fruit in "${fruits[@]}"; do
    echo "I like $fruit"
done

```

**Method 2: Index-based loop**

```bash
for i in "${!fruits[@]}"; do
    echo "Index $i: ${fruits[$i]}"
done

```

**Method 3: C-style loop**

```bash
for ((i=0; i<${#fruits[@]}; i++)); do
    echo "${fruits[$i]}"
done

```

**Array Slicing**

**Get subset:**

```bash
numbers=(1 2 3 4 5 6 7 8 9 10)
echo ${numbers[@]:2:4}    # 3 4 5 6 (start at index 2, take 4 elements)
echo ${numbers[@]:5}      # 6 7 8 9 10 (from index 5 to end)

```

***

#### 3. Associative Arrays (Bash 4.0+)

**What they are:** Key-value pairs (like dictionaries in Python or objects in JavaScript)

**Must declare first:**

```bash
declare -A person

```

**Creating Associative Arrays**

**Method 1: Individual assignment**

```bash
declare -A person
person[name]="John"
person[age]="30"
person[city]="New York"

```

**Method 2: All at once**

```bash
declare -A person=( [name]="John" [age]="30" [city]="New York" )

```

**Accessing Associative Arrays**

**Get value by key:**

```bash
echo ${person[name]}     # John
echo ${person[age]}      # 30

```

**Get all keys:**

```bash
echo ${!person[@]}       # name age city

```

**Get all values:**

```bash
echo ${person[@]}        # John 30 New York

```

**Check if key exists:**

```bash
if [[ -v person[name] ]]; then
    echo "Name exists"
fi

```

**Looping Through Associative Arrays**

**Loop through keys and values:**

```bash
for key in "${!person[@]}"; do
    echo "$key: ${person[$key]}"
done

```

*Output:*

```
name: John
age: 30
city: New York

```

***

#### 4. Strings

**String operations:**

```bash
text="Hello World"

# Length
echo ${#text}              # 11

# Substring
echo ${text:0:5}           # Hello

# Replace
echo ${text/World/Bash}    # Hello Bash

# Uppercase
echo ${text^^}             # HELLO WORLD

# Lowercase
echo ${text,,}             # hello world

```

***

#### 5. Integers (with declare)

**Declare as integer:**

```bash
declare -i number=10
number=number+5
echo $number    # 15 (no need for $(( )))

```

***

#### Real-World Examples

**Example 1: Processing Files**

```bash
files=("data1.csv" "data2.csv" "data3.csv")

for file in "${files[@]}"; do
    echo "Processing $file..."
    python process.py "$file"
done

```

**Example 2: Configuration**

```bash
declare -A config
config[host]="localhost"
config[port]="5432"
config[database]="mydb"
config[user]="admin"

echo "Connecting to ${config[host]}:${config[port]}"

```

**Example 3: Log Levels**

```bash
declare -A log_levels
log_levels[DEBUG]=0
log_levels[INFO]=1
log_levels[WARN]=2
log_levels[ERROR]=3

current_level=${log_levels[INFO]}
echo "Current log level: $current_level"

```

**Example 4: Data Pipeline**

```bash
# List of data sources
sources=("api1" "api2" "database1" "file_system")

# Process each source
for source in "${sources[@]}"; do
    echo "Extracting from $source..."
    ./extract.sh "$source"
done

```

**Example 5: Environment Variables**

```bash
declare -A environments
environments[dev]="development.server.com"
environments[staging]="staging.server.com"
environments[prod]="production.server.com"

env="prod"
echo "Deploying to ${environments[$env]}"

```

**Example 6: User Data**

```bash
declare -A users
users[john]="john@example.com"
users[alice]="alice@example.com"
users[bob]="bob@example.com"

# Send email to all users
for username in "${!users[@]}"; do
    email="${users[$username]}"
    echo "Sending email to $username at $email"
done

```

**Example 7: Counting**

```bash
declare -A word_count

# Count words in array
words=("apple" "banana" "apple" "orange" "banana" "apple")

for word in "${words[@]}"; do
    ((word_count[$word]++))
done

# Display counts
for word in "${!word_count[@]}"; do
    echo "$word: ${word_count[$word]}"
done

```

*Output:*

```
apple: 3
banana: 2
orange: 1

```

***

#### Array vs Associative Array

| Feature     | Indexed Array      | Associative Array     |
| ----------- | ------------------ | --------------------- |
| Keys        | Numbers (0,1,2...) | Strings               |
| Declaration | Optional           | `declare -A` required |
| Access      | `${arr[0]}`        | `${arr[key]}`         |
| Use case    | Lists, sequences   | Key-value pairs       |

***

#### Common Patterns

**Read file into array:**

```bash
mapfile -t lines < file.txt
# or
readarray -t lines < file.txt

echo "Line 1: ${lines[0]}"
echo "Total lines: ${#lines[@]}"

```

**Split string into array:**

```bash
text="one,two,three,four"
IFS=',' read -ra parts <<< "$text"
echo ${parts[0]}    # one
echo ${parts[1]}    # two

```

**Command output to array:**

```bash
files=($(ls *.txt))
echo "Found ${#files[@]} text files"

```

**Check if element exists:**

```bash
fruits=("apple" "banana" "orange")

if [[ " ${fruits[@]} " =~ " banana " ]]; then
    echo "Found banana!"
fi

```

**Remove duplicates:**

```bash
original=("a" "b" "a" "c" "b")
unique=($(printf '%s\\n' "${original[@]}" | sort -u))
echo ${unique[@]}    # a b c

```

***

#### Limitations

**No multi-dimensional arrays (natively):**

```bash
# Workaround: use associative array
declare -A matrix
matrix[0,0]=1
matrix[0,1]=2
matrix[1,0]=3
matrix[1,1]=4

echo ${matrix[0,1]}    # 2

```

**No true objects:**

```bash
# Bash doesn't have objects/classes
# Use associative arrays instead

```

***

#### Quick Reference

**Indexed Arrays:**

```bash
arr=("a" "b" "c")          # Create
echo ${arr[0]}             # Access element
echo ${arr[@]}             # All elements
echo ${#arr[@]}            # Length
arr+=("d")                 # Append
unset arr[1]               # Remove element

```

**Associative Arrays:**

```bash
declare -A map             # Declare
map[key]="value"           # Set
echo ${map[key]}           # Get
echo ${!map[@]}            # All keys
echo ${map[@]}             # All values
echo ${#map[@]}            # Length

```

***

### Bash Control Structures: Loops, Conditionals, and More

This guide covers the essential control structures in Bash scripting that allow you to create dynamic, decision-making scripts.

#### Conditionals

**If Statements**

The `if` statement lets you execute code based on conditions.

**Basic syntax:**

```bash
if [ condition ]; then
    # code to execute if condition is true
fi
```

**If-else:**

```bash
if [ condition ]; then
    # code if true
else
    # code if false
fi
```

**If-elif-else:**

```bash
if [ condition1 ]; then
# code if condition1 is true
elif [ condition2 ]; then
# code if condition2 is true
else
# code if all conditions are false
fi
```

**Example:**

```bash
#!/bin/bash
age=25

if [ $age -lt 18 ]; then
    echo "Minor"
elif [ $age -lt 65 ]; then
    echo "Adult"
else
    echo "Senior"
fi
```

**String comparisons:**

* `=` : equal to
* `!=` : not equal to
* `z` : string is empty
* `n` : string is not empty

**File tests:**

* `f` : file exists and is a regular file
* `d` : directory exists
* `r` : file is readable
* `w` : file is writable
* `x` : file is executable
* `e` : file exists (any type)

**Example:**

```bash
if [ -f "/etc/passwd" ]; then
    echo "File exists"
fi

if [ "$name" = "Alice" ]; then
    echo "Hello, Alice!"
fi
```

#### Case Statements

Use `case` for multiple conditions based on pattern matching.

```bash
case $variable in
    pattern1)
        # code for pattern1
        ;;
    pattern2)
        # code for pattern2
        ;;
    *)
        # default case
        ;;
esac
```

**Example:**

```bash
#!/bin/bash
fruit="apple"

case $fruit in
    apple)
        echo "It's an apple"
        ;;
    banana|orange)
        echo "It's a banana or orange"
        ;;
    *)
        echo "Unknown fruit"
        ;;
esac
```

### Loops

#### For Loop

Iterate over a list of items.

**Basic syntax:**

```bash
for variable in list; do
    # code to execute
done
```

**Examples:**

```bash
# Loop through items
for fruit in apple banana cherry; do
    echo "I like $fruit"
done

# Loop through numbers
for i in {1..5}; do
    echo "Number: $i"
done

# Loop through files
for file in *.txt; do
    echo "Processing $file
done

# C-style for loop
for ((i=0; i<5; i++)); do
    echo "Count: $i"
done
```

#### While Loop

Execute code while a condition is true.

```bash
while [ condition ]; do
    # code to execute
done
```

**Example:**

```bash
#!/bin/bash
counter=1

while [ $counter -le 5 ]; do
    echo "Counter: $counter"
    ((counter++))
done
```

#### Until Loop

Execute code until a condition becomes true (opposite of while).

```bash
until [ condition ]; do
    # code to execute
done
```

**Example:**

```bash
#!/bin/bash
counter=1

until [ $counter -gt 5 ]; do
    echo "Counter: $counter"
    ((counter++))
done
```

#### Loop Control

* **break** : Exit the loop entirely
* **continue** : Skip to the next iteration

**Example:**

```bash
for i in {1..10}; do
    if [ $i -eq 5 ]; then
        continue  # Skip 5
    fi
    if [ $i -eq 8 ]; then
        break  # Stop at 8
    fi
    echo $i
done
# Output: 1 2 3 4 6 7
```

#### Functions

Define reusable blocks of code.

**Basic syntax:**

```bash
function_name() {
    # code
    return value  # optional, returns exit status (0-255)
}
```

**Example:**

```bash
#!/bin/bash

greet() {
    local name=$1  # local variable
    echo "Hello, $name!"
}

add_numbers() {
    local sum=$(($1 + $2))
    echo $sum
}

# Calling functions
greet "Alice"
result=$(add_numbers 5 3)
echo "Sum: $result"
```

**Practical examples**

**Example 1: File Backup Script**

```bash
#!/bin/bash

backup_dir="/backup"
source_dir="/home/user/documents"

if [ ! -d "$backup_dir" ]; then
    mkdir -p "$backup_dir"
    echo "Created backup directory"
fi

for file in "$source_dir"/*.txt; do
    if [ -f "$file" ]; then
        cp "$file" "$backup_dir/"
        echo "Backed up: $(basename $file)"
    fi
done
```

**Example 2: Menu System**

```bash
#!/bin/bash

while true; do
    echo "1. List files"
    echo "2. Show date"
    echo "3. Exit"
    read -p "Choose option: " choice
    
    case $choice in
        1)
            ls -l
            ;;
        2)
            date
            ;;
        3)
            echo "Goodbye!"
            break
            ;;
        *)
            echo "Invalid option"
            ;;
    esac
    echo ""
done
```

***

### Practical Combined Example using some of the commands

**Complete backup workflow:**

```bash
#!/bin/bash
# Automated backup script

# Get today's date
backup_date=$(date +%Y%m%d)

# Find large files
echo "Finding files larger than 50MB..."
find /data -size +50M > large_files_$backup_date.txt

# Create compressed archive
echo "Creating backup..."
tar -czf backup_$backup_date.tar.gz /data --exclude='*.tmp'

# Sync to remote server
echo "Syncing to backup server..."
rsync -avz backup_$backup_date.tar.gz user@backup-server:/backups/

# Check system resources
echo "System status:"
top -b -n 1 | head -5

echo "Backup complete!"
```

***
