Commands and Scripting

Guide to Bash commands and scripting

This is not an exhaustive guide, therefore here's additional sources of information just in case:


What is a Shell?

A shell is a command-line interpreter - it's the program that takes the commands you type and translates them into actions the operating system can understand. It's called a "shell" because it wraps around the operating system kernel, providing a user interface to access system functions.

Bash (Bourne Again Shell)

Bash is the most widely used shell, especially on Linux systems:

  • Default on most Linux distributions and older macOS versions

  • Written in C

  • Highly compatible - most shell scripts you find online are written for bash

  • Rich scripting capabilities with good documentation

  • Stable and mature - been around since 1989

  • Extensive history and tab completion

Zsh (Z Shell)

Zsh is a more modern shell with enhanced features:

  • Default on newer macOS (since Catalina)

  • Better autocompletion - more intelligent suggestions

  • Advanced globbing - more powerful pattern matching

  • Themes and plugins - highly customizable (especially with Oh My Zsh)

  • Better interactive features - spelling correction, shared history

You’ll likely use Bash or Zsh if you’re using MacOS. In order to switch between them temporarily, just type their name: bash or zsh.

But if you want to permanently change the default shell, use these commands:

Check which one you’re currently using:


Expansions

All kinds of expansionsarrow-up-right

Let's consider one type of expansions.

Brace Expansion

Brace expansion is a convenient Bash feature that generates multiple strings from a pattern containing braces. It happens before any other expansions and allows you to create multiple arguments or strings efficiently.

  • Increment: You can specify an increment in the brace expansion, such as {1..10..2} to get 1 3 5 7 9

  • Zero-padding: You can prefix numbers with 0 to force consistent width, e.g., {01..10} would expand to 01 02 ... 10.

Example 1

{1..10} utilizes brace expansion to generate a sequence of numbers from 1 to 10.

Printing the sequence.

This command will output:

Using in a for loop.

This loop will iterate, assigning each number from 1 to 10 to the variable i in turn, and print a message for each.

Example 2

Nested Braces

You can nest brace expansions for more complex patterns:

Practical Uses

Creating directories:

Backing up files:

Batch renaming or operations:

Important notes:

  • No variables in brace expansion. You cannot directly use variables within brace expansion for the start and end values. For example, echo {$from..$to} where from=1 and to=10 will not work as expected; it would literally output {$from..$to}. For variable-based ranges, consider using the seq command or a traditional for ((i=start; i<=end; i++)) loop.

  • Brace expansion doesn't use wildcards or match existing files—it just generates text

  • No spaces should appear inside the braces unless you want them in the output


Simple commands

Echo

echo - Print text to the screen

Appends a timestamped message to a log file


Ls

ls - List the contents of a directory (shows files and folders).

Combinations

Option
What it does

-l

Long format (detailed)

-a

Show all (including hidden)

-h

Human-readable sizes

-t

Sort by time

-S

Sort by size

-r

Reverse order

-R

Recursive (subdirectories)


Pwd

Print current working directory


Cd

cd - Change directory (move to a different folder)

Go to data pipeline folder and immediately list all files


Mkdir

mkdir - Make a new directory (create a folder)

Creates nested folder structure: data/raw, data/processed, data/archive


Rmdir

Remove empty directories only.

You can specify one or many directories to remove.

Remove nested empty directories:


Mv

mv - Move or rename files/folders

Move all CSV files to backup folder and count how many were moved


Rm

The remove commandarrow-up-right deletes a file or empty directory:

Remove directory with contents (DANGEROUS):

You can optionally add a -r flag to tell the rm command to delete a directory and all of its contents recursively. "Recursively" is just a fancy way of saying "do it again on all of the subdirectories and their contents".

Remove with confirmation (safer practice):

Asks before deleting each file


Cp

cp - Copy files or folders

Copy entire data folder to backup with today's date in the name


Touch

touch - Create an empty file or update timestamp

Creates 10 files: file1.txt, file2.txt, ... file10.txt

Every file has metadata that includes timestamps:

  • Last modified time - when the file content was last changed

  • Last accessed time - when the file was last opened

When you use touch on an existing file, it updates these timestamps to the current time WITHOUT changing the file's content.


Cat

The cat command is used to view the contents of a file. It's short for "concatenate", which is a fancy way of saying "put things together". It can feel like a confusing name if you're using the cat command to view a single file, but it makes more sense when you're using it to view multiple files at once.

You can do something like this:

This would read the contents of error.log and redirect (chain) it to grep command which will search for the word “date”.

Or this:

This would read the contents of example.txt and pipe it into wc command.


Head/tail

Sometimes you don't want to print everything in a file. Files can be really big after all.

The head Command

The head command prints the first n lines of a file, where n is a number you specify.

If you don't specify a number, it will default to 10.

The tail Command

The tail command prints the last n lines of a file, where n is a number you specify.


Less/more

less and more - they're both commands for viewing files, but less is the more powerful one.

The more and less commands let you view the contents of a file, one page (or line) at a time.

In the context of these commands, less is literally more. The less command does everything that the more command does but also has more features. As a general rule, you should use less instead of more.

You would only use more if you're on a system that doesn't have less installed.

more Command

The older, simpler file viewer:

What you can do:

  • Press Space - go to next page

  • Press Enter - go down one line

  • Press q - quit

  • That's basically it!

Limitations:

  • You can only scroll DOWN (not back up)

  • Once you pass something, you can't go back to see it

  • Less features overall

less Command

The newer, better file viewer:

What you can do:

  • Press Space or Page Down - go to next page

  • Press b or Page Up - go BACK up a page

  • Press Arrow keys - move up/down line by line

  • Press /searchterm - search for text

  • Press n - go to next search result

  • Press N - go to previous search result

  • Press g - go to beginning of file

  • Press G - go to END of file

  • Press q - quit

Why it's better:

  • You can scroll both up AND down

  • You can search within the file

  • It doesn't load the entire file into memory (great for huge files)

  • Much more control


Which

The which command is used in Unix-like systems (Linux, macOS) to find the full path of an executable file that would be run when you type a command. It searches through directories listed in your system's PATH environment variable to locate the specified program. For example, typing which ls would show the path to the ls command's executable file.

Uname

  • uname -a - Print all system information

  • uname -s - Print kernel name

  • uname -r - Print kernel release


Date

Show current date and time:

Output: Thu Oct 2 16:45:23 AQTT 2025

Show date in specific format:

Output: 2025-10-02

Show time only:

Output: 16:45:23


CURL

Transfer data to/from servers (Client URL).

What it does:curl is a command-line tool for making HTTP/HTTPS requests. It's like a browser, but for the terminal. You can download files, interact with APIs, send data, and test web services.

Think of it as: A programmable web browser for the command line

Basic GET request (fetch webpage):

Download data from a URL:

Download JSON data from API and save it to dataset.json file

Save with original filename:

Downloads and saves as 'dataset.csv' (keeps original name)

Working with APIs

GET request with headers:

Sends request with authentication and specifies JSON response

POST request with JSON data:

Creates new user by sending JSON data

POST data from file:

The @ symbol reads data from file

PUT request (update):

Updates user with ID 123

DELETE request:

Deletes user with ID 123

Why it's essential for data engineers:

  • Fetch data from APIs

  • Download datasets from URLs

  • Test API endpoints

  • Automate data ingestion

  • Monitor web services


Read

Bash style:

With timeout:

Waits 10 seconds for input, if no response, continues with defaults

Zsh style:


Find

find - Search for files based on criteria

Example:

Finds all CSV files larger than 100MB modified in the last 7 days. mtime stands for “modified time”

Delete old log files:

Finds and deletes log files older than 30 days

Find and process files:

Finds all JSON files, counts lines in each, then sums them up


Tee

The tee command in Bash reads from standard input and writes to both standard output AND one or more files simultaneously. Think of it like a "T" pipe fitting in plumbing - the data flow splits in two directions.

Syntax:

  • tee [-ai] [file ...]

    • -a Append the output to the files rather than overwriting them.

    • -i Ignore the SIGINT signal.

    • file A pathname of an output file.

tee is almost always used with an upstream source because its whole purpose is to duplicate data flowing through a pipeline.

Typical usage pattern:

Common examples:

This is incredibly useful for logging command output while still monitoring it in real-time, or when you need to save intermediate results in a pipeline.

Can you use tee without a pipe?

Technically yes, but it's uncommon:

SIGINT

SIGINT is a signal (Signal Interrupt) sent to a process, typically when you press Ctrl+C in the terminal. It's a request for the program to terminate gracefully. The process can catch this signal and handle it (e.g., clean up resources before exiting) or ignore it.

Common signals include:

  • SIGINT (2): Interrupt from keyboard (Ctrl+C)

  • SIGTERM (15): Termination request

  • SIGKILL (9): Forceful kill (cannot be caught or ignored)

Explanation of the -i option:

  • By default: If tee receives a signal like SIGINT (Ctrl+C), it does what any normal program would do - it terminates immediately

  • With the -i option: The -i flag tells tee to ignore SIGINT signals

Why is this useful?

Imagine you have a long-running command pipeline:

If you press Ctrl+C, SIGINT goes to all processes in the pipeline. Without -i, tee would stop immediately, breaking the pipeline. With -i:

Now tee will ignore Ctrl+C and keep running, allowing the data flow to continue even if you accidentally hit Ctrl+C or intentionally want to stop only certain parts of the pipeline.


Tar

Archive and compress files

Create compressed backup with exclusions:

Creates compressed archive excluding temp files and cache folder, with date in filename

Extract to specific directory:

Extracts compressed archive to a specific location

List contents without extracting:

Shows only CSV files inside the archive without extracting


rsync - Remote/local file synchronization tool

What it does:rsync is a file copying/syncing tool that only transfers the differences between source and destination. It's much smarter and faster than regular cp command, especially for large files or when syncing repeatedly.

Think of it as: Smart copy that only updates what changed

Key advantages over cp:

  • Only copies changed files (not everything)

  • Can resume interrupted transfers

  • Shows progress

  • Works over network (SSH)

  • Preserves permissions, timestamps, ownership

  • Can delete files in destination that don't exist in source

Basic syntax:

Important note about trailing slashes:

Common Examples

Simple local sync:

Syncs data folder to backup (archive mode, verbose)

Sync with progress bar:

Shows progress, human-readable sizes

Sync to remote server:

Syncs from remote server to local machine (with compression)


Zip / unzip

Compress and extract zip files

Create zip with password protection:

Creates encrypted zip file (will prompt for password)

Zip multiple directories:

Combines multiple folders into one zip file

Unzip to specific directory:

Extracts zip contents to specific location

List contents without extracting:

Unzip specific file:

Extracts only one specific file from the zip


Gzip and gunzip - Compress and decompress files

What they do:

  • gzip compresses files (makes them smaller)

  • gunzip decompresses files (restores original)

Think of it as: ZIP files for Linux (but only for single files)

File extension: .gz

gzip - Compress files

Basic syntax:

What happens:

  • Original file gets compressed

  • Creates filename.gz

  • Original file is DELETED (replaced with compressed version)

Simple Examples

Compress a file:

Creates data.csv.gz, deletes data.csv

Keep original file:

Creates data.csv.gz, KEEPS data.csv

Compress multiple files:

Each file becomes file1.txt.gz, file2.txt.gz, file3.txt.gz

gunzip - Decompress files

Basic syntax:

What happens:

  • Compressed file gets decompressed

  • Creates original filename

  • Compressed file is DELETED

Simple Examples

Decompress a file:

Creates data.csv, deletes data.csv.gz

Keep compressed file:


Top

Real-time system monitoring

Basic usage:

Shows live view of processes, CPU, memory usage

Once inside top:

  • Press M - sort by memory usag

  • Press P - sort by CPU usage

  • Press k - kill a process (then enter PID)

  • Press q - quit

Run top in batch mode (for logging):

Takes one snapshot of system state and saves top 20 lines to fil

Monitor specific user's processes:

Shows only processes belonging to specific user

Show only specific number of processes:

Shows top 15 processes once (useful for scripts)

Alternative: htop (more user-friendly if installed):

Interactive, colorful, easier to use than top


Awk - Pattern scanning and text processing tool

What it does:awk is a powerful programming language designed for processing text files, especially structured data like CSV files. It works by reading files line-by-line and letting you perform operations on specific columns (fields).

Think of it as: Excel formulas for the command line

Best for:

  • Extracting specific columns from CSV/tab-delimited files

  • Performing calculations on data (sum, average, count)

  • Filtering rows based on conditions

  • Reformatting structured data

Simple example:

Prints columns 1 and 3 from a CSV file

More complex example:

For rows where column 3 > 100, calculate the average of column 4

How it works:

  • F',' = field separator is comma (for CSV files)

  • $1, $2, $3 = column 1, column 2, column 3

  • $0 = entire line

  • You can use conditions, loops, and calculations


Sed - Stream editor for find/replace and text transformation

What it does:sed is a tool for editing text in a stream (line by line). It's most commonly used for find-and-replace operations, but can also delete lines, insert text, and transform data.

Think of it as: Find and Replace on steroids

Best for:

  • Finding and replacing text in files

  • Deleting specific lines

  • Extracting specific line ranges

  • Modifying text without opening an editor

Simple example:

Replaces all occurrences of "old" with "new"

More complex example:

Converts date format from YYYY-MM-DD to DD/MM/YYYY

Common operations:

  • s/find/replace/g = substitute (find and replace)

  • /pattern/d = delete lines matching pattern

  • 10,20d = delete lines 10-20

  • i = edit file in-place (modify the actual file)


Time

What it does:time measures how long a command takes to run. It shows three different time measurements.

Basic usage:

Example:

Output explanation:

  • real = actual time that passed (what you'd see on a stopwatch)

  • user = time CPU spent running your program

  • sys = time CPU spent on system operations (file I/O, etc.)

Real-world examples:

Save timing to variable:


Diff

Common Uses

Compare two files:

Side-by-side comparison:

Shows files next to each other

Unified format (like Git):

Shows context around changes

Ignore whitespace differences:

Compare directories:

Shows which files are different

Brief output (just show which files differ):

Colorized output:

Understanding diff Output

Format: <line_number><action><line_number>

  • a = add

  • c = change

  • d = delete


Grep

grep - Search text using patterns

Basic search

Case-insensitive search

Search recursively in directories

Show line numbers

Invert match (show lines that don't match)

You can also search multiple files at once. For example, if we wanted to search for the word "hello" in hello.txt and hello2.txt, we could run:

Recursive Search

You can also search an entire directory, including all subdirectories. For example, to search for the word "hello" in the current directory and all subdirectories:

The . is a special alias for the current directory.


Sort

The sort command in Linux/Unix is used to sort lines of text files or input in various ways.

Basic Usage

Common Options

Sort Order:

Numeric Sorting:

Case Sensitivity:

Unique Values:

Real-World Use Cases

Find top 10 largest files:

Sort log entries by timestamp:

Get unique IP addresses from logs:

Sort processes by memory usage:


Uniq

The uniq command filters out or reports repeated lines. Important: It only detects adjacent duplicates, so the input usually needs to be sorted first.

Basic Syntax

Basic Usage

Common Options

Count occurrences:

Show only duplicates:

Show only unique lines:

Ignore case:


Cut

The cut command extracts sections from each line of files - great for working with columnar data.

Basic Syntax

Cutting by Characters

Cutting by Fields (Columns)

Custom Delimiters

Practical Examples


jq - JSON processor and query tool

What it does:jq is like grep, sed, and awk combined, but specifically for JSON data. It lets you parse, filter, transform, and extract data from JSON files or API responses.

Think of it as: SQL queries for JSON

Why it's essential for data engineering:

  • Most APIs return JSON

  • Modern logs are often in JSON format

  • Easy to extract specific fields from complex JSON

Basic syntax:

Common Examples

Pretty print JSON:

Makes JSON readable with proper indentation

Extract a specific field:

Output: "John"

Extract nested field:

Gets city from nested structure

Extract from array:

Gets name from first item in array

Extract multiple fields:

Output: "John" and 30 on separate lines

Filter array based on condition:

Shows only users older than 25

Create new JSON structure:

Transforms JSON with new field names

Extract to CSV:

Converts JSON array to CSV format

Count items in array:

Returns number of items

Get all values of a specific field:

Extracts all names from array of users

Filter and transform:

Gets only active users, shows only id and name fields

Real-World Data Engineering Example

  1. Extract data from API response:

Fetches API data and extracts all emails

  1. Convert JSON logs to CSV:

  1. Filter error logs:

  1. Count errors per day:

  1. Extract nested data:

  1. Combine multiple JSON files:

The -s flag slurps all files into one array


du

Disk Usage (check how much space files/folders use)

What it does: du shows how much disk space files and directories are using. It's essential for finding what's eating up your storage.

Think of it as: A disk space analyzer for the command line

Simple Examples

Check size of current directory:

Shows size of current directory and all subdirectories (in kilobytes)

Check size of specific folder:

Human-readable sizes:

Shows sizes as 1K, 234M, 2G instead of kilobytes

Summary only (total size):

Shows just one line with total size

Check multiple folders:

Find largest datasets:

Shows all folders in /data sorted by size

Check database size:

du Common Flags

Flag
What it does

-h

Human-readable (KB, MB, GB)

-s

Summary only (total)

-a

Show all files (not just directories)

-c

Show grand total at end

-d N

Max depth of N levels

--max-depth=N

Same as -d N

-k

Show in kilobytes

-m

Show in megabytes


history

View history of previously run commands.


What it does: Creates links to files or directories. There are two types: hard links and symbolic (soft) links.

Think of it as: Creating shortcuts or aliases to files

Symbolic Links (Soft Links) - Most Common

Create a symbolic link:

Example:

Creates a shortcut called report_link.txt that points to the original file

Link to directory:

Check if it's a link:

Output shows: lrwxr-xr-x ... report_link.txt -> /home/user/documents/report.txt

Create shortcut to frequently used directory:


su - Switch User

What it does: Switches to another user account. Stands for "substitute user" or "switch user".

Basic syntax:

Switch to root:

Prompts for root password

Switch to root with root's environment:

The dash (-) loads root's environment variables and home directory

Switch to specific user:

Prompts for bob's password

Exit back to your user:


sudo - Execute command as another user (usually root)

What it does: Runs a single command with elevated privileges (usually as root). Stands for "superuser do".

Think of it as: Temporary admin powers for one command

Basic syntax:

Example:

Runs apt update as root

Common Uses

Install software:

Edit system files:

View protected files:


Operators

Arithmetic Operators

  • + - Addition

  • - - Subtraction

  • * - Multiplication

  • / - Division

  • % - Modulus (remainder)

  • * - Exponentiation

Example:

Comparison Operators

For numeric comparisons

For string comparisons

Example:

Logical Operators

Example:

&& - Run next command ONLY if previous succeeds

Note: && operator signifies conditional execution. The core function of && is to create a dependency between commands. The command to the right of && will only run if the command to its left exits with a status of 0. In Bash, an exit status of 0 conventionally signifies success, while any non-zero exit status indicates failure.

If mkdir fails (e.g., the directory already exists), cd will not be attempted.

Short-circuiting: The && operator exhibits "short-circuiting" behavior. If the first command fails, Bash immediately stops evaluating the expression and does not execute the subsequent commands linked by &&. This is efficient as it avoids unnecessary operations.

|| - Run next command ONLY if previous fails

Syntax:

Behavior:

  • command2 runs ONLY if command1 exits with non-zero (failure)

  • If command1 succeeds, command2 never runs

Examples:

; (Semicolon) - Run next command REGARDLESS

Syntax:

Behavior:

  • command2 runs no matter what

  • Doesn't care if command1 succeeded or failed

Examples:

! (NOT) - Negate exit status

Syntax:

Examples:

Combining Operators

AND then OR:

If command1 succeeds, run command2; if either fails, run command3

Example:

Grouping with parentheses:

Real-World Examples

Safe script execution:


Permissions

Each file and directory in Unix systems has permissions associated with them.

You have to ask 2 questions when talking about permissions:

  1. Who has the permissions?

  2. What permissions do they have?

    1. Any user accessing a specific file/directory may or may not have access to read it, write to it, or execute it.

Both permissions, i.e. who and what are represented by a 10-character string. Here are examples for each type of file:

  • Regular Files

  • Directories

  • Special Files

What do these characters mean?

  • 1 character is always either - or d , so that a user recognizes if it’s a directory or not.

Regular file (e.g. -rwxrwxrwx)

Directory (e.g. drwxrwxrwx)

  • The next 3 characters r,w,x represent the three permissions - read, write, execute. Who are they apply to? Usually the owner, i.e. the one who created the file, or else if changed afterwards manually.

    • Each permissions has a state: granted or not granted. If it’s granted, there is a letter present and - if not. Example: r-x means owner can read and execute but not write. owner can

  • Finally, the next 6 characters are another 2 sets of rwx. The second set of rwx applies to the group instead of the owner. And the last set applies to everyone else.


Changing permissions

For more information: https://www.stationx.net/linux-file-permissions-cheat-sheet/#def-perarrow-up-right

chmod command (stands for "change mode”)

Example: chmod -R u=rwx,g=,o= DIRECTORY. This means:

  • The owner can read, write, and execute

  • The group can do nothing

  • Others can do nothing

In the command above, u means "user" (aka "owner"), g means "group", and o means "others". The = means "set the permissions to the following", and the rwx means "read, write and execute". The g= and o= mean "set group and other permissions to nothing". The -R means "recursively", which means "do this to all of the contents of the directory as well".

Remember, . is a special alias for the current directory.

There is symbolic and numeric notations for permission definition:

Symbolic notation:

  • u = user/owner

  • g = group

  • o = other

  • a = all (user + group + other)

Numeric notation:

  • First digit = owner permissions (instead of the first three letters)

  • Second digit = group permissions (instead of the second three letters)

  • Third digit = other permissions (instead of the third three letters)

So chmod 755 file means:

  • 7 (rwx) for owner

  • 5 (r-x) for group

  • 5 (r-x) for other

Common Permission Patterns

  • 755: Executable files (owner can do everything, others can read/execute)

  • 644: Regular files (owner can read/write, others read-only)

  • 600: Private files (only owner can read/write)

  • 777: Full access for everyone (generally avoided for security)


chown (stands for “change owner”)

What it does:chown changes who owns a file or directory. In Unix/Linux systems, every file has an owner (user) and a group. This command lets you change either or both.

Think of it as: Transferring ownership of files to different users

Why it matters:

  • Control who can access/modify files

  • Fix permission issues

  • Set up proper access for web servers, databases, etc.

  • Essential for multi-user systems and servers

Basic Syntax

or


Running scripts

Creating a script

A "shebang"arrow-up-right is a special line at the top of a script that tells your shell which program to use to execute the file. It’s a shell interpreter.

The format of a shebang is:

For example, if your script is a Python script and you want to use Python 3, your shebang might look like this:

This tells the system to use the Python 3 interpreter located at /usr/bin/python3 to run the script.

If you're writing scripts that need to work on bash and zsh shells, use the portable version:

Running a script

If the program is in the current directory, you need to prefix it with ./ to run it:

  • ./program.sh

There is also an option to run a script that can’t be closed by ctrl+c combination. You can run it by adding force after your script name, e.g.: ./program.sh force

Writing Professional Bash Script Headers

When creating Bash scripts for professional or collaborative environments, including a well-structured header makes your code more maintainable and easier to understand. Here's how to document your scripts effectively.

Header Placement

Place your documentation header immediately after the shebang line (#!/bin/bash) and before any executable code. Use the # symbol to create comments that won't be executed.

Essential Header Information

A professional script header should include these five key pieces of information:

  1. Author - Who wrote the script (name or username)

  2. Creation Date - When the script was originally created

  3. Last Modified - When the script was last updated

  4. Description - A brief explanation of what the script does

  5. Usage - How to run the script, including any arguments or flags

Why This Matters

Including these details helps anyone who encounters your script (including your future self) quickly understand:

  • Its purpose and functionality

  • Who to contact with questions

  • Whether it's current or potentially outdated

  • How to execute it correctly

Example Header

Additional Considerations

For more complex scripts, you might also include:

  • Version number for tracking script evolution

  • Dependencies listing required tools or packages

  • License information for shared or open-source code

  • Contact information such as email or support channels

Adopting this convention from the start establishes good habits and makes your scripts production-ready.


Shell configuration

Bash and Zsh both have configuration filesarrow-up-right that run automatically each time you start a new shell session. These files are used to set up your shell environment. They can be used to set up aliases, functions, and environment variables.

These files are located in your home directory (~) and are hidden by default. The ls command has a -a flag that will show hidden files:

  • If you're using Bash, .bashrc is probably the file you want to edit.

  • If you're using Zsh, .zshrc is probably the file you want to edit or create if it doesn't yet exist.


Environment variables

Apart from regular variables, there is another type of variable called an environment variablearrow-up-right. They are available to all programs that you run in your shell.

You can view all of the environment variables that are currently set in your shell with the env command.

To set a variable in your shell, use the export command:


What's particularly useful is that any programs or scripts you execute in your shell will inherit access to these environment variables.

To demonstrate this, let's create a simple script file named greet.sh:

Now we can make it executable and run it:


You can also temporarily set a variable for a single command, instead of exporting it (exporting means the variable will persist until you close the shell).

For example:


Your shell comes with several environment variables that are essentially "standard" - meaning various programs and system components recognize and utilize them automatically. The PATH variable is a prime example of this.

Why is the PATH Variable Important?

Without the PATH variable, you'd need to specify the complete filesystem location for every command you want to execute. Rather than simply typing ls, you'd be forced to type /bin/ls (or wherever the ls program lives on your particular system). This would be extremely tedious.

The PATH variable contains a collection of directory paths that your shell searches through whenever you enter a command. When you type ls, your shell examines each directory listed in PATH looking for an executable file named ls. Once found, it executes that program. If no matching executable is discovered, you'll receive a "command not found" error.

You can view your current PATH setting with this command:

This will display a long string of directory paths separated by colons (:). Each path represents a location where your shell searches for executable programs.

Note: Restarting your shell session will reset the PATH variable to its default.

Adding a directory to PATH

To add a directory to your PATH without overwriting all of the existing directories, use the export command and reference the existing PATH variable:

As you know, this is a temporary change until your session is closed, therefore you won’t be able to use executables from anywhere.

Permanently adding a directory to PATH

The most common way to do this is to add the same export command that you used in the last lesson to your shell's configuration file.


Man command

The manarrow-up-right command is short for "manual". It's a program that displays the manual for other programs.

The man command functions only with programs that have documentation available in the manual system, though fortunately this includes most shell built-ins and standard Unix utilities. To use it, simply provide the command name as an argument. The logical starting point is to examine the manual for the manual system itself:

How to search for what you need:


Command flag conventions

The availability and nature of command flags depends entirely on how each program's developer designed it. However, most Unix commands follow established patterns:

  • Single-letter flags use one dash as a prefix (e.g., v)

  • Word-based flags use two dashes as a prefix (e.g., -version)

  • Many commands offer both short and long versions of the same option (e.g., v and -version)

Help flag

Standard practice among mature command-line applications is to include a "help" feature that displays usage instructions. This assistance is typically accessible through one of these methods:

  • -help (long flag format)

  • h (short flag format)

  • help (as the initial argument)

The help output tends to be more digestible than comprehensive man documentation. Rather than serving as exhaustive reference material, it functions more like a concise getting-started tutorial.


Nano editor

  • Ctrl+O to save the file (confirm any prompts with "enter")

  • Ctrl+X to exit the editor.

There should be a list of commands at the bottom of the screen.


Program Exit Codes

Exit codes (also known as "return codes" or "status codes") serve as a communication mechanism for programs to indicate whether their execution completed successfully.

A program returns 0 to signal successful completion. All other exit codes indicate some form of failure or error condition. In most cases when something goes wrong, you'll see exit code 1, which serves as a general-purpose error indicator.

These exit codes enable programs to monitor and respond to the success or failure of other programs they execute. For instance, at Boot.dev, our monitoring system checks the exit code of our server application - if it terminates with a non-zero code, our monitoring automatically restarts the service and records the failure for investigation.

Within your shell environment, you can examine the exit code from the most recently executed command using the special variable $?. Here are some practical examples:


Standard output (stdout), Standard error (stderr), Standard input (stdin)

Redirecting Streams

You can redirect stdout and stderr to different places using the > and 2> operators. > redirects stdout, and 2> redirects stderr.

Capturing Standard Output to a File

Note: use >> to append to a file instead of rewriting it.

Capturing Error Output to a File

In this demonstration, ls is used to deliberately trigger an error message (attempting to list a directory that doesn't exist), and this error output gets redirected into errors.log.

Standard input

Since we have standard output, it makes sense that there would also be standard input, correct?

"Standard Input," commonly referred to as "stdin," represents the default source from which programs receive their input data. It functions as a data stream that applications can consume during their execution.

Note: The read command prompts for and accepts user input from stdin (standard input).


Piping

Among the shell's most elegant features is the ability to chain programs together by sending one program's output directly into another program's input. This single mechanism enables remarkably sophisticated automation workflows.

The Pipe Operator

The pipe symbol is | - a vertical line character typically found on the same key as the backslash (\\) above your enter key. This operator captures the stdout from the command on its left side and feeds it as stdin to the command on its right side.

In this demonstration, the echo command produces the text "I find your lack of faith disturbing" as its output. Rather than displaying this text in your terminal, the pipe operator redirects it to the wc (word count) utility. The wc program tallies the words in whatever input it receives, and the -w flag instructs it to report only the word count.

This functionality works because wc, like most command-line utilities, can accept input from stdin as an alternative to reading from a file path.


Xargs

xargs is a powerful bash command that builds and executes commands from standard input. It's particularly useful for handling situations where you need to pass a large number of arguments to a command, or when you want to convert input into arguments for another command.

Basic Concept

xargs reads items from standard input (separated by spaces or newlines) and passes them as arguments to another command. Think of it as a bridge that converts input lines into command arguments.

Simple Examples


Interrupt and Kill

Interrupt

Occasionally, a running program will become unresponsive or you'll need to terminate it. This typically happens when:

  • The command contains an error and isn't behaving as expected

  • The program is attempting network operations while you're offline

  • You're processing large datasets and decide not to wait for completion

  • A software defect is causing the application to freeze

When you encounter these situations, you can terminate the program using ctrl + c. This keyboard combination sends a "SIGINT" (interrupt signal) to the running process, instructing it to terminate gracefully.

Kill

Occasionally, a program becomes completely unresponsive (or behaves maliciously) and ignores the SIGINT signal entirely. When this occurs, your best approach is to open a separate shell session (another terminal window) and forcibly terminate the problematic process.

Command Format

PID represents "process ID" - a unique numerical identifier assigned to every running process on your system. To discover the process IDs currently active on your machine, you can use the ps ("process status") command:

The "aux" flags specify "display all processes, including those belonging to other users, with detailed information for each process".


More about Scripting

Positional arguments

Positional arguments allow your script to accept input from the command line when executed.

Basic Positional Parameters

When you run a script like ./script.sh arg1 arg2 arg3, Bash automatically assigns these values to special variables:

  • $0 : The script name itself

  • $1 : First argument

  • $2 : Second argument

  • $3 : Third argument

  • ... and so on up to $9

  • ${10} : Tenth argument and beyond (use braces)

Example:

Running ./greet.sh Alice 30 outputs:

Special Parameter Variables

  • $# : Number of arguments passed to the script

  • $@ : All arguments as separate words

  • $* : All arguments as a single word

  • $? : Exit status of the last command

  • $$ : Process ID of the current script

  • $! : Process ID of the last background command

Example:

Looping Through Arguments


Data types

Bash is not a strongly-typed language - it treats almost everything as strings by default. However, it does support some data structures:

1. Variables (Strings/Numbers)

Basic variables:

Everything is a string unless you do math:


2. Arrays (Indexed)

What they are: Ordered lists of values, accessed by numeric index (0, 1, 2, ...)

Creating Arrays

Method 1: Direct assignment

Method 2: Individual assignment

Method 3: Empty array

Accessing Array Elements

Get single element:

Get all elements:

Get array length:

Get length of specific element:

Modifying Arrays

Add element:

Update element:

Remove element:

Remove entire array:

Looping Through Arrays

Method 1: For loop

Method 2: Index-based loop

Method 3: C-style loop

Array Slicing

Get subset:


3. Associative Arrays (Bash 4.0+)

What they are: Key-value pairs (like dictionaries in Python or objects in JavaScript)

Must declare first:

Creating Associative Arrays

Method 1: Individual assignment

Method 2: All at once

Accessing Associative Arrays

Get value by key:

Get all keys:

Get all values:

Check if key exists:

Looping Through Associative Arrays

Loop through keys and values:

Output:


4. Strings

String operations:


5. Integers (with declare)

Declare as integer:


Real-World Examples

Example 1: Processing Files

Example 2: Configuration

Example 3: Log Levels

Example 4: Data Pipeline

Example 5: Environment Variables

Example 6: User Data

Example 7: Counting

Output:


Array vs Associative Array

Feature
Indexed Array
Associative Array

Keys

Numbers (0,1,2...)

Strings

Declaration

Optional

declare -A required

Access

${arr[0]}

${arr[key]}

Use case

Lists, sequences

Key-value pairs


Common Patterns

Read file into array:

Split string into array:

Command output to array:

Check if element exists:

Remove duplicates:


Limitations

No multi-dimensional arrays (natively):

No true objects:


Quick Reference

Indexed Arrays:

Associative Arrays:


Bash Control Structures: Loops, Conditionals, and More

This guide covers the essential control structures in Bash scripting that allow you to create dynamic, decision-making scripts.

Conditionals

If Statements

The if statement lets you execute code based on conditions.

Basic syntax:

If-else:

If-elif-else:

Example:

String comparisons:

  • = : equal to

  • != : not equal to

  • z : string is empty

  • n : string is not empty

File tests:

  • f : file exists and is a regular file

  • d : directory exists

  • r : file is readable

  • w : file is writable

  • x : file is executable

  • e : file exists (any type)

Example:

Case Statements

Use case for multiple conditions based on pattern matching.

Example:

Loops

For Loop

Iterate over a list of items.

Basic syntax:

Examples:

While Loop

Execute code while a condition is true.

Example:

Until Loop

Execute code until a condition becomes true (opposite of while).

Example:

Loop Control

  • break : Exit the loop entirely

  • continue : Skip to the next iteration

Example:

Functions

Define reusable blocks of code.

Basic syntax:

Example:

Practical examples

Example 1: File Backup Script

Example 2: Menu System


Practical Combined Example using some of the commands

Complete backup workflow:


Last updated