# The gcloud Crash Course

***

### Google Cloud CLI (often called `gcloud`)

#### The "Grammar" of the CLI

The commands might look long, but they follow a strict hierarchy that makes them easy to guess. It almost always follows this pattern:

`gcloud [GROUP] [SUB-GROUP] [ACTION] [FLAGS]`

* Group: The high-level product (e.g., `compute`, `storage`, `dataproc`, `functions`).
* Sub-Group: The specific resource (e.g., `instances`, `buckets`, `clusters`).
* Action: What you want to do (e.g., `list`, `create`, `delete`, `describe`).

Example:

* *English:* "I want to create a virtual machine instance in Compute Engine."
* *CLI:* `gcloud compute instances create my-server`

#### Getting Started (The Setup)

Before you run anything, you usually need to tell the CLI who you are and which project you are working on.

* `gcloud auth login`

  Opens a browser window to log you into your Google account.
* `gcloud init`

  Runs a wizard that helps you select your default Project, Region (e.g., us-central1), and Zone.
* `gcloud config set project [PROJECT_ID]`

  Switches your active project. Crucial if you work on multiple environments (e.g., dev vs prod).

#### The "Must-Know" Commands for Data Engineering

As a Data Engineer, these are the commands you will use 90% of the time.

**A. Storage (The "gsutil" legacy)**

*Note: For a long time, GCP used a separate tool called `gsutil` for storage. Recently, they added `gcloud storage`, but you will still see `gsutil` in 99% of tutorials.*

* List files:

  `gcloud storage ls gs://my-bucket-name/`
* Copy a file (upload/download):

  `gcloud storage cp ./local-file.csv gs://my-bucket-name/data/`
* Make a bucket:

  `gcloud storage buckets create gs://my-new-data-lake`

**B. Compute (VMs)**

* List running servers:

  `gcloud compute instances list`
* SSH into a server (The Magic Command):

  `gcloud compute instances ssh [INSTANCE_NAME]`

  (This is amazing because it handles SSH keys for you automatically. No need to manage `.pem` files like in AWS.)

**C. Dataproc (Spark/Hadoop)**

* List clusters:

  `gcloud dataproc clusters list --region=us-central1`
* Submit a Spark job:

  `gcloud dataproc jobs submit spark --cluster=[CLUSTER_NAME] --region=[REGION] --jar=[YOUR_JAR_FILE]`

#### The "Safety Net" Flags

If you are ever confused, use these flags at the end of any command:

* `--help`: Prints the manual for that specific command.
  * *Example:* `gcloud compute instances create --help`
* `--dry-run`: (Available on some commands) Pretends to run the command to check for errors but doesn't actually create resources (saving you money).
* `--format=json`: Outputs the result in JSON. This is critical when you are writing Python scripts to read the output of the CLI.

#### Pro Tip: Cloud Shell

You don't actually need to install this on your laptop yet.

1. Go to the GCP Console (website).
2. Click the >\_ icon in the top right.
3. This opens a terminal at the bottom of your screen with `gcloud` pre-installed and authorized. It is the fastest way to practice.

***

### Cheatsheet

I have organized this by the "lifecycle" of a typical data project: Authentication -> Storage -> Compute -> Big Data. You can save this as a reference.

#### Setup & Auth (The Basics)

*You must run these before doing anything else.*

| **Command**                              | **Description**                                                         |
| ---------------------------------------- | ----------------------------------------------------------------------- |
| `gcloud auth login`                      | Opens browser to log in with your Google account.                       |
| `gcloud init`                            | Interactive setup to pick your default Project, Region, and Zone.       |
| `gcloud config set project [PROJECT_ID]` | Switch active project (critical when working on multiple environments). |
| `gcloud config list`                     | "Who am I?" - Shows current active user and project settings.           |

***

#### Storage (GCS) - `gcloud storage`

*Used for: Data Lakes, uploading CSVs, downloading logs.*

| **Command**                                 | **Description**                            |
| ------------------------------------------- | ------------------------------------------ |
| `gcloud storage ls`                         | List all buckets in your project.          |
| `gcloud storage ls gs://[BUCKET_NAME]/`     | List files inside a specific bucket.       |
| `gcloud storage mb gs://[BUCKET_NAME]`      | Make Bucket. Creates a new bucket.         |
| `gcloud storage cp [LOCAL_FILE] gs://[DST]` | Copy (Upload) a file from laptop to cloud. |
| `gcloud storage cp -r [FOLDER] gs://[DST]`  | Recursive Copy (Upload an entire folder).  |
| `gcloud storage rm gs://[BUCKET]/[FILE]`    | Remove (Delete) a file.                    |

***

#### Compute Engine (VMs) - `gcloud compute`

*Used for: Hosting databases, running scripts, dev environments.*

| **Command**                              | **Description**                                                               |
| ---------------------------------------- | ----------------------------------------------------------------------------- |
| `gcloud compute instances list`          | Show all VMs (running and stopped).                                           |
| `gcloud compute instances create [NAME]` | Launch a new default Linux server.                                            |
| `gcloud compute instances start [NAME]`  | Turn a stopped server back on.                                                |
| `gcloud compute instances stop [NAME]`   | Turn a server off (stops compute charges, keeps data).                        |
| `gcloud compute ssh [NAME]`              | The Magic Command. Log into the server terminal (handles keys automatically). |

***

#### Dataproc (Spark/Hadoop) - `gcloud dataproc`

*Used for: Heavy data processing, running Spark jobs.*

| **Command**                                                | **Description**                                   |
| ---------------------------------------------------------- | ------------------------------------------------- |
| `gcloud dataproc clusters list --region=[REGION]`          | See active Spark clusters.                        |
| `gcloud dataproc clusters create [NAME] --region=[REGION]` | Spin up a standard cluster (Master + 2 Workers).  |
| `gcloud dataproc jobs submit spark ...`                    | Submit a job to the cluster (see template below). |
| `gcloud dataproc clusters delete [NAME] --region=[REGION]` | Tear Down. Deletes the cluster to stop billing.   |

Template for Submitting a Spark Job:

Bash

```
gcloud dataproc jobs submit spark \
    --cluster=[CLUSTER_NAME] \
    --region=[REGION] \
    --jars=gs://[BUCKET]/my-job.jar \
    -- [ARGUMENTS_FOR_YOUR_CODE]
```

***

#### Useful Flags (Append these to any command)

* `--help` : Detailed manual for that command (e.g., `gcloud compute instances create --help`).
* `--project=[ID]` : Run this *one* command on a different project without switching config.
* `--format="json"` : Output data as JSON (great for Python scripts parsing output).
* `--format="table"` : Force output to be a readable ASCII table.

#### Pro Tip: Autocomplete

If you are using Cloud Shell, you can type `gcloud comp` and hit TAB, and it will finish the word to `compute` for you. It works for flags and resource names too!

***