Glossary of terms

Glossary of miscellaneous topics


Many programming concepts and their descriptions: https://dev.to/chhunneng/100-computer-science-concepts-you-should-know-2pgkarrow-up-right

Many DE terms: https://dagster.io/glossaryarrow-up-right


Two Generals problem

About Two Generals problem: https://linuxblog.io/the-two-generals-problemarrow-up-right


Race condition

A race condition in programming occurs when multiple threads or processes concurrently access and modify shared resources, and the final outcome depends on the unpredictable timing and order of these operations. This can lead to non-deterministic behavior, where the program's output varies between executions even with the same input, making such bugs difficult to reproduce and debug. Key characteristics of race conditions:

  • Shared Resources: The presence of a shared resource (e.g., a variable, data structure, file, or database) that multiple threads or processes can access.

  • Concurrent Access: Multiple threads or processes attempt to access or modify the shared resource simultaneously.

  • Unpredictable Timing: The relative timing of these accesses is not guaranteed, and the operating system or runtime environment can schedule threads in an arbitrary order.

  • Non-deterministic Outcome: The final state of the shared resource and the program's behavior can vary depending on the precise order of operations, leading to incorrect or unexpected results.

Example: Consider a shared counter being incremented by multiple threads. If two threads read the current value of the counter, both increment it, and then both write the new value back, the final value might be incorrect if the writes overlap in an unfavorable way. For instance, if the counter is 10, both threads read 10, Thread A writes 11, and then Thread B writes 11, the final value is 11 instead of the expected 12. Mitigation techniques: To prevent race conditions, synchronization mechanisms are employed to ensure that only one thread or process can access a shared resource at a time, or that operations on shared resources are atomic:

  • Locks/Mutexes: Provide exclusive access to a critical section of code, ensuring only one thread can execute it at a time.

  • Semaphores: Control access to a limited number of resources, allowing a specified number of threads to proceed concurrently.

  • Atomic Operations: Use hardware-supported atomic instructions for simple operations like increments or decrements, guaranteeing they are indivisible.

  • Critical Sections: Identify and protect code blocks that access shared resources, ensuring mutual exclusion.


CQRS

CQRS (Command Query Responsibility Segregation) : is an architectural pattern that separates an application's write operations (commands) from its read operations (queries). This separation allows for different data models, scaling strategies, and data stores to be used for each, leading to improved performance, scalability, and flexibility, especially in complex and high-performance systems.

  • Database Design Implications: While CQRS doesn't dictate a specific database technology, it often leads to the use of different database designs or even entirely separate databases optimized for either reads or writes.

    • Write-optimized databases (Command side): These often prioritize data integrity, transactional consistency, and normalized schemas to facilitate updates and prevent data duplication. Relational databases are a common choice here.

    • Read-optimized databases (Query side): These might employ denormalized schemas, materialized views, or even different database types (e.g., NoSQL databases) to achieve high read performance and cater to specific query patterns.

  • Flexibility in Database Choices: CQRS allows for the use of different database technologies for the command and query sides, enabling you to choose the best tool for each specific need (e.g., a relational database for commands and a document database for queries).

In essence, CQRS is a higher-level architectural pattern that informs and guides database design decisions to achieve optimized performance and scalability for both read and write operations.


RBAC - Role-Based Access Control

chevron-rightExplanationhashtag

🔐 RBAC (Role-Based Access Control)

The Core Concept: Instead of giving permission to specific people, you give permissions to specific job titles (Roles). People are then assigned those titles.

The Analogy: The Hospital Badge

Imagine a hospital security system.

  • Without RBAC: You have to program every single door to open for "Dr. Smith," "Nurse Jones," and "Janitor Bob" individually. If Dr. Smith quits, you have to find every door she had access to and remove her.

  • With RBAC: You create a "Doctor" badge. You program the doors to open for anyone holding a "Doctor" badge. When Dr. Smith is hired, you just hand her the badge. If she quits, you take it back. You never touch the door programming.

The Three Pillars

RBAC separates "Who you are" from "What you can do" using a middle layer.

  1. User (Who): The individual person (e.g., alice@company.com).

  2. Role (The Bridge): A label that groups permissions (e.g., Admin, Editor, Viewer).

  3. Permission (What): The specific action allowed (e.g., READ table, DELETE file, EXECUTE query).

The Flow:

User ➔ Assigned to ➔ Role ➔ Has ➔ Permissions

Why use it? (The "Scale" Argument)

  • Efficiency: If you hire 50 new Junior Engineers, you don't assign 50 sets of permissions. You just assign the Junior_Eng role 50 times.

  • Least Privilege: It makes it easier to ensure users only have the access they strictly need for their job function (a core security principle).

  • Auditing: It is easier to answer "Who can delete production data?" by looking at the Admin role than by checking every single user account.

Example: A Database Setup

  • Role A: Data_Analyst

    • Permissions: SELECT on tables. (Can look, but cannot touch).

  • Role B: Data_Engineer

    • Permissions: SELECT, INSERT, UPDATE, CREATE TABLE. (Can build and change things).

  • Scenario: Alice is promoted from Analyst to Engineer.

    • Action: Revoke Data_Analyst role ➔ Grant Data_Engineer role.

    • Result: Her permissions update instantly across the entire system.


Big-O Notation

https://blog.algomaster.io/p/big-o-notation-explained-in-8-minutesarrow-up-right

chevron-right📝 Big O Notation: Crash Coursehashtag

The Core Idea: Big O Notation doesn't tell you the speed in seconds. It tells you how the number of operations grows as the input size (n) grows. It measures the worst-case scenario.

The Analogy: Simple Search vs. Binary Search

Imagine you have a list of 100 items.

  • Simple Search: You check every single item one by one. In the worst case, you check 100 items. If the list doubles to 200, you check 200. This is linear.

  • Binary Search: You split the list in half every time. For 100 items, it takes ~7 steps. If the list doubles to 200, it only takes 1 more step (8 steps). This is logarithmic.

Common Big O Run Times (Fastest to Slowest)

Notation

Name

Analogy / Example

Growth Rate

O(1)

Constant Time

Accessing an array index. It takes the same time regardless of size.

Flat line.

O(log n)

Logarithmic Time

Binary Search. The "Divide and Conquer" approach.

Grows very slowly.

O(n)

Linear Time

Simple Search (Looping through a list). Reading every page of a book.

Grows steadily.

O(n * log n)

Log Linear Time

Quicksort or Mergesort. Fast sorting algorithms.

Slightly steeper than O(n).

O(n²)

Quadratic Time

Selection Sort. Nested loops (a loop inside a loop).

Grows fast. Dangerous for big data.

O(n!)

Factorial Time

The Traveling Salesperson Problem. Calculating every possible route.

Explodes immediately. Impossible for large n.

Image Source: "Grokking Algorithms" book by Aditya Y. Bhargava

The chart above visualizes how different algorithms handle increasing workloads. It contrasts efficient algorithms (represented by the calm "Fast" duck) with inefficient ones (the sweating "Slow" duck).

Key Takeaways

  • Ignore the Constants: Big O focuses on growth. O(2n)O(2n) and O(100n)O(100n) are both just O(n) because the curve shape is the same.

  • Worst-Case Matters: When comparing algorithms, we usually care about the worst-case scenario (e.g., searching for an item that is at the very end of the list).

  • Space Complexity: Algorithms also take up memory. Big O can measure memory usage (space) just like it measures time.

Note: average-case run time is also important, not only worst-case run time.


💡 Visual Mnemonic

  • O(log n) is like flattening a piece of paper by folding it in half repeatedly.

  • O(n) is like reading a book page by page.

  • O(n²) is like a handshake line where everyone shakes hands with everyone else.

Source: "Grokking Algorithms" book by Aditya Y. Bhargava


RFC

RFC process: https://medium.com/juans-and-zeroes/a-thorough-team-guide-to-rfcs-8aa14f8e757carrow-up-right


RACI

https://en.wikipedia.org/wiki/Responsibility_assignment_matrixarrow-up-right


First-class citizens

https://en.wikipedia.org/wiki/First-class_citizenarrow-up-right


CDN - Content Delivery Network


Heredoc

A heredoc (short for "here document") is a way to write multi-line strings in programming without dealing with a bunch of quote marks and escape characters. Think of it as a cleaner way to handle longer blocks of text.

Instead of writing something messy like this:

You can use a heredoc to write it more naturally:

How It Works

The heredoc uses a special marker (the exact syntax depends on the language):

  1. You start with a marker that says "here comes a multi-line string"

  2. You write your content across multiple lines, exactly as you want it to appear

  3. You end with a closing marker

In Shell (Bash/sh), heredocs use a special syntax with << followed by a delimiter. Here's how it works:

Basic Syntax

The EOF (End Of File) is just a marker - you can use any word you want, but EOF is the most common convention.

Common Uses

Assigning to a variable:

Writing to a file:

Piping to a command:

Useful Variations

Suppress leading tabs (use <<-):

Prevent variable expansion (quote the delimiter):

Allow variable expansion (default behavior):

This prints: Hello, Alice! and your actual home directory path.


TDD - Test Driven Development

https://en.wikipedia.org/wiki/Test-driven_developmentarrow-up-right


Data Residency and Data Sovereignty

https://www.splunk.com/en_us/blog/learn/data-sovereignty-vs-data-residency.htmlarrow-up-right



Last updated