Glossary of terms
Glossary of miscellaneous topics
Many programming concepts and their descriptions: https://dev.to/chhunneng/100-computer-science-concepts-you-should-know-2pgk
Many DE terms: https://dagster.io/glossary
Two Generals problem
About Two Generals problem: https://linuxblog.io/the-two-generals-problem
Race condition
A race condition in programming occurs when multiple threads or processes concurrently access and modify shared resources, and the final outcome depends on the unpredictable timing and order of these operations. This can lead to non-deterministic behavior, where the program's output varies between executions even with the same input, making such bugs difficult to reproduce and debug. Key characteristics of race conditions:
Shared Resources: The presence of a shared resource (e.g., a variable, data structure, file, or database) that multiple threads or processes can access.
Concurrent Access: Multiple threads or processes attempt to access or modify the shared resource simultaneously.
Unpredictable Timing: The relative timing of these accesses is not guaranteed, and the operating system or runtime environment can schedule threads in an arbitrary order.
Non-deterministic Outcome: The final state of the shared resource and the program's behavior can vary depending on the precise order of operations, leading to incorrect or unexpected results.
Example: Consider a shared counter being incremented by multiple threads. If two threads read the current value of the counter, both increment it, and then both write the new value back, the final value might be incorrect if the writes overlap in an unfavorable way. For instance, if the counter is 10, both threads read 10, Thread A writes 11, and then Thread B writes 11, the final value is 11 instead of the expected 12. Mitigation techniques: To prevent race conditions, synchronization mechanisms are employed to ensure that only one thread or process can access a shared resource at a time, or that operations on shared resources are atomic:
Locks/Mutexes: Provide exclusive access to a critical section of code, ensuring only one thread can execute it at a time.
Semaphores: Control access to a limited number of resources, allowing a specified number of threads to proceed concurrently.
Atomic Operations: Use hardware-supported atomic instructions for simple operations like increments or decrements, guaranteeing they are indivisible.
Critical Sections: Identify and protect code blocks that access shared resources, ensuring mutual exclusion.
CQRS
CQRS (Command Query Responsibility Segregation) : is an architectural pattern that separates an application's write operations (commands) from its read operations (queries). This separation allows for different data models, scaling strategies, and data stores to be used for each, leading to improved performance, scalability, and flexibility, especially in complex and high-performance systems.
Database Design Implications: While CQRS doesn't dictate a specific database technology, it often leads to the use of different database designs or even entirely separate databases optimized for either reads or writes.
Write-optimized databases (Command side): These often prioritize data integrity, transactional consistency, and normalized schemas to facilitate updates and prevent data duplication. Relational databases are a common choice here.
Read-optimized databases (Query side): These might employ denormalized schemas, materialized views, or even different database types (e.g., NoSQL databases) to achieve high read performance and cater to specific query patterns.
Flexibility in Database Choices: CQRS allows for the use of different database technologies for the command and query sides, enabling you to choose the best tool for each specific need (e.g., a relational database for commands and a document database for queries).
In essence, CQRS is a higher-level architectural pattern that informs and guides database design decisions to achieve optimized performance and scalability for both read and write operations.
RBAC - Role-Based Access Control
Explanation
🔐 RBAC (Role-Based Access Control)
The Core Concept: Instead of giving permission to specific people, you give permissions to specific job titles (Roles). People are then assigned those titles.
The Analogy: The Hospital Badge
Imagine a hospital security system.
Without RBAC: You have to program every single door to open for "Dr. Smith," "Nurse Jones," and "Janitor Bob" individually. If Dr. Smith quits, you have to find every door she had access to and remove her.
With RBAC: You create a "Doctor" badge. You program the doors to open for anyone holding a "Doctor" badge. When Dr. Smith is hired, you just hand her the badge. If she quits, you take it back. You never touch the door programming.
The Three Pillars
RBAC separates "Who you are" from "What you can do" using a middle layer.
User (Who): The individual person (e.g.,
alice@company.com).Role (The Bridge): A label that groups permissions (e.g.,
Admin,Editor,Viewer).Permission (What): The specific action allowed (e.g.,
READ table,DELETE file,EXECUTE query).
The Flow:
User ➔ Assigned to ➔ Role ➔ Has ➔ Permissions
Why use it? (The "Scale" Argument)
Efficiency: If you hire 50 new Junior Engineers, you don't assign 50 sets of permissions. You just assign the
Junior_Engrole 50 times.Least Privilege: It makes it easier to ensure users only have the access they strictly need for their job function (a core security principle).
Auditing: It is easier to answer "Who can delete production data?" by looking at the
Adminrole than by checking every single user account.
Example: A Database Setup
Role A:
Data_AnalystPermissions:
SELECTon tables. (Can look, but cannot touch).
Role B:
Data_EngineerPermissions:
SELECT,INSERT,UPDATE,CREATE TABLE. (Can build and change things).
Scenario: Alice is promoted from Analyst to Engineer.
Action: Revoke
Data_Analystrole ➔ GrantData_Engineerrole.Result: Her permissions update instantly across the entire system.
Big-O Notation
https://blog.algomaster.io/p/big-o-notation-explained-in-8-minutes
📝 Big O Notation: Crash Course
The Core Idea: Big O Notation doesn't tell you the speed in seconds. It tells you how the number of operations grows as the input size (n) grows. It measures the worst-case scenario.
The Analogy: Simple Search vs. Binary Search
Imagine you have a list of 100 items.
Simple Search: You check every single item one by one. In the worst case, you check 100 items. If the list doubles to 200, you check 200. This is linear.
Binary Search: You split the list in half every time. For 100 items, it takes ~7 steps. If the list doubles to 200, it only takes 1 more step (8 steps). This is logarithmic.
Common Big O Run Times (Fastest to Slowest)
Notation
Name
Analogy / Example
Growth Rate
O(1)
Constant Time
Accessing an array index. It takes the same time regardless of size.
Flat line.
O(log n)
Logarithmic Time
Binary Search. The "Divide and Conquer" approach.
Grows very slowly.
O(n)
Linear Time
Simple Search (Looping through a list). Reading every page of a book.
Grows steadily.
O(n * log n)
Log Linear Time
Quicksort or Mergesort. Fast sorting algorithms.
Slightly steeper than O(n).
O(n²)
Quadratic Time
Selection Sort. Nested loops (a loop inside a loop).
Grows fast. Dangerous for big data.
O(n!)
Factorial Time
The Traveling Salesperson Problem. Calculating every possible route.
Explodes immediately. Impossible for large n.

Image Source: "Grokking Algorithms" book by Aditya Y. Bhargava
The chart above visualizes how different algorithms handle increasing workloads. It contrasts efficient algorithms (represented by the calm "Fast" duck) with inefficient ones (the sweating "Slow" duck).
Key Takeaways
Ignore the Constants: Big O focuses on growth. O(2n) and O(100n) are both just O(n) because the curve shape is the same.
Worst-Case Matters: When comparing algorithms, we usually care about the worst-case scenario (e.g., searching for an item that is at the very end of the list).
Space Complexity: Algorithms also take up memory. Big O can measure memory usage (space) just like it measures time.
Note: average-case run time is also important, not only worst-case run time.
💡 Visual Mnemonic
O(log n) is like flattening a piece of paper by folding it in half repeatedly.
O(n) is like reading a book page by page.
O(n²) is like a handshake line where everyone shakes hands with everyone else.
Source: "Grokking Algorithms" book by Aditya Y. Bhargava
RFC
RFC process: https://medium.com/juans-and-zeroes/a-thorough-team-guide-to-rfcs-8aa14f8e757c
RACI
https://en.wikipedia.org/wiki/Responsibility_assignment_matrix
First-class citizens
https://en.wikipedia.org/wiki/First-class_citizen
CDN - Content Delivery Network
Heredoc
A heredoc (short for "here document") is a way to write multi-line strings in programming without dealing with a bunch of quote marks and escape characters. Think of it as a cleaner way to handle longer blocks of text.
Instead of writing something messy like this:
You can use a heredoc to write it more naturally:
How It Works
The heredoc uses a special marker (the exact syntax depends on the language):
You start with a marker that says "here comes a multi-line string"
You write your content across multiple lines, exactly as you want it to appear
You end with a closing marker
In Shell (Bash/sh), heredocs use a special syntax with << followed by a delimiter. Here's how it works:
Basic Syntax
The EOF (End Of File) is just a marker - you can use any word you want, but EOF is the most common convention.
Common Uses
Assigning to a variable:
Writing to a file:
Piping to a command:
Useful Variations
Suppress leading tabs (use <<-):
Prevent variable expansion (quote the delimiter):
Allow variable expansion (default behavior):
This prints: Hello, Alice! and your actual home directory path.
TDD - Test Driven Development
https://en.wikipedia.org/wiki/Test-driven_development
Data Residency and Data Sovereignty
https://www.splunk.com/en_us/blog/learn/data-sovereignty-vs-data-residency.html
Last updated