Notes
- HDFS file system integration bigdata
- Types of file storage
- Object Storage
Andy Pavlo CMU Notes
- Remove mmap function (OS responsible), avoid using it.
- DB heap is an unordered collection of pages, where tuples are stored in random order. two ways - linked list,page directory (better) similar to tlb buffer.
- Page directory is in main memory, big the size of the page lesser the records in page directory,think as tlb- translation look aside buffer of OS
- Diff fragmentation techniques
- Denormalize tuple data. Inline or embed directly tuples together from two diff tables linked via foreign key.Packing them together.Proto buf API. Alkaban 10 yrs sold mysql storage engine with automatic denormalization.
- Slotted tuples methods and log structured model for data-tuple organization in files
- LevelDB:google -> Rocksdb:facebook -> base of all golang dbs (all uses log structured model for storing tuples)
- Fixed point decimals representation - IEEE 754 standard
- Overflow pages for storing large values, sometimes in external data
- Storing thumbnails in db can be faster sometimes rather than traditional approach
- Microsoft research blob vs files for big data
Postgres -
- EXPLAIN ANALYZE CLAUSE
- DUMP Clause
- Time calc
- Andy's terminal
- Decimals and real nos not the same
- Varchar, varbinary, text, blob are mostly the same [header[with length and metadata],sequence of data bytes]
Topic to write on
- Talking to databases 1
- OLAP vs OLTP which one to go with.. 3
- Cloud pricing models 2
- Practical hashing 4
- Do you really need large storage space.
- Building your own database (new hardware)
Project Topic to write on
- Intrashare 1
- Aakshar 2
- Vector 3
- Replica (at scale) 4
- Offline Verification