towardsdatascience.com/demystifying-the-parquet-file-format-13adb0206705
1 Users
0 Comments
2 Highlights
0 Notes
Tags
Top Highlights
Demystifying the Parquet File Format
Technical TLDR Apache parquet is an open-source file format that provides efficient storage and fast read speed. It uses a hybrid storage format which sequentially stores chunks of columns, lending to high performance when selecting and filtering data. On top of strong compression algorithm support (snappy, gzip, LZO), it also provides some clever tricks for reducing file scans and encoding repeat variables.
Glasp is a social web highlighter that people can highlight and organize quotes and thoughts from the web, and access other like-minded people’s learning.