XComs should be used to pass small amounts of data between tasks.
Airflow supports JSON serialization, as well as Pandas dataframe serialization in version 2.6 and later
What if the data you need to pass is a little bit larger, for example a small dataframe? The best way to manage this use case is to use intermediary data storage. This means saving your data to some system external to Airflow at the end of one task, then reading it in from that system in the next task.
Airflow is meant to be an orchestrator, not an execution framework. If your data is very large, it is probably a good idea to complete any processing using a framework like Spark
What do you do when one of your downstream tasks requires metadata about an upstream task, or processes the results of the task immediately before it?
XComs are one method of passing data between tasks, but they are only appropriate for small amounts of data.
Large data sets require a method making use of intermediate storage and possibly utilizing an external processing framework
Glasp is a social web highlighter that people can highlight and organize quotes and thoughts from the web, and access other like-minded people’s learning.