Refreshing Data

recover_partitions(): Recovers all the partitions in the directory of a table and update the catalog. This only works for partitioned tables and not un-partitioned tables or views.
refresh_by_path(): Invalidates and refreshes all the cached data (and the associated metadata) for any Dataset that contains the given data source path. Path matching is by prefix, i.e. "/" would invalidate everything that is cached.
refresh_table(): Invalidates and refreshes all the cached data and metadata of the given table. For performance reasons, Spark SQL or the external data source library it uses might cache certain metadata about a table, such as the location of blocks. When those change outside of Spark SQL, users should call this function to invalidate the cache. If this table is cached as an InMemoryRelation, drop the original cached version and make the new version cached lazily.

Usage

recover_partitions(sc, table)

refresh_by_path(sc, path)

refresh_table(sc, table)

NULL, invisibly. These functions are mostly called for their side effects.