Looking for advice: I want to experiment with some data repository functionality and to do this I need some public-domain (e.g. CC0) & heterogeneous datasets to ingest for testing. Mainly I'm looking for things like:

- datasets - various file formats & data structures
- text (may try Proj. Gutenburg for this?)
- documents/papers
- images

I don't need huge quantities or data volumes - just enough to play with some processes.

Can anyone suggest some useful collections I can "borrow"?