feat: add utility function (and/or data) for URL datasets if necessary

# A Light Discussion about Dataset Choices for URL (at least)
Besides a small subset of (m)C4, I prefer finding intersections among metadata (URL at least), promptsource, and evaluation WGs.
- [ ] TyDi QA (primary task) is probably the only common dataset

For either one of two WGs excluding us metadata here,
- From evaluation
  - [ ] https://www.kaggle.com/rmisra/news-category-dataset
  * GEM from eval WG, specifically
    - [ ] MLSum
    - [ ] WikiLingua
- From promptsource
  - [ ] app_reviews: although not really URL/URI but basically namespace and date
  - [ ] CC-News: virtually a subset of C4
  - Probably some more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add utility function (and/or data) for URL datasets if necessary #20

A Light Discussion about Dataset Choices for URL (at least)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: add utility function (and/or data) for URL datasets if necessary #20

Description

A Light Discussion about Dataset Choices for URL (at least)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions