README.md
Movies Datasets
This xet contains movie datasets from a few different places.
data/movies.csv
This dataset comes from Kaggle and then has been augmented using the code the Jupyter Notebook in the code subdirectory.
The notebook augments the Kaggle dataset by getting the 'plot outline' from IMDB and adding that to the 'plot' column in the dataset.
code
This directory contains a Jupyter Notebook used to take the Kaggle dataset and augment it with IMDB data containing movie plots.
data/imdb - IMDB Datasets
These datasets are rooted at data/imdb
2023-03-31
Official IMDB data current as of this date.
These files were downloaded from IMDB, specifically from here.
Then, these files were ingested into a sqlite database using this python package, resulting in imdb.db.
2017-frozen
These files were downloaded from an FTP mirror described here. These files were not ingested into a relational database (attempts to ingest into SQLite failed).
The formatting of these files is very different, so using a python package like this will consuming these files much easier.
Kaggle
This is a single Kaggle user's movie dataset, found here. It was presented as two files, and they are simply concatenated into movies.csv. This file is the initial file for this dataset.
File List | Total items: 4 | ||
---|---|---|---|
Name | Last Commit | Size | Last Modified |
code | |||
data | |||
.gitattributes | |||
README.md |
About
Playing around & building movie dataset.