1
0
Fork 0

Playing around & building movie dataset.

README.md

Movies Datasets

This xet contains movie datasets from a few different places.

data/movies.csv

This dataset comes from Kaggle and then has been augmented using the code the Jupyter Notebook in the code subdirectory.

The notebook augments the Kaggle dataset by getting the 'plot outline' from IMDB and adding that to the 'plot' column in the dataset.

code

This directory contains a Jupyter Notebook used to take the Kaggle dataset and augment it with IMDB data containing movie plots.

data/imdb - IMDB Datasets

These datasets are rooted at data/imdb

2023-03-31

Official IMDB data current as of this date.

These files were downloaded from IMDB, specifically from here.

Then, these files were ingested into a sqlite database using this python package, resulting in imdb.db.

2017-frozen

These files were downloaded from an FTP mirror described here. These files were not ingested into a relational database (attempts to ingest into SQLite failed).

The formatting of these files is very different, so using a python package like this will consuming these files much easier.

Kaggle

This is a single Kaggle user's movie dataset, found here. It was presented as two files, and they are simply concatenated into movies.csv. This file is the initial file for this dataset.

File List Total items: 4
Name Last Commit Size Last Modified
code Final movies dataset, 35K movies 1 year ago
data Xet repo README 1 year ago
.gitattributes Initial commit 79 B 1 year ago
README.md Xet repo README 1.6 KiB 1 year ago

About

Playing around & building movie dataset.

Repository Size

Loading repo size...

Commits 9 commits

Collaborators

File Types