How to run experiments?

To upload the datasets that have been used for the join game experiment, you just need to run the "data_uploader.bash" file with the updated database_name, username and port of your PostgreSQL database. You would also need to update the path of the file location that contains all the table data. The path from the data directory is present, any path that preceeds it, needs to be updated.

=====================================

Installation steps for PostgreSQL:

git clone https://github.com/OSU-IDEA-Lab/Join-Game.git
Create a newdirectory called "executables" outside of the Join-game directory. Replace everything in quotations with the path asked along with the quotations.
./configure --prefix="/path/to/executables/directory" --enable-depend --enable-assert --enable-debug
make
make install
export PATH="/path/to/executables/directory"/bin:$PATH
export PGDATA=DemoDir
initdb
In the below command replace portNumber with your port number of choice
"/path/to/executables/directory"/bin/pg_ctl -D "path/to/Join-Game"/DemoDir -o "-p portNumber" -l logfile start
psql -p portNumber template1
replace databaseName by the name of your database
create database databaseName;

\q
"/path/to/executables/directory"/bin/pg_ctl -D "path/to/Join-Game"/DemoDir -o "-p 1997" -l logfile stop
lsof -i :portNumber
psql -p portNumber databaseName

=====================================

To run similarity joins, follow the below steps:

make world
make install-world
restart the postgres server
connect to psql
CREATE EXTENSION fuzzystrmatch;
Now, you can run the similarity joins.

=====================================

Data uploading to your PostgreSQL database.:

Compiling Code

Linux User
Run make -f makefile_linux.original. You will see dbgen and dists.dss files.

These two will be used for TPC-H data generation.

Preparing TPC-H dataset

The command ./dbgen -h shows list of options.

Inside dbgen folder, run the below command.
./dbgen -s 10.0 -z 0 The above command will create z = 0 data with 10GB size data.
Do the below command to see the first few rows of a table.
head customer.tbl If you observe carefully, each tuple will have a | symbol at the end.
For query loading and processing, the | at the end is not required. So, we need to remove this in the next step.
Remove the "|" at the end of the tuple in all the tables.
sed -i 's/|$//' *.tbl

After running the above command, we can see that | is removed for all the tuples in all the tables.

You can also run the script remove.sh for multiple tables.

Shuffle each table as below. If you do not shuffle, the tuples will be in sorted order.
Run the script shuffle_tables.sh for multiple tables.

Preparing Similarity Join dataset

To upload the Cars, WDC and Movies datasets, just run the data_uploader.bash file.
You can download the above datasets at the below links.
1. Cars Dataset: parking_tickets table: http://www.kaggle.com/datasets/new-york-city/nyc-parking-tickets and Car_brands table: http://www.back4app.com/database/back4app/car-make-model-dataset
  Query used: SELECT Car_brands1.make, parking_tickets1.vehicle_make FROM Car_brands1 JOIN parking_tickets1 ON levenshtein(trim(Car_brands1.make::varchar(10)), trim(parking_tickets1.vehicle_make::varchar(10))) <= (1, 2 and 3);
2. WDC Dataset: webdatacommons.org/largescaleproductcorpus/v2 You can divide the data into two separate tables to join them as WDC1 and WDC2
  Query Used: SELECT wdc1Brands.brand, wdc2Brands.brand FROM wdc1Brands JOIN wdc2Brands ON levenshtein(trim(wdc1Brands.brand::varchar(10)), trim(wdc2Brands.brand::varchar(10))) <= (1, 2 and 3);
3. Movies Dataset: IMDB table: https://developer.imdb.com/non-commercial-datasets/ and OMDB table: https://www.omdbapi.com/
  Query Used: EXPLAIN ANALYSE SELECT imdb.title, omdbMovies.title FROM imdb JOIN omdbMovies ON levenshtein(trim(imdb.title::varchar(50)), trim(omdbMovies.title::varchar(50))) <= (9, 10 and 11);
Open the data_uploader.bash file and then replace "database_name" "username" "port" with your database_name, username and port. Also, you would need to open the python files mentioned in the bash file and then replace the file path of the data files for all these tables with their correct file paths.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
TPC-H/PSQL-tpch-lab		TPC-H/PSQL-tpch-lab
config		config
contrib		contrib
doc		doc
my_code		my_code
src		src
.dir-locals.el		.dir-locals.el
.gitattributes		.gitattributes
.gitignore		.gitignore
COPYRIGHT		COPYRIGHT
Cars_Upload.py		Cars_Upload.py
GNUmakefile.in		GNUmakefile.in
GRAuthorsUpload.py		GRAuthorsUpload.py
GRbooksUpload.py		GRbooksUpload.py
HISTORY		HISTORY
IMDBOMDBUpload.py		IMDBOMDBUpload.py
IMDBUpload.py		IMDBUpload.py
INSTALL		INSTALL
Makefile		Makefile
README		README
README.md		README.md
WDCUpload.py		WDCUpload.py
aclocal.m4		aclocal.m4
configure		configure
configure.in		configure.in
data_uploader.bash		data_uploader.bash
database_copy.py		database_copy.py
dataset_shuffler_ABR.py		dataset_shuffler_ABR.py
dataset_shuffler_Cars.py		dataset_shuffler_Cars.py
dataset_shuffler_GRBA.py		dataset_shuffler_GRBA.py
dataset_shuffler_Spotify.py		dataset_shuffler_Spotify.py
dataset_shuffler_WDC.py		dataset_shuffler_WDC.py
dataset_shuffler_movies.py		dataset_shuffler_movies.py
dateProcessor.py		dateProcessor.py
debugger.py		debugger.py
distributionChecker.py		distributionChecker.py
json_filtering.py		json_filtering.py
loop.bash		loop.bash
loop2.sh		loop2.sh
new_script_1gig_osl_q10.py		new_script_1gig_osl_q10.py
new_script_1gig_osl_q12.py		new_script_1gig_osl_q12.py
new_script_1gig_osl_q2.py		new_script_1gig_osl_q2.py
queryRunner.py		queryRunner.py
random_shuffler.py		random_shuffler.py
script_full_1gig_load.py		script_full_1gig_load.py
similarity_ABR.py		similarity_ABR.py
similarity_Cars.py		similarity_Cars.py
similarity_GRBR.py		similarity_GRBR.py
similarity_Spotify.py		similarity_Spotify.py
similarity_WDC.py		similarity_WDC.py
similarity_averager.py		similarity_averager.py
similarity_lv1_cord19.py		similarity_lv1_cord19.py
similarity_movies.py		similarity_movies.py
spotifyUpload.py		spotifyUpload.py
wdc.py		wdc.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How to run experiments?

Installation steps for PostgreSQL:

Data uploading to your PostgreSQL database.:

Compiling Code

Preparing TPC-H dataset

Preparing Similarity Join dataset

About

Releases

Packages

Contributors 3

Languages

OSU-IDEA-Lab/Join-Game

Folders and files

Latest commit

History

Repository files navigation

How to run experiments?

Installation steps for PostgreSQL:

Data uploading to your PostgreSQL database.:

Compiling Code

Preparing TPC-H dataset

Preparing Similarity Join dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages