How to run the three python scripts with a virtual enviroment

Fist of all, you need to have pip installed. Pip is a package manager for installing and managing Python software packages

sudo apt install python3-pip

Then you will like to create a virtual enviroment to run the scripts, for this, follow the next steps:

1. After you’ve downloaded the contents of this repository, open your terminal and change directories until you are located inside the ‘2ndAssigment’ directory (you’ll need to find the ‘AIandOpenScienceInResearchSoftwareEngineering-main’ directory wherever you dowloaded this repository first because the ‘2ndAssigment’ directory is inside of it).

For example:

terminalone:

  1. type the next command to install virtual enviroments:

python3 -m pip install --user virtualenv
  1. Set up the virtual enviroment (give it any name you like, in this next command the name is ‘test’):

python3 -m venv test

terminaltwo:

  1. Activate your enviroment:

source test/bin/activate

Now your terminal should look similar to the the one on the next figure:

terminalthree:

  1. You’ll need to install the next libraries inside the virtual enviroment, so after you see the parenthesis in your console with the name of your enviroment, just type the commands shown for each of the libraries requiered listed next, or to make this step easier and faster just type the next command in your terminal (you should be inside the ‘2ndAssigment’ directory):

pip install -r requirements.txt

Installing the libraries inside the enviroment should looks like this:

terminalfour:

If you want to install one by one, run the next commands: - Installing BeautifulSoup

pip install beautifulsoup4
  • Installing lxml

pip install lxml
  • Installing Image

pip install image
  • Installing NumPy

pip install numpy
  • Installing WordCloud

pip install wordcloud

How to get your xmls from your own pdfs

  1. Make sure you have GROBID server running, you can do this using Docker (make sure you have it installed), and the run the next commands in another terminal:

  • Pull the GROBID image:

docker pull lfoppiano/grobid:0.7.2
  • Run the image:

docker run -t --rm -p 8070:8070 lfoppiano/grobid:0.7.2

The web service will be running in http://localhost:8070/

Your terminal will likely look like the one on the next figure when the GROBID server is running:

On your terminal it can look like this when GROBID server is running:

  1. Now copy and paste your pdfs inside the directory called ‘Papers’ that’s inside the directory ‘2ndAssigment’ that you downloaded when downloading this repository.

    papers:

  1. Now if you are not inside the ‘2ndAssigment’ directory in your terminal already, change directories until you are inside the ‘./AIandOpenScienceInResearchSoftwareEngineering-main/2ndAssigment’ directory and run the next command:

python request_grobid.py

This runs a Python script that’ll connect to the GROBID server and download all of the XML of the pdfs inside the ‘Papers’ directory, it might take a few minutes. NOTE: if the grobid server it’s taking long just stop the container and start it again by running the prior command once more.

After running the python script, your terminal should look like this one on the next figure, it won’t finish executing until all of the pdfs inside the ‘Papers’ directory have been turned into XMLs:

grobid2:

  1. Check the ‘Papers’ directory to make sure that all of the xmls are there and stop the docker container, after that, you can run each of the python scripts for each task, mentioned on the next section. The directory should look similar to this now:

    xml: