Scrapers

Base

class scrapers.base.BaseScraper(teiid_user=None, teiid_password=None, kerberos=False, neo4j_user='neo4j', neo4j_password='neo4j', neo4j_server='localhost', neo4j_scheme='bolt')[source]

Base scraper class to standardize the main scraper functionality.

default_since = '2020-05-18'
default_until = '2021-05-19'
is_container_build(build_info)[source]

Check whether a Koji build is a container build.

Parameters

build_info (KojiBuild) – build info from Teiid

Returns

boolean value indicating whether the build is a container build

Return type

bool

is_module_build(build_info)[source]

Check whether a Koji build is a module build.

Parameters

build_info (KojiBuild) – build info from Teiid

Returns

boolean value indicating whether the build is a module build

Return type

bool

run(since=None)[source]

Run the scraper.

Parameters

since (str) – a datetime to start scraping data from

Raises

NotImplementedError – if the function is not overridden

teiid_host = 'virtualdb.engineering.redhat.com'
teiid_port = 5432

Bugzilla

class scrapers.bugzilla.BugzillaScraper(teiid_user=None, teiid_password=None, kerberos=False, neo4j_user='neo4j', neo4j_password='neo4j', neo4j_server='localhost', neo4j_scheme='bolt')[source]

Scrapes the Bugzilla tables in Teiid.

create_user_node(email)[source]

Create a User node in Neo4j.

Parameters

email (str) – the user’s email

Returns

User object

get_bugzilla_bugs(start_date, end_date)[source]

Get the Buzilla bugs information from Teiid.

Parameters
Returns

list of dictionaries containing bug info

Return type

list

run(since=None, until=None)[source]

Run the Bugzilla scraper.

Parameters
  • since (str) – a datetime to start scraping data from

  • until (str) – a datetime to scrape data until

update_neo4j(bugs)[source]

Update Neo4j with Bugzilla bugs information from Teiid.

Parameters

bugs (list) – a list of dictionaries

DistGit

class scrapers.distgit.DistGitScraper(teiid_user=None, teiid_password=None, kerberos=False, neo4j_user='neo4j', neo4j_password='neo4j', neo4j_server='localhost', neo4j_scheme='bolt')[source]

Scrapes the GitBZ tables in Teiid.

get_distgit_data(since, until)[source]

Query Teiid for the dist-git commit and Bugzilla information.

Parameters
Returns

a list of dictionaries

Return type

list

run(since=None, until=None)[source]

Run the dist-git scraper.

Parameters
  • since (str) – a datetime to start scraping data from

  • until (str) – a datetime to scrape data until

Errata

class scrapers.errata.ErrataScraper(teiid_user=None, teiid_password=None, kerberos=False, neo4j_user='neo4j', neo4j_password='neo4j', neo4j_server='localhost', neo4j_scheme='bolt')[source]

Scrapes the Errata Tool tables in Teiid.

get_advisories(since, until)[source]

Query Teiid for the Errata Tool advisories.

Parameters
Returns

a list of dictionaries

Return type

list

get_associated_builds(advisory_id)[source]

Query Teiid to find the Brew builds associated with a specific advisory.

Parameters

advisory_id (int) – the advisory ID

Returns

a list of a dictionaries

Return type

list

get_attached_bugs(advisory_id)[source]

Query Teiid to find the Bugzilla bugs attached to a specific advisory.

Parameters

advisory_id (int) – the advisory ID

Returns

a list of a dictionaries

Return type

list

get_koji_build(build_id)[source]

Query Teiid to find the Koji build attached to a specific advisory.

Parameters

build_id (int) – the build ID

Returns

a list of a dictionaries

Return type

list

run(since=None, until=None)[source]

Run the Errata Tool scraper.

Parameters
  • since (str) – a datetime to start scraping data from

  • until (str) – a datetime to scrape data until

update_neo4j(advisories)[source]

Update Neo4j with Errata Tool advisories from Teiid.

Parameters

advisories (list) – a list of dictionaries of advisories

Freshmaker

class scrapers.freshmaker.FreshmakerScraper(teiid_user=None, teiid_password=None, kerberos=False, neo4j_user='neo4j', neo4j_password='neo4j', neo4j_server='localhost', neo4j_scheme='bolt')[source]

Scrapes the Freshmaker API.

freshmaker_url = 'https://freshmaker.engineering.redhat.com/api/2/events/?per_page=50'
get_koji_task_result(task_id)[source]

Query Teiid for a Koji task’s result attribute.

Parameters

task_id (int) – the Koji task ID to query

Returns

an XML string

Return type

str

query_api_and_update_neo4j()[source]

Scrape the Freshmaker API and upload the data to Neo4j.

Parameters

start_date (str) – a datetime to start scraping data from

run(since=None, until=None)[source]

Run the Freshmaker scraper.

Parameters
  • since (str) – a datetime to start scraping data from

  • until (str) – a datetime to scrape data until

Koji

class scrapers.koji.KojiScraper(teiid_user=None, teiid_password=None, kerberos=False, neo4j_user='neo4j', neo4j_password='neo4j', neo4j_server='localhost', neo4j_scheme='bolt')[source]

Scrapes the Koji tables in Teiid.

get_build_info(build_ids)[source]

Query Teiid for build info.

Parameters

build_ids (list) – ID’s of Koji builds

Returns

a list of dictionaries

Return type

list

get_build_tags(build_id)[source]

Query Teiid for all tags a build is tagged in.

Parameters

build_id (int) – the Koji build’s ID

Returns

a list of dictionaries

Return type

list

get_koji_builds(start_date, end_date)[source]

Query Teiid for Koji builds.

Parameters
Returns

a list of dictionaries

Return type

list

get_tag_info(tag_name)[source]

Query Teiid for tag_id of a tag and build_ids associated to it.

Parameters

tag_name (str) – tag name

Returns

a list of dictionaries

Return type

list

get_task(task_id)[source]

Query Teiid for a Koji task.

Parameters

task_id (int) – the Koji task ID to query

Returns

a list of dictionaries

Return type

list

run(since=None, until=None)[source]

Run the Koji scraper.

Parameters
  • since (str) – a datetime to start scraping data from

  • until (str) – a datetime to scrape data until

update_neo4j(builds)[source]

Update Neo4j with Koji build information from Teiid.

Parameters

builds (list) – a list of dictionaries

Teiid

class scrapers.teiid.Teiid(host, port, username, password)[source]

Abstracts interfacing with Teiid to simplify connections and queries.

get_connection(db_name, force_new=False, retry=None)[source]

Return an existing psycopg2 connection and establish it if needed.

Parameters
  • db_name (str) – the database name to get a connection to

  • force_new (bool) – forces a new database connection even if one already exists

  • retry (int) – the number of times to retry a failed connection. If this is not set, then the Teiid connection attempt will be repeated until it is successful.

Returns

a connection to Teiid

Return type

psycopg2 connection

query(sql, db='public', retry=None)[source]

Send the SQL query to Teiid and return the rows as a list.

Parameters
  • sql (str) – the SQL query to send to the database

  • db (str) – the database name to query on

  • retry (int) – the number of times to retry a failed query. If this is not set, then the Teiid query will be repeated until it is successful.

Returns

a list of rows from Teiid. Each row is a dictionary with the column headers as the keys.

Return type

list

Utils

scrapers.utils.retry_session()[source]

Create a python-requests session that retries on connection failures.

Returns

a configured session object

Return type

requests.Session