Scrapers

Base

class scrapers.base.BaseScraper(teiid_user=None, teiid_password=None, kerberos=False, neo4j_user='neo4j', neo4j_password='neo4j', neo4j_server='localhost')[source]

Base scraper class to standardize the main scraper functionality.

default_since = '2019-03-03'
default_until = '2020-03-03'
is_container_build(build_info)[source]

Check whether a Koji build is a container build.

Parameters:build_info (KojiBuild) – build info from Teiid
Returns:boolean value indicating whether the build is a container build
Return type:bool
is_module_build(build_info)[source]

Check whether a Koji build is a module build.

Parameters:build_info (KojiBuild) – build info from Teiid
Returns:boolean value indicating whether the build is a module build
Return type:bool
run(since=None)[source]

Run the scraper.

Parameters:since (str) – a datetime to start scraping data from
Raises:NotImplementedError – if the function is not overridden
teiid_host = 'virtualdb.engineering.redhat.com'
teiid_port = 5432

Bugzilla

class scrapers.bugzilla.BugzillaScraper(teiid_user=None, teiid_password=None, kerberos=False, neo4j_user='neo4j', neo4j_password='neo4j', neo4j_server='localhost')[source]

Scrapes the Bugzilla tables in Teiid.

create_user_node(email)[source]

Create a User node in Neo4j.

Parameters:email (str) – the user’s email
Returns:User object
get_bugzilla_bugs(start_date, end_date)[source]

Get the Buzilla bugs information from Teiid.

Parameters:
Returns:

list of dictionaries containing bug info

Return type:

list

run(since=None, until=None)[source]

Run the Bugzilla scraper.

Parameters:
  • since (str) – a datetime to start scraping data from
  • until (str) – a datetime to scrape data until
update_neo4j(bugs)[source]

Update Neo4j with Bugzilla bugs information from Teiid.

Parameters:bugs (list) – a list of dictionaries

DistGit

class scrapers.distgit.DistGitScraper(teiid_user=None, teiid_password=None, kerberos=False, neo4j_user='neo4j', neo4j_password='neo4j', neo4j_server='localhost')[source]

Scrapes the GitBZ tables in Teiid.

get_distgit_data(since, until)[source]

Query Teiid for the dist-git commit and Bugzilla information.

Parameters:
Returns:

a list of dictionaries

Return type:

list

run(since=None, until=None)[source]

Run the dist-git scraper.

Parameters:
  • since (str) – a datetime to start scraping data from
  • until (str) – a datetime to scrape data until

Errata

class scrapers.errata.ErrataScraper(teiid_user=None, teiid_password=None, kerberos=False, neo4j_user='neo4j', neo4j_password='neo4j', neo4j_server='localhost')[source]

Scrapes the Errata Tool tables in Teiid.

get_advisories(since, until)[source]

Query Teiid for the Errata Tool advisories.

Parameters:
Returns:

a list of dictionaries

Return type:

list

get_associated_builds(advisory_id)[source]

Query Teiid to find the Brew builds associated with a specific advisory.

Parameters:advisory_id (int) – the advisory ID
Returns:a list of a dictionaries
Return type:list
get_attached_bugs(advisory_id)[source]

Query Teiid to find the Bugzilla bugs attached to a specific advisory.

Parameters:advisory_id (int) – the advisory ID
Returns:a list of a dictionaries
Return type:list
get_koji_build(build_id)[source]

Query Teiid to find the Koji build attached to a specific advisory.

Parameters:build_id (int) – the build ID
Returns:a list of a dictionaries
Return type:list
run(since=None, until=None)[source]

Run the Errata Tool scraper.

Parameters:
  • since (str) – a datetime to start scraping data from
  • until (str) – a datetime to scrape data until
update_neo4j(advisories)[source]

Update Neo4j with Errata Tool advisories from Teiid.

Parameters:advisories (list) – a list of dictionaries of advisories

Freshmaker

class scrapers.freshmaker.FreshmakerScraper(teiid_user=None, teiid_password=None, kerberos=False, neo4j_user='neo4j', neo4j_password='neo4j', neo4j_server='localhost')[source]

Scrapes the Freshmaker API.

freshmaker_url = 'https://freshmaker.engineering.redhat.com/api/2/events/?per_page=50'
get_koji_task_result(task_id)[source]

Query Teiid for a Koji task’s result attribute.

Parameters:task_id (int) – the Koji task ID to query
Returns:an XML string
Return type:str
query_api_and_update_neo4j()[source]

Scrape the Freshmaker API and upload the data to Neo4j.

Parameters:start_date (str) – a datetime to start scraping data from
run(since=None, until=None)[source]

Run the Freshmaker scraper.

Parameters:
  • since (str) – a datetime to start scraping data from
  • until (str) – a datetime to scrape data until

Koji

class scrapers.koji.KojiScraper(teiid_user=None, teiid_password=None, kerberos=False, neo4j_user='neo4j', neo4j_password='neo4j', neo4j_server='localhost')[source]

Scrapes the Koji tables in Teiid.

get_build_info(build_ids)[source]

Query Teiid for build info.

Parameters:build_ids (list) – ID’s of Koji builds
Returns:a list of dictionaries
Return type:list
get_build_tags(build_id)[source]

Query Teiid for all tags a build is tagged in.

Parameters:build_id (int) – the Koji build’s ID
Returns:a list of dictionaries
Return type:list
get_koji_builds(start_date, end_date)[source]

Query Teiid for Koji builds.

Parameters:
Returns:

a list of dictionaries

Return type:

list

get_tag_info(tag_name)[source]

Query Teiid for tag_id of a tag and build_ids associated to it.

Parameters:tag_name (str) – tag name
Returns:a list of dictionaries
Return type:list
get_task(task_id)[source]

Query Teiid for a Koji task.

Parameters:task_id (int) – the Koji task ID to query
Returns:a list of dictionaries
Return type:list
run(since=None, until=None)[source]

Run the Koji scraper.

Parameters:
  • since (str) – a datetime to start scraping data from
  • until (str) – a datetime to scrape data until
update_neo4j(builds)[source]

Update Neo4j with Koji build information from Teiid.

Parameters:builds (list) – a list of dictionaries

Teiid

class scrapers.teiid.Teiid(host, port, username, password)[source]

Abstracts interfacing with Teiid to simplify connections and queries.

get_connection(db_name, force_new=False, retry=None)[source]

Return an existing psycopg2 connection and establish it if needed.

Parameters:
  • db_name (str) – the database name to get a connection to
  • force_new (bool) – forces a new database connection even if one already exists
  • retry (int) – the number of times to retry a failed connection. If this is not set, then the Teiid connection attempt will be repeated until it is successful.
Returns:

a connection to Teiid

Return type:

psycopg2 connection

query(sql, db='public', retry=None)[source]

Send the SQL query to Teiid and return the rows as a list.

Parameters:
  • sql (str) – the SQL query to send to the database
  • db (str) – the database name to query on
  • retry (int) – the number of times to retry a failed query. If this is not set, then the Teiid query will be repeated until it is successful.
Returns:

a list of rows from Teiid. Each row is a dictionary with the column headers as the keys.

Return type:

list

Utils

scrapers.utils.retry_session()[source]

Create a python-requests session that retries on connection failures.

Returns:a configured session object
Return type:requests.Session