Scrapers¶

Base¶

class scrapers.base.BaseScraper(teiid_user=None, teiid_password=None, kerberos=False, neo4j_user='neo4j', neo4j_password='neo4j', neo4j_server='localhost', neo4j_scheme='bolt')[source]¶

Base scraper class to standardize the main scraper functionality.

default_since = '2021-12-15'¶

default_until = '2022-12-16'¶

is_container_build(build_info)[source]¶

Check whether a Koji build is a container build.

Parameters: build_info (KojiBuild) – build info from Teiid
Returns: boolean value indicating whether the build is a container build
Return type: bool

is_module_build(build_info)[source]¶

Check whether a Koji build is a module build.

Parameters: build_info (KojiBuild) – build info from Teiid
Returns: boolean value indicating whether the build is a module build
Return type: bool

run(since=None)[source]¶

Run the scraper.

Parameters: since (str) – a datetime to start scraping data from
Raises: NotImplementedError – if the function is not overridden

teiid_host = 'virtualdb.engineering.redhat.com'¶

teiid_port = 5432¶

Bugzilla¶

class scrapers.bugzilla.BugzillaScraper(teiid_user=None, teiid_password=None, kerberos=False, neo4j_user='neo4j', neo4j_password='neo4j', neo4j_server='localhost', neo4j_scheme='bolt')[source]¶

Scrapes the Bugzilla tables in Teiid.

create_user_node(email)[source]¶

Create a User node in Neo4j.

Parameters: email (str) – the user’s email
Returns: User object

get_bugzilla_bugs(start_date, end_date)[source]¶

Get the Buzilla bugs information from Teiid.

Parameters

start_date (datetime.datetime) – when to start scraping data from
end_date (datetime.datetime) – determines until when to scrape data

Returns

list of dictionaries containing bug info

Return type

list

run(since=None, until=None)[source]¶

Run the Bugzilla scraper.

Parameters

since (str) – a datetime to start scraping data from
until (str) – a datetime to scrape data until

update_neo4j(bugs)[source]¶

Update Neo4j with Bugzilla bugs information from Teiid.

Parameters: bugs (list) – a list of dictionaries

DistGit¶

class scrapers.distgit.DistGitScraper(teiid_user=None, teiid_password=None, kerberos=False, neo4j_user='neo4j', neo4j_password='neo4j', neo4j_server='localhost', neo4j_scheme='bolt')[source]¶

Scrapes the GitBZ tables in Teiid.

get_distgit_data(since, until)[source]¶

Query Teiid for the dist-git commit and Bugzilla information.

Parameters

since (datetime.datetime) – determines when to start the query
until (datetime.datetime) – determines until when to scrape data

Returns

a list of dictionaries

Return type

list

run(since=None, until=None)[source]¶

Run the dist-git scraper.

Parameters

since (str) – a datetime to start scraping data from
until (str) – a datetime to scrape data until

Errata¶

class scrapers.errata.ErrataScraper(teiid_user=None, teiid_password=None, kerberos=False, neo4j_user='neo4j', neo4j_password='neo4j', neo4j_server='localhost', neo4j_scheme='bolt')[source]¶

Scrapes the Errata Tool tables in Teiid.

get_advisories(since, until)[source]¶

Query Teiid for the Errata Tool advisories.

Parameters

since (datetime.datetime) – determines when to start querying
until (datetime.datetime) – determines until when to scrape data

Returns

a list of dictionaries

Return type

list

get_associated_builds(advisory_id)[source]¶

Query Teiid to find the Brew builds associated with a specific advisory.

Parameters: advisory_id (int) – the advisory ID
Returns: a list of a dictionaries
Return type: list

get_attached_bugs(advisory_id)[source]¶

Query Teiid to find the Bugzilla bugs attached to a specific advisory.

Parameters: advisory_id (int) – the advisory ID
Returns: a list of a dictionaries
Return type: list

get_koji_build(build_id)[source]¶

Query Teiid to find the Koji build attached to a specific advisory.

Parameters: build_id (int) – the build ID
Returns: a list of a dictionaries
Return type: list

run(since=None, until=None)[source]¶

Run the Errata Tool scraper.

Parameters

since (str) – a datetime to start scraping data from
until (str) – a datetime to scrape data until

update_neo4j(advisories)[source]¶

Update Neo4j with Errata Tool advisories from Teiid.

Parameters: advisories (list) – a list of dictionaries of advisories

Freshmaker¶

class scrapers.freshmaker.FreshmakerScraper(teiid_user=None, teiid_password=None, kerberos=False, neo4j_user='neo4j', neo4j_password='neo4j', neo4j_server='localhost', neo4j_scheme='bolt')[source]¶

Scrapes the Freshmaker API.

freshmaker_url = 'https://freshmaker.engineering.redhat.com/api/2/events/?per_page=50'¶

get_koji_task_result(task_id)[source]¶

Query Teiid for a Koji task’s result attribute.

Parameters: task_id (int) – the Koji task ID to query
Returns: an XML string
Return type: str

query_api_and_update_neo4j()[source]¶

Scrape the Freshmaker API and upload the data to Neo4j.

Parameters: start_date (str) – a datetime to start scraping data from

run(since=None, until=None)[source]¶

Run the Freshmaker scraper.

Parameters

since (str) – a datetime to start scraping data from
until (str) – a datetime to scrape data until

Koji¶

class scrapers.koji.KojiScraper(teiid_user=None, teiid_password=None, kerberos=False, neo4j_user='neo4j', neo4j_password='neo4j', neo4j_server='localhost', neo4j_scheme='bolt')[source]¶

Scrapes the Koji tables in Teiid.

get_build_info(build_ids)[source]¶

Query Teiid for build info.

Parameters: build_ids (list) – ID’s of Koji builds
Returns: a list of dictionaries
Return type: list

get_build_tags(build_id)[source]¶

Query Teiid for all tags a build is tagged in.

Parameters: build_id (int) – the Koji build’s ID
Returns: a list of dictionaries
Return type: list

get_koji_builds(start_date, end_date)[source]¶

Query Teiid for Koji builds.

Parameters

start_date (datetime.datetime) – determines when to start the query
end_date (datetime.datetime) – determines until when to scrape data

Returns

a list of dictionaries

Return type

list

get_tag_info(tag_name)[source]¶

Query Teiid for tag_id of a tag and build_ids associated to it.

Parameters: tag_name (str) – tag name
Returns: a list of dictionaries
Return type: list

get_task(task_id)[source]¶

Query Teiid for a Koji task.

Parameters: task_id (int) – the Koji task ID to query
Returns: a list of dictionaries
Return type: list

run(since=None, until=None)[source]¶

Run the Koji scraper.

Parameters

since (str) – a datetime to start scraping data from
until (str) – a datetime to scrape data until

update_neo4j(builds)[source]¶

Update Neo4j with Koji build information from Teiid.

Parameters: builds (list) – a list of dictionaries

Teiid¶

class scrapers.teiid.Teiid(host, port, username, password)[source]¶

Abstracts interfacing with Teiid to simplify connections and queries.

get_connection(db_name, force_new=False, retry=None)[source]¶

Return an existing psycopg2 connection and establish it if needed.

Parameters

db_name (str) – the database name to get a connection to
force_new (bool) – forces a new database connection even if one already exists
retry (int) – the number of times to retry a failed connection. If this is not set, then the Teiid connection attempt will be repeated until it is successful.

Returns

a connection to Teiid

Return type

psycopg2 connection

query(sql, db='public', retry=None)[source]¶

Send the SQL query to Teiid and return the rows as a list.

Parameters

sql (str) – the SQL query to send to the database
db (str) – the database name to query on
retry (int) – the number of times to retry a failed query. If this is not set, then the Teiid query will be repeated until it is successful.

Returns

a list of rows from Teiid. Each row is a dictionary with the column headers as the keys.

Return type

list

Utils¶

scrapers.utils.retry_session()[source]¶

Create a python-requests session that retries on connection failures.

Returns: a configured session object
Return type: requests.Session