selenium_scholar.py¶
This module provides classes for querying Google Scholar using selenium and parsing returned results. It currently only processes the first results page. It is not a recursive crawler.
-
class
AddArticleTask(result, article)[source]¶ Bases:
objectTask that adds an article to the result
-
class
ParseTask(result)[source]¶ Bases:
snowballing.scholar.ScholarArticleParser120726Task that parsers articles
-
class
ScholarSettingsTask(pages=10, citform=0, new_window=False, collections=1)[source]¶ Bases:
objectThis class lets you adjust the Scholar settings for your session.
-
CITFORM_BIBTEX= 4¶
-
CITFORM_ENDNOTE= 3¶
-
CITFORM_NONE= 0¶
-
CITFORM_REFMAN= 2¶
-
CITFORM_REFWORKS= 1¶
-
COLLECTIONS_ARTICLES_AND_PATENTS= 1¶
-
COLLECTIONS_ARTICLES_ONLY= 0¶
-
COLLECTIONS_CASE_LAW= 2¶
-
SETTINGS_URL= 'http://scholar.google.com/scholar_settings?hl=en&as_sdt=0,5&sciodt=0,5'¶
-
citform¶
-
collections¶
-
new_window¶
-
per_page_results¶
-
-
class
SearchScholarQuery[source]¶ Bases:
snowballing.scholar.ScholarQueryThis version represents the search query parameters the user can configure on the Scholar website, in the advanced search options.
-
SCHOLAR_QUERY_URL= 'http://scholar.google.com/scholar?'¶
-
get_url()[source]¶ Returns a complete, submittable URL string for this particular query instance. The URL and its arguments will vary depending on the query.
Sets names that must be on the result’s author list.
-
set_scope(title_only)[source]¶ Sets Boolean indicating whether to search entire article or title only.
-
-
class
SeleniumScholarQuerier(driver=None)[source]¶ Bases:
objectScholarQuerier instances can conduct a search on Google Scholar with subsequent parsing of the resulting HTML content. The articles found are collected in the articles member, a list of ScholarArticle instances.
-
class
URLQuery(url, start=None)[source]¶ Bases:
snowballing.scholar.ScholarQueryRepresent a Google Scholar query using a generic query We use it to navigate on the citations