selenium_scholar.py¶
This module provides classes for querying Google Scholar using selenium and parsing returned results. It currently only processes the first results page. It is not a recursive crawler.
- 
class 
AddArticleTask(result, article)[source]¶ Bases:
objectTask that adds an article to the result
- 
class 
ParseTask(result)[source]¶ Bases:
snowballing.scholar.ScholarArticleParser120726Task that parsers articles
- 
class 
ScholarSettingsTask(pages=10, citform=0, new_window=False, collections=1)[source]¶ Bases:
objectThis class lets you adjust the Scholar settings for your session.
- 
CITFORM_BIBTEX= 4¶ 
- 
CITFORM_ENDNOTE= 3¶ 
- 
CITFORM_NONE= 0¶ 
- 
CITFORM_REFMAN= 2¶ 
- 
CITFORM_REFWORKS= 1¶ 
- 
COLLECTIONS_ARTICLES_AND_PATENTS= 1¶ 
- 
COLLECTIONS_ARTICLES_ONLY= 0¶ 
- 
COLLECTIONS_CASE_LAW= 2¶ 
- 
SETTINGS_URL= 'http://scholar.google.com/scholar_settings?hl=en&as_sdt=0,5&sciodt=0,5'¶ 
- 
citform¶ 
- 
collections¶ 
- 
new_window¶ 
- 
per_page_results¶ 
- 
 
- 
class 
SearchScholarQuery[source]¶ Bases:
snowballing.scholar.ScholarQueryThis version represents the search query parameters the user can configure on the Scholar website, in the advanced search options.
- 
SCHOLAR_QUERY_URL= 'http://scholar.google.com/scholar?'¶ 
- 
get_url()[source]¶ Returns a complete, submittable URL string for this particular query instance. The URL and its arguments will vary depending on the query.
Sets names that must be on the result’s author list.
- 
set_scope(title_only)[source]¶ Sets Boolean indicating whether to search entire article or title only.
- 
 
- 
class 
SeleniumScholarQuerier(driver=None)[source]¶ Bases:
objectScholarQuerier instances can conduct a search on Google Scholar with subsequent parsing of the resulting HTML content. The articles found are collected in the articles member, a list of ScholarArticle instances.
- 
class 
URLQuery(url, start=None)[source]¶ Bases:
snowballing.scholar.ScholarQueryRepresent a Google Scholar query using a generic query We use it to navigate on the citations