selenium_scholar.py¶
This module provides classes for querying Google Scholar using selenium and parsing returned results. It currently only processes the first results page. It is not a recursive crawler.
-
class
AddArticleTask
(result, article)[source]¶ Bases:
object
Task that adds an article to the result
-
class
ParseTask
(result)[source]¶ Bases:
snowballing.scholar.ScholarArticleParser120726
Task that parsers articles
-
class
ScholarSettingsTask
(pages=10, citform=0, new_window=False, collections=1)[source]¶ Bases:
object
This class lets you adjust the Scholar settings for your session.
-
CITFORM_BIBTEX
= 4¶
-
CITFORM_ENDNOTE
= 3¶
-
CITFORM_NONE
= 0¶
-
CITFORM_REFMAN
= 2¶
-
CITFORM_REFWORKS
= 1¶
-
COLLECTIONS_ARTICLES_AND_PATENTS
= 1¶
-
COLLECTIONS_ARTICLES_ONLY
= 0¶
-
COLLECTIONS_CASE_LAW
= 2¶
-
SETTINGS_URL
= 'http://scholar.google.com/scholar_settings?hl=en&as_sdt=0,5&sciodt=0,5'¶
-
citform
¶
-
collections
¶
-
new_window
¶
-
per_page_results
¶
-
-
class
SearchScholarQuery
[source]¶ Bases:
snowballing.scholar.ScholarQuery
This version represents the search query parameters the user can configure on the Scholar website, in the advanced search options.
-
SCHOLAR_QUERY_URL
= 'http://scholar.google.com/scholar?'¶
-
get_url
()[source]¶ Returns a complete, submittable URL string for this particular query instance. The URL and its arguments will vary depending on the query.
Sets names that must be on the result’s author list.
-
set_scope
(title_only)[source]¶ Sets Boolean indicating whether to search entire article or title only.
-
-
class
SeleniumScholarQuerier
(driver=None)[source]¶ Bases:
object
ScholarQuerier instances can conduct a search on Google Scholar with subsequent parsing of the resulting HTML content. The articles found are collected in the articles member, a list of ScholarArticle instances.
-
class
URLQuery
(url, start=None)[source]¶ Bases:
snowballing.scholar.ScholarQuery
Represent a Google Scholar query using a generic query We use it to navigate on the citations