pysolrq package

class pysolrq.solr.SolrClient(host, version=4.7)[source]
__dict__ = dict_proxy({'__module__': 'pysolrq.solr', 'get_control': <function get_control>, 'get_collection': <function get_collection>, '__dict__': <attribute '__dict__' of 'SolrClient' objects>, '__weakref__': <attribute '__weakref__' of 'SolrClient' objects>, '__doc__': None, '__init__': <function __init__>})
__init__(host, version=4.7)[source]

Constructor for SolrClient class

Parameters:
  • host (str) – Solr host Example:http://example.company.com:8983/solr/
  • version (float) – Current version of Solr host, default=4.7
__module__ = 'pysolrq.solr'
__weakref__

list of weak references to the object (if defined)

get_collection(collection, max_rows=50000)[source]

Factory method to return SolrCollection object

Parameters:
  • collection (str) – name of Solr collection
  • max_rows (int) – maximum rows to fetch, default=50,000
Returns:

Return type:

SolrCollection

get_control(collection)[source]

Factory method to return SolrControl object

Parameters:collection (str) – name of Solr collection
Returns:
Return type:SolrControl
class pysolrq.solr.SolrCollection(host, collection, max_rows=50000)[source]

SolrCollection class

Should not be instantiated directly. Use get_collection method of SolrClient object to get SolrCollection object

__init__(host, collection, max_rows=50000)[source]

Constructor for SolrCollection class

Parameters:
  • host (str) – Solr host Example:http://example.company.com:8983/solr/
  • collection (str) – name of Solr collection
  • max_rows (int) – maximum rows to fetch
__module__ = 'pysolrq.solr'
__repr__() <==> repr(x)[source]
__str__() <==> str(x)[source]
facet_range(query, field_params)[source]

Get facet results using Solr Facets

Parameters:
  • query (str) – Query string Example: 'field1':'val1' AND 'field2':'val2'
  • field_params (dict) – Example: {field_1:[start, end, gap, include], field_2:[start, end, gap, include]}
  • bins (int) –
Returns:

Return type:

dict

fetch(query, fields=None, num_rows=None)[source]

Fetches all rows from returned results from your Solr collection

Parameters:
  • query (str) – Query string Example: 'field1':'val1' AND 'field2':'val2'
  • fields (list of str) – comma separated list of field names Example: ['field1', 'field3']
  • num_rows (int) – number of rows to fetch
Returns:

  • list – a list of dicts
  • None – if self.num_found exceeds self.max_rows

pre_fetch(query, fields)[source]

Fetches the first 10 rows from returned results from your Solr collection

Parameters:
  • query (str) – Query string Example: 'field1':'val1' AND 'field2':'val2'
  • fields (list of str) – comma separated list of field names Example: ['field1', 'field3']
Returns:

Return type:

None

stats(query, fields, metrics=['min', 'max', 'sum', 'count', 'missing', 'sumOfSquaresmean', 'stddev', 'percentiles', 'distinctValues', 'countDistinct', 'cardinality'], percentiles='25,50,75')[source]

Gets basic statistics from Solr

Parameters:
  • query (str) –
    Query string::
    Example: 'field1':'val1' AND 'field2':'val2'
  • fields (list of str) –
    comma separated list of field names::
    Example: ['field1', 'field3']
  • metrics (list of str) – list of available metrics are: ‘min’, ‘max’, ‘sum’, ‘count’, ‘missing’, ‘sumOfSquares’, ‘mean’, ‘stddev’, ‘percentiles’, ‘distinctValues’, ‘countDistinct’, ‘cardinality’
  • percentiles (str) – A string where different percentile values are separated by commas Example: "25,50,75" Note: Uses t-digest approximation algorithm
Returns:

A dictionary with metrics as keys

Return type:

dict

class pysolrq.solr.SolrControl(host, collection)[source]

SolrControl class can be used to make collections and perform indexing of your data.

The data can be in a delimited file such as CSV or a Solr acceptable xml format such as:

<add>
    <doc>
        <field name="id">001</field>
        <field name="food">milk</field>
        <field name="talk">meow</field>
    </doc>
    <doc>
        <field name="id">002</field>
        <field name="food">bone</field>
        <field name="talk">bark</field>
    </doc>
</add>
__init__(host, collection)[source]

Constructor for SorControl class

Parameters:
  • host (str) – Solr host Example:http://example.company.com:8983/solr/
  • collection (str) – name of Solr collection
__module__ = 'pysolrq.solr'
_clean(values)[source]

Cleans the data in values

Parameters:values (list) – list of some data
Returns:A list of values in values with leading and trailing whitespaces removed
Return type:list
_csv_iter(filename, delimiter=', ')[source]

Returns a generator of the read delimited file

Parameters:
  • filename (str) – A delimited file
  • delimiter (str) – Example: ","
Yields:

list – The next list of values read in a row in the given delimited file

_data_iter(file_path, delimiter=None, fields=None, unique_id=True, keep_row=False)[source]

Returns a generator of the read delimited file

Parameters:
  • file_path (str) – A delimited file
  • delimiter (str) – Example: ","
  • fields (tuple of str.) – A list of field names to be used for indexing Example: ('field1', 'field3')
Yields:

str – The next str is an xml formatted str with values read from a row in the file_path file. Example: if a delimited file contains a row as:

"cat", "milk", "meow"

this method will yield:

<add>
    <doc>
        <field name="id">3d144141'</field>
        <field name="food">Hi</field>
        <field name="talk">Hello</field>
    </doc>
</add>

assuming the given fields are ["food", "talk"]

_get_data(values, fields, unique_id=True)[source]

Given the values and fields, returns an str in Solr acceptable xml format

Parameters:
  • values (list) – list of some data
  • fields (list of str) –
    A list of field names to be used for indexing::
    Example: ['field1', 'field3']
Returns:

Return type:

str

_get_doc(d, unique_id=True)[source]

Given a dictionary of fields and values, returns an str to be used by _get_data method

_post_to_collection(data)[source]

Given the data in Solr acceptable xml format posts the data to the Solr Collection

_transform(line, fields)[source]
_transform_partition(partition, fields)[source]
_xmltostr(file_path)[source]

Reads a solrxml file and converts it to a string

Parameters:file_path (str) – An xml file
Returns:
Return type:str
make_collection(num_shards)[source]

Makes a new collection This assumes that the user has already uploaded the collection’s configuration to zookeeper

Parameters:num_shards (int) – number of shards for the collection
Returns:
Return type:None
start_index(file_path_or_spark_df, file_format='solrxml', delimiter=None, fields=None, unique_id=True, batch_size=1, keep_row=False, cleaner_func=None)[source]

Indexes data to the collection

Parameters:
  • file_path (str) – Points to a file with data to be indexed
  • file_format (str) – Available choices are ‘solrxml’ or ‘csv’.
  • delimiter (str) – Required when file_format=’csv’. Example: ","
  • fields (tuple of str.) – A list of field names to be used for indexing Example: ('field1', 'field2')
  • unique_id (bool) – If True, autogenerates a field name id and a unique uuid value to the doc If False, modify the Solr config so that id is not a unique key
Returns:

Return type:

None