finds.unstructured.unstructured
Classes for unstructured and textual datasets
Copyright 2022, Terence Lim
MIT License
- class finds.unstructured.unstructured.Unstructured(mongodb: MongoDB, database: str)[source]
Bases:
object
Base class for unstructured datasets
- Parameters:
mongod – connection to MongoClient where data collection is stored
database – name of the database in MongoDB
- Variables:
db – pymongo.database.Database connection
Examples:
>>> fomc = Unstructured(mongodb, 'fomc') # connect to client named 'fomc' >>> fomc.show() >>> fomc.select('minutes', where_clause) >>> fomc.delete('minutes', where_clause) >>> fomc.insert('minutes', doc) >>> fomc['minutes'].estimated_document_count() # count docs in collection >>> fomc['minutes', 'field']
Notes: - sudo apt-get install -y mongodb-org # install latest community version - sudo systemctl start mongod # start and stop mongodb server - sudo systemctl status mongod - sudo systemctl restart mongod - sudo systemctl stop mongod
- delete(collection: str, where: str | Dict | List) int [source]
Delete all docs in collection satisfying where clause
- Parameters:
collection – name of collection in database to delete
where – where clause describing documents to delete
- Returns:
number of documents deleted, -1 if collection not in database
Notes:
str filter (passed on directly to pymongo)
dict of {keys:values}
list of key names (to delete if key name $exists)
- get(collection: str, field: str) Any [source]
Return value of field of first doc containing key field name
- Parameters:
collection – name of collection in database to retrieve from
field – key field name
- Returns:
value of key field of first document where key field name exists
- insert(collection: str, doc: Dict, keys: List[str] = [])[source]
Insert one doc; optionally remove existing duplicate document first
- Parameters:
collection – name of collection in database to insert into
doc – dict of {key:value} representing document
keys – list of field names, to delete existing docs with same values
- Returns:
number of existing documents (with same key values) deleted
- load_dataframe(collection: str, df: DataFrame, keys: List[str] = [], update: bool = False)[source]
Insert_many records from rows of dataframe to a collection
- Parameters:
collection – Name of collection in database to delete
df – Each row of DataFrame is document, column names as key fields
keys – Fields names to update or replace if same values
update – If key fields have same value, update if True. Else replace
- select(collection, where: str | List | Dict = [], include_id: bool = False) List [source]
Iterator to retrieve docs in collection satisfying where clause
- Parameters:
collection – Name of collection in database to delete
where – Where clause describing documents to retrieve
include_id – If True, then include _id field in return
- Returns:
Document selecting where clause in a list of dict