In this post we will setup a replica node for an already existing mongoDB or for a new mongoDB. We will also see how to access a mongoDB replica setup using Python programming language. A replica set in MongoDB is a group of mongod processes that maintain the same data set. Replica sets provide redundancy and high availability, and are the basis for all production deployments.
To start we need two instances to host mongoDB. One instance will act as the primary, and the other will act as the secondary. More about primary and secondary can be found here. Once…
Bidirectional Encoder Representations from Transformers (BERT) is a Transformer-based machine learning technique for natural language processing (NLP) pre-training developed by Google . BERT model or other NLP models that are trained on large datasets utilises more memory. For example BERT-base requires around 450 MB of memory, whereas BERT-large requires around 1.2 GB of memory.
While using any of these NLP models for fine tuning our dataset, we save multiple versions of the fine tuned dataset. And in production we might require to load any of the version’s in runtime. As a general practice we might consider to load the NLP…
In this post we will see how to generate word embedding’s and plot a chart for the corresponding words.
First we need to get a paragraph or text for which we need to find the embeddings, I took a pragraph from here for this post.
paragraph = '''Jupiter is the fifth planet from the Sun and the largest in the Solar System.
It is a gas giant with a mass one-thousandth that of the Sun,
but two-and-a-half times that of all the…
In this article we will use GPU for training a spaCy model in Windows environment. spaCy is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython. spaCy is designed to help you do real work — to build real products, or gather real insights. spaCy excels at large-scale information extraction tasks, and is the best way to prepare text for deep learning. It interoperates seamlessly with TensorFlow, PyTorch, scikit-learn, Gensim and the rest of Python’s awesome AI ecosystem.
Let’s first install the GPU dependencies for spaCy.
In this post we will see how to convert BIO tagged text to original text. The BIO / IOB format (short for inside, outside, beginning) is a common tagging format for tagging tokens in a chunking task in computational linguistics (ex. named-entity recognition). The B- prefix before a tag indicates that the tag is the beginning of a chunk, and an I- prefix before a tag indicates that the tag is inside a chunk. The B- tag is used only when a tag is followed by a tag of the same type without O tokens between them. …
I recently got a new HP Laptop, and it does not have an inbuilt keyboard shortcut for
Suspend , and I couldn’t find it either in the Power-off option. I Googled a lot, and finally got the terminal command to Suspend the system.
In this article, we will be learning how to create a keyboard shortcut to
Suspend your PC or Laptop that uses Ubuntu 17.10.
Step 1: Create a shell script to
Suspend the system. I named the shell script file as
Step 2: Give executable permission for the created shell script in any directory, I created…
In this post we will be making a context manager using Python. Context managers comes in handy when we are using the
with statement in Python. We will be making a context manager for writing HDFS file in Pandas. This will be a continuation of my previous post, regarding HDFS using Python Pandas.
We will start be creating a class for our context manager:
__exit__ are the magic functions of
__enter__ is called immediately after the
__init__ function and
__exit__ is called at the end of the
Now lets use our context manager class:
This will be our entire code:
This can be also used with opening databases, files, and a lot more.
Happy coding !!!
In this post I will be showing how to make heat-maps with dendrogram using Python’s Matplotlib library. I came across a post about heat-maps with dendrogram using R and I tried using it with R, but I found R bit tough because of my lack of exposure with R. Thats when I decided to do the same using Python.
I searched the web and did some research, and was finally able to do it using Python.
The dependencies are as follows:
The following code will give us the the heat-maps with dendrogram:
This is how our dendrogram will look:
In this post I will be sharing how to recover corrupt excel file (.xls or .xlsx) file using Python. We were analysing a large data-set of a Pharmaceutical company, and the company was using SAP for their ERP. Their sales data were auto generated from SAP and were provided to us. But all of the data-sets were corrupt. Corrupt in the sense we were able to view the file in excel, but not using Python. The following was the error that is displayed while opening the file using excel.
ReactJS makes life simpler. If you are already working with it, you would be knowing it. I just wanted to demonstrate the working of file-upload and sending the uploaded file as a POST request.
For the demonstration I used the React Boilerplate and was following the Redux architecture. I will be only sharing the components that I used for file upload and POST request. This will definitely help you guys to understand and add the component to your project.
Senior Data Scientist