My sister recently asked me for some help in running a comparison job between some large files spanning GBs worth of text. The most efficient solution I could find was Big Query.
So I tried some sample jobs by manually getting the data into a Google Compute Engine then to Google Storage and then loaded into a BigQuery dataset. The reason I had to use the Compute Engine was because this large file was available on a remote server with an ssh only access.
Running and benchmarking python is easier if you have a standard place to run and share them. Kaggle fits this profile perfectly. It is a hosted service for your code which can be run and shared online.
Developing in Cloud https://www.kaggle.com/tusharm567
As an added advantage, I get direct access to loads of public datasets to play and sharpen my skills on.
Benchmarking Code from math import * %timeit tan(atan(exp(log(sqrt(1*1))))) This kind of stuff is helpful while learning and testing out algorithms and/or comparing different approaches to a solution based on compute.
Context Recently, I started working on some Raspberry Pi projects, but the most frustrating part of that experience was to access the raspberry pi during development.
I had the following options:
Use the Television at my place as a monitor over HDMI (not great pixel quality) and use wired keyboard and mouse to control. This was troublesome since I could only develop when I’m at my place and the Television is free.
After a long delay I finally mustered up the courage to build the query ranking module. Some scary stuff. Here are the problems I’ve been facing whenever I start building this.
The ranking requires the table’s Foreign Key structure. Once the query generation is done, the recursive calls along with Javascript’s callbacks is a nightmare. Callbacks are being received even after sending the output. Some relations between the tables are discovered even after ranking.
For now, I have chosen packet for the trial server even though the cost is high. The specs are good (even for entry level containers) and I got some credits to work with initially.
So I setup the ssh keys for login and got to work.
The github repository for parsey mcparseface and syntaxnet is available here and it also lists out the steps required to setup syntaxnet and get it working.
Before I start building up the server for the application, I wanted to pen down the structure for the app. The structure is represented below.
Abbreviations:
WSA: Web Speech API (Speech to Text) NLP: Natural Language Processing Tool (Syntaxnet or Google Cloud NLP API) DIS: Database Indexed Search DDI: Database Dynamic Indexing DES: Database Exhaustive Search NES: NoSQL Database Elastic Search SFT: SQL Database Full Text Search
Converting the Natural Language to Database Query project over to Javascript requires the implementation of the Web Speech API in a single Javascript file. I will discuss the steps taken for the integration in this blog post. Kinda unorthodox but here’s a reference before the content, if you want to experiment on Web Speech API using the official guide.
First we check if the API is available with the browser using the following code
GSoC 2016 Work Submission As a part of the final evaluation, GSoC 2016 accepted students are required to provide a permanent link with details about the work done along with the documentation for that work. This post is for that content.
Here are the details about my GSoC 2016 project once again:
Title: Apache SYNCOPE-809
ORGANIZATION
Apache Software Foundation
MENTOR
Francesco Chicchiriccò
Description:
The SYNCOPE-809 feature request points out the lack of a plugin for IDEs to allow users to create and edit mail templates and report stylesheets in the IDE itself instead of doing so using their dashboard.
htmlhelpers package The following files are available in the htmlhelpers package
AssistInfo.java: Provides replace functionality for template content AttributeInfo.java: Identifies attribute for html tags AutoIndentAction.java: Provides logic for auto indentation of html code CSSBlockScanner.java: Provides logic for CSS block identification CSSRule.java: Provides logic for CSS code identification DocTypeRule.java: Provides logic for html doctype tag identification HTMLAutoEditStrategy.java: Provides logic for html content assist HTMLCompletionProcessor.java: Provides logic for html code auto-completion HTMLContextType.