Feeds:
Posts
Comments

I could not find a source for the magic z40 in the .git/hooks/pre-push.sample file. A quick google search threw up this.

z40 is a regular expression matching the empty blob/commit/tree
SHA: “0000000000000000000000000000000000000000”. found here https://github.com/git-lfs/git-lfs/blob/master/git/rev_list_scanner.go

Hopefully somebody can comment and let me know of a more detailed explanation.

Advertisement

I could not easily find solutions for this problem hence this post.

Here are two ways of checking that
df.apply(pd.Series.nunique, axis=0).unique().tolist() == [1]
and
df.groupby(df.columns.tolist()).ngroups == 1

the groupby method was *slightly* faster in my timing tests using the following code

import pandas as pd
import timeit

data = [
dict(zip(list('abc'), range(1, 4))),
dict(zip(list('abc'), range(1, 4))),
]

df = pd.DataFrame(data)

print(df.apply(pd.Series.nunique, axis=0).unique().tolist() == [1])

print(df.groupby(df.columns.tolist()).ngroups == 1)

print(timeit.timeit('df.apply(pd.Series.nunique, axis=0).unique().tolist() == [1]',
number=1000, globals=globals()))

print(timeit.timeit('df.groupby(df.columns.tolist()).ngroups == 1', number=1000,
globals=globals()))

TL;DR fullchain.pem is concatenation of cert.pem and chain.pem

I get the TLS certificates for nginx web server via letsencrypt. They provide the following 4 files on successful authentication

  • cert.pem
  • chain.pem
  • fullchain.pem
  • privkey.pem

here is how you verify that fullchain.pem is just a concatenation of cert.pem and chain.pem. Please note that the order of those two files is important.

diff fullchain.pem <(cat cert.pem chain.pem)

pip install –user spacy 2

this is NOT about installing spacy and models to a virtual environment. i faced no problems in doing that.

this is NOT about installing spacy and models system wide using sudo pip install.

i want to use pip install --user to install spacy and models to ~/.local/lib/python3.x/site-packages

first: install spacy using pip install --user -U spacy OR python -m pip install --user -U spacy.

second: find the shortcut name of the model you want to install. i want to install the model en_core_web_lg https://spacy.io/models/#available-models

third: issue the following command in a terminal and as soon as you see the URL to the model hit Control-c to cancel the progress. the command is python -m spacy download en_core_web_lg.
the reason we don’t want the command to complete is that it will try to install the model to the system wide site-packages directory
the other reason why i do it this way is that spacy automatically tells me the name of the compatible model. i don’t want to download a model that is incompatible with the installed version of spacy.
in my case the URL was https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz

fourth: download that model using curl or wget or some other program.
curl -L -O https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz

fifth: pip install –user /path/to/downloaded/model pip install --user -U en_core_web_lg-2.0.0.tar.gz

sixth: now link the model with some shortcut name. this name is to be used later to load the model. python -m spacy link en_core_web_lg en. here en is the shortcut name i chose while en_core_web_lg is the name of the installed model.

finally: verify the whole thing using python -m spacy validate. you should be able to load the model using

import spacy
nlp = spacy.load('en') # shortcut name

multiple colons in makefiles

more than one colon in gnu make makefile rule

gnu make makefiles more than one colon

i could not find the following page while googling for the search queries mentioned above. hope google is able to send people to this page at which point they will find what they are looking for

https://www.gnu.org/software/make/manual/html_node/Static-Usage.html#Static-Usage

default user agent in scrapy


scrapy settings --get=USER_AGENT

Scrapy/1.2.1 (+http://scrapy.org)


from sklearn.feature_extraction import stop_words

print(stop_words.ENGLISH_STOP_WORDS)

currently there are 318 words in that frozenset.

NLTK also has its own stopwords


from nltk.corpus import stopwords
print(stopwords.words('english'))

there are 153 words in that


select a/sum(a) from foo; -- WRONG!

will not work because sum(a) would work on each row and will return the number present in that row resulting in all 1. since sum(n) = n where n is a single number.

here is where we can make good use of  hive windowing and analytics functions.

the solution is to use the following:


select a/sum(a) over () from foo; -- RIGHT !!!

here we are instructing that the sum be performed over the entire column ‘a’.

you could of course make 2 queries and calculate the sum of a first and then hardcode the sum in another query but that means you are performing 2 passes over the table for sure. now i can’t be sure how many passes over the table are being made in the right query above. if you know the answer please post in the comments below.

It has been a while since I played with Archlinux. Meanwhile AUR has transitioned and now uses version controlled PKGBUILDs. So here is how to go about it.

Let us take the example of the package cower.

If you visit that page you will find a “Download snapshot” link under the Package Actions box to the right of the page near the top of the page. Just click on it and you will download a compressed tarball; cower.tar.gz in this case. Uncompress that to find the actual PKGBUILD in it. I also noticed a hidden file called .SRCINFO in the same folder. Now you can simply issue the command “makepkg -irs” in the same directory and you are all set.

The other way is to git clone the repo. The repo link is right at the top of the page under Git Clone URL. If you clone the repo you will find the PKGBUILD and .SRCINFO and .git directory in there. Again use “makepkg -irs” to install the package.

If one launches Hive using the “-S” or “–silent” option then Hive does not print progress information. However if you are already inside the Hive command line shell then you can control this behaviour by setting the value of silent.


set silent=on; -- make hive silent

set silent=false; -- make hive print progress information