tag:blogger.com,1999:blog-15120699921572909092024-03-13T14:08:42.760-07:00Jonathan GriepThis blog is primarily about software development issues we uncover as we build products or services.JGhttp://www.blogger.com/profile/12732475709171849823noreply@blogger.comBlogger10125tag:blogger.com,1999:blog-1512069992157290909.post-70216662082379075862018-07-28T12:10:00.000-07:002018-07-28T12:16:23.775-07:00'pip' is not recognized as an internal or external command, operable program or batch file.<h2>
Windows fails to find pip after installing python</h2>
<div>
After installing python and adding the python path to your path environment variable, you open a cmd window and type the pip command to install a library:</div>
<br />
<code>
c:\Users\myusername> pip install matplotlib
</code>
<br />
<br />
and you get back this error:<br />
<br />
<code>
'pip' is not recognized as an internal or external command,
operable program or batch file. </code><br />
<br />
Assuming that you have added the python directory to your path environment variable, you can execute pip this way:<br />
<br />
<code>c:\Users\myusername> python -m pip install matplotlib
</code>
<br />
<br />
This is because pip is usually located in the "Scripts" folder directly under the folder where python is installed.<br />
<br />
Alternatively you can add the Scripts directory to your path environment variable.<br />
<code><br /></code>
<code><br /></code>
<br />
<div>
</div>
JGhttp://www.blogger.com/profile/12732475709171849823noreply@blogger.com1tag:blogger.com,1999:blog-1512069992157290909.post-47136328144605459552017-04-05T10:54:00.001-07:002017-04-06T11:49:42.436-07:00Indexing Attachments with Elasticsearch<h2>
Introduction</h2>
<div>
<a href="https://www.elastic.co/" target="_blank">Elasticsearch</a> is a great open source tool for indexing many different types of content and providing a fast search capability. I have been working with version 5.3 (on a CentOS 7 virtual machine) to build a tool to migrate or search NSF files using Elasticsearch as a NoSQL data store. The information provided in this post can be used to get you started indexing common files from any source.</div>
<div>
<br /></div>
<div>
One important feature is the ability to index attachments. This post walks through the steps needed to get this to work with the latest <a href="https://www.elastic.co/guide/en/elasticsearch/plugins/master/ingest-attachment.html" target="_blank">ingest-attachment</a> plugin and some of the current limitations of working with the plugin. The previous mapper-attachments plugin has been deprecated in version 5.0.0.</div>
<h2>
</h2>
<h2>
Supported File Formats</h2>
<div>
The ingest-attachment plugin uses the <a href="http://tika.apache.org/" target="_blank">Apace Tika content analysis toolkit</a> to extract text from each file as it is processed. The file formats supported are shown in this source module on <a href="https://github.com/elastic/elasticsearch/blob/5.3/plugins/ingest-attachment/src/main/java/org/elasticsearch/ingest/attachment/TikaImpl.java#L57-L71">github.</a> There are eleven popular file formats including pdf, html, XLS and PPT. Notably, the .eml file format (for mail messages) is not supported in this release even though there is a Tika parser available for that format. </div>
<h2>
</h2>
<h2>
Installation</h2>
<div>
First you need to download and install <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/setup.html">Elasticsearch.</a> Then you need to follow the <a href="https://www.elastic.co/guide/en/elasticsearch/plugins/5.3/ingest-attachment.html">instructions to install</a> the ingest-attachment plugin. There is a Docker image available, I will explore its use in a later post.</div>
<div>
<br /></div>
<div>
Note that the command to install the plugin is different depending on your installation configuration. The script that you need to run (bin/elasticsearch-plugin) is relative to where you installed elasticsearch. On CentOS the default location is /usr/share/elasticsearch/bin/elasticsearch-plugin. </div>
<div>
<br /></div>
<div>
In the examples below my elastic search installation is listening at http://localhost:9200.</div>
<div>
<br /></div>
<h2>
Set up the pipeline</h2>
<div>
Next set up a pipeline to process the attachment data. This is really a configuration step and only needs to done once. This is done by using http PUT:</div>
<div>
<br /></div>
<div>
curl -XPUT 'localhost:9200/_ingest/pipeline/attachment' -H 'Content-Type: application/json' -d'</div>
<div>
{</div>
<div>
"description" : "Extract attachment content", </div>
<div>
"processors" : [</div>
<div>
{</div>
<div>
"attachment" : {</div>
<div>
"field" : "data",</div>
<div>
"indexed_chars" : -1</div>
<div>
}</div>
<div>
}</div>
<div>
]</div>
<div>
}</div>
<div>
'</div>
<div>
You can get the current list of pipelines using http GET to verify it is setup:</div>
<div>
<br /></div>
<div>
curl -XGET 'localhost:9200/_ingest/pipeline'</div>
<div>
<br /></div>
<div>
In the above example "data" is the name of the field that will be treated as attachment data. Setting "indexed_chars" to -1 allows the entire file to be indexed (which can resource intensive). There are <a href="https://www.elastic.co/guide/en/elasticsearch/plugins/5.3/using-ingest-attachment.html">other options available.</a> When you PUT your document content as json, the value for the data field is the B64 encoded content from your file. Its also possible to avoid B64 encoding the file by using the CBOR format which I will explore in another post.</div>
<div>
<br /></div>
<h2>
Index and search for a file</h2>
<div>
As an example, suppose you want to search the contents of a text file named sampleattachment.txt.</div>
<div>
To create the file:</div>
<div>
<br /></div>
<div>
echo "I like to go on the Pelham Parkway to cross the Bronx." > sampleattachment.txt</div>
<div>
<br /></div>
<div>
To add the content of the file to an index named "myindex" and with a type named "media" and a entry id of "99" you can use this bash script:</div>
<div>
<br /></div>
<div>
#!/bin/bash</div>
<div>
filePath='sampleattachment.txt'</div>
<div>
b64encoding=$(base64 --wrap=0 $filePath)</div>
<div>
curl -XPUT 'localhost:9200/myindex/media/99?pipeline=attachment' -d "</div>
<div>
{</div>
<div>
\"data\" : \"$b64encoding\"</div>
<div>
}</div>
<div>
"</div>
<div>
where we refer to the attachment pipeline that we set up above.</div>
<div>
<br /></div>
<div>
Next verify that the content is indexed by performing a search:</div>
<div>
<br /></div>
<div>
curl 'localhost:9200/myindex/media/_search?pretty=true' -d '</div>
<div>
{</div>
<div>
"query" : { "query_string" : { "query" : "Pelham" } }</div>
<div>
}</div>
<div>
'</div>
<div>
returns the results for a search for the word "Pelham".</div>
<div>
<br /></div>
<div>
Here are the JSON formatted search results (which you can use any number of JavaScript based frameworks to format). You can see that Elasticsearch has recognized the content_type as a text attachment.</div>
<div>
<br /></div>
<div>
<div>
{</div>
<div>
"took" : 20,</div>
<div>
"timed_out" : false,</div>
<div>
"_shards" : {</div>
<div>
"total" : 5,</div>
<div>
"successful" : 5,</div>
<div>
"failed" : 0</div>
<div>
},</div>
<div>
"hits" : {</div>
<div>
"total" : 1,</div>
<div>
"max_score" : 0.25124598,</div>
<div>
"hits" : [</div>
<div>
{</div>
<div>
"_index" : "myindex",</div>
<div>
"_type" : "media",</div>
<div>
"_id" : "99",</div>
<div>
"_score" : 0.25124598,</div>
<div>
"_source" : {</div>
<div>
"data" : "SSBsaWtlIHRvIGdvIG9uIHRoZSBQZWxoYW0gUGFya3dheSB0byBjcm9zcyB0aGUgQnJvbnguCg==",</div>
<div>
"attachment" : {</div>
<div>
"content_type" : "text/plain; charset=ISO-8859-1",</div>
<div>
"language" : "en",</div>
<div>
"content" : "I like to go on the Pelham Parkway to cross the Bronx.",</div>
<div>
"content_length" : 56</div>
<div>
}</div>
<div>
}</div>
<div>
}</div>
<div>
]</div>
<div>
}</div>
<div>
}<br />
<br />
<h2>
Filtering Returned Fields</h2>
</div>
</div>
<div>
You can filter what fields are returned in the search results by using a <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-source-filtering.html">source include or exclude filter</a>. For example, if we want to only return the attachment fields without the content (and exclude the data field):</div>
<div>
<br /></div>
<div>
<div>
curl 'localhost:9200/myindex/media/_search?pretty=true' -d '</div>
<div>
{<br />
"_source" : {<br />
"includes" : { "attachment.*" },<br />
"excludes" : { "attachment.content" }<br />
},</div>
<div>
"query" : { "query_string" : { "query" : "Pelham" } }</div>
<div>
}</div>
<div>
'</div>
</div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
JGhttp://www.blogger.com/profile/12732475709171849823noreply@blogger.com0tag:blogger.com,1999:blog-1512069992157290909.post-5383150283836328992016-10-06T07:18:00.000-07:002016-10-06T07:32:12.704-07:00High Physical Memory Usage Issue<h2>
Background</h2>
I investigated an issue where a new Hyper-V virtual machine running Windows 7 would consume most of the physical memory after exactly 5 minutes of uptime. This would occur even if no applications were running.<br />
<br />
<h2>
Investigation</h2>
<div>
I tried using the Windows Task Manager "Processes" tab to look at the memory being used but none of the processes listed (mostly services) had anywhere close to the amount of physical memory (8 GB) allocated.</div>
<div>
<br /></div>
<div>
After some initial searching I found this great SysInternals utility called RAMMap: <a href="https://technet.microsoft.com/en-us/sysinternals/rammap.aspx" target="_blank">https://technet.microsoft.com/en-us/sysinternals/rammap.aspx</a></div>
<div>
<br /></div>
<div>
Running RAMMap utility indicated that most of the memory was "Driver Locked". Using a Google search, I found this post: <a href="https://social.technet.microsoft.com/Forums/office/en-US/d4f97391-a70c-47b1-ab05-bab4754868ac/hyperv-dynamic-memory-driver-locked?forum=winserverhyperv" target="_blank">https://social.technet.microsoft.com/Forums/office/en-US/d4f97391-a70c-47b1-ab05-bab4754868ac/hyperv-dynamic-memory-driver-locked?forum=winserverhyperv</a></div>
<div>
<br /></div>
<div>
I found that the Windows 7 virtual machine was specified to use "Dynamic Memory".</div>
<div>
<br /></div>
<h2>
Solution </h2>
<div>
After shutting down the virtual machine, I unchecked the "Enable Dynamic Memory" option in the Memory Settings for the virtual machine and set the startup memory to my fixed size. After restarting the virtual machine, I found that the physical memory usage no longer grew after 5 minutes.</div>
<div>
<br /></div>
JGhttp://www.blogger.com/profile/12732475709171849823noreply@blogger.com0tag:blogger.com,1999:blog-1512069992157290909.post-1247942235636280342016-10-03T13:46:00.000-07:002016-10-03T13:46:18.019-07:00Useful TypeScript LinksTypeScript is a strongly typed open source language which compiles into JavaScript. TypeScript was originally developed by Microsoft. The language supports interfaces, classes (including inheritance), generics and modules.<br />
<br />
Using TypeScript enables the developer to validate the code contracts at design/compile time instead of waiting until the code is executed in the browser. This reduces your development and testing costs and results in a more reliable site.<br />
<br />
This YouTube video provides a good introduction to the language including how to integrate jQuery with TypeScript: <a href="https://www.youtube.com/watch?v=hd6vXJJC8no" target="_blank">Getting Started with TypeScript</a><br />
<div>
<br /></div>
There is a browser based playground at: <a href="http://www.typescriptlang.org/Playground" target="_blank">http://www.typescriptlang.org/Playground</a> which you can use to try out the language.<br />
<br />
Many TypeScript type definitions (which are very useful when incorporating other JavaScript frameworks such as jQuery) are available on github: <a href="https://github.com/DefinitelyTyped/DefinitelyTyped" target="_blank">https://github.com/DefinitelyTyped/DefinitelyTyped</a><br />
<br />
<br />
<br />
<br />JGhttp://www.blogger.com/profile/12732475709171849823noreply@blogger.com0tag:blogger.com,1999:blog-1512069992157290909.post-85519449930680819312016-02-03T17:54:00.000-08:002018-05-23T19:55:12.174-07:00Debugging Network File System (NFS) connections<h2>
Introduction</h2>
<br />
There are a number of useful shell command line utilities that can be used to debug Network File System (NFS) connections. These utilities are typically available on most Linux or other Unix-variant operating system.<br />
<br />
<h2>
rpcinfo</h2>
<div>
This utility can be used by client computers to find out what services and protocols are supported by a given server. This is a good starting point to find out if the NFS services are enabled on a given computer. There are three services typically needed for NFSv3 connections: portmapper, mount and nfs. The man pages (man rpcinfo) will provide more information about the various options available.</div>
<div>
<br /></div>
<h2>
showmount</h2>
<div>
This utility can be used to list what directories have been exported by a given NFS server. See the man page for the command line options on your system.</div>
<div>
<br /></div>
<h2>
Network Protocol Analyzers</h2>
<div>
There are a number of free utilities which can be used to analyze network transactions: Wireshark, tcpdump. and others. These utilities allow the user to monitor network traffic between the client and the server and log it. The analyzers can then be used to review the log to see what individual commands were sent from the client and the response from the server.<br />
<br />
<h2>
Centos</h2>
</div>
<div>
On Centos, it is possible to enable logging using rpcdebug:</div>
<div>
<br /></div>
<div>
rpcdebug -m nfsd -s all</div>
<div>
<br /></div>
<div>
will send logging for the nfs server to /var/log/messages</div>
<div>
<br /></div>
<div>
and </div>
<div>
<br /></div>
<div>
rpcdebug -m nfsd -c all</div>
<div>
<br /></div>
<div>
will disable logging again. See "man rpcdebug" for more info.<br />
<br />
If trying to debug a command like df:<br />
<br />
strace df -h<br />
<br />
will print out system calls made executing the command. To output to a file:<br />
<br />
strace df -h 2> traceout.txt<br />
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
JGhttp://www.blogger.com/profile/12732475709171849823noreply@blogger.com0tag:blogger.com,1999:blog-1512069992157290909.post-40865981794926019352013-02-24T08:32:00.000-08:002013-02-24T08:32:51.203-08:00undefined symbol: tdb_transaction_start_nonblockWhen enabling Samba on an openSUSE instance, I received the above error when I tried to use:<br />
<br />
net join <br />
<br />
to join a domain.<br />
<br />
Apparently there are some dependency issues. To resolve this problem I used yast to find libtdb and install it and the error went away.<br />
<br />
However, when I tried to start Samba on startup it failed. I discovered this second error by looking in /var/log/samba/log.smbd:<br />
<br />
/var/sbin/smbd: symbol lookup error: /usr/sbin/smbd: undefined symbol: wbcSidsToUnixIds<br />
<br />
I found that this is from libwbclient0, so I used yast to install it (version 3.6.3-115.1) and this second error went away. You may have to first stop the nmb daemon using:<br />
<br />
rcnmb stop<br />
<br />
<br />
After rebooting I checked the status of both nmb and smb using the following commands:<br />
<br />
rcsmb status<br />
and<br />
rcnmb status<br />
<br />
and now both daemons are running.<br />
JGhttp://www.blogger.com/profile/12732475709171849823noreply@blogger.com1tag:blogger.com,1999:blog-1512069992157290909.post-67877967633226045222012-10-11T03:25:00.000-07:002012-10-11T03:25:14.263-07:00Viewing the .NET Finalizer QueueIn .NET, memory is managed via a garbage collector. The collector works by processing the "Finalizer Queue". Sometimes the queue can back up (the overall system is so busy that it can't release the items fast enough) and so you may need to come up with a new resource deallocation strategy. <br />
<br />
In order to find the problematic objects, its useful to look at the queue at certain points when the system is under load to see what can be reclaimed sooner by implementing the IDisposable interface and freeing those objects in your code (thereby avoiding having them processed by the queue).<br />
<br />
There is a third party tool to do let you view this information but it would make sense that there is an alternative way using Microsoft Visual Studio and it is described in this article by Tess Fernandez: <a href="http://blogs.msdn.com/b/tess/archive/2007/10/19/net-finalizer-memory-leak-debugging-with-sos-dll-in-visual-studio.aspx">.Net finalizer memory leak debugging with sos dll in visual studio</a>.<br />
<br />
There were a couple of noteworthy gotchas:<br />
<br />
<ol>
<li>When you connect the debugger to the running executable, you need to ensure that Native debugging is turned on</li>
<li>The sos.dll extension that you load is done via the "immediate" window which is different from the "command" window. To get an immediate window to open you can type <i>immed</i> into the command window. From the immediate window you use the <i>.load </i>command highlighted in the article.</li>
</ol>
<div>
The best part about this tool is there doesn't appear to be anything you need to install. The sos.dll is always there with .Net 2.0 or later.</div>
JGhttp://www.blogger.com/profile/12732475709171849823noreply@blogger.com0tag:blogger.com,1999:blog-1512069992157290909.post-10778638854412218542012-07-31T06:04:00.002-07:002012-07-31T06:10:25.232-07:000x1B1 - Version mismatch between executable and preexisting shared memory versions! EXITING.I recently did some debugging on this Lotus Notes issue and posted a response in the IBM dW forum:<br />
<br />
<br />
<br />
<a href="http://www-10.lotus.com/ldd/nd85forum.nsf/5f27803bba85d8e285256bf10054620d/993bb695ac0e190785257a4c00441c76?OpenDocument&Highlight=0,tw%3F,notes,processes">http://www-10.lotus.com/ldd/nd85forum.nsf/5f27803bba85d8e285256bf10054620d/993bb695ac0e190785257a4c00441c76?OpenDocument&Highlight=0,tw%3F,notes,processes</a>JGhttp://www.blogger.com/profile/12732475709171849823noreply@blogger.com0tag:blogger.com,1999:blog-1512069992157290909.post-6575731591820406102011-11-05T12:20:00.000-07:002011-11-05T13:47:02.322-07:00Debugging Enterprise Vault 10 IndexingIn debugging some indexing related issues with Symantec Enterprise Vault I found this link <a href="http://www.symantec.com/docs/TECH160420">http://www.symantec.com/docs/TECH160420</a> which provides some useful information about debugging indexing. <br />
<br />
I have found the EV Dtrace facility to be very useful for debugging in the past and for indexing, one of the tasks which generate output is EVIndexVolumesProcessor so you should set it to generate verbose output while using Dtrace.<br />
<br />
The dtrace was instrumental in leading me to this post: <a href="http://www.symantec.com/connect/forums/problems-indexing-after-upgrading#comment-5964601">http://www.symantec.com/connect/forums/problems-indexing-after-upgrading#comment-5964601</a> The indexer was having trouble communicating with the StorageCrawler even though they were on the same machine. Disabling the firewall resolved the issue so there must be a port that needs to be opened. More details as they become available.JGhttp://www.blogger.com/profile/12732475709171849823noreply@blogger.com0tag:blogger.com,1999:blog-1512069992157290909.post-1385612960780634252011-04-29T15:28:00.000-07:002011-11-06T22:26:19.691-08:00About MeI run a software development company: JPG Consulting, Inc. We develop software on a contract basis as well as sell our own products on <a href="http://www.notesconnectors.com/">http://www.notesconnectors.com/</a>.<br />
<br />
This blog is going to be devoted to API issues we uncover as we do our development. Often we find the documentation and samples lacking and wind up doing a lot of digging on the web or code experimentation to find the solution.<br />
<br />
I will try to label the relevant API on each post to make searching easier.JGhttp://www.blogger.com/profile/12732475709171849823noreply@blogger.com0