Discover Top Posts Tagged with #couchdb node.js stream

Streaming CouchDB's _all_docs to a plain list of documents

CouchDB's "_all_docs" endpoint gets you all the documents in database in this form

e.g

http://mycouchdbserver?all_docs?include_docs=true

{ "total_rows": 93985131, "offset": 0, "rows": [ { "id": "0000230a35e724e12b8c18a8f700065d", "key": "0000230a35e724e12b8c18a8f700065d", "value": { "rev": "1-adf8311047fcdd953543118e7d501fa1" }, "doc": { "_id": "0000230a35e724e12b8c18a8f700065d", "_rev": "1-adf8311047fcdd953543118e7d501fa1", "a": "1", "b": "2", "c": "3" } }, { "id": "0000230a35e724e12b8c18a8f7000ccd", "key": "0000230a35e724e12b8c18a8f7000ccd", "value": { "rev": "1-5ce610ff79bc1cfe62b4a1a68e5b09cf" }, "doc": { "_id": "0000230a35e724e12b8c18a8f7000ccd", "_rev": "1-5ce610ff79bc1cfe62b4a1a68e5b09cf", "a": "2", "b": "5", "c": "6" } } ] }

Notice the documents themselves are contained inside an object inside an array. In real life, the data comes out like this:

{"total_rows":93985131,"offset":0,"rows":[ {"id":"0000230a35e724e12b8c18a8f700065d","key":"0000230a35e724e12b8c18a8f700065d","value":{"rev":"1-adf8311047fcdd953543118e7d501fa1"},"doc":{"_id":"0000230a35e724e12b8c18a8f700065d","_rev":"1-adf8311047fcdd953543118e7d501fa1","a":"1","b":"2","c":"3"}}, {"id":"0000230a35e724e12b8c18a8f7000ccd","key":"0000230a35e724e12b8c18a8f7000ccd","value":{"rev":"1-5ce610ff79bc1cfe62b4a1a68e5b09cf"},"doc":{"_id":"0000230a35e724e12b8c18a8f7000ccd","_rev":"1-5ce610ff79bc1cfe62b4a1a68e5b09cf","a":"2","b":"5","c":"6"}} ]}

with each object on its own line.

If you are wanting to export the data and put it in Redshift, for example, the JSON needs to be in this form:

{"_id":"0000230a35e724e12b8c18a8f700065d","_rev":"1-adf8311047fcdd953543118e7d501fa1","a":"1","b":"2","c":"3"} {"_id":"0000230a35e724e12b8c18a8f7000ccd","_rev":"1-5ce610ff79bc1cfe62b4a1a68e5b09cf","a":"2","b":"5","c":"6"}

Solution 1 - jq

The jq utility allows JSON to be parsed and reformatted on the command-line. e.g.

curl 'http://mycouchdbserver?_all_docs?include_docs=true_' | jq '.rows[].doc'

(Thanks to Cloudant Support for this solution). Unfortunately it is not suitable for large data sets as jq requires all of the data to be in-memory.

Solution 2 - Use this docstream.js utility

DocStream takes _all_docs data in on stdin and outputs just the "doc" section:

curl 'http://mycouchdbserver?_all_docs?include_docs=true_' | node docstream.js'

This is should work with any size of data set, as long as each document appears per line.

e.g.

cat sample.txt | ./docstream.js | gzip > output.txt.gz

#CouchDB node.js stream

Streaming CouchDB's _all_docs to a plain list of documents

CouchDB's "_all_docs" endpoint gets you all the documents in database in this form

e.g

http://mycouchdbserver?all_docs?include_docs=true

Notice the documents themselves are contained inside an object inside an array. In real life, the data comes out like this:

{"total_rows":93985131,"offset":0,"rows":[ {"id":"0000230a35e724e12b8c18a8f700065d","key":"0000230a35e724e12b8c18a8f700065d","value":{"rev":"1-adf8311047fcdd953543118e7d501fa1"},"doc":{"_id":"0000230a35e724e12b8c18a8f700065d","_rev":"1-adf8311047fcdd953543118e7d501fa1","a":"1","b":"2","c":"3"}}, {"id":"0000230a35e724e12b8c18a8f7000ccd","key":"0000230a35e724e12b8c18a8f7000ccd","value":{"rev":"1-5ce610ff79bc1cfe62b4a1a68e5b09cf"},"doc":{"_id":"0000230a35e724e12b8c18a8f7000ccd","_rev":"1-5ce610ff79bc1cfe62b4a1a68e5b09cf","a":"2","b":"5","c":"6"}} ]}

with each object on its own line.

If you are wanting to export the data and put it in Redshift, for example, the JSON needs to be in this form:

Solution 1 - jq

The jq utility allows JSON to be parsed and reformatted on the command-line. e.g.

curl 'http://mycouchdbserver?_all_docs?include_docs=true_' | jq '.rows[].doc'

(Thanks to Cloudant Support for this solution). Unfortunately it is not suitable for large data sets as jq requires all of the data to be in-memory.

Solution 2 - Use this docstream.js utility

DocStream takes _all_docs data in on stdin and outputs just the "doc" section:

curl 'http://mycouchdbserver?_all_docs?include_docs=true_' | node docstream.js'

This is should work with any size of data set, as long as each document appears per line.

e.g.

cat sample.txt | ./docstream.js | gzip > output.txt.gz

#CouchDB node.js stream

#couchdb node.js stream

Trending Tags

Recently Viewed Tags

#couchdb node.js stream