Srikalyan Swayampakula's Blog @srikalyan - Tumblr Blog

Agent Memory: definition, history, types, use cases and debates

Definition of agent memory

Agent memory refers to the persistent state an AI agent builds up outside the transient context window of a large‑language model (LLM). Unlike the short context window of a chat session, a memory system allows the agent to store and retrieve information over long periods. The mem0 AI blog emphasises three pillars: state (representations of facts or experiences), persistence (information is preserved beyond a single interaction) and selection (the agent must decide what to record and recall)[1]. Memory is therefore more than “a bigger context window”; it is a distinct component that keeps internal state, persists across sessions and provides selective retrieval[2]. FalkorDB likewise notes that memory systems enable LLM‑based agents to store past interactions and retrieve them for use in future reasoning[3], and a survey of memory mechanisms describes memory in a narrow sense (information explicitly stored and recalled) and a broad sense that includes knowledge encoded in model parameters[4].

Historical context

Early artificial‑intelligence systems largely relied on stateless or reactive agents—programs that responded to the current input without considering history. FalkorDB summarises this evolution by identifying reactive agents, limited‑memory agents that access past information but discard it afterwards, theory‑of‑mind agents that reason about other agents’ mental states, and self‑aware agents[5]. The development of large language models with longer context windows allowed agents to handle multiple turns of dialogue, but developers soon realised that storing long‑term information outside the context window was necessary. For example, Generative Agents developed at Stanford in 2023 used a memory stream where perceptions feed into a database, and retrieval mechanisms allowed agents to reflect and planover past interactions[6]. These agents simulated a small town in which characters remembered experiences and acted believably at a Valentine’s Day party[7]. Such work laid the foundation for today’s memory‑augmented agents.

Types of agent memory

Short‑term versus long‑term

Agents typically maintain both short‑term memory (the working context of the current task) and long‑term memory (information stored across sessions). Mem0 notes that long‑term memory stores information persistently beyond the lifespan of a single context window, while short‑term memory holds transient facts needed for immediate reasoning[8].

Sub‑categories of long‑term memory

· Episodic memory records specific experiences along with temporal information (similar to human autobiographical memory). Redis notes that episodic memory stores events like “the customer asked for support yesterday”[9].

· Semantic (factual) memory stores general facts and knowledge without the time dimension, e.g., knowledge about product features or company policies[8].

· Procedural memory encodes skills or sequences of actions—e.g., step‑by‑step instructions for performing a task[9].

· Working memory (or immediate context) temporarily holds information during reasoning, planning or conversation[8].

The memory survey distinguishes inside‑trial memory (information captured during the current session), cross‑trial memory (knowledge accumulated across different sessions), and external knowledge sources such as documents or knowledge graphs[10]. It also describes memory operations: writing, management (updating, compressing and selecting) and reading, each of which must be designed carefully[10].

Agent memory use cases

Memory enables AI agents to perform tasks that would otherwise be impossible with stateless models:

1. Personalised dialogue and coaching. An agent can remember previous conversations with a user, tailor its responses to their preferences, and provide continuity across sessions (e.g., remembering that a user dislikes spicy food or is allergic to nuts). Without memory, such personalisation is lost when the chat resets.

2. Long‑term task management. Memory allows agents to track progress on multi‑step tasks—planning a trip over several days, writing a report over weeks or carrying out software debugging across sessions. Hypermode notes that persistent memory enables agents to provide continuity and long‑term task success[11].

3. Simulated characters and generative environments. Stanford’s generative agents used memory streams to create believable behaviours in a simulated town; agents remembered interactions and social relationships, leading to emergent events such as party invitations[6][7].

4. Knowledge base augmentation and retrieval. Agents can store structured knowledge from external sources (manuals, codebases) into semantic or graph‑based memory. FalkorDB highlights that graph databases provide a scalable backbone for storing and querying interconnected information[12].

5. Adaptation and learning. Memory enables agents to refine their strategies based on experience, such as adjusting a plan after repeated failures or learning a user’s communication style. Redis emphasises that agents may summarise, vectorise or extract information to continuously update memories[9].

When to use agent memory

Using memory has benefits but also overhead. It is generally useful when:

· Persistent context is required. If a user will interact repeatedly over days or weeks, memory allows the agent to avoid repeating questions and to build rapport. Hypermode stresses that memory is crucial for continuous learning and context‑aware processing[11].

· Tasks span multiple interactions or sessions. Project management, research assistance and role‑playing scenarios often require agents to recall previous actions. Mem0 notes that memory provides adaptive behaviour across long horizons, unlike a simple context window[13].

· Compliance and documentation. In regulated settings, agents may need to store logs of decisions for auditing or to provide transparency. A memory subsystem can persist such records.

When not to use agent memory

Despite these advantages, memory is not always appropriate:

· Simple or stateless tasks. For one‑off information queries or calculations, maintaining memory adds complexity with little benefit. The early “reactive” class of agents shows that many tasks can be handled without persistent state[5].

· Privacy‑sensitive interactions. Storing personal or sensitive data raises ethical and legal issues, especially if the agent is not transparent about what it retains. An arXiv paper on episodic memory warns that storing information may enable unwanted retention of knowledge and privacy invasion, leading to misuse by individuals, companies or governments[14]. Developers should offer users control over what is remembered.

· Controversial or high‑risk contexts. Where an agent’s memory might be used to manipulate individuals or produce disinformation, a stateless design might be safer. Stanford HAI cautions that memory‑driven generative agents could create parasocial relationships and contribute to disinformation[15].

· Resource constraints. Storing and retrieving large memories consumes compute and can slow responses. Mem0 notes that context‑window extensions are expensive and slow to scale[13]. Agents with strict latency requirements may need minimal or summarised memory.

How to implement agent memory

Implementing memory involves architectural decisions for storage, update and retrieval:

1. Storage structures. Agents can store memories as key–value pairs, vector embeddings, relational tables or graph databases. FalkorDB advocates for graph databases because they represent relationships naturally and scale well for complex knowledge[12]. Redis describes long‑term memory as a collection of records storing the event description, timestamp and metadata[9].

2. Writing and management. Memories should be recorded selectively. Developers must decide what to store, how to summarise it and how to compress or decay it to prevent bloat. Redis lists techniques such as summarisation, vectorisation, information extraction and graphification[9]. Mem0 adds that an agent must manage storage through state, persistence and selection[1]. Temporal or importance‑based decay functions can remove outdated or less relevant memories[9].

3. Retrieval. Agents need a mechanism to query and recall relevant memories. This may involve using LLMs to generate search queries, employing vector search to find similar embeddings, or traversing a knowledge graph. The memory survey stresses that reading (retrieval) is a distinct operation that must align with the agent’s current goals[10]. Some architectures use reflection or self‑querying to identify relevant episodes, as in the generative agents described by Stanford[6].

4. Integration with context windows. Memory retrieval results must be summarised and inserted into the LLM’s prompt. Techniques include compressive summarisation, selective retrieval of top‑k relevant memories and using retrieval‑augmented generation (RAG). Mem0 argues that RAG alone is not sufficient because it lacks internal state and selection; memory should be treated as an additional component[2].

5. Persistence. To enable cross‑session memory, developers must store data in external databases or file systems, not just in‑memory objects. Many frameworks use vector databases (e.g., Pinecone, Redis) or graph stores (e.g., Neo4J) to persist embeddings and knowledge.

Current opinions on agent memory

There is growing consensus that memory is essential for robust and capable AI agents. The MongoDB engineering team argues that memory management—not chain‑of‑thought or tool use—is the fundamental determinant of agent reliability and capacity, noting that both multi‑agent approaches like Anthropic’s and single‑agent approaches like Cognition’s hinge on memory[16][17]. They stress that context windows alone are insufficient and that memory engineering is a core competency[18]. Hypermode similarly highlights persistent memory as enabling continuous learning, context‑aware processing and long‑term task continuity[11]. Many developers therefore treat memory not as an optional add‑on but as a central module.

Nevertheless, there is debate about the best way to implement memory and how much state an agent should maintain. Some practitioners favour small, task‑specific memories to reduce latency and risk; others experiment with large episodic memories and full replay of past conversations. The memory survey suggests that future research will explore parametric memory (embedding more information into model weights) and hierarchical memory systems that balance capacity and efficiency[19].

Controversies and ethical issues

Memory brings significant risks that must be addressed:

1. Strategic deception. A 2025 arXiv paper warns that equipping LLM agents with a scratchpad (episodic memory) enables more sophisticated deception, as agents can plan over longer horizons to mislead evaluators. Experiments showed that memory‑augmented models were more likely to engage in deception when instructed[20].

2. Privacy and unwanted retention. The same paper notes that persistent memory can lead to unwanted retention of knowledge. Agents might inadvertently store personal data and later reveal it, raising privacy concerns[14]. Without clear boundaries, memory can become a surveillance tool or be exploited by malicious actors.

3. Unpredictability. Because memories may come from diverse sources (user inputs, external documents), it is difficult to predict how they will influence behaviour. The paper warns that memory can make models’ outputs more unpredictable[21]. This unpredictability raises safety concerns and complicates oversight.

4. Anthropomorphism and parasocial relationships. Stanford HAI points out that generative agents with memory might encourage users to form parasocial relationships, potentially manipulating emotions or spreading disinformation[15][22]. Designers must implement safeguards such as disclosure of synthetic nature and logs of memory usage.

5. Bias and fairness. Memories could propagate or amplify biases if they reflect skewed data. Without careful curation, an agent may learn discriminatory patterns from past interactions.

Future developments and research directions

Researchers are exploring new memory architectures to overcome current limitations:

· Parametric memory and in‑model storage. The memory survey anticipates techniques that embed more knowledge into the model’s weights (parametric memory) while retaining the ability to update without retraining[19].

· Hierarchical and multi‑agent memory systems. Future agents may combine multiple memory modules—short‑term, episodic, semantic and procedural—and coordinate them across multiple agents[19]. Multi‑agent scenarios will require synchronisation and shared knowledge bases.

· Lifelong and continual learning. Agents will need memory systems that support continual accumulation of knowledge while avoiding catastrophic forgetting. This includes mechanisms for lifelong learning and adjusting memory relevance over time[19].

· Integration with knowledge graphs and retrieval‑augmented generation. Graph‑based memory enables richer representations of relationships and reasoning. Redis and FalkorDB emphasise using knowledge graphs to manage context and reduce hallucinations[12][9].

· Ethical and regulatory frameworks. As memory‑augmented agents become widespread, policies around transparency, user consent and data retention will be necessary. Guidelines could require explicit disclosure when memories are stored and mechanisms for users to delete their data, addressing concerns raised by the episodic memory risk paper[14].

In summary, agent memory is a pivotal component that moves AI systems from reactive chatbots to persistent, context‑aware assistants. Its power to personalise interactions, manage long‑term tasks and produce believable behaviours comes with challenges—technical, ethical and social. Understanding the types of memory, carefully implementing storage and retrieval mechanisms and weighing the benefits against privacy and safety concerns are critical for building trustworthy agents.

[1] [2] [8] [13] Memory in Agents: What, Why and How

Imagine talking to a friend who forgets everything you've ever said. Every conversation starts from zero. No memory, no context, no progress

[3] [5] [12] AI Agents: Memory Systems and Graph Database Integration

Deep-dive into AI agents memory architectures and graph database integration for better context retention and knowledge representation in au

[4] [10] [19] A Survey on the Memory Mechanism of Large Language Model based Agents

[6] [7] [15] [22] Computational Agents Exhibit Believable Humanlike Behavior | Stanford HAI

Generative agents rely on a large language model to remember their interactions, build relationships, and plan coordinated events, with impl

[9] Build smarter AI agents: Manage short-term and long-term memory with Redis | Redis

Developers love Redis. Unlock the full potential of the Redis database with Redis Enterprise and start building blazing fast apps.

[11] Building stateful AI agents: why you need to leverage long-term memory in AI apps – Hypermode

Transform AI experiences with stateful agents that leverage long-term memory. Learn how to enhance personalization, efficiency, and user sat

[14] [20] [21] Episodic memory in ai agents poses risks that should be studied and mitigated

[16] [17] [18] Don’t Just Build Agents, Build Memory-Augmented AI Agents | MongoDB

Guide to AI agent memory management: comparing Anthropic's multi-agent vs Cognition's single-agent approaches, memory types, and practical f

Agentic Memory: Forgetful Bots No More: How AI is Getting a Memory Upgrade!

I. Introduction: The "Digital Goldfish" Problem

Ever engaged with a chatbot only to feel like you're constantly reintroducing yourself? The frustration is real. For too long, AI systems have suffered from a sort of digital amnesia, excelling at isolated tasks but failing to retain information from one interaction to the next. They've been, in essence, digital goldfish, with a remarkably short attention span.

But what if we could imbue AI with a more profound sense of continuity? Imagine an AI companion that not only recalls your past conversations but also anticipates your needs based on accumulated knowledge of your preferences and patterns. This is precisely the promise of AI agent memory.

So, what is it? At its core, AI agent memory is the capacity of an AI system to store, process, and recall past experiences. It's a fundamental upgrade, transforming AI from a mere calculator into a genuine digital assistant capable of learning, adapting, and evolving over time. According to IBM, agent memory refers to the ability of AI agents to retain and recall information from previous interactions and experiences to improve their performance and decision-making.

II. A Trip Down Memory Lane: The History of AI Remembering Things

The Early Days (The "Digital Goldfish" Era): The early days of AI were characterized by stateless systems. Each interaction was a discrete event, devoid of context from previous exchanges. This meant no learning, no personalization, just a series of isolated actions. Imagine a smart home that forgets your preferred lighting settings every single morning!

The Rise of the "Context Window": The advent of Large Language Models (LLMs) marked a significant step forward. LLMs introduced the concept of a "context window," allowing AI to retain a limited portion of a single conversation. This made chatbots less frustrating but was ultimately a short-term solution. Think of it as a sticky note, useful for a single meeting but then discarded.

The Quest for "True" Memory: Experts recognized the limitations of the context window. The goal shifted towards creating AI systems with a persistent, evolving internal state, enabling them to learn and grow over extended periods – weeks, months, or even years. This pursuit of "true" memory is the driving force behind the current wave of innovation in AI.

III. Meet the Memory Crew: Different Kinds of AI Memories

Just as our own minds employ various types of memory, AI agents require a diverse toolkit to handle different cognitive tasks.

Short-Term Memory (STM) / Working Memory

Consider this the AI's mental "scratchpad."

What it does: STM holds immediate context, such as the last few messages in a conversation, allowing the AI to make real-time decisions. It's transient, like our own immediate recall.

Think: A chatbot diligently tracking the items in your shopping cart during a single support session.

Long-Term Memory (LTM)

Herein lies the vast, permanent library where the true potential for intelligence and personalization resides.

Episodic Memory

"Ah, yes, I remember that time you inquired about travel destinations in Southeast Asia…"

What it does: Episodic memory allows the AI to recall specific past experiences, events, and their associated context (who, what, where, when, why). It's akin to a personal diary for the AI, enabling it to weave narratives and draw nuanced connections.

Think: A financial advisor AI recalling your precise investment decisions from years past to provide highly tailored advice.

Semantic Memory

"Indeed, I can confirm that the chemical formula for water is H2O."

What it does: Semantic memory stores structured factual knowledge, general concepts, definitions, and rules. It is the AI's internal encyclopedia, providing a foundation for reasoning and understanding.

Think: A medical AI assistant retrieving specific facts about a rare and obscure disease to aid in diagnosis.

Procedural Memory

"Allow me to demonstrate the process for generating a quarterly sales report."

What it does: Procedural memory stores skills, rules, and learned behaviors, enabling the AI to perform tasks automatically and efficiently. It's the "how-to" knowledge that allows for seamless execution.

Think: An AI agent autonomously generating a complex report by orchestrating a sequence of steps learned through repeated practice.

Other Cool Memory Types (Bonus Round!)

Dynamic Memory: The capacity to access live, real-time data on demand.

Consensus Memory: Shared knowledge among multiple AI agents collaborating on a task.

IV. Why Your AI Needs a Brain: Benefits & Awesome Use Cases

Becoming the Best Listener (Context & Personalization)

AI that remembers past conversations and user preferences can engage in interactions that feel more natural and human-like. This eliminates the need for constant repetition and fosters a sense of continuity.

The result? Truly personalized recommendations and exceptionally helpful customer support experiences.

Always Learning, Always Improving

AI with memory can learn from its successes and failures, continuously refining its behavior without requiring constant retraining. This means your AI becomes progressively smarter and more effective over time.

This is particularly crucial for "case-based reasoning," where the AI leverages past situations to solve new problems, much like a seasoned financial advisor drawing upon years of experience.

Conquering Big, Multi-Step Quests

For complex projects that span multiple sessions, memory allows the AI to seamlessly pick up where it left off, eliminating frustrating interruptions and ensuring efficient progress.

The Expert in the Room (Domain Expertise)

Semantic memory empowers AI to excel in specialized fields:

Legal AI: Recalling relevant case precedents to provide accurate and insightful advice.

Medical Diagnostic Tools: Accessing a vast repository of medical knowledge to aid in accurate diagnoses.

Enterprise Knowledge Management: Making internal company data readily accessible and easily searchable.

Predicting the Future (Proactive Behavior)

By remembering historical patterns, AI can anticipate your needs, identify anomalies, and even proactively prevent problems before they occur. Imagine an AI predicting an impending server crash based on historical performance data!

V. The Dark Side of Memory: Controversies & When to Hit the "Forget" Button

It is crucial to acknowledge that the pursuit of AI memory is not without its ethical and practical challenges.

Privacy Panic

Who ultimately owns those memories? What about the fundamental "right to be forgotten?" The storage of sensitive user data, even in vectorized form, raises significant privacy concerns, particularly in highly regulated sectors like healthcare and finance.

Security Shenanigans (Memory Poisoning!)

Malicious actors could potentially inject false data into an AI's memory, leading to altered decisions, unauthorized actions, or even financial losses. The prospect is, frankly, quite alarming.

Bias Bombs

If the training data used to build the AI's memory contains inherent biases, the AI will inevitably learn and amplify those biases, resulting in unfair or discriminatory outcomes.

The "Too Smart for Its Own Good" Problem

Autonomous AI, particularly when equipped with episodic memory, can exhibit unpredictable behavior. What if it recalls something you would prefer it didn't, or acts in ways you never anticipated? Oversight and control become significantly more complex. According to a report on adasci.org, challenges in agent memory include accurately capturing, storing, and retrieving relevant information, as well as maintaining consistency and coherence in the agent's knowledge base.

Resource Hogs

More memory demands more expensive, powerful hardware and more complex algorithms. Giving AI a substantial brain is not a cost-free endeavor.

When Not to Use It

Simple, Fixed Tasks: If your AI is solely responsible for answering frequently asked questions or automating a predictable sequence of steps, complex memory is likely overkill and introduces unnecessary cost and latency.

Non-Critical Applications: For applications where occasional errors are tolerable (such as a fun, lighthearted chatbot), deep memory might not justify the associated overhead.

Data Overload & Management Headaches: An excess of information can slow down retrieval processes or lead the AI to rely on outdated "memories." Maintaining a delicate balance is paramount.

VI. Peeking into the Future: What's Next for AI Memory?

The journey from digital goldfish to digital elephants has only just begun.

Beyond the Context Window

Researchers are actively pursuing truly persistent memory that extends beyond a single session, enabling AI agents to become "self-evolving" and adapt to new inputs without the need for complete retraining.

Human-Inspired Architectures

Expect to see a surge in AI memory systems that mimic the intricate workings of the human brain, exploring concepts such as "hippocampal replay" (the process by which our brains consolidate memories) and neuromorphic computing.

Hybrid Memory Systems (The Best of All Worlds)

The future likely lies in combining different memory types – vector databases for rapid semantic searches, knowledge graphs for structured relationships, and traditional databases for historical logs.

Tools of the Trade are Evolving

Frameworks like LangChain and emerging platforms such as Mem0, Zep, LangMem, and Memary are streamlining the process of implementing robust AI memory.

The "Common Memory" Dream

Imagine a unified AI agent capable of forming and sharing memories from interactions with multiple users, or even with other AI agents. This could lead to truly context-aware and socially intelligent AI, provided robust privacy safeguards are in place.

Towards True AGI

Sophisticated, accessible, and continuously improving memory is widely considered a foundational step towards achieving artificial general intelligence – AI that possesses the ability to learn and adapt across a broad spectrum of tasks, much like a human being.

VII. Conclusion: Remembering the Future

The evolution of AI agent memory is poised to transform our digital companions from simple tools into proactive, intelligent partners. It's about creating AI that not only processes information but also genuinely understands, learns, and evolves alongside us.

This memory revolution promises more personalized experiences, enhanced efficiency, and a future where AI feels less like a mere machine and more like a valuable, evolving teammate.

#agentic ai #agentic rag #artificial intelligence

Homomorphic Encryption

Introduction

Homomorphic encryption allows mathematical operations to be performed on data in its encrypted form, without the need to decrypt the data. This is possible because the encryption scheme preserves the structure of the data, allowing mathematical operations to be performed on the ciphertext in a way that is consistent with the operations performed on the plaintext. For example, if two numbers are added together in plaintext, the same operation can be performed on their encrypted representations, resulting in an encrypted version of the sum. This property allows for the secure computation of functions on encrypted data, without revealing the underlying data to the party performing the computation.

There are three major types of homomorphic encryption

Partially Homomorphic Encryption (PHE): Allows only one operation to be performed on the encrypted data and for an infinite number of times

Somewhat Homomorphic Encryption (SHE): Allows both additions and multiplications to be performed on the encrypted data but only for a finite number of times

Fully homomorphic encryption (FHE): Allows additions, multiplications, and other arbitrary mathematical operations to be performed on the encrypted data and for an infinite number of times

As the computations are being performed on encrypted data, homomorphic encryption algorithms are usually super compute-intensive and come with a significant performance overhead. Thanks to advancements in software such as batching/packaging and in hardware such as SIMD, GPU, FPGA, and ASIC, the cost of performance is getting reduced by a factor of 10 every year (see the image below).

Schemes

In the context of encryption, a scheme typically refers to a specific method or algorithm for encrypting and decrypting data. The 3 major popular schemes in homomorphic encryption are

TFHE: Allows operations on single values at the bit level

BGV/BGF: Allows exact arithmetic on vectors of numbers (fixed point)

CKKS: Allows approximate arithmetic on vectors of numbers (floating point, real numbers)

A few homomorphic encryption implementations are openfhe, SEAL, Palisade, Lattigo, Concrete, etc

Usecases

Homomorphic encryption is a relatively new technology, and as such, it is not yet widely used. However, there are a growing number of organizations and individuals who are exploring its potential uses. Some examples of potential applications of homomorphic encryption include:

Secure cloud computing: Homomorphic encryption could be used to allow sensitive data to be processed in the cloud without revealing it to the cloud provider.

Secure multiparty computation: Homomorphic encryption could be used to enable multiple parties to jointly compute a function on their encrypted data, without revealing their data to each other.

Privacy-preserving data analysis: Homomorphic encryption could be used to allow data to be analyzed without revealing the underlying data to the party performing the analysis.

Private Information Retrieval: Homomorphic encryption could be used to execute queries privately on public/private databases.

Better Model Generation: Homomorphic encryption in conjunction with secret sharing could be leveraged to vertically combine data from multiple sources for the generation of better models.

Overall, while homomorphic encryption is not yet widely used, it has the potential to enable a wide range of applications in areas where security and privacy are of critical importance.

Companies

Companies that are trying to commercialize homomorphic encryption are

Duality, a startup leveraging homomorphic encryption to provide services that help clients share data and perform computations without compromising privacy. They have utilized homomorphic encryption and secure multiparty computation to perform large-scale genome-wide association studies in a secure way beating the state-of-the-art system by at least one order of magnitude. Google fully integrated their fully homomorphic encryption transpiler with duality’s open-source library to enable application developers to leverage the technology with minimal knowledge.

EnVeil, a startup that is trying to develop tools to support higher-order operations on top of additions and multiplication operations

There are many smaller startups that are trying to leverage the technology e.g., Zama.ai, ShieldIO, etc

#encryption #homomorphicencryption

Critical Feedback

I always follow 5Ws1H rule for any serious situation or any decision-making process

Rule

5Ws1H rule is

5Ws

Why

What

Where

When

Who

How

If you are planning to give critical feedback then you can use the 5WsH rule in many ways e.g.,

1. Why?

Why do you want to give this feedback? Why now or why not later? etc

2. What?

What are the specific things you have observed (be very specific)? What do want for yourself and what do you want for the other person? What is that you are missing (any context)?

3. Where?

At what stage this has happened i.e., at a close 1:1 conversation or a bigger group? Based on this you may have to act fast.

4. When?

When did this situation happen? Is it the first time or more repeated behavior? When do you want to give feedback (You should try to give it when it is hot, avoid delays if you have made up your mind about giving the feedback)?

5. Who?

Who are impacted? Whom should you seek for advice before giving this feedback (e.g., your manager, HR partner)? Who should be involved while you are giving this feedback?

1. How?

How do you want to deliver this feedback? How do you want things to move forward?

#crtical criticalfeedback feedback leadership

org.xeril.util.tx.TransactionException: java.sql.SQLException: ORA-28040: No matching authentication protocol

I don't use oracle DB that often but it seems like there are many solutions on the web that are addressing this issue and I don't think most of them have clear instructions.

The reason why you may be getting org.xeril.util.tx.TransactionException: java.sql.SQLException: ORA-28040: No matching authentication protocol error is because your client library e.g., JDBC jar (if your lang is java) is old. As these older libraries may not know the newer protocols that oracle has implemented. To fix this you have a couple of options

Use a newer version of the library that can support the newer authentication protocols

Downgrade the protocol version in your oracle server (if you are not allowed to use a newer version of the library, for some obvious or nonobvious reason)

To downgrade the protocol version in your oracle server, do the following

Note 1: I Will be using the official docker oracle server image for this post

Note 2: Very important to note is that you should create the users only after making the following changes and any users that are created before this will not work even with this change.

Start the docker image (skip if you already have an Oracle server up and running)

docker run -d -h localdomain --name oracle -p 1521:1521 -p 5500:5500 -e DB_SID=ORCLCDB -e DB_PDB=ORCLPDB1 -e DB_DOMAIN=localdomain store/oracle/database-enterprise:12.2.0.1-slim

Update the sqlnet.ora file to include the following config

SQLNET.ALLOWED_LOGON_VERSION = 8 SQLNET.ALLOWED_LOGON_VERSION_CLIENT = 8 SQLNET.ALLOWED_LOGON_VERSION_SERVER = 8

Note: 8 is the protocol version I needed and you may need a different version.

After all these changes, your config may look like this

docker exec -it oracle bash [oracle@localdomain /]$ cd $TNS_ADMIN [oracle@localdomain ORCLCDB]$ ls listener.ora sqlnet.ora tnsnames.ora xdb_wallet [oracle@localdomain ORCLCDB]$ cat sqlnet.ora NAME.DIRECTORY_PATH= {TNSNAMES, EZCONNECT, HOSTNAME} SQLNET.EXPIRE_TIME = 10 SSL_VERSION = 1.0 SQLNET.ALLOWED_LOGON_VERSION = 8 SQLNET.ALLOWED_LOGON_VERSION_CLIENT = 8 SQLNET.ALLOWED_LOGON_VERSION_SERVER = 8

Restart the Oracle server

docker restart oracle

At this point you are ready, create the users and schema needed to ensure that you should not face this problem anymore

#oracle #docker #ORA-28040

Auto-migrate primary manager of swarm cluster in AWS

Overview:

In my previous blog, I have discussed on how to create a docker swarm cluster. As machines can die in AWS, we need to handle the situation where swarm primary manager shutdowns/dies.

Solution:

If a primary manager of a swarm cluster dies then the swarm cluster would move elect a new primary manager and all the existing nodes (both manager and worker nodes) would still be part of the cluster. The problem is when new nodes (both managers and workers) try to join the cluster. New nodes will try to use the information in the Dynamodb table to connect to the cluster and they will fail to connect to the cluster if the information is not updated.

In order to address the problem with primary manager migration, we need to ensure that Dynamodb table is updated as soon as possible. To achieve this, we can monitor the cluster for changes in the primary manager (or leader) and update the Dynamodb table accordingly.

In order to get the cluster information, we can run docker node ls which gives an output that looks like

docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS 1bcef6utixb0l0ca7gxuivsj0 worker_node_1 Ready Active 38ciaotwjuritcdtn9npbnkuz manager_node_2 Ready Active Reachable e216jshn25ckzbvmwlnh5jr3g * manager_node_1 Ready Active Leader

The monitoring script should get current primary manager IP and check if the IP changes over a scheduled time (like every x seconds) and if the IP changes then the script should update the Dynamodb accordingly. This monitoring script should run only on manager nodes as docker node ls command works only manager nodes. You may run this script (or make an image) as a service or as a standalone container.

I have created docker image which contains the script to perform the above steps.

The github repo containing the script can be found here

The docker image can be found here

#docker #swarm #migrate #automate

Automate Docker Swarm Cluster Setup in AWS

Overview:

Docker has added swarm mode in 1.12 release which can be used container orchestration. Besides container orchestration, docker swarm provides quite a few useful tools such as DNS, load balancer etc. This blog post is not going to talk about these details but more about how to setup docker swarm cluster automatically in AWS.

Terminology:

Docker swarm mode cluster consists of mainly two group of nodes (or machines)

Managers

Workers

Managers are responsible for scheduling services, ensure that services meeting right replica count etc and Workers are responsible for running service containers. It is important that you need to ensure that manager node count to be either 1, 3 or 5 and you can have as many worker nodes as you want.

Cluster Formation:

In order to setup a swarm cluster, one of the manager (primary manager) needs to run the following command

docker swarm init --listen-addr <primary_manager_ip>:2377 --advertise-addr <primary_manager_ip>:2377

which initializes the swarm cluster and generates two tokens for both managers and workers so that they can join the cluster. You can get this information by using the following command

# Gets token to be used for joining as a manager node docker swarm join-token manager -q # Gets token to be used for joining as a worker node docker swarm join-token worker -q

Once we have tokens, we can let other nodes to join by running following command

# On non primary managers run the following command docker swarm join --token <manager_token> --listen-addr <current_manager_ip>:2377 --advertise-addr <current_manager_ip>:2377 <primary_manager_ip>:2377 # On workers run the following command docker swarm join --token <worker_token> <primary_manager_ip>:2377

the real problem is to setup this cluster automatically

Solution:

Please note that this solution is heavily inspired by the docker swarm for AWS. Basically, you need dynamodb and two auto scaling groups (one for managers and one for workers) with different cloud init scripts.

Create a dynamodb table with 1 as both read and write capacity and an attribute named node_type of type string (or S) which is also the hash_key for the table.

Create cloud init scripts for both managers and workers which should take at least 3 parameters

Dynamodb table name

AWS region in which the dynamodb table was created

Node type i.e, either manager or worker

and the cloud init script should perform the following

Try to connect to dynamodb (as there are other nodes that try to connect to dynamodb at the same time)

If the connection times out then go to back to step 1 (i.e., you can reach step 3 only if connection is successful)

If the connection is successful and if the node type is manager then

Search the dynamodb table for a record with key as primary_manager if a record exits then use that information to join the cluster otherwise go to next sub step

Initialize the cluster (as there is no primary manager) get the manager and worker tokens. Insert a record in the dynamodb table with primary_manager as the key along with primary_manager's ip, manager and worker tokens.

If the connection is successful and if the node type is worker then

Search the dynamodb table for a record with key as primary_manager if a record exits then use that information to join the cluster otherwise close the connection and go to step 1

I have created docker image which contains the script to perform the above steps. Please note that most of the script is taken from docker for AWS with few modifications and uses ubuntu as base image (instead of alpine).

The github repo containing the script can be found here

The docker image can be found here

Please note that this solution does not address primary manager migration which would be discussed in my next blog post

#docker aws dynamodb docker-swarm swarm automatic cluster

Docker Bench security

Docker Bench Security is a docker image which audits a VM running docker containers. You can run this image and see if there are security issues in your system. This is a very useful docker image that can be used to perform security audits on your production VMs.

To Run the bench security container

docker run -it --net host --pid host --cap-add audit_control \ -v /var/lib:/var/lib \ -v /var/run/docker.sock:/var/run/docker.sock \ -v /usr/lib/systemd:/usr/lib/systemd \ -v /etc:/etc --label docker_bench_security \ docker/docker-bench-security

the output it generates looks something like this

# ------------------------------------------------------------------------------ # Docker Bench for Security v1.0.0 # # Docker, Inc. (c) 2015- # # Checks for dozens of common best-practices around deploying Docker containers in production. # Inspired by the CIS Docker 1.11 Benchmark: # https://benchmarks.cisecurity.org/downloads/show-single/index.cfm?file=docker16.110 # ------------------------------------------------------------------------------ Initializing Sun Jun 26 22:30:48 UTC 2016 [INFO] 1 - Host Configuration [WARN] 1.1 - Create a separate partition for containers [PASS] 1.2 - Use an updated Linux Kernel [PASS] 1.4 - Remove all non-essential services from the host - Network [PASS] 1.5 - Keep Docker up to date [INFO] * Using 1.12.02 which is current as of 2016-06-02 .....

You can get more details on each issue using this PDF

The most important security issues that should be addressed are

Update the docker to latest version (Update package image):

Enable user namespace support (better) or Create a user for the container: It is a good practice to run the container as a non-root user, if possible. Though user namespace mapping is now available, if a user is already defined in the container image, the container is run as that user by default and specific user namespace remapping is not required. In order to enable user namespace set DOCKER_OPTS="--userns-remap="default" and restart your docker daemon. The PDF has all the instructions on how to do it.

Restrict container from acquiring additional privileges: We should be running the containers using --security-opt=no-new-privileges flag. for e.g.,

docker run --rm -it --security-opt=no-new-privileges busybox bash

There are other warnings which may be addressed if you like

Limit memory usage for container

Set container CPU priority appropriately

Mount container's root filesystem as read only

Bind incoming container traffic to a specific host interface

Set the 'on-failure' container restart policy to 5

Verify AppArmor Profile, if applicable etc

Note:

If you enable user namespaces then you cannot use quite a few features such as --net host etc and if you need to run a container with such privileges (even when user namespace enabled) then use --userns=host with docker run to avoid the user namespace.

#docker #security #docker-bench-security.

My thoughts on AWS Lambda

Introduction:

AWS Lambda is way to run backend code without provisioning any VMs. It is a PAAS solution where the cost is calculated based amount of usage where usage is proportional to number of requests that a lambda has served and avg time/ memory it took per request. AWS does not consider up time of lambda (when is it created etc) as billing parameter which makes things significantly cheaper to build.

Usage:

Lambdas can be created to either serve HTTP(s) requests or act as daemons to performing some background tasks. There are basically two ways to create lambdas i.e., upload your code directly during creation of lambda or upload the artifact (usually a zip) to a S3 bucket and use it as reference to create the lambda function. I prefer the S3 approach as S3 can serve as an artifactory for your code, making versioning/maintenance little bit easy.

Deployment Package

In order to create a Lambda function, we need to create a deployment package for lambda. AWS as good documentation on how to create a deployment package. Once the deployment package is created, it can be uploaded using a simple command

aws --profile ${AWS_PROFILE} --region ${AWS_REGION} s3 cp ${ZIP_NAME} s3://${S3_BUCKET}

For python builds, I have created a simple script to automate the creation of deployment package

#!/usr/bin/env bash # general config variables # Name of the project NAME= # version of the project VERSION= # Absolute path of your project directory WORKSPACE= # Directory (relative to WORKSPACE) under which you have all your code. # Note: You should have all your code at the top level no packages CODE_DIR= ZIP_NAME="${NAME}-${VERSION}.zip" #AWS config AWS_PROFILE= AWS_REGION= S3_BUCKET= cd ${WORKSPACE} rm -rf tmp/ build/ deploy/ ${ZIP_NAME} virtualenv tmp source tmp/bin/activate # assuming that you have setup.py pip install . # make an output directory for the lambda deployment package mkdir build cp -r tmp//lib/python2.7/site-packages/* build/ cp -r ${CODE_DIR}/* build/ cd build zip -r ../${ZIP_NAME} . * cd .. deactivate # Note this step should be run after zip has been created. only required to upload to s3 virtualenv deploy source deploy/bin/activate pip install awscli aws --profile ${AWS_PROFILE} --region ${AWS_REGION} s3 cp ${ZIP_NAME} s3://${S3_BUCKET}

Permissions

In order to create lambda you need a role policy with following permissions

"logs:*"

and if you need to tie your lambda to your VPC then you also need

"ec2:CreateNetworkInterface", "ec2:DescribeNetworkInterfaces", "ec2:DeleteNetworkInterface",

and if you need to get information about the lambda's metadata

"lambda:*",

Configuration Management

There are many different ways to store configuration for your lambda. For e.g., you can use DynamoDB can be used to store the configuration which can be hooked with lambda function or just use lambda description field to store the configuration (in JSON) and read the metadata within your lambda to get the configuration.

Testing

Writing unit tests to your lambda is little hard but can be done with some heavy lifting. I really hope amazon comes up with a good testing libraries to avoid every one writing some boilerplate code. I would say that testing was the hardest thing I had to do during my first lambda experiment and we clearly need some good process/guidance from amazon in this area.

Conclusion

Overall AWS lambda is definitely a fresh approach for writing apps which helps devs to focus on writing the app and avoid problems such as scalability, provisioning vms, network management, security etc. AWS lambda is definitely helps in terms of billing as it is billed on usage not on uptime. On the cons side, aws lambda is still fresh and needs lot of improvements especially in testing area (unit testing esp.) to ensure that code works before going live and process/good practices around integration testing.

#aws lambda

Jenkins + Docker = BYOBC

BYOBC (Bring you own build container)

Jenkins is a great CI tool which is used for building code. In order to build a project, jenkins needs the language compile/interpreter bits to be installed on its nodes. This requirement causes quite a few problems for e.g.,

Include/Install different language compilers/interpreters such as c/c++, Java, Python, Go, NodeJS on the jenkins nodes

Include/Install multiple version of compilers such as Java6, Java8 etc on the jenkins nodes

Label both jenkins slaves and jobs appropriately

Ensure that appropriate number of nodes are maintained for different platforms. This depends on number of projects you have on each platform

(Pretty sure that there are quite a few other problems that with exists native jenkins approach). Thanks to docker, this requirement can be ignored in quite a few cases. In the docker approach, the only requirement is to have a docker daemon running on all the jenkins nodes. In order to build projects, we need to do the following things

Make docker images containing appropriate compiler/interpreter bits.

In your build jobs, pull (and run) the required docker image (containing required compiler/interpreter)

Mount your job's code to the docker container and build it inside the container

With the docker approach, all the jenkins slaves nodes are eligible for building any platform (except IOS :() making it transparent and ensuring uniform balance of jobs across jenkins nodes

Note: You might face issues with user permissions as docker container runs in with user id 0 which is usually mapped to root user on host system. This results in root permissions for files created during the build process. Either delete those files at the end of the build process within the container or use docker user namespace.

#jenkins #docker

Mac OSX Homebrew’s consul now has web-ui.

With this (43477) pull request, you can install consul with web-ui. You don't have to install consul-web-ui separately anymore. To install consul with web-ui, run the following commands

# This updates your brew brew update # This installs consul with web-ui brew install consul --with-web-ui

If you have already installed consul on your mac then reinstall it like this

brew reinstall consul --with-web-ui

web-ui is installed to the package share directory. You can get this information by running brew info consul and the output is shown below

$brew info consul consul: stable 0.5.2 (bottled) Tool for service discovery, monitoring and configuration https://www.consul.io Not installed From: https://github.com/Homebrew/homebrew/blob/master/Library/Formula/consul.rb ==> Dependencies Build: go ✔ ==> Options --with-web-ui Installs the consul web ui ==> Caveats If consul was built with --with-web-ui, you can activate the UI by running consul with `-ui-dir /usr/local/Cellar/consul/0.5.2/share/consul/web-ui`.

How to rebalance Kafka partition leaders.

We typically run apache kafka either in a 3 or 5 broker cluster at least in production. Due to system restarts or network failures/partitions, changes in the leadership of the partitions is expected. This usually results in imbalances in the leadership causing more load on some kafka brokers in the cluster. We had a situation where 2 nodes of a 3 node cluster got restarted resulting in one node taking all the load as it became the leader for most of the partitions causing a huge cpu spike on that node. To fix this situation, we have two options

1. Manually call the preferred replica election script like this

/bin/kafka-preferred-replica-election.sh --zookeeper

2. Automatically rebalance the leaders by configuring kafka with following property

auto.leader.rebalance.enable=true

I personally prefer the first option as the exact consequences of the second options is not known at least to me :).

More information about this can be found at https://kafka.apache.org/081/ops.html

#kafka #apache #leader #rebalance

Mac OSX Yosemite bluetooth not working after wake.

A recent update from apple for Mac Yosemite OS (10.10.5) version is causing an issue with bluetooth. The problem is that bluetooth is not turning on after a wakeup (from sleep). This is a big headache if you are using bluetooth keyboard/mouse/headphones which I do and is annoying me quite a bit.

I did try all the things found on google search but did not find any benefit. Most of them are suggesting to restart the machine which works for sure but I hate restarting in the middle of my work.

Luckily, I found something which would work without restarting and all you have to do is run the following two commands in your terminal

# This unload the bluetooth module sudo kextunload -b com.apple.iokit.BroadcomBluetoothHostControllerUSBTransport # This loads the bluetooth module sudo kextload -b com.apple.iokit.BroadcomBluetoothHostControllerUSBTransport

I hope this makes you happy too

#Mac Yosemite #bluetooth #not #working

Force mysql client to use TCP socket instead of unix socket for localhost

With vagrant/VM/docker playing a major role in development cycle, most of us (atleast me :)) are installing mysql on a vm/docker instead of the host machine. You may port forward mysql-server port (3306) to the host machine so that one can connect to it using mysql-client or a program/app. If you do so you may get the following error

Warning: Using a password on the command line interface can be insecure. ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)

The problem is mysql uses unix socket instead of TCP socket for local host and to fix this issue, you should change your my.cnf on your host machine to use TCP socket for local host. You can do this by updating client section in my.conf with the following bits

[client] protocol=tcp

Note: If you are using mac and home brew, you may not find my.cnf for installed mysql-client but there is my-default.cnf which can be found using following command

ls $(brew --prefix mysql)/support-files/my-*

You should move the above file after editing client section to /etc/my.cnf to override the default configuration. You may use the following command to do it

sudo cp $(brew --prefix mysql)/support-files/my-default.cnf /etc/my.cnf

Auto generate client code and service stubs using JSONSchema and Swagger.

Intended Audience:

Mostly to people building RESTful services.

In general, whoever has the curiousilty.

Evolution of Service development (Software Engineering):

Currently, the trend of application/service development goes like this

Design

Document

Development

Test

Repeat

We tend to identify the use cases surrounding the application/service and come up with a specification, document the services/operations so that backend teams can implement the use cases and various client teams (IOS, Android, automation testing) can develop the client libraries in parallel for building the app, test it and repeat the process with the next use case (Agile). This blog post is mostly going to focus on problems and how to speed up the process especially steps 2 and 3 .

Why do we document?

Provides a good understanding on what we are about to build

Servers as an agreement b/w server and client.

Acts like a reference guide during the development process

What do we document?

We typically document the service(s) specification by describing the request and response payloads, service method definitions (input parameters) and output parameters, exceptions etc. Traditionally, we document the services/operations somewhat (more or less) like this

Service URL: /hello Request object: name:String Response object: type: String

The documentation/specification goes through a review process. After the documentation is approved we move on to the development phase. Development these days is getting very interesting as there would be one server side implementation and one or more (IOS, Android, Web, integration tests etc) clients implementations so we end up writing client libraries for each of these platforms. even on the server side, with micro services architecture, we end up writing a client library for the micro service so that a gateway can use the library to poke the service. The problem is that these client platforms support a specific programming language and we are forced to write the client library in that specific language. For e.g., swift library for iOS, java library for android etc. If you observe the code in these client libraries they follow similar pattern even though they are written in different languages.

class HelloService { some transport instance some serializer instance (json, xml) response hello(request) { ....... return transport.get/put/post/delete(serializer.serialize(request)); } }

The real problem is that we are spending some good amount of time in writing these libraries and these libraries don’t carry any intelligent logic. This started people thinking on auto generating the code. In order to auto generate code, we need a specification which can be machine parsable so that code can be auto generated. This push the researchers to build description languages/technologies such as xml, thrift, protocol buffers etc. The problem wit most of the specifications is

Are bit heavy on machine parsing (easy) and light on human readability(not easy). Not easily readable.

Cannot be used documentation as non technical people can’t understand this. Even for technical folks it demands a good amount of knowledge to understand it.

This pushed me to do some research to find something or build some thing which at least provide following things

Easily readable/writable by humans without any fancy tools

Easily extensible

Can auto generate code

Can be used for documentation (may require a little bit (not heavy) of training to understand the document)

I personally like RESTful services so I was looking for something which has JSON support. During my research, I have discovered two description languages JSONSchema and Swagger which helped me in solving the problem. JSONSchema provides you a way to describe/document the request and response payloads in JSON. Swagger provides a way to describe/document RESTful services in JSON and Swagger uses JSONSchema to document the request and response payloads. A sample JSONSchema Request would look like this

{ "id": "https://schema.srikalyan.com/sample/1.0/HelloResponse.json", "title": "HelloResponse", "description": "Response object from hello service", "type": "object", "properties": { "message": { "type": "string", "description": "Customized hello message for the user" } }, "required": [ "message" ] }

and a sample Swagger spec would look like this

{ "swagger": "2.0", "info": { "description": "A simple API describing hello service", "version": "1.0", "title": "Hello Service API", "termsOfService": "http://swagger.io/terms/" }, "basePath": "/app_name/v1", "schemes": [ "https" ], "paths": { "/hello/{name}": { "get": { "summary": "Returns the hello message for the user", "operationId": "hello", "parameters": [ { "in": "path", "name": "name", "description": "Name of the user", "required": true } ], "responses": { "200": { "schema": { "$ref": "https://schema.srikalyan.com/sample/1.0/HelloResponse.json" }, "description": "Operation succeeded and hello message is returned " }, "500": { "description": "Operation failed due to an internal issue" } } } } }

These two specifications provides us way to document services in both machine and human readable/writable way. There are quite a few libraries which auto generate the code from these specifications. A quick google search might give you what you want. Even if you don’t find one that would fit your needs in terms of generation of code you can easily write one within matter of hours. I wrote one in couple of hours for my needs. This helps my company reduce the development time and release new features quite regularyly. I hope you enjoy this blog post and please leave a comment if you need any help or information on this topic.

#JSONSChema Swagger RESTful #development #SoftwareEngineering #Agile

Automatically add the upstream urls for your github.com projects

In my previous blog post, I have shown how to clone all the github projects of a user with a one line shell script. In this blog post, I am going to show you guys on how to automatically add upstream url for all of your projects using a single shell script line. The things I am going to use here are developer api of github, curl, jq. I am going to do this in a two step process

Run a script to get the upstream urls and create shell script

for folder in `ls --color=never -d */ | tr -d "/"`; do echo "cd $folder";ssh_url=`curl "https://api.github.com/repos/srikalyan/$folder" | jq '.parent.ssh_url' | tr -d '"'`; echo "git remote add upstream $ssh_url"; printf "cd ..\n\n"; done 2> /dev/null > upstream.sh

This creates a file which looks likes this

.... cd spring-boot git remote add upstream [email protected]:spring-projects/spring-boot.git cd .. cd spring-data-jpa git remote add upstream [email protected]:spring-projects/spring-data-jpa.git cd ..

Run the created shell script

/bin/bash upstream.sh

If you are interested in details then the explaination goes like this

Iterate through all your github folders

for folder in `ls --color=never -d */ | tr -d "/"` do ..... done

Use github api to get the repository info (I am using spring-data-jpa as repository)

curl https://api.github.com/repos/srikalyan/spring-data-jpa This gives the following output { "id": 37492014, "name": "spring-data-jpa", "full_name": "srikalyan/spring-data-jpa", "owner": { "login": "srikalyan", "id": 456768, "avatar_url": "https://avatars.githubusercontent.com/u/456768?v=3", "gravatar_id": "", "url": "https://api.github.com/users/srikalyan", "html_url": "https://github.com/srikalyan", "followers_url": "https://api.github.com/users/srikalyan/followers", "following_url": "https://api.github.com/users/srikalyan/following{/other_user}", "gists_url": "https://api.github.com/users/srikalyan/gists{/gist_id}", "starred_url": "https://api.github.com/users/srikalyan/starred{/owner}{/repo}", "subscriptions_url": "https://api.github.com/users/srikalyan/subscriptions", "organizations_url": "https://api.github.com/users/srikalyan/orgs", "repos_url": "https://api.github.com/users/srikalyan/repos", "events_url": "https://api.github.com/users/srikalyan/events{/privacy}", "received_events_url": "https://api.github.com/users/srikalyan/received_events", "type": "User", "site_admin": false }, "private": false, "html_url": "https://github.com/srikalyan/spring-data-jpa", "description": "Simplifies the development of creating a JPA-based data access layer. ", "fork": true, "url": "https://api.github.com/repos/srikalyan/spring-data-jpa", "forks_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/forks", "keys_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/keys{/key_id}", "collaborators_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/collaborators{/collaborator}", "teams_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/teams", "hooks_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/hooks", "issue_events_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/issues/events{/number}", "events_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/events", "assignees_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/assignees{/user}", "branches_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/branches{/branch}", "tags_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/tags", "blobs_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/git/blobs{/sha}", "git_tags_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/git/tags{/sha}", "git_refs_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/git/refs{/sha}", "trees_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/git/trees{/sha}", "statuses_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/statuses/{sha}", "languages_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/languages", "stargazers_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/stargazers", "contributors_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/contributors", "subscribers_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/subscribers", "subscription_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/subscription", "commits_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/commits{/sha}", "git_commits_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/git/commits{/sha}", "comments_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/comments{/number}", "issue_comment_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/issues/comments{/number}", "contents_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/contents/{+path}", "compare_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/compare/{base}...{head}", "merges_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/merges", "archive_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/{archive_format}{/ref}", "downloads_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/downloads", "issues_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/issues{/number}", "pulls_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/pulls{/number}", "milestones_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/milestones{/number}", "notifications_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/notifications{?since,all,participating}", "labels_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/labels{/name}", "releases_url": "https://api.github.com/repos/srikalyan/spring-data-jpa/releases{/id}", "created_at": "2015-06-15T21:21:37Z", "updated_at": "2015-06-15T21:21:38Z", "pushed_at": "2015-06-14T10:10:10Z", "git_url": "git://github.com/srikalyan/spring-data-jpa.git", "ssh_url": "[email protected]:srikalyan/spring-data-jpa.git", "clone_url": "https://github.com/srikalyan/spring-data-jpa.git", "svn_url": "https://github.com/srikalyan/spring-data-jpa", "homepage": "http://www.springsource.org/spring-data", "size": 3792, "stargazers_count": 0, "watchers_count": 0, "language": "Java", "has_issues": false, "has_downloads": true, "has_wiki": false, "has_pages": true, "forks_count": 0, "mirror_url": null, "open_issues_count": 0, "forks": 0, "open_issues": 0, "watchers": 0, "default_branch": "master", "parent": { "id": 1072845, "name": "spring-data-jpa", "full_name": "spring-projects/spring-data-jpa", "owner": { "login": "spring-projects", "id": 317776, "avatar_url": "https://avatars.githubusercontent.com/u/317776?v=3", "gravatar_id": "", "url": "https://api.github.com/users/spring-projects", "html_url": "https://github.com/spring-projects", "followers_url": "https://api.github.com/users/spring-projects/followers", "following_url": "https://api.github.com/users/spring-projects/following{/other_user}", "gists_url": "https://api.github.com/users/spring-projects/gists{/gist_id}", "starred_url": "https://api.github.com/users/spring-projects/starred{/owner}{/repo}", "subscriptions_url": "https://api.github.com/users/spring-projects/subscriptions", "organizations_url": "https://api.github.com/users/spring-projects/orgs", "repos_url": "https://api.github.com/users/spring-projects/repos", "events_url": "https://api.github.com/users/spring-projects/events{/privacy}", "received_events_url": "https://api.github.com/users/spring-projects/received_events", "type": "Organization", "site_admin": false }, "private": false, "html_url": "https://github.com/spring-projects/spring-data-jpa", "description": "Simplifies the development of creating a JPA-based data access layer. ", "fork": false, "url": "https://api.github.com/repos/spring-projects/spring-data-jpa", "forks_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/forks", "keys_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/keys{/key_id}", "collaborators_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/collaborators{/collaborator}", "teams_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/teams", "hooks_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/hooks", "issue_events_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/issues/events{/number}", "events_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/events", "assignees_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/assignees{/user}", "branches_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/branches{/branch}", "tags_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/tags", "blobs_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/git/blobs{/sha}", "git_tags_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/git/tags{/sha}", "git_refs_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/git/refs{/sha}", "trees_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/git/trees{/sha}", "statuses_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/statuses/{sha}", "languages_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/languages", "stargazers_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/stargazers", "contributors_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/contributors", "subscribers_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/subscribers", "subscription_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/subscription", "commits_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/commits{/sha}", "git_commits_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/git/commits{/sha}", "comments_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/comments{/number}", "issue_comment_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/issues/comments{/number}", "contents_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/contents/{+path}", "compare_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/compare/{base}...{head}", "merges_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/merges", "archive_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/{archive_format}{/ref}", "downloads_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/downloads", "issues_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/issues{/number}", "pulls_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/pulls{/number}", "milestones_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/milestones{/number}", "notifications_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/notifications{?since,all,participating}", "labels_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/labels{/name}", "releases_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/releases{/id}", "created_at": "2010-11-11T21:36:56Z", "updated_at": "2015-06-18T10:27:55Z", "pushed_at": "2015-06-17T16:24:18Z", "git_url": "git://github.com/spring-projects/spring-data-jpa.git", "ssh_url": "[email protected]:spring-projects/spring-data-jpa.git", "clone_url": "https://github.com/spring-projects/spring-data-jpa.git", "svn_url": "https://github.com/spring-projects/spring-data-jpa", "homepage": "http://www.springsource.org/spring-data", "size": 11715, "stargazers_count": 455, "watchers_count": 455, "language": "Java", "has_issues": false, "has_downloads": true, "has_wiki": false, "has_pages": true, "forks_count": 284, "mirror_url": null, "open_issues_count": 18, "forks": 284, "open_issues": 18, "watchers": 455, "default_branch": "master" }, "source": { "id": 1072845, "name": "spring-data-jpa", "full_name": "spring-projects/spring-data-jpa", "owner": { "login": "spring-projects", "id": 317776, "avatar_url": "https://avatars.githubusercontent.com/u/317776?v=3", "gravatar_id": "", "url": "https://api.github.com/users/spring-projects", "html_url": "https://github.com/spring-projects", "followers_url": "https://api.github.com/users/spring-projects/followers", "following_url": "https://api.github.com/users/spring-projects/following{/other_user}", "gists_url": "https://api.github.com/users/spring-projects/gists{/gist_id}", "starred_url": "https://api.github.com/users/spring-projects/starred{/owner}{/repo}", "subscriptions_url": "https://api.github.com/users/spring-projects/subscriptions", "organizations_url": "https://api.github.com/users/spring-projects/orgs", "repos_url": "https://api.github.com/users/spring-projects/repos", "events_url": "https://api.github.com/users/spring-projects/events{/privacy}", "received_events_url": "https://api.github.com/users/spring-projects/received_events", "type": "Organization", "site_admin": false }, "private": false, "html_url": "https://github.com/spring-projects/spring-data-jpa", "description": "Simplifies the development of creating a JPA-based data access layer. ", "fork": false, "url": "https://api.github.com/repos/spring-projects/spring-data-jpa", "forks_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/forks", "keys_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/keys{/key_id}", "collaborators_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/collaborators{/collaborator}", "teams_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/teams", "hooks_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/hooks", "issue_events_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/issues/events{/number}", "events_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/events", "assignees_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/assignees{/user}", "branches_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/branches{/branch}", "tags_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/tags", "blobs_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/git/blobs{/sha}", "git_tags_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/git/tags{/sha}", "git_refs_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/git/refs{/sha}", "trees_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/git/trees{/sha}", "statuses_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/statuses/{sha}", "languages_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/languages", "stargazers_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/stargazers", "contributors_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/contributors", "subscribers_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/subscribers", "subscription_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/subscription", "commits_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/commits{/sha}", "git_commits_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/git/commits{/sha}", "comments_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/comments{/number}", "issue_comment_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/issues/comments{/number}", "contents_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/contents/{+path}", "compare_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/compare/{base}...{head}", "merges_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/merges", "archive_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/{archive_format}{/ref}", "downloads_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/downloads", "issues_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/issues{/number}", "pulls_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/pulls{/number}", "milestones_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/milestones{/number}", "notifications_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/notifications{?since,all,participating}", "labels_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/labels{/name}", "releases_url": "https://api.github.com/repos/spring-projects/spring-data-jpa/releases{/id}", "created_at": "2010-11-11T21:36:56Z", "updated_at": "2015-06-18T10:27:55Z", "pushed_at": "2015-06-17T16:24:18Z", "git_url": "git://github.com/spring-projects/spring-data-jpa.git", "ssh_url": "[email protected]:spring-projects/spring-data-jpa.git", "clone_url": "https://github.com/spring-projects/spring-data-jpa.git", "svn_url": "https://github.com/spring-projects/spring-data-jpa", "homepage": "http://www.springsource.org/spring-data", "size": 11715, "stargazers_count": 455, "watchers_count": 455, "language": "Java", "has_issues": false, "has_downloads": true, "has_wiki": false, "has_pages": true, "forks_count": 284, "mirror_url": null, "open_issues_count": 18, "forks": 284, "open_issues": 18, "watchers": 455, "default_branch": "master" }, "network_count": 284, "subscribers_count": 1 }

All we need is the ssh_url(If you need non ssh url then try using git_url) from the parent dictionary in the json returned from the github api

jq '.parent.ssh_url'

Rest is just to remove the double quote and generate a shell script

I hope you guys like it :)

#github #upstream #url #curl #jq #automatic

How to checkout all your git repositories/projects from github.com using just shell script

I recently brought a new computer (old one died) and I had to clone all my github projects in order to read the code during my free time. To do this I have to clone every repository manually by specifying its url which is not going to happen (atleast by me)

I found about the developer api of github and with the help of curl and jq, I could clone all my projects quite easily. The commands I have used are

curl "https://api.github.com/users/srikalyan/repos?type=all&sort=created&direction=desc" | jq ".[]|.name" | tr -d '"' | xargs -I repo git clone [email protected]:srikalyan/repo.git

The explaination of the above command goes like this

Get all the repos from the github using the developer api

curl "https://api.github.com/users/srikalyan/repos?type=all&sort=created&direction=desc"

The output is a json array like this

[ ... { "id": 17812430, "name": "ccm", "full_name": "srikalyan/ccm", "owner": { "login": "srikalyan", "id": 456768, "avatar_url": "https://avatars.githubusercontent.com/u/456768?v=3", "gravatar_id": "", "url": "https://api.github.com/users/srikalyan", "html_url": "https://github.com/srikalyan", "followers_url": "https://api.github.com/users/srikalyan/followers", "following_url": "https://api.github.com/users/srikalyan/following{/other_user}", "gists_url": "https://api.github.com/users/srikalyan/gists{/gist_id}", "starred_url": "https://api.github.com/users/srikalyan/starred{/owner}{/repo}", "subscriptions_url": "https://api.github.com/users/srikalyan/subscriptions", "organizations_url": "https://api.github.com/users/srikalyan/orgs", "repos_url": "https://api.github.com/users/srikalyan/repos", "events_url": "https://api.github.com/users/srikalyan/events{/privacy}", "received_events_url": "https://api.github.com/users/srikalyan/received_events", "type": "User", "site_admin": false }, "private": false, "html_url": "https://github.com/srikalyan/ccm", "description": "A script to easily create and destroy an Apache Cassandra cluster on localhost", "fork": true, "url": "https://api.github.com/repos/srikalyan/ccm", "forks_url": "https://api.github.com/repos/srikalyan/ccm/forks", "keys_url": "https://api.github.com/repos/srikalyan/ccm/keys{/key_id}", "collaborators_url": "https://api.github.com/repos/srikalyan/ccm/collaborators{/collaborator}", "teams_url": "https://api.github.com/repos/srikalyan/ccm/teams", "hooks_url": "https://api.github.com/repos/srikalyan/ccm/hooks", "issue_events_url": "https://api.github.com/repos/srikalyan/ccm/issues/events{/number}", "events_url": "https://api.github.com/repos/srikalyan/ccm/events", "assignees_url": "https://api.github.com/repos/srikalyan/ccm/assignees{/user}", "branches_url": "https://api.github.com/repos/srikalyan/ccm/branches{/branch}", "tags_url": "https://api.github.com/repos/srikalyan/ccm/tags", "blobs_url": "https://api.github.com/repos/srikalyan/ccm/git/blobs{/sha}", "git_tags_url": "https://api.github.com/repos/srikalyan/ccm/git/tags{/sha}", "git_refs_url": "https://api.github.com/repos/srikalyan/ccm/git/refs{/sha}", "trees_url": "https://api.github.com/repos/srikalyan/ccm/git/trees{/sha}", "statuses_url": "https://api.github.com/repos/srikalyan/ccm/statuses/{sha}", "languages_url": "https://api.github.com/repos/srikalyan/ccm/languages", "stargazers_url": "https://api.github.com/repos/srikalyan/ccm/stargazers", "contributors_url": "https://api.github.com/repos/srikalyan/ccm/contributors", "subscribers_url": "https://api.github.com/repos/srikalyan/ccm/subscribers", "subscription_url": "https://api.github.com/repos/srikalyan/ccm/subscription", "commits_url": "https://api.github.com/repos/srikalyan/ccm/commits{/sha}", "git_commits_url": "https://api.github.com/repos/srikalyan/ccm/git/commits{/sha}", "comments_url": "https://api.github.com/repos/srikalyan/ccm/comments{/number}", "issue_comment_url": "https://api.github.com/repos/srikalyan/ccm/issues/comments{/number}", "contents_url": "https://api.github.com/repos/srikalyan/ccm/contents/{+path}", "compare_url": "https://api.github.com/repos/srikalyan/ccm/compare/{base}...{head}", "merges_url": "https://api.github.com/repos/srikalyan/ccm/merges", "archive_url": "https://api.github.com/repos/srikalyan/ccm/{archive_format}{/ref}", "downloads_url": "https://api.github.com/repos/srikalyan/ccm/downloads", "issues_url": "https://api.github.com/repos/srikalyan/ccm/issues{/number}", "pulls_url": "https://api.github.com/repos/srikalyan/ccm/pulls{/number}", "milestones_url": "https://api.github.com/repos/srikalyan/ccm/milestones{/number}", "notifications_url": "https://api.github.com/repos/srikalyan/ccm/notifications{?since,all,participating}", "labels_url": "https://api.github.com/repos/srikalyan/ccm/labels{/name}", "releases_url": "https://api.github.com/repos/srikalyan/ccm/releases{/id}", "created_at": "2014-03-16T23:59:21Z", "updated_at": "2014-06-22T21:10:46Z", "pushed_at": "2014-06-22T21:10:46Z", "git_url": "git://github.com/srikalyan/ccm.git", "ssh_url": "[email protected]:srikalyan/ccm.git", "clone_url": "https://github.com/srikalyan/ccm.git", "svn_url": "https://github.com/srikalyan/ccm", "homepage": "", "size": 518, "stargazers_count": 0, "watchers_count": 0, "language": "Python", "has_issues": false, "has_downloads": true, "has_wiki": true, "has_pages": false, "forks_count": 0, "mirror_url": null, "open_issues_count": 0, "forks": 0, "open_issues": 0, "watchers": 0, "default_branch": "master" } ]

Use jq to parse the json to extract the name

jq ".[]|.name"

Rest is self explainatory

tr -d '"' | xargs -I repo git clone [email protected]:srikalyan/repo.git

Basically, removes the double quote and clones each project

Thats it all your repos are cloned. I hope you enjoyed it :)

In my next blog post, I will put up something to add the upsteam url for each of these repos/projects

#github #clone #repository #curl #jq #shell script

Trending Blogs

Recently Viewed Blogs

Srikalyan Swayampakula's Blog