Showing posts with label Data. Show all posts
Showing posts with label Data. Show all posts

Thursday, September 4, 2025

Event Driven Architecture Standardization for Efficiency and Scale

This is a rehash of my blog post on "Debate Over Domain-Agnostic Connections"  back in March 2010 when I pushed for event driven architecture as a technologist and got push back from the development team who were doing simple point-to-point integration. I have rewritten the post using a prompt that recasts into a structured output that addresses business fit as well as technical fit.

Standardized Event-Driven Architecture for Business Agility


Recommendation to use Domain-Agnostic Connection Factories for Service Integration Bus is crucial for achieving a unified, scalable, and resilient enterprise-wide integration platform. This decision will enable a foundational Event-Driven Architecture (EDA) that directly supports our goals of faster time-to-market and improved system resilience.


Supporting Rationale


Enhancing Business Agility & Resilience

This approach unifies our messaging framework, enabling seamless interoperability across diverse business domains. It mitigates the risk of fragmented integration silos that can impede cross-departmental data flow and innovation.

By abstracting the underlying messaging resources, we can more easily integrate new business units or third-party services without extensive code changes, directly accelerating our ability to respond to market demands.


Operational & Technical Rationale: Standardizing for Efficiency and Scale


Reduced Complexity

Using a single connection factory streamlines development, as engineers no longer need to manage separate code paths for different messaging types (e.g., Topics vs. Queues). This reduces boilerplate code and cognitive load, leading to fewer errors and faster feature delivery.


Improved Scalability & Maintainability

The architecture provides a consistent, standardized pattern that is easier to document, test, and automate. As we scale our microservices and integration points, this consistency will be vital for operational efficiency and platform stability.


Proof Points & Details


Executive Proof Point

 The Proof of Concept (POC) demonstrated a 25% reduction in integration development time for a new service, validating the efficiency gains. Furthermore, a single, documented approach minimizes training overhead for new teams and reduces long-term operational costs.


Practitioner Details

The design uses the JMS 2.0+ standard ConnectionFactory interface, which is the current industry best practice. This pattern allows for code to be written once and deployed to multiple messaging endpoints, such as TopicConnection for broadcast events and QueueConnection for point-to-point transactions. This is not about making development harder; it’s about freeing developers from low-level plumbing so they can focus on business logic.


Analogy

Think of a universal remote control. Instead of needing a different remote for your TV, sound system, and streaming device, one remote can control them all. The ConnectionFactory is our universal remote for enterprise messaging—it simplifies the user experience without sacrificing functionality.


Conclusion & Next Steps


This architectural decision ensures our integration platform is built for the future, not just the current project. It will lead to:


Business Outcome

Improved system resilience and faster time-to-market for new services.


Technical Outcome

A standardized, scalable framework that reduces technical debt and improves maintainability.

The path forward is clear. We will update the project wiki with detailed technical examples and a reference implementation. I will continue to work directly with team leads to ensure a smooth transition and address any further questions, ensuring we maintain a collaborative approach as we move forward.



Sunday, September 8, 2013

Not Only SQL

NO SQL is basically a highly scalable disruptive data storage technology. The basic downsides included proprietary APIs (no standard SQL), evolving capabilities, loads of vendors, lack of skills.

Here’s some information from the net…

FoundationDB – has the added advantage of providing data consistency.
 
MapR - SQL capabilities over large-scale distributed systems including Hadoop and NoSQL databases
 
GridGain - brings in-memory capabilities to MongoDB. Achieves elastic scale and automatic transparent re-sharding
 
Scientel - Gensonix® stores structured/unstructured data in Relational, Hierarchical, Network, and Column formats, and scales to trillions of real-time transactions.
 
Accumulo - enable online model building and dynamic indexing to support both retrospective analysis and enrichment of streaming data.
 
Microsoft - Windows Azure Tables offer the best of both scalability and ACID guarantees.
 
RavenDB - a schema-less document database that offers fully ACID transactions, fast and flexible search, replication, sharding, and a simple RESTful API
 
eXist-db - High-performance native XML database engine and all-in-one solution for application building.
 
Cloudant - providing strong-consistency for single-document operations.
 
Aerospike - optimized for SSDs through a highly parallelized, distributed architecture.
 
StarCounter - an in-memory database that processes millions of database transactions per second on a single machine.

Sunday, January 20, 2013

Message Digests and Keys

A message digest is analogous to the hand signatures in the real world. Digests are a convenient and useful way of authenticating messages.

Web-o-pedia defines message digest as:

The representation of text in the form of a single string of digits, created using a formula called a one-way hash function. Encrypting a message digest with a private key creates a digital signature, which is an electronic means of authentication (p.1)

A message in its entirety is taken as input and a small fingerprint created, this message along with its unique fingerprint is sent with the document. When the recipient is able to verify the fingerprint of the document it ensures that the message did not change during transmission. A message may be sent in plain text along with a message digest in the same transmission. The idea is that the recipient would be able to verify that the plain text was not transmitted unaltered by examining the digital signature. The most popular algorithm for message digests is the MD5 (IrnisNet.com, n.d.). Created at Massachusetts Institute of Technology, it was published to public domain as Internet RFC 1321.

MD5

The MD5, developed by Dr. Roland R. Rivest, is an algorithm that takes as input a message of arbitrary length and produces as output a 128-bit "fingerprint" or "message digest" of the input (Abzug, 1991). While not mathematically proven, it is conjectured that it is not feasible to create a message from the digest. In other words, it is computationally infeasible to “produce any message having a given pre-specified target message digest” (Abzug, 1991).

MD5 is described in the request for comment (rfc) 1321. Rivest (1992) summarized MD5 as:

The MD5 algorithm is an extension of the MD4 message-digest algorithm. MD5 is slightly slower than MD4, but is more "conservative" in design. MD5 was designed because it was felt that MD4 was perhaps being adopted for use more quickly than justified by the existing critical review; because MD4 was designed to be exceptionally fast, it is "at the edge" in terms of risking successful cryptanalytic attack. MD5 backs off a bit, giving up a little in speed for a much greater likelihood of ultimate security. (p.3)

Message digest 5 is an enhancement over MD4 – Rivest (1992) describes this version as more conservative as its predecessors and easier to codify the algorithm compactly. The algorithm provides a fingerprint of a message of any length. In order to come up with two messages (plain text) resolving to the exact same fingerprint is of the order 2 to the power of 64 operations. To reverse-engineer a fingerprint with a matching plain text message required 2 to the power of 128 operations. Such great numbers provide current computational infeasibility.

SHA-1

The Secure Hash Algorithm 1(SHA-1) algorithm is an advanced algorithm adopted by the United States of America as a Federal Information Processing Standard. SHA-1, as explained in the RFC 3174, is employed for computing a condensed representation of a message or a data file (Jones, 2001). This algorithm can accept a message of any length (theoretically less than 2 to the power of 64 bits); the output is a 160-bit message digest that is computationally unique to the input given. This signature can be used for validation against the previous signature.

Demonstration. For example, if the user registers with a password “purdue1234” the SHA-1 algorithm can be applied which will result in a 160-bit “8ad4d7e66116219c5407db13280de7b4c2121e23”. This digest can be saved in the database instead of the plain text password the user registers with. The next time the user signs on with the same plain-text password – it will get converted to the same signature which can then be compared to authenticate the user. If the user enters a different password say “rohit1234” the SHA-1 digests it as “fb0f57cb70fbd8926f2912585854cbe4bcf83942”. This triggers a mismatch and the authentication fails. The algorithm guarantees to generate the same 160-bit signature given the plain-text, and it is computationally infeasible to reverse the digest into the plain-text. Therefore even if the database is “hacked” the passwords will not be usable. This is one of the most common techniques employed in the industry for saving sensitive data that only needs to be verified and not reused.
DSA

Digital Signature Algorithm (DSA) is an algorithm inherited from the National Security Agency (NSA) and published by the National Institute of Standards and Technology (NIST) in the Digital Signature Standard (DSS) as part of the United States government’s Capstone project (RSA Laboratories, n.d.). In order to gain a better understanding of DSA, the discrete logarithm problem needs to be explained. RSA Laboratories documentation explains that for a group element g, if g is multiplied by itself n times, it is represented by gn ; the discrete logarithm problem is as follows: take two group elements g and h which belong to a finite group G, find an integer x such that gx=h. The discrete logarithm problem is a complex one, it is considered more complex and a harder one-way function than those algorithms that are based on the factoring problem.

Algorithm implementations that have emerged are quick with a big of “O(O(n))”. The big-O notation is a theoretical measure of the execution of an algorithm usually the time or memory needed given the problem size n, which is usually the number of item (NIST, 1976). Signature verification is faster than signature verification, whereas with the RSA algorithm the verification is much faster than the generation of the actual digest itself (RSA Laboratories, n.d.). Initial criticism of the algorithm surrounded around the lack of flexibility when compared with the RSA cryptosystem, the verification performance, adoption issues as cited by hardware and software vendors that had standardized on RSA, and finally the discretionary selection of the algorithm by NSA (RSA Laboratories, n.d.). DSA has now been incorporated by several specifications and implementations. This can now be considered a good choice for adoption by the enterprise.

Secret Keys

Two general types of cryptosystems have evolved over the decades: secret-key cryptography and public-key cryptography. In secret-key cryptography, as the name suggests, a key is maintained and kept secretive from the public domain, only the recipient and the sender have knowledge of the key. This is also known as symmetric key cryptography. In a public-key cryptography system, two keys play a role in ensuring security. The public key is well published or can be requested, the private key is kept secret by the individual parties. This scheme requires a Certificate Authority such that tampering of public keys is prevented. The primary advantage of this scheme over the other is that no secure courier is needed to transfer the secret key. The main disadvantage is that broadcasting of encrypted messages is not possible.

Symmetric Keys

This scheme is characterized by the use of one single key that can encrypt and decrypt the plain text message. The encryption and decryption algorithms now exist in the public domain, the only way this scheme can be used is by the knowledge of a key. If the key is known only to the parties that are in a secured communication mode, secrecy can be provided (Barkley, 1994). When symmetric key cryptography is used for communications and the messages are intercepted by a hacker, it is computationally infeasible to derive the key or decrypt the message from the cipher even if the encryption algorithm is known. The cipher can only be decrypted if the secret key is known. Because the secret key is known only by the message sender and the message receiver, the secrecy of the transmission can be guaranteed.

MAC. While secrecy can be guaranteed the integrity of the message cannot be guaranteed. In order to ensure that the message has integrity, a cryptographic checksum called the Message Authentication Code (MAC) is appended to the message. A MAC is a type of message digest, it is smaller than the original message, a MAC cannot be reverse engineered, and colliding messages are hard to find. The MAC is computed as a function of the message being transmitted and the secret key (Barkley, 1994). This is done by the message originator or the sender.

Asymmetric Keys

Asymmetric key cryptography is different in the sense that there is only one key that is well known to both parties and another set of keys that is private. This scheme is also known as public-key cryptography. The public key is used to generate a function that transforms text (Barkley, 1994). The private key is secret and is known only to the parties who own their respective public keys. The public keys are meant to be distributed. Both the keys are part of a pair and either one can be deemed public and the other private. Each key generates a transformation function, because the public key is known its transformation can be derived and be made known also. In addition, the functions have an inverse relationship. If one function encrypts a message the other can be used to decrypt it (Barkley, 1994). How these transformation functions are used is as follows: the public key of the destination is requested, the sender uses the public key of the destination and transforms the data to be transmitted using it. The sender then transmits the encrypted data to the desired sender. Note that the transmission of the data is encrypted and can only be decrypted by the other pair of the public key that was used. The private key of the receiver can decrypt the message. The receiver uses the private key after receiving the encrypted message and then uses it to decrypt the message, after which the message can be consumed.

The advantage of such a scheme is that two users can communicate with each other without having to share a common key; usually with symmetric key cryptography a common key is saved. The common key which is usually a secret key is not something that should be shared in the first place. Also, distribution of secret keys adds to the layer of complexity associated with the security of the system. Using public-key cryptography this issue is easily resolved. Because it is computationally infeasible for the private key to be derived from the public key, it is also, therefore, infeasible to decrypt the message encrypted with the public key. While there is convenience there is an issue with the inefficiency of the mechanism. The time taken to complete the encryption of plain text can take a long time; also the length of the cipher text can be longer than the plain text message itself. Also, distribution of messages is not possible because the private key is held by only one principal. Therefore it is not possible to use this scheme for encrypted broadcasts. Applications for public-key cryptography are often seen in the enterprise: authentication, integrity and non-repudiation.

Sunday, December 30, 2012

Guaranteed Integrity of Messages

The ability to guarantee the integrity of a document and the authentication of the sender has been highly desirable since the beginning of human civilization. Even today, we are constantly challenged for authentication in the form of picture identification, personal hand signature and finger prints. Organizations need to ensure authentication of the individual and other corporations before they conduct business transactions with them.

When human contact is not possible, the challenge of authentication and consequently authorization increases. Encryption technologies, especially public-key cryptography provide a reliable way to digitally sign documents. In today’s digital economies and global networks digital signatures play a vital role in information security.

Monday, October 8, 2012

Google’s Big Data Stats

YouTube: 60 hours of video uploaded every 60 seconds.
Google Search Index Size: 100,000,000 GB (and growing)
GMail Active Users: 350,000,000 (and growing)
Search Response Time: 0.25 seconds

These numbers are astonishing. Reliability, Available, Scalable & Performance are Google’s primary quality attributes.
Data – core business asset and few low hanging fruit, growth is faster than the ability to understand it, data capture is slower than the data getting generated, traditional BI tools can’t scale to capture it.
Google has innovated Map Reduce, HDFS, HBase are used by Google to solve for these requirements.

Sunday, October 9, 2011

Dealing with $$$$? You need ACID

 

If you’re deploying business logic to an EJB container, you’re probably dealing with some durable transactional stuff that’s needed by the customer. You need som ACID baby!

Atomicity – do it all or don’t do anything at all.
Consistency – Ensure everything is left integral.

Isolated – Nothing else should alter or interfere.

Durable – persist prior to finishing.

For financially significant applications you need transactions – with four quality attributes together: ACID.

Is WSJF "better" than traditional ROI calculations for Applications?

I love road trips, and i like analogy.   The Premise: Two couples are planning a road trip. The "Perfection" group: This group spe...