The IT Architect's Journal: Learning Cloud, AI, and Cybersecurity in Public.

I'm Rohit Sood, and I'm sharing my insights, work and opinions to think through my own battle-tested insights as well as untested hypothesis. I want to hear from you. IT Leaders and technologists are navigating the complex landscape of enterprise technology. My goal is to cut through the noise and deliver actionable knowledge grounded in my experiences as a tech startup co-founder and a veteran of major corporations. All opinions are my own.

Sunday, May 19, 2013

Understanding Architectures with Pictures

“Pictures speak a 1024 words.” – this is a quote I used a lot for the past 10 years or so. Why? Well this is because architectures need to be visualized. Bredemeyer Consulting’s Visual Architecting Process, SEI’s SAPP, RUP’s UML, Zachman, TOGAF etc. all dwell on visualizing abstractions. But how do you translate that to real projects, and especially those that claim to be agile and misunderstand that to mean no design and no plans? As one of the many signatories of the agile manifesto it is clear to me that those projects that do not know what the architecture is can not deliver software in a timely manner with good quality attributes.

Circuit Board On A Blueprint Background Royalty Free Stock Photos - Image: 7848798

Software Architecture should be represented by a set of views that support its analysis. Usually the following views are most often used:

Advice: At least 3-views recommended by SEI:

1. Module View

2. Component View

3. Deployment View

Plus a sequence diagram can be added as the forth.

Multiple views of a software architecture allow it to be understandable without any confusion by the entire team and its stakeholders.

Sunday, May 12, 2013

Architect with security in mind as a first thought

So if you’re doing a solution architecture review, make sure you first look at the security design of the system including authentication, digital signatures, secret key cryptography, public key cryptography, authorization, and non-repudiation from the perspective of a digital firm. Authentication and authorization are the founding stones of security which needs to be understood and deployed across the enterprise.

http://images.appleinsider.com/att-security-guard-070607.jpg

The use of digital signatures has seen tremendous growth in recent years and with the onset of new technologies, in particular Web-services, promises to be the dominant area in security. Corporate espionage is on the rise, and security can not be overlooked.
Ensure your system vulnerabilities are checked - Cross Site Scripting seems to be the worst offender in modern systems. Make sure your internet-facing applications are hosted on supported and patched platforms. Approach it with an outside-in, basic-first strategy for your IT department instead of focussing on obtuse things like bit-encryption levels first, ensure you can prioritize defenses against the most probably threat vectors first.

Sunday, May 5, 2013

Software Architectures need to be evaluated.

What constitutes an architecture?

“You employ stone, wood and concrete, and with these materials you build houses and palaces. That is construction. Ingenuity is at work.

But suddenly you touch my heart, you do me good, I am happy and I say ‘This is beautiful’. That is Architecture.”

- Le Corbusier, 1923

- Quoted in Architecture: From Prehistory to Post-modernism

Well, then what is software architecture?

There is no universal agreed upon formal definition of software architecture, however, the Software Engineering Institute (SEI) has defined it as follows:

“The software architecture of a system is the structure of structures of the system, which comprise software components, the externally visible properties of those components and the relationships among them.” - SEI’s definition of Software Architecture.

- It is a vehicle for communication among stakeholders.

- It is the manifestation of the earliest design decisions.

- It is a reusable, transferable abstraction.

Software elements – modules, components etc. Externally visible properties – does provide for internal flexibility. E.g. a contract is externally visible.

All designs involve tradeoffs. Architecture is the earliest life-cycle artifact that embodies significant design decisions: choices and tradeoffs.

Predict a system’s quality attributes by studying its architecture. We can analyze architecture for achievement of quality attributes – it determines risk not a “grade”.

Bottom line: an evaluation should result in architectural “Risks Themes”. See SEI’s web-site for details.

Evaluation of Software Architecture is essential to determine Risks

I love change for positive growth and innovation because it makes me excited and feel like I am making a difference to the people using the product that was once in my head and now in their hands.

Sometimes I encounter software architectures just “evolved” out of need. At times teams “end up” with architectures that just happened to them, other times projects are proposed and designs sketched up to deliver the software. Evaluation of software is essential in all cases.

Look at this structure, to me this looks really ugly, however to the contractor it may be the most lucrative structure to the people living inside it doesn’t matter. Risks, Non-Risks, Tradeoffs and Sensitivity points are great ways to highlight risk themes so that a design decision can be made once they are understood.

The point is : no architecture is good or bad, there are simply risk themes which when elaborated gives the person information to personally judge it based on their needs.

Monday, April 29, 2013

ATAM

The ATAM process is a short, facilitated interaction between the stakeholders to conduct the activities outlined in the blackboard, leading to the identification of risks, sensitivities, and tradeoffs:

• risks can be the focus of mitigation activities, e.g. further design, further analysis, prototyping

• sensitivities and tradeoffs can be explicitly documented

Architecture reviews are not repeatable without a process. ATAM gives a defined process to achieve a repeatable architecture evaluation process.

The federally funded Software Engineering Institute Carnegie Mellon has pioneered this method for evaluation of software architectures.

Saturday, April 6, 2013

Milk Adulteration in India

This post is a departure from my normal range of topics.

When I was a child, my grandfather, considered cows milk to be the elixir of the Gods. The entire multi-generational household including the neighbors got their milk free of cost for decades.

I have done some research on Milk contamination in India. In the past two weeks during my visit to India I have personally recognized "suspicious milk" in either chai in various dhabas or in paneer in various meals. Milk that smells funny, looks a little weird and tastes 'synthetic'; and Paneer that is just "too white" when I cut into it - and too pasty or chewy than what i remember paneer to be.

I spent a few hours researching this - and here is what I found and I thought it was worth sharing.

State by state milk samples were taken across the country and after various chemical tests, the milk standards conformity across states varied differently.

I was shocked to see that 100% of West Bengal milk sampled by the government of India is adulterated and contaminated. Punjab 81%, Delhi 70% milk is contaminated, and Maharashtra is 65% - see the report link in PDF.

This means the suppliers to brand named milk marketeers like Mother Dairy, Amul etc are adulterated as well as "loose milk" is contaminated. Profit over health - see the video report.

Here is what the scientific tests done by the Govt. of India reports:

"The non-conforming sample in the descending order of percentage with

respect to the total sample collected in different states were as follows: Bihar

(100%), Chhattisgarh (100%), Daman and Diu (100%), Jharkhand (100%),

Orissa (100%), West Bengal (100%), Mizoram (!00%), Manipur (96%),

Meghalaya (96%), Tripura (92%), Gujarat (89%), Sikkim (89%), Uttrakhand

(88%), Uttar Pradesh (88%), Nagaland (86%), Jammu & Kashmir (83%),

Punjab (81%), Rajasthan (76%) Delhi (70%), Haryana (70%), Arunachal

Pradesh (68%), Maharashtra (65%), Himachal Pradesh (59%), Dadra and Nagar

Haveli (58%), Assam (55%), Chandigarh (48%), Madhya Pradesh (48%),

Kerala (28%), Karnataka (22%), Tamil Nadu (12%), and Andhra Pradesh

(6.7%). "

Reference:

Executive Report from FSSAI http://www.fssai.gov.in/Portals/0/Pdf/sample_analysed(02-01-2012).pdf

News Report http://www.youtube.com/watch?v=ZSFogugc0-w

What actions or behavior modifications should be taken? I think we need to drop consumption of all milk based products and switch to green tea, black coffee, no curd, lassi or paneer, butter or ghee. It may be too extreme a step but I believe there is a risk of contamination and ill health.

If you must drink milk - make sure you see it come out of the cow and bring it home, or else find Organic certifications that are reliable.

Sunday, February 10, 2013

Non-repudiation–not a non-issue

To understand electronic non-repudiation, we must understand traditional non-repudiation from a legal perspective. The basis for a legal repudiation of a manual signature can pass only if the signature is a forgery, or an authentic signature was obtained via unconscionable conduct by a party to a transaction, fraud instigated by a third party, and undue influence exerted by a third party (McCullagh & Caelli, 2000).

From a technical perspective non-repudiation (NR) is basically proof that a certain principal sent or received the message in question. Every message exchange can be tied to a principal with a guarantee. An NR token is generated and verified that is sent by the principal – this way the principal cannot deny sending that message. In the same way, an NR token for a message received by the principal is created – this way the receipt of the message cannot be denied either.

The technical meaning of non-repudiation shifts the onus of proof from the recipient to the alleged signatory or entirely denies the signatory the right to repudiate a digital signature (McCullagh & Caelli, 2000). The use of a trusted system can solve the authentication, authorization and consequently non-repudiation issues by leveraging digital signatures.

Web-services. With more and more e-commerce being conducted on the Web and business-to-business transactions occurring, the importance of non-repudiation and digital signatures has gained a lot of importance. In the future, digital signatures will be commonly used in this area for providing non-repudiation services to the enterprise.

Sunday, February 3, 2013

Authorization–Legal Drinking Age ?

Authorization is the process by which valuable resources are protected and only limited access provided to principals who are authenticated. Principals are entities that request access to resources. Principals can be people or other servers. It is important to note that authorization can take place only when authentication of the principal has occurred previously. This makes sense because principals who are unable to prove their identity should not be given permission to access sensitive information.

Authorization in the Enterprise

In the enterprise environment access control comes in many flavors including discretionary, role-based, mandatory, and firewall types of access control. Discretionary authorization is the process by which two principal are given mutually exclusive access to the same resource. For example, principal A can be give read-only access to resource C while principal B can be given full access to the same resource. Usually such access control mechanisms are hierarchical in nature.

Access Control List

Discretionary access-control mechanisms typically maintain a list of principals and their associated permissions in an access control list (ACL). ACLs can be stored separately that can be accessed during the authentication or authorization process. Principals can also be parts of groups and have group access permissions applied. Role-base access control is applied when a usage role has to be applied across several principals. If there are multiple system users then a user group is created and a common ACL applied. Once the ACL is applied to the group, all principals that belong to the group automatically inherit the permissions too. It is still possible in most cases to override, overload or perform other polymorphic behaviors to user-permissions applied to principals. Applying access controls to security groups and principals works well in most cases.

Classifications. Classification levels may be used to specify authorization levels, in this scheme the resource, principal and groups are all supplied with a pre-defined authorization level, the level of comparative authorization defines the actual access roles. For example, if resource C is tagged as classified, resource D as unclassified, principal A as classified and principal B as unclassified then principal A can access both C & D while principal B can access only D. Such parallel hierarchies can determine the access logic with ease. In general, if a principal’s classification level is higher than that of the resource then the principal is given access to the resource.

Firewalls. Inter-network communications is often protected by a firewall in the enterprise. A firewall is a mechanism by which access to particular transport control protocol/Internet protocol (TCP/IP) ports on some network of computers is restricted based on the location of the incoming connection request. Firewalls are often a gateway that connects two or more networks. Rules can be applied to firewalls that can block certain ports, protocols and Internet protocol (IP) addresses from access the network. Proxy-servers are sometimes installed inside corporate networks that typically bypass the firewall.

Trust domains. Domains can be defined and be used to protect sensitive resources. This is accomplished by grouping all servers and processes that have the same access control policy into a domain. This trust-domain can interact at the micro level with a level of trust defined by the ACL. IP address with specific ports and communications can also be included in the domain as well. Security policy domains are also sometimes called realms.

Java technology. Java employs stringent security standards in the Java Virtual Machine (JVM), however when security domains are pre-defined, code can be executed over uniform resource locators (URL) within the trust-domains. Also multiple domains can be defined and trust at a certain level is defined, this way code executing in one domain can trust, and make useful calls to code running in another domain. The domain is thus called trusted domain. Sub-domains can be created and each sub-domain can have one or more parent domains. The partitioning of domains by creating sub-domains provides the ability to assign more restrictive permissions at the sub-domain level – but not higher access levels. Domains can also be federated; the federation of domains allows permissions to be assigned to domains and other sub-domains.

Auditing. Authorization requests can be logged by the servers, gateways and firewall. Audit logs can help isolate sequence of events of particular threads of events. Investigation of such type can be done in order to uncover any suspected authorization attempts into protected resources. A lot of information can be logged into the security log files. Typical information that is logged during a security audit is audit type events, timestamp of the event, identity of the principal requesting access, identification of the target source being requested, permission being requested on the target source, location from which the target source is being requested and any protocol-specific information respectively (Jaworski, Perrone & Chaganti, 2001). Due to the sensitive nature of security logs accessing security logs should be restricted to authorized principals.

Sunday, January 27, 2013

Protocols for the Security Stickler

Data communications channels are often insecure, subjecting messages transmitted over the channels to passive and active threats (Barkley, 1994). Internet protocols connect various networks and data packets are transmitted over them. An entire protocol stack exists over which computers exchange messages. For example, Web-browsers sent Hyper-text Markup Language (HTML) messages over the Hyper-text Transfer Protocol (HTTP) which sits on top of the TCP/IP stack. Additional protocols are now in places that create secure channels for such communication, Secure Sockets Layer (SSL) sits between the HTTP and TCP/IP protocols, so for secure Web-page transfers HTTP is transmitted over the standard port 443 of SSL rather than the unsecured port 80 assigned to HTTP. Together this results in HTTPS (HTTP over SSL) communication. Secure socket layer and TLS are security protocols primarily used for network transport of messages.

Secure Sockets Layer

The Secure Sockets Layer (SSL) protocol is a security protocol that provides communication privacy over the Internet by allowing client-server applications to communicate in a way that is designed to prevent eavesdropping, tampering or message forgery (Freier, Karlton & Kocher, 1996). SSL is composed of a handshake protocol and a record protocol, which typically sits on top of a reliable communication protocol like TCP. SSL evolved into its latest version 3.0 resulting in the transport layer security protocol.

Transport Layer Security

The primary goal of The Transport Layer Security (TLS) protocol is to provide privacy and data integrity between two communicating applications; this is used for encapsulation of various higher level protocols (Dierks & Allen, 1999, p. 3). The TLS is actually a combination of two layers, the TLS Record Protocol and the TLS Handshake Protocol. The TLS Record Protocol has two basic properties: connection privacy and reliability. The TLS Handshake protocol has three basic properties: peer identity authentication, shared secret negotiation, and negotiation reliability.

One advantage of TLS is that is independent of the application protocol (Dierks & Allen, 1999, p. 4). Higher-level protocols can be layered on top of this protocol. This leaves the decision of TLS initiation of handshaking and authentication certificate exchanges to the judgment of higher-level protocol designers. The primary goals of the TLS protocol, thus, are to provide cryptographic security, interoperability, and extensibility. These are fundamental requirements of enterprise security.

Sunday, January 20, 2013

Message Digests and Keys

A message digest is analogous to the hand signatures in the real world. Digests are a convenient and useful way of authenticating messages.

Web-o-pedia defines message digest as:

The representation of text in the form of a single string of digits, created using a formula called a one-way hash function. Encrypting a message digest with a private key creates a digital signature, which is an electronic means of authentication (p.1)

A message in its entirety is taken as input and a small fingerprint created, this message along with its unique fingerprint is sent with the document. When the recipient is able to verify the fingerprint of the document it ensures that the message did not change during transmission. A message may be sent in plain text along with a message digest in the same transmission. The idea is that the recipient would be able to verify that the plain text was not transmitted unaltered by examining the digital signature. The most popular algorithm for message digests is the MD5 (IrnisNet.com, n.d.). Created at Massachusetts Institute of Technology, it was published to public domain as Internet RFC 1321.

MD5

The MD5, developed by Dr. Roland R. Rivest, is an algorithm that takes as input a message of arbitrary length and produces as output a 128-bit "fingerprint" or "message digest" of the input (Abzug, 1991). While not mathematically proven, it is conjectured that it is not feasible to create a message from the digest. In other words, it is computationally infeasible to “produce any message having a given pre-specified target message digest” (Abzug, 1991).

MD5 is described in the request for comment (rfc) 1321. Rivest (1992) summarized MD5 as:

The MD5 algorithm is an extension of the MD4 message-digest algorithm. MD5 is slightly slower than MD4, but is more "conservative" in design. MD5 was designed because it was felt that MD4 was perhaps being adopted for use more quickly than justified by the existing critical review; because MD4 was designed to be exceptionally fast, it is "at the edge" in terms of risking successful cryptanalytic attack. MD5 backs off a bit, giving up a little in speed for a much greater likelihood of ultimate security. (p.3)

Message digest 5 is an enhancement over MD4 – Rivest (1992) describes this version as more conservative as its predecessors and easier to codify the algorithm compactly. The algorithm provides a fingerprint of a message of any length. In order to come up with two messages (plain text) resolving to the exact same fingerprint is of the order 2 to the power of 64 operations. To reverse-engineer a fingerprint with a matching plain text message required 2 to the power of 128 operations. Such great numbers provide current computational infeasibility.

SHA-1

The Secure Hash Algorithm 1(SHA-1) algorithm is an advanced algorithm adopted by the United States of America as a Federal Information Processing Standard. SHA-1, as explained in the RFC 3174, is employed for computing a condensed representation of a message or a data file (Jones, 2001). This algorithm can accept a message of any length (theoretically less than 2 to the power of 64 bits); the output is a 160-bit message digest that is computationally unique to the input given. This signature can be used for validation against the previous signature.

Demonstration. For example, if the user registers with a password “purdue1234” the SHA-1 algorithm can be applied which will result in a 160-bit “8ad4d7e66116219c5407db13280de7b4c2121e23”. This digest can be saved in the database instead of the plain text password the user registers with. The next time the user signs on with the same plain-text password – it will get converted to the same signature which can then be compared to authenticate the user. If the user enters a different password say “rohit1234” the SHA-1 digests it as “fb0f57cb70fbd8926f2912585854cbe4bcf83942”. This triggers a mismatch and the authentication fails. The algorithm guarantees to generate the same 160-bit signature given the plain-text, and it is computationally infeasible to reverse the digest into the plain-text. Therefore even if the database is “hacked” the passwords will not be usable. This is one of the most common techniques employed in the industry for saving sensitive data that only needs to be verified and not reused.
DSA

Digital Signature Algorithm (DSA) is an algorithm inherited from the National Security Agency (NSA) and published by the National Institute of Standards and Technology (NIST) in the Digital Signature Standard (DSS) as part of the United States government’s Capstone project (RSA Laboratories, n.d.). In order to gain a better understanding of DSA, the discrete logarithm problem needs to be explained. RSA Laboratories documentation explains that for a group element g, if g is multiplied by itself n times, it is represented by gⁿ; the discrete logarithm problem is as follows: take two group elements g and h which belong to a finite group G, find an integer x such that g^x=h. The discrete logarithm problem is a complex one, it is considered more complex and a harder one-way function than those algorithms that are based on the factoring problem.

Algorithm implementations that have emerged are quick with a big of “O(O(n))”. The big-O notation is a theoretical measure of the execution of an algorithm usually the time or memory needed given the problem size n, which is usually the number of item (NIST, 1976). Signature verification is faster than signature verification, whereas with the RSA algorithm the verification is much faster than the generation of the actual digest itself (RSA Laboratories, n.d.). Initial criticism of the algorithm surrounded around the lack of flexibility when compared with the RSA cryptosystem, the verification performance, adoption issues as cited by hardware and software vendors that had standardized on RSA, and finally the discretionary selection of the algorithm by NSA (RSA Laboratories, n.d.). DSA has now been incorporated by several specifications and implementations. This can now be considered a good choice for adoption by the enterprise.

Secret Keys

Two general types of cryptosystems have evolved over the decades: secret-key cryptography and public-key cryptography. In secret-key cryptography, as the name suggests, a key is maintained and kept secretive from the public domain, only the recipient and the sender have knowledge of the key. This is also known as symmetric key cryptography. In a public-key cryptography system, two keys play a role in ensuring security. The public key is well published or can be requested, the private key is kept secret by the individual parties. This scheme requires a Certificate Authority such that tampering of public keys is prevented. The primary advantage of this scheme over the other is that no secure courier is needed to transfer the secret key. The main disadvantage is that broadcasting of encrypted messages is not possible.

Symmetric Keys

This scheme is characterized by the use of one single key that can encrypt and decrypt the plain text message. The encryption and decryption algorithms now exist in the public domain, the only way this scheme can be used is by the knowledge of a key. If the key is known only to the parties that are in a secured communication mode, secrecy can be provided (Barkley, 1994). When symmetric key cryptography is used for communications and the messages are intercepted by a hacker, it is computationally infeasible to derive the key or decrypt the message from the cipher even if the encryption algorithm is known. The cipher can only be decrypted if the secret key is known. Because the secret key is known only by the message sender and the message receiver, the secrecy of the transmission can be guaranteed.

MAC. While secrecy can be guaranteed the integrity of the message cannot be guaranteed. In order to ensure that the message has integrity, a cryptographic checksum called the Message Authentication Code (MAC) is appended to the message. A MAC is a type of message digest, it is smaller than the original message, a MAC cannot be reverse engineered, and colliding messages are hard to find. The MAC is computed as a function of the message being transmitted and the secret key (Barkley, 1994). This is done by the message originator or the sender.

Asymmetric Keys

Asymmetric key cryptography is different in the sense that there is only one key that is well known to both parties and another set of keys that is private. This scheme is also known as public-key cryptography. The public key is used to generate a function that transforms text (Barkley, 1994). The private key is secret and is known only to the parties who own their respective public keys. The public keys are meant to be distributed. Both the keys are part of a pair and either one can be deemed public and the other private. Each key generates a transformation function, because the public key is known its transformation can be derived and be made known also. In addition, the functions have an inverse relationship. If one function encrypts a message the other can be used to decrypt it (Barkley, 1994). How these transformation functions are used is as follows: the public key of the destination is requested, the sender uses the public key of the destination and transforms the data to be transmitted using it. The sender then transmits the encrypted data to the desired sender. Note that the transmission of the data is encrypted and can only be decrypted by the other pair of the public key that was used. The private key of the receiver can decrypt the message. The receiver uses the private key after receiving the encrypted message and then uses it to decrypt the message, after which the message can be consumed.

The advantage of such a scheme is that two users can communicate with each other without having to share a common key; usually with symmetric key cryptography a common key is saved. The common key which is usually a secret key is not something that should be shared in the first place. Also, distribution of secret keys adds to the layer of complexity associated with the security of the system. Using public-key cryptography this issue is easily resolved. Because it is computationally infeasible for the private key to be derived from the public key, it is also, therefore, infeasible to decrypt the message encrypted with the public key. While there is convenience there is an issue with the inefficiency of the mechanism. The time taken to complete the encryption of plain text can take a long time; also the length of the cipher text can be longer than the plain text message itself. Also, distribution of messages is not possible because the private key is held by only one principal. Therefore it is not possible to use this scheme for encrypted broadcasts. Applications for public-key cryptography are often seen in the enterprise: authentication, integrity and non-repudiation.