Apache CXF, JAX-RS, and injecting dependencies into your resources without Spring

I find myself (for Apache Tika) wanting to do some work on an Apache CXF powered, RAX-RS RESTful API. For a few reasons, we're not using Spring, and we're not using a configuration XML file. It's all defined and run in code.

For this, we now need to pass in some dependencies to our Resource classes. Ask around, and everyone says "that's easy with Spring", but in this case we're not using Spring... After quite a bit of reading around, I've found the answer. What you need to do is use a SingletonResourceProvider to control what object gets returned when CXF wants your resource.

Using the code below, you can run a simple server, and then fetch http://localhost:8765/test from your web browser / curl etc. Because of the singleton, you get "I am a singleton!", and not "I was re-created, not a singleton" which you'd get if CXF created a new instance each time.

So, if you find yourself needing to inject some dependencies into your CXF resource classes, and you're not in spring, you can use this pattern to create the instance with the dependencies passed in, then have those used.

 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *     http://www.apache.org/licenses/LICENSE-2.0
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * See the License for the specific language governing permissions and
 * limitations under the License.

import java.util.ArrayList;
import java.util.List;

import javax.ws.rs.GET;
import javax.ws.rs.Path;
import javax.ws.rs.Produces;
import javax.ws.rs.core.MediaType;
import javax.ws.rs.core.Response;

import org.apache.cxf.binding.BindingFactoryManager;
import org.apache.cxf.jaxrs.JAXRSBindingFactory;
import org.apache.cxf.jaxrs.JAXRSServerFactoryBean;
import org.apache.cxf.jaxrs.lifecycle.ResourceProvider;
import org.apache.cxf.jaxrs.lifecycle.SingletonResourceProvider;

public class TestCLI {
  public static final int DEFAULT_PORT = 9998;
  public static final String DEFAULT_HOST = "localhost";
  public static class TestResource {
      private String msg;
      public TestResource(String msg) {
          this.msg = msg;
      public TestResource() {
          this("I was re-created, not a singleton");
      public Response getStr() {
          return Response.ok(msg, MediaType.TEXT_PLAIN_TYPE).build();
  public static void main(String[] args) throws Exception {
      JAXRSServerFactoryBean sf = new JAXRSServerFactoryBean();

      List rProviders = new ArrayList();
      rProviders.add(new SingletonResourceProvider(new TestResource("I am a singleton!")));
      BindingFactoryManager manager = sf.getBus().getExtension(BindingFactoryManager.class);
      JAXRSBindingFactory factory = new JAXRSBindingFactory();
      manager.registerBindingFactory(JAXRSBindingFactory.JAXRS_BINDING_ID, factory);

When run, we see what we hoped for!

$ wget -q -O - http://localhost:8765/test
I am a singleton!

GOTO Conference Aarhus - My Notes

Last week, I was at the GOTO Aarhus conference. My general overview of it is available here. For my much less exciting notes, read on...

Now, before I go onto my notes, a small apology. These are just basic notes and pointers on the things I either wanted to read up on later, or things I thought were important. They're not a complete summary of a session. GOTO had a lot of students there helping out, and some of them did an amazing job at note taking and summarising

There and Back Again - Software Security in the 21st Century - Brian Chess
  • To error is human

  • #1 education

  • #2 code review

Why Agile doesn't scale, and what you can do about it - Dan North
  • Local optimisations don't roll up

  • Agile doesn't have an opinion

  • Book Suggestion: Agile adoption patterns - Richard durnall

  • Book Suggestion: The leprechauns of software engineering - ebook

  • Book Suggestion: Beyond budgeting

  • Contextual consistency

  • Scaling is more than just small things bigger

  • Guiding principles, strong leadership

  • Crossing the chasm of credibility is hard

Why code in Node.js often gets rejected by SoundCloud
Inspired by this: I need to stop just reading little bits on Scala and Closure! Need to try some things in them

Connway's Law - organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations

Something on data vis stuff (I forget which, sorry!)
Data description language (dsl)
Then from that to neo4j or similar to actually query
Allows for easy changes to db format
Closure macros for DSLs
Small team wins over big team when lots of knowledge needs to be shared
Ola Bini / @olabini

"Mocking without hangover"
"Hiding selenium in ui tests"

How to think about parallel programming - not!
~= an introduction to functional programming
Tail calls

javap - bytecode printer
Non capturing lambdas faster than inner classes
Capturing lambdas same as inner classes, but may improve later!

Code golf
Kingdom of Nouns

Scott Murray - d3 tutorials for non programmers
D3 academic paper

Risk Management
Architecture covers hard parts, critical parts, prototype then fix
Executable requirements and feedback
If you can't identify a consumer for it, you don't need to deliver it

Put it in the mission statement
Consider social justice issues
Start with program committee - needs to be diverse from there on down
Doesn't mean lowering the bar! Plenty of top notch speakers of all kinds
Women tend to say no more - tend to be more busy
Need to do outreach, work hard to get women to submit
CFP alone not enough to increase diversity
Speakers isn't a hard problem, though other parts are

Scala and Play
Syntax, naming, versioning and white space are fertile ground for bike shedding
"Marketing is a moral imperative"

"Out of the tar pit" mark and Mosley
State spoils testing. We test it anyway, as what else can we do?
"Generative testing"
Oop - less helpful when constraints must be enforced across objects
Immutable state instead of mutable state makes reasoning easier
Complexity through control still exists in FP
"Java repl"
"Functional reactive programming" - functional for UIs

"5 second rule" for presentation slides

5k / 10k art - tiny sized web apps and mobile
Pursue simplicity - avoid cognative load

Discussion with Trisha Gee
Spock - groovy, bdd test tool

Overall - I need to give more talks!

GotoCon Aarhus 2013 round-up

Last week, I was lucky enough to attend GOTO Aarhus 2013, the original and biggest of the GOTO Conferences. In short, it was excellent!

During the course of the conference, I took notes of interesting things (with varying levels of detail and comprehensiveness...), which I've put in another post. I started off with one post, but it got a bit big, so I've opted instead to split them up. This one's all about the conference itself. But, before we get to that...

My first involvement with GOTO was a couple of years ago, when I attended + spoke at the first GOTO Amsterdam. I largely enjoyed that one, but it wasn't quite what I'd expected, both in terms of the audience and the things covered. Unfortunately, this expectations gap meant that my talk there wasn't quite the right one... While everyone who came really liked it and learnt a lot from it, sadly it didn't get the numbers it might have had. This time round, I knew what to expect, and knew the kind of audiences and speakers there, so was better prepared to make the most out of it!

For those who've not made it to a GOTO Conference before, they cover a bit of a mixture of topics, but one that seems to work well together. You seem to learn what you need today, what you might need tomorrow, along with a fun bunch of people to speak to between sessions to discuss it all with. Everything is related to Software Development, but in a broad sense, not just pure tech in isolation. So, along with the talks on technology stuff, you also have project management, architecture, careers, technology stacks, general approaches and more. Some of the best talks weren't the most immediately obvious ones, so it's certainly worth trying out some tracks outside of your normal role.

GOTO Aarhus ran with 6 tracks, so it's not a small event. I'd say that at least half the time, I wanted to attend 3 of the talks, so you're not short on interesting things to attend! Possibly my record was one slot where I quite fancied going to 5 out of the 6 talks on offer! The event took place in the Aarhus Musikhuset, the main concert hall, so for many of the sessions you were sat in nice tiered seating, in a medium or big room. It took me a little while to work out which room was where, but then I discovered a very nifty locator in the mobile app. That worked much better than I expected, and showed you where in the building you were, where the room was, and what direction to head in. It was the future arrived! One or two more signs might've been helpful though, as you do look a little bit silly navigating your way round a concert hall using your ipad...

The conference ran over 3 days, and despite fitting a lot in, didn't feel too busy. Partly I think this was due to the 20 minute break between every (50 minute) session, which gave a chance to pause, have a drink, snack on something and re-charge. It saved a mad dash between sessions, and certainly helped me keep going to the end! The sponsors seemed to like it too, as there were more people about for them to chat to. The break area was sunny, and you could walk outside for fresh air, which made a big difference to energy levels vs being buried in some deep basement for 3 days!

Despite there being a large number of people there, everyone seemed very friendly, I had lots of interesting chats with speakers and attendees alike. Possibly I should've got a tshirt saying "I'm not Danish", as almost everyone tried speaking Danish to me first, but everyone was happy and able to switch to English! As well as the danes, there were people from all over Scandinavia there, along with a smattering of other European countries, so there was a good mix.

GOTO had a lot of students there helping out, and some of them did an amazing job at note taking and summarising! Check #gotoaar for some, but the one that most caught my eye was Mathilde Hoeg, who did some amazing summaries like these:

For my much less exciting notes than those... Move onto part 2 here.

If you missed Aarhus, as well as next year's one, there's still time to make it to GOTO Berlin in a couple of weeks, which looks to have an excellent line-up too! See here for details and signup.

Why might you want Alfresco's Cloud-Style Multi Tenancy on-premise?

In part 1 we looked at what Alfresco Multi-Tenancy (MT) is, and why you might want to use it. In part 2, we looked at how collaboration makes a strict one-tenant-per-user model no longer work well, and saw how the Alfresco Cloud has solved this to permit efficient and effective collaboration.

Let's have a quick review of the situation, to remind ourselves how it works:

Each team / department / company / etc has full and complete control and administration over their tenant (as identified by their email domain). When they need to collaborate with someone from outside of their tenant, they invite them in for that one site. The person invited in retains full control of their own tenant, but gets the limited rights given to them in the sites in invited tenants. They can work with others on documents in those sites, collaborate, discuss, comment, but not pierce the veil of the security or protection

Now, that sort of thing can be done without the need of Multi-Tenancy. Alfresco is heavily customisable and extensible, and allows you to override or replace all sorts of parts. If you know what you're doing, it's perfectly possible to achieve the same thing in a single tenant setup. It's not a quick job though, and you'll need to do things like:
  • Define your own way of recording which "virtual tenants" people can access, and what belongs to what
  • Override various People searches (services and webscripts) to check/filter by that
  • Override the Category/Tagging services, along with various webscripts, to give each "virtual tenant" their own tag cloud
  • Override the default permissions, to be more restrictive
  • And a few other bits too...

OK, so it can be done, what's the problem? Well, a few things, including:
  • The initial work takes a while (reviewing then coding), which means less other work
  • Upgrades are slow, as you often can't cleanly override the bits you want to, so you often have to port your code forward + re-review
  • Testing and Validating takes longer, as there are more "moving parts" (so to speak)
  • You have to write custom reports to list who has access to what, so that access reviews can be done, rather than just listing tenant users
  • You have to do the work, rather than getting it included in your Enterprise subscription!

I know for a fact that at least one company would prefer to use Cloud-Style MT on-premise, rather than achieving the same thing with permissions + customisations (mine!), but I strongly suspect we're not the only ones. In fact, I'd rather suspect that there are huge numbers of potential customers out there who stuck Alfresco off their "potential systems" list early on because of a lack of such support "out of the box".

Let us consider a typical collaboration environment, which could easily be different departments within a large organisation, or small companies working together, or a mixture of big company departments and their outsourced providers helping:

In this situation, Users 1 and 2 work for the main company, but invite in two different smaller agencies to help them out with different things. We could even imagine a case where Users 3 and 4 collaborate together, without the others!

How about some simple cases of this setup?
  • Two different departments use the same install of Alfresco, but want complete control of their own spaces. They need to use the same marketing agency, and sometimes both departments and the marketing agency need to work together on an advert
  • Three different departments each have their own intranet, but also all work on the internet site. They need to collaborate with external designers and web agencies on sites relating to their joint external website, as well as each department with just the agency for their respective intranets
  • Operations and Fixed Fee both have their own areas, but they sometimes need to work with Legal and Purchasing, who both have their own areas too

I'm sure I could go on with ideas involving federated companies, or other cases with strong inter-departmental divisions but an increasing need to collaborate across them on certain things. Now, let us consider a few cases where there are many companies, who might all want to share one collaboration system, but currently Alfresco Enterprise won't fit their needs because of a lack of the Cloud Style Multi-Tenancy.

Business Incubator. Needs to be common sites to submit quarterly reports, as well as for general discussions about the space. Some companies will collaborate with each other to win funding, so need shared sites. In other cases, companies are secretive about their partners, and don't want to leak information about who they are working with. Not only do sites need to be private, but they need independent Tags and User lists to prevent one company being able to find out things on another one by checking tags or doing people searches.

Pharma. Pharma A have WonderDrug A, Pharma B have WonderDrug B, but both are also working together with Pharma C on WonderDrug Z. All 3 use the same DM CRO, A and C use the same PV provider, B and C the same MW provider while A do their MW in-house. On WonderDrug Z the three pharma companies want to share their data, but A doesn't want B or C to know who they use for their in-house MW to prevent poaching, no-one wants anyone else to know their tagging schemes, and the DM CRO shouldn't have access to everything even if they work with everyone. (You can do this setup with permissions, but as previously mentioned it has notable upgrade costs)

Banking. The Legal and Compliance teams need strong separation between their areas and those of the rest of the business. IBD and Equities need to be able to prove that the Chinese Wall remains unbroken. All of them need to collaborate on the Charitable Works project, and some of all of them need to offer input on the re-tendering of the catering contract. Passwords need to change often, and the idea of having multiple logins won't go down well.

I'm sure there are a number of other cases too, where the need for strict separation between Tenants is required most of the time, but controlled access between them to collaborate is needed in others. Those were just the three that first sprang to mind. In all 3 cases, unless you have a very deep knowledge of Alfresco, you would do an initial evaluation of the product then reject it as unsuitable, since the feature isn't there out of the box. How many lost deals are there out there from this feature not being present? How many possible deals never got into an Alfresco prospects list, because the lack of collaborative MT was spotted before anyone talked to sales or a partner? How many times have organisations really wanted to open-ness and community that comes with Alfresco, but had to go for a different solution because of this one missing feature? I'm going to say the answer to all of those is "rather a lot"

The crazy thing about these missed sales, these missed opportunities, these skilled Alfresco customers having to roll the solution themselves, is it's not some impossible new feature that would take man-years for Alfresco to add in. This is a feature that Alfresco have already written, tested, documented, and supported for getting on for 2 years! This wouldn't be some new area to develop, this is something Alfresco already support, and effectively support twice!

How much work would be involved in migrating this existing feature over from Cloud to Enterprise? Well, some parts of it are already done. As mentioned in this 4.2 blog post, the new Public API on 4.2 uses cloud-style URL patterns, eg http://localhost:8080/alfresco/api/-default-/public/cmis/versions/1.1/atom
The -default- in the URL is an explicit tenant selection, which is the first step in the move away from the old-style implicit tenant selection. What else? Well, there would be a lot of testing involved - Alfresco Enterprise has a much wider set of supported stacks/platforms than Cloud. Coding/Merging wise, it wouldn't actually be too bad, certainly a lot easier than some of the other Cloud -> Enterprise merges that have happened of late. Documentation would need updating too, and probably some compatibility layers that maintain implicit tenant selection on old-style URL patterns.

All in all thought, it's not something that could be knocked out in a couple of i-days, but it's not a huge undertaking, much smaller than others recently. It would reduce support and maintenance costs for Alfresco, by reducing from 2 to 1 the number of MT platforms supported. It would greatly reduce the support/maintenance costs for those currently forced to implement their own system. Perhaps much more importantly, a huge number of new sales opportunities would arise, and arguably a number of whole new sectors would suddenly be able to consider using Alfresco. I may be a little biased, but I'd say that's well worth going for!

Cloud-Style Alfresco Multi Tenancy

In the first part of this Alfresco Multi-Tenancy series, I tried to cover what MT is, and how Alfresco implements it, concentrating on the On-Premise / One / Community / Enterprise versions of Alfresco. In this part, I'll try to explain a little of what changed with the Cloud version of Alfresco.

Let us consider again our situation from the end of part 1. Each team / department / company / etc has their own tenant within our single managed HA install of Alfresco. Each one has their own set of users, and are fully isolated from each other. Our setup looks a little like:

At this point, User 1 asks User 4 to help out with a document in Repository 1. Maybe it's a cross-team project, and they need to collaborate on the presentation to give to management. Maybe User 1 has been working a press release, and needs to get User 4 in from the PR team to sign it off. Maybe Users 1 and 2 are a company, and they've hired in freelancer User 4 to help with some work. There are many reasons, but what to do?

Well, one option is to email the document over, or copy it on a USB stick or something. This wouldn't be much fun, creates a tracking nightmare, and hey - didn't you buy this Content Management thing to avoid that sort of nasty mess in the first place? Clearly this isn't what we want...

Hmm, well, Alfresco supports Replication and Transfers, could we maybe use that? Let's see how that would work:

The problem here is that we still have two copies. That means two lots of hassle, constant issues with working out which version to use, who has the latest changes, what filename to use, how to keep in sync, all that sort of thing. In short, you've recreated most of the problems that drove you to a Content Repository in the first place!

Well, what instead if we gave User 4 an account on our Repository 1 tenant? Wouldn't that work?

Now, our first issue is how to let them log in. Normally, Alfresco uses the domain part of the username to identify the tenant, so we can't just let jane.smith@pr.agency.example.com log into @company.example.org as the domain is wrong. This means we either have to use non-emails (eg jane@pr.agency and jim@company), or we have to play with a load of settings, or both. Next up, what about poor Jane and where to go? If she logs into his normal account, she can't see this Press Release she's supposed to work on. Then she remembers and logs in with a different, unusual account, and accesses the Company tenant with the thing to work on. In here, she can't see her normal content, can't see her normal changes, can't easily access her stock content in her site, and is generally confused about where she is (it looks the same but isn't!). The upshot is user confusion, annoyance, people rarely logging into some systems due to the faff, and a lot of admin overhead (creating all those duplicate accounts). This just doesn't fit how people work today.

At this point in our story - enter Alfresco Cloud! A triumph of vision over a lot of things... The Alfresco Cloud version is built on top of the regular Alfresco Enterprise codebase (ish...), with customisations to both the underlying services, and the UI. Part of what it changes / introduces is a brand new model of Multi-Tenancy.

When someone talks about Alfresco Cloud-Style Multi-Tenancy, or Cloud MT, or something like that, there's one key bit they're probably referring to. Well, two related bits. But first, two screenshots:

Leaving aside for a moment that one is using a largely-stock Alfresco 4.1 theme, and the other is using a the very snazy new 4.2 one, there are two things to see. Firstly, the URL. In the local version, we have a common URL, with just a user's email in the final part of the URL for their dashboard. For the cloud one, we see the domain (tenant) included in the URL. Secondly, on the local one, there's no way to pick a tenant. On the cloud one, we see our tenant listed at the top. Clicking that...

We then have the option to access additional (secondary) tenants to which we have been invited to. Not all tenants on the system, mind you, just the ones to which access has been granted. Also, not all resources within those secondary tenants, only sites to which we have been invited. (In your home tenant, you can access all public sites by default, in secondary tenants you can only access sites to which you have explicitly been granted access, and no others)

The practical upshot of all this - modern style collaboration can work, and can work well! We can largely maintain the required "silos" between data, each group (tenant) can manage things for their own team / department / company / etc, but they can invite specific people to collaborate with them on one specific thing (and only that). The invited user can access that small subset using their existing login, without needing to do anything special, can see what's going on, can work on that document alongside their own ones, and it's all good!

Before we end this post, a little bit on the technical side. (Note - I didn't write any of the Cloud-Style MT code, but I have been to talks on it)

What exactly is involved in the cloud-style multi-tenancy, from a coding side? I'd say the main things are:
  • Ability to specify the tenant of interest independent of the login (so you can pick an invited tenant)

  • Store/check/retrieve/manage additional (secondary) tenants for a given user

  • Different permissions when in an invited (secondary) tenant to your home one (can't see non-invited sites, search users etc)

  • Cross-tenant invitations

  • A whole bunch of tenant creation stuff, self sign-up, tenant management, billing, metrics etc (required to run the cloud scale service)

I'll largely ignore the last one, as that's specific to running The Alfresco Cloud, as opposed to running something like the Alfresco Cloud. (This part is pretty much independent of the rest, and while important for the Alfresco Cloud, has no interest to anyone else.)

So, what's the scope of these changes? The first one was actually the one that took much of the work. The WebScript framework on both end needed changes, these had to be fed down to the Tenant Service, and the URLs changed too. Since people can be in multiple tenants, you can't just auto-detect their active tenant from their username. Instead, the URLs hold the current tenant of interest, and change based on the tenant you're in.

An extended Tenant Service was also needed, to hold the details of the additional / secondary / etc tenants for a user, allow querying of them, management of them etc. This part is quite small. Then, you need some logic that pulls out the requested tenant from the URL / webscripts, checks it's allowed for the user, and sets of the context for the request. You also need to change some permissions things, as someone who's only a guest in a tenant (accessing it as a secondary tenant) shouldn't be able to do everything as in their own tenant. We also need to have the support for inviting someone to a different tenant, so you can invite User 4 to join a site in Repository 1 with the tenant change happening inside the invite transaction stuff. Oh, and you need to maintain this whole set of Multi-Tenancy code alongside the existing On-Premise single MT code, bug fix and enhance both etc...

All of this is hard, tricky, and took a lot of work! However, it's also all done... Alfresco have tackled the problem, and come out victorious! Yey for the Alfresco Engineers!

In the final part, we'll look at why other people might want all of this lovely thing not in the Cloud

Alfresco Multi Tenancy - An Introduction

Alfresco has supported Multi Tenancy for a number of years now. A little over a year ago, Alfresco added support to the Cloud version which expanded the MT support, but thus far hasn't brought those improvements back to the On-Premise (One) version. In this series of blog posts, I'll try to explain what the traditional Alfresco MT does (and doesn't) do, what new things the Cloud MT handles, and finally why I feel everyone would benefit from from the Cloud MT on-premise.

Right, so, Multi Tenancy, what is it? The Alfresco Enterprise Documentation covers part, and the Wiki has a lot of details. Neither of them seem to cover much of the basics, which can cause some confusion (it certainly has when I've tried to explain it to others). I'll try to give a quick intro, feel free to skip the next few paragraphs if it's all known to you already!

With a simple content store system, you typically have a single installation, which hosts a single set of content. One set of users, one set of files, everything in together. Works just fine for your team / department / company / etc. Something like:

Your new system is great, and everyone wants in! You start adding accounts for a couple of people from another team / department / etc, so they can use your system. You create them their own sites to collaborate in, but it's still your system, and you retain full control. After a while, they decide they want their own system, which they themselves have full control over. To do this, they install their own brand new copy of the system, set it up, create their own set of users and sites, grant permissions, tell people the new URL etc. You now have two independent installations, each holding their own independent repository stores.

Having two independent systems may seem great to start with, as each team / department / company etc has their own area. They can do whatever they want, they have full control over their own things, they don't have to worry about checking permissions to prevent someone in one team having access to another team's things, there's a nice big gap between them. Then, you start needing to upgrade, to apply customisations, to develop new in-house features. Suddenly, there's two different systems that both need to be updated, both need to be tested and validated. Two different systems need to be maintained, two different systems need two sets of resources, twice the monitoring, more IT work, more co-ordination. Trying to add some more teams / departments / companies / etc in is going to make this even worse. Federation / common interfaces / CMIS can help a little with the common access, but the work grows....

At this point, some bright spark starts asking for Multi Tenancy. Instead of having one complete install for each individual group, why not have them share a common one, but have the system pretend to them that each one is independent? From a maintenance and resourcing perspective, you pretty much just have the one system to look after. From a user's perspective, they each have their own totally independent content / users / permissions / admins / etc, as before, but it's simpler and cheaper to run.

In this model, there's generally a "super-admin" somewhere who can create or delete new tenants within the common (multi-tenant) system, and who can reset admin passwords within each tenant. Otherwise, each sub-part of the system (each tenant) is independent of each other, doing their own things in blissful ignorance and fully protected from what's going on in the other tenants. The cost of hosting / maintaining is shared, so everyone wins on cost. It isn't quite a fully independent system, as things have to be the same until you have identified which tenant you belong to (perhaps by a URL, or more commonly based on your username on login), but for old-style working this was fine.

This sort of multi-tenant model was very popular with hosting companies, and large companies with strong divisions. Each team / department / company had their own sub-area that they could work in, that was "theirs" which they had full control of, but there's only one installation. For big setups, you could also get extra resilience and availability, since rather than 3 teams each having their own one server, they could club together and get a HA cluster with 3 machines in it that they all shared.

As previously mentioned, Alfresco has supported this style of Multi-Tenancy for quite some time. (I believe there was one big OEM deal that funded a lot of the initial work on it, but there had been plenty of call for it from the field before this). It used to be a little fiddly to set up, but as of 4.0.2 it is pretty simple to get going. There's information on the wiki and in the enterprise docs. Let's have a go at enabling it for our Enterprise 4.1 install to see!

First, log into the Repo (Explorer) interface as an admin. Next, go to the tenant admin console at http://localhost:8080/alfresco/faces/jsp/admin/tenantadmin-console.jsp (assuming a default install). You should see a very simple Explorer console / shell thingy to interact with:

When we run
show tenants
we see that we have no tenants by default. Let's create one! Type in
create example.org adminpass
and wait while it churns away. (Your first tenant is a little slow, as a load of setup happens, but new ones tend to be fast). Next, head over to share. Instead of logging in as admin instead use admin@example.org and a password of adminpass (we set the domain and admin password at creation time). When we log in, we see an empty repository, without any of the sites we previously had

Create a new site here, and upload a few files. Then, log out, and log in with the normal admin account. Search for the site you created, but you won't find it. Alfresco keeps the different tenants independent, including the default tenant and the one you just created.

Finally, before we look at the problems that modern styles of working pose to this, let's do a quick round-up of what the Alfresco On-Premise / Enterprise / One implementation of Multi-Tenacy does and doesn't do.

You can use Multi-Tenancy with the following:
  • Alfresco Explorer (JSF-based web client)
  • Alfresco Share (since 3.2)
  • WebDAV
  • FTP
  • Web Scripts
  • CMIS (since 3.2)
  • tenant 'guest' authentication is allowed, but must be within the context of a tenant (such as 'guest@acme')

Unfortunately though, the following are not supported/implemented/tested in a multi-tenant enterprise environment:
  • CIFS
  • WCM
  • RM
  • Portlets
  • LDAP, NTLM and authentication methods other than "alfresco"
  • Inbound Email
  • Content Replication
  • IMAP
  • SPP / VTI (SharePoint Protocol)
  • reloadable Share config (if using dynamic models)
  • GoogleDocs integration

In Part 2, we look at what the Alfresco Cloud offers in place of this.

Enabling replication with OpenLDAP 2.4 on RHEL 6 with cn=config (olc....)

The last time I setup replication on OpenLDAP, it was on Debian with a single slapd.conf file. This time both machines were running Redhat Enterprise RHEL 6, using the OLC / cn=config style of configuration, with ldif files. I found quite a lot of information on how to setup replication, quite a lot on using ldap modify to change cn=config, but not a lot on using cn=config ldif files for replication. Having worked out which bits to follow and which to ignore, I thought I'd document it for others to hopefully learn from!

This guide assumes you're familiar with OpenLDAP replication, want to use Syncrepl, are using RHEL 6 for your master and your slave(s), and have already got your directory working on the master, set permissions etc.

Step 1 - Create a read only replication account

Using your favourite tools, create a new account (objectclass = account) in your directory, with a suitable name (eg uid=replication or some such). Set a secure password on it, and make a note of that. Next, edit /etc/openldap/slapd.d/cn=config/olcDatabase={2}bdb.ldif and grant read permissions to this user on all attributes. Any it can't see won't get replicated, so tweak your olcAccess statements as needed. A basic setup would look like:
olcAccess: {0}to attrs=userPassword
  by self =xw
  by dn.exact="uid=pwreset,dc=example,dc=org" =xw
  by dn.exact="uid=replicate,dc=example,dc=org" read
  by anonymous auth
  by * none
olcAccess: {1}to *
  by anonymous auth
  by self write
  by dn.exact="uid=replicate,dc=example,dc=org" read
  by users read
  by * none

Step 2 - Enable the syncprov module

On all machines (master and slaves), create a new file /etc/openldap/slapd.d/cn=config/ called cn=module{0}.ldif . Into it place:
dn: cn=module{0}
objectClass: olcModuleList
cn: module{0}
olcModulePath: /usr/lib64/openldap
olcModuleLoad: {0}back_bdb
olcModuleLoad: {1}syncprov
Note that if you're on a 32 bit system, you'll need /usr/lib/openldap not lib64. This file will trigger the loading of the syncprov module, and the bdb one if needed. If you want to add more modules later for other things, you can either add them to the ordered olcModuleLoad list, or add cn=module{1}.ldif and list them in there

Step 3 - Turn on syncprov for each directory

syncprov needs to be enabled for each directory, which in the default config would mean for olcDatabase={2}bdb, and possibly olcDatabase={0}config too. For now, I've opted to enable syncprov for both, but only pull the former, but I may change that in time.

Firstly, create two new directories, /etc/openldap/slapd.d/cn=config/olcDatabase={0}config/ and /etc/openldap/slapd.d/cn=config/olcDatabase={2}bdb/ - these directories should match with a similar .ldif file of the same name, just without the .ldif. Into each one, create a file olcOverlay={0}syncprov.ldif which will hold the syncprov config for each directory. The file should look something like (depending on your chosen settings):
dn: olcOverlay={0}syncprov
objectClass: olcOverlayConfig
objectClass: olcSyncProvConfig
# Sync Setup for the main LDAP Database
olcOverlay: {0}syncprov
# Sync Checkpoints every 20 changes or 1 hour
olcSpCheckpoint: 20 60
# Keep a fair number of operations in the log
olcSpSessionlog: 1000
Restart slapd on the master server, and ensure it starts without error.

Step 4 - Configure the slave(s) to poll the master

Finally, on each slave we need to configure the directory to pull from the master. This will use the syncprov module load we setup earlier in step 2, which needs to be done for each server!

Edit your database config file, eg /etc/openldap/slapd.d/cn=config/olcDatabase={2}bdb.ldif and add to the bottom of it an entry like:
olcSyncrepl: rid=135
  retry="60 30 300 +"
The rid needs to be unique per slave, and needs to be a three digit number, I've found a suitable part of the IP address to be a good option to go for!

On each slave, ensure there's an empty directory, then start slapd. Within a short while, a search should then show all the data from the master, and you're good to go!

If you hit problems, try running /usr/sbin/slapd -h ldap:/// ldaps:/// ldapi:/// -u ldap -d 255 to start the server in debug mode with logging to the console. The logs can be a little cryptic, but with googling you ought to be able to work out what's wrong and fix!

Nick - Authenticating to Alfresco Share using Apache Auth (eg for SSO)

This post carries on from my earlier one on Alfresco Repo SSO, and builds on the information on how the Alfresco WebScript authentication methods work.

At this point, we can hopefully log into the Alfresco repository (Explorer or /alfresco/wcservice) automatically by authenticating to Apache. Our next step, before moving onto SSO in Share, is to also allow header based authentication to Alfresco. We need to do this because of how Share will talk to the Repo - it will know the user's username from Apache, but it won't have their password (and equally won't have their client cert if doing SSL auth). Share needs to be able to tell the repo who you are, and have that trusted. For that, we need to tweak our External Authentication SubSystem.

Before doing that, be aware that by turning on header based auth, anyone who can make http requests to the repo could add in that header and impersonate anyone, so take great care! You should do one of use firewall / auth rules it so that only Share can, use something like an Apache module to strip out the header except from Share, or use SSL + client auth + proxyUserName.

Currently, our external-authentication.properties file looks like:
To support header based auth, we add two more lines, giving us:

If you don't specify a proxyHeader, the default is X-Alfresco-Remote-User (that comes from here).

With that in place, from a machine that is allowed to talk to Alfresco with the header, try executing something like:
    $ curl -X GET -L -H "X-My-Custom-SSO-Header: nick" http://localhost:8080/alfresco/ | less
Take a look through the HTML, and check you see something like Logout (nick) and not Login (guest)

Now, we can turn to Share. There is information on this in the wiki and the Enterprise docs, but they're a bit short on explanation to understand why you need to do things.

First, two things we can safely ignore now. One of those is the KeyStore entry. That's used if you want Share to talk to the Repo over SSL with client Auth. You may well want to turn that on for production, but not for getting started. (If you do, you should set the SSL client cert username as your proxyUserName, then the Repo will only check for the proxyHeader HTTP Header for the username for requests from that user). Secondly, we're going to be using a HTTP Header to carry the username, so we don't need the alfrescoCookie connector.

If you don't have one already, create a alfresco/web-extension/share-config-custom.xml file, typically either in a module, or in the Tomcat shared classes. If you're creating a fresh one, use share-config-custom.xml.sample as a basis (it has nearly the config in it you need, along with other bits you won't want now so remove / comment them out). The key bit for us is to add/modify a Remote config section. This is where you define how Share talks to other systems over http/https, especially to the repository (via the alfresco endpoint). Define a Remote section as follows:
   <config evaluator="string-compare" condition="Remote">
               <name>Alfresco Connector</name>
               <description>Connects to an Alfresco instance using header and cookie-based authentication</description>

                <name>Alfresco - user access</name>
                <description>Access to Alfresco Repository WebScripts that require user authentication</description>

With that in place, if you go to Share in your browser via the Apache connection - http://local-alfresco/share/ if you followed part 1 - you get an Apache basic auth prompt, then you should be automatically logged into Share with the user details you gave to Alfresco. SSO done!

One other thing to try before we understand how it works (and you lock down the security!), we'll try curl again:
$ curl -X GET -L -H "X-My-Custom-SSO-Header: nick" http://localhost:8080/share/ | less
Look for something near the top showing Alfresco.constants.USERNAME = "nick" and not guest to confirm header based auth worked for Share.

So, what's going on, and how does it all work? At this point, you'll probably want to grab the Spring Surf source code if you want to dig around, as most of the logic isn't in Share itself, but the underlying Surf framework.

Looking at the config, there are a few things to note. In the alfresco endpoint, we have <endpoint-url>http://localhost:8080/alfresco/wcs</endpoint-url> - note that it has changed to the WCServices URL from the regular Services one. As seen in the how the Alfresco WebScript authentication methods work blog post, this is so that we can use the Authentication subsystems via the filters to authenticate, while services just does basic auth + tickets. The next thing is <external-auth>true</external-auth>, which tells Share that we're using External Authentication and it should hide a few bits. AuthenticationUtil exposes this via isExternalAuthentication, and a few bits of Share use that. Finally, the connector-id ties us back to the Header and some Java.

In the connector section of the config, the route taken is that Share finds an endpoint with the id alfresco, then uses that to find the connector to use by the connector-id defined there. For us, that's our new alfrescoHeader one. That includes <userHeader>X-My-Custom-SSO-Header</userHeader> which is the HTTP Header used by Share and the Repo to pass along the username to use. The other thing in there is the class that drives it all, SlingshotAlfrescoConnector.

At this point, I should perhaps point out a gotcha. The underlying External Authentication SubSystem supports external.authentication.userIdPattern to only use part of the Header as the username, but due to ALF-18393 Share doesn't and it all goes a bit wrong... The option isn't (currently!) Share Compatible

Looking some more at SlingshotAlfrescoConnector which we defined above, there's one key method in there, applyRequestHeaders. This is the part where Share is tweaking the headers on any connections it makes to the Repo. In it, it checks to see if you authenticated without a password (i.e. with SSO). If you have, your username is sent to the repo in the header X-Alfresco-Remote-User and if you set a config entry userHeader then with that too as well. There's also setConnectorSession which feeds your custom header down the stack.

So, if you want to send some extra headers to the Repo to prove to it you are really Share, and you don't want to do it with SSL client certs (for which there is an example in share-config-custom.xml.sample), then overriding SlingshotAlfrescoConnector + applyRequestHeaders + specifying that class instead is likely the way to go.

Where does the username come from to pass along? That's within Spring Surf, in AbstractUserFactory inside initialiseUser

The final bit of magic is explaining how specifying our magic header gets passed through Share to the Repo. The solution to that is to be found in SSOAuthenticationFilter. As well as handling Kerberos and NTML challenge stuff, it also does a bit of magic in the case of custom username headers. Specifically, in wrapHeaderAuthenticatedRequest it makes the value of the custom header available as the result of getRemoteUser(), which allows the rest of Surf to know who you are.

Phew, and we're done! You should now have a working Alfresco + Share SSO setup, which can work either through Apache with Apache handling the Authentication, or direct to Tomcat with special headers. We (hopefully) understand how it works, why, and have a testbed. At this point, we can tweak it for our final SSO setup, put in any security around it we need, and go live!

Authenticating to Alfresco using Apache Auth (eg for SSO)

I'm currently setting up SSO (Single Sign On) for an Alfresco + Share install. I know I'm going to need to do a bit of customising, but I've struggled a bit with understanding what's going on with the various SSO tutorials. They give instructions, but not always a lot of explanation, which can be tricky if you know you want to go off piste. So, having reminded myself how the WebScript /services/ + /wcservices/ authentication works, my next step is to get some basic Apache Authentication in place, and have Alfresco trust that.

Why basic Apache Authentication? Well, it's the basis on which things like Alfresco + CAS and Alfresco + Shibboleth work, it can be used for SSL client cert auth, it's fairly easy to setup, I know most of the pieces already (always a bonus!). My aim in this - get Alfresco Explorer and the /wcservices/ WebScripts trusting Apache provided authentication. I'm not looking at Share, that's a follow-on post.

So, what components do we need? Firstly, there's the Alfresco External Authentication Subsystem. Next there's Tomcat AJP-13. There's Apache mod_jk to talk AJP-13. Finally, there's the basic Apache Authentication. Because I just want to test, I'm using a simple file based Apache auth setup. You likely won't use that in production, but it'll get you most of the way to a working SSO setup, and is easy to test with.

Our first thing to do is tell Alfresco that it can trust the remote authentication from Apache. (The process for this is similar to enabling Header based external Authentication). To do this, we need to enable + configure another Authentication SubSystem, specifically the External Auth SubSystem. Create a directory alfresco/extension/subsystems/Authentication/external/external-apache in your usual customisation spot (typically a module or the Tomcat Shared Classes). Into this new directory, create external-authentication.properties, and populate it with something like:
# Which users should be treated as an Admin?
# Enable the External Authentication
That's not all though, you also need to add this new authentication subsystem to the active list. Somewhere, quite possibly in your alfresco-global.properties file you should have an authentication.chain line. Prepend this with external-apache:external, something like:
# This is what we used to have, LDAP + Built-in

# Try the External Auth if we can, otherwise the previous ones
Restart Alfresco, and check that you can log in as before. You should also see (assuming the default logging settings) something like INFO [org.alfresco.repo.management.subsystems.ChildApplicationContextFactory] Starting 'Authentication' subsystem, ID: [Authentication, managed, external-apache] in your logs to show it was found and used.

Next up, we need to enable AJP-13 in Tomcat, and tell it to trust Apache supplied Authentication. I'm assuming you have two Tomcats, one for Alfresco on port 8080, and one for Share on 8081. If you only have one, skip the Share/8081/8010 bits. Edit [Tomcat Root]/conf/server.xml; for both Tomcats. Around line 90, you should come across an AJP Connector section, which may be commented out. Uncomment it, set the Alfresco one to port 8009 (default), and the share one to 8010. Finally, add the attribute tomcatAuthentication="false" to the Connector. This should give you something on the Alfresco Tomcat looking like:
   <!-- Define an AJP 1.3 Connector on port 8009 -->
    <Connector port="8009" protocol="AJP/1.3" redirectPort="8443" tomcatAuthentication="false" />
And for Share it'll be:
   <!-- Define an AJP 1.3 Connector on port 8010 (Alternate Port) -->
    <Connector port="8010" protocol="AJP/1.3" redirectPort="8443" tomcatAuthentication="false" />
Restart your tomcats, and ensure they come up without error. Check you can telnet to localhost 8009 and 8010 (there's no welcome banner) to ensure the Tomcats are listening and no local firewalling breaks.

Half way!

If you haven't already, install a copy of Apache (latest 2.4, or perhaps even 2.5/2.6 if that's out by the time you read this!), and make sure you can get the welcome page up. If on a Unix, make sure you have the Apache developer tools package installed too, if you're doing a package install. If you can, install a packaged version of the Tomcat Connectors module. Failing that, grab the source code and build + install the module

There are two parts to configuring up mod_jk, the first is a JK specific file worker.properties, the second is the regular Apache config. For the former, create a new file somewhere sensible (but make sure it's not somewhere that Apache auto-loads config files from, or you'll get errors). FWIW, I created a new directory under /etc/apache2/ of conf.other and created it as tomcat-workers.properties but it'll depend on your distro. Assuming you setup your Tomcat AJP as above, create the file with:
# For communicating with Tomcat via AJP13

# We have two Tomcats, and we'd like status please

# Enable the status worker

# Alfresco is on 8080 (http) / 8009 (AJP)

# Share is on 8081 (http) / 8010 (AJP)

Now, define a new vhost to Apache, either by tacking it on the end of your httpd.conf, or putting it in a new file in conf.d, or in sites-enabled, or .... as appropriate for your setup. To start with, go for something like:
# Load the mod_jk module (your path to it may be different)
LoadModule jk_module /usr/lib/apache2/modules/mod_jk.so

# These paths may need changing for your setup too
# Where the file we created above lives
JkWorkersFile /etc/apache2/conf.other/tomcat-workers.properties
# Where to put jk shared memory
JkShmFile     /var/log/apache2/mod_jk.shm
# Where to put jk logs
JkLogFile     /var/log/apache2/mod_jk.log
# Set the jk log level [debug/error/info]
JkLogLevel    info
# Select the timestamp log format
JkLogStampFormat "[%a %b %d %H:%M:%S %Y] "

# Define the vhost that'll expose Tomcat
<VirtualHost *:80>
	ServerAdmin webmaster@localhost
	ServerName local-alfresco

        JkMount /status     jkstatus
        JkMount /alfresco   jkalfresco
        JkMount /alfresco/* jkalfresco
        JkMount /share      jkshare
        JkMount /share/*    jkshare
The ServerName needs to be unique, so either make it your machine name (only vhost), or create a new alias for in your /etc/hosts file and use that. Restart apache, and try visiting [server name]/status to check that mod_jk is loaded and running. Then, visit [server name]/alfresco and check that you get explorer as guest.

Finally, before we consider about how it all fits together, we need to turn on Authentication in Apache, so it has some auth to pass to Tomcat. Full details on this are in the Apache docs, but basically you'll want to use htpasswd to create a digest password file, something like:
  $ # Create the file, add admin
  $ htpasswd -c /etc/apache2/conf.other/alfresco.htpasswd admin admin
  $ # Create another user (don't use these passwords in production!)
  $ htpasswd /etc/apache2/conf.other/alfresco.htpasswd nick password

Now, add to our vhost some auth lines:
        <Location />
          AuthType Basic
          AuthName "Test Alfresco Access Auth"
          AuthBasicProvider file
          AuthUserFile /etc/apache2/conf.other/alfresco.htpasswd
          Require valid-user
Restart Apache, and re-visit [server name]/alfresco . You should be prompted for a password. Specify admin and admin (as given to htpasswd), and you should be taken to Explorer logged in as admin automatically. We're basically there!

At this point, make sure you're only using a test server, have suitable firewalling etc in place. We're just trying to get some basics in place to test with, so we have a working base before making our changes. More is needed to make this production...

So, how does this all work? Apache is doing the authentication, and passing that through to Tomcat. We can see that Tomcat sees the username by creating a small test jsp somewhere in the Alfresco webapp, containing:
User: <%=request.getRemoteUser()%>
Run that, and you'll see the username you gave to the Apache auth come through. That username is passed into the Servlet Request.

What does Alfresco do with it? When we enabled our External Authentication SubSystem, we effectively pulled in the external-filter-context.xml context file, along with partly overriding the default external-filter.properties. The webscriptAuthenticationFilter isn't really doing that much for us, the key bit is the remoteUserMapper. This is configured up to use DefaultRemoteUserMapper

When the request comes in to Alfresco, for either Explorer or /wcservices/ (but not /services/ as that's different), the global Authentication Filter fires. In doFilter it calls getSessionUser which in turn checks for a Remote User Mapper. If one is found (and we've enabled one with our next Authentication SubSystem), getRemoteUser is called. If you look at that method in DefaultRemoteUserMapper, we'll see it checks the Remote User on the request, and returns that. So, no login is needed at the Tomcat / Alfresco layer, because the value from Apache is trusted.

We can also see how we could do http header auth at this point if we wanted, and also see some of the security issues which mean you shouldn't blindly enable this in production...!

So, we can now use our test Apache auth setup to be auto logged-in to Alfresco Explorer and wcservices. In the next part, we'll see how we enable Share auto-login with the same setup, and how that actually works under the hood.

Alfresco WebScripts (Services) and Authentication

I'm currently doing some work on enabling SSO for Alfresco, and I've come across some slightly confused wiki pages and blog posts out there around how webscripts and authentication fits together. As I've done quite a bit of fighting with this in the past, I thought it might be worth trying to summarise what works and why.

Firstly, WebScript URL prefixes. The Alfresco webscripts are available under three pairs of URL prefixes, also known as WebScript runtimes. As detailed on the wiki, the three are:
  • /alfresco/service/ or /alfresco/s/ - HTTP Auth + Tickets, "apiServlet"

  • /alfresco/wcservice/ or /alfresco/wcs/ - Explorer Style Auth, "wcapiServlet"

  • /alfresco/168service/ or /alfresco/168s/ - JSR-168 Portlet Auth, "portalapiServlet"

I'll skip the portlet stuff, as I don't use it, and concentrate on the other too. (Note that all three expose the same set of webscripts, it's just the authentication in front of them that changes)

The first thing to note is that describing /alfresco/service/ as using HTTP Auth misses a few bits out. There are in fact several ways to authenticate to it. One option, most commonly used when debugging and admining, is to use HTTP Basic Auth (but make sure you're using it over SSL in production!). If no other authentication is sent, and the webscript requires it, the webscript framework will prompt you for your username and password with basic auth.

Another option is to use Alfresco Tickets, as detailed in the wiki here. You can get a ticket a few different ways, but posting JSON of the username and password to /alfresco/service/api/login is often the most common. The ticket can be supplied with a request via the alf_ticket URL parameter, or by sending HTTP basic auth with a username of ROLE_TICKET and the ticket as the basic auth password. Another option is to send the URL parameter of guest=true to explicitly request guest access.

If you want to see how the authentication to /alfresco/service/ works, the main class to look at is BasicHttpAuthenticatorFactory

When looking at /alfresco/wcservice/ , most things will describe it as using Explorer authentication. That isn't quite the full story though, if you take a look at web.xml you'll see that there are actually two Authentication Filters that apply to requests through wcservice. The Global Authentication Filter is used (that's the one shared with Explorer), but before that a special WebScript Authentication Filter is also tried.

The latter is passed off to a spring bean webscriptAuthenticationFilter. Not all authentication subsystems implement one of these, so often it'll do nothing. If the authentication filter declines, then it'll fall back to the default Global Authentication Filter, which ends up as the bean globalAuthenticationFilter, and from there the normal Explorer authentication kicks in.

Quite a few of the instructions out there for doing SSO involve specifying some custom share config, to change the alfresco Remote Endpoint to use /wcs . If you see one of these, it will almost always be coupled with changes to your Authentication subsystem(s) to do something with the webscriptAuthenticationFilter. When setting this sort of thing up / debugging, be aware that you'll need to test with /wcs and not /service, and that normally you can't test with Explorer as that doesn't use the same filter!